cluster-pub


Namecluster-pub JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryCLI to cluster scientific papers
upload_time2024-10-26 14:07:52
maintainerNone
docs_urlNone
authorfelipe barcelos
requires_python<3.12,>=3.11
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ClusterPub

ClusterPub is a tool developed to help researchers in their processes of bibliographic review, 
helping them to find papers related to their areas of interest,
based on search results returned by papers repositories, like, IEEE Xplore and Pubmed.

## Instalation 🛠

To install and execute ClusterPub it is necessary to have Python 3.11 or above installed.

## Run ClusterPub 🚀

To execute ClusterPub run the following command:

#### Cluster publications present in a bibliographic file
```bash Python installation command
cluster-pub {source_file} {result_file}
```

#### OBS: The result_file name should contain the desired extension.

The allowed extensions for the source file are:

- NBIB
- RIS
- BibTex

The allowed extensions for the result file are:

- EPS
- JPEG
- PDF
- PGF
- PNG
- PS
- Raw (Binary)
- RGBA
- SVG
- SVGZ
- TIF
- TIFF
- Webp


#### To obtain help about the parameters and options available execute the following command:
```bash Python installation command
cluster-pub --help
```

There is a folder in the project directory called sample_files, containing files that could be used to execute tests.

## Extract Clustering Metrics  📈

To calculate clustering metrics, like, Silhouette Score, Davies-Bouldin Score and Calinski-Harabasz Score run the following commands:

OBS: The argument number_of_clusters is not the desired clusters quantity,
but it is the quantity of clusters/categories that might exit in the analysed dataset.

#### Calculate Davies-Bouldin Score
```bash Python installation command
cluster-pub-metrics davies-bouldin-score {source_file} {number_of_clusters}
```

#### Calculate Calinski-Harabasz Score
```bash Python installation command
cluster-pub-metrics calinski-harabasz-score {source_file} {number_of_clusters}
```

#### Calculate Silhouette Score
```bash Python installation command
cluster-pub-metrics silhouette-score {source_file} {number_of_clusters} --distance-metric={distance_metric}
```

#### To obtain help for the score commands listed above run the following command:
```bash Python installation command
cluster-pub-metrics {score_command} -- help
```

## Background Information 🔍

The default hyperparameters and algorithms used in this project are:

- Word Embeddings Technicque: Hash2Vec
- Dimensionality Reduction Technicque: SVD
- Number of singular values used in SVD: 8
- Clustering Algorithm: Hierarchical Clustering
- Distance Metric: Cosine Similarity
- Linkage Method: Weighted
- Supported Languages: English
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cluster-pub",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.11",
    "maintainer_email": null,
    "keywords": null,
    "author": "felipe barcelos",
    "author_email": "felipebarcelos@ufu.br",
    "download_url": "https://files.pythonhosted.org/packages/88/31/41c35275bd29e6ca9909c69adaa85220d07b213c988b447a739ceaea5c90/cluster_pub-0.2.0.tar.gz",
    "platform": null,
    "description": "# ClusterPub\n\nClusterPub is a tool developed to help researchers in their processes of bibliographic review, \nhelping them to find papers related to their areas of interest,\nbased on search results returned by papers repositories, like, IEEE Xplore and Pubmed.\n\n## Instalation \ud83d\udee0\n\nTo install and execute ClusterPub it is necessary to have Python 3.11 or above installed.\n\n## Run ClusterPub \ud83d\ude80\n\nTo execute ClusterPub run the following command:\n\n#### Cluster publications present in a bibliographic file\n```bash Python installation command\ncluster-pub {source_file} {result_file}\n```\n\n#### OBS: The result_file name should contain the desired extension.\n\nThe allowed extensions for the source file are:\n\n- NBIB\n- RIS\n- BibTex\n\nThe allowed extensions for the result file are:\n\n- EPS\n- JPEG\n- PDF\n- PGF\n- PNG\n- PS\n- Raw (Binary)\n- RGBA\n- SVG\n- SVGZ\n- TIF\n- TIFF\n- Webp\n\n\n#### To obtain help about the parameters and options available execute the following command:\n```bash Python installation command\ncluster-pub --help\n```\n\nThere is a folder in the project directory called sample_files, containing files that could be used to execute tests.\n\n## Extract Clustering Metrics  \ud83d\udcc8\n\nTo calculate clustering metrics, like, Silhouette Score, Davies-Bouldin Score and Calinski-Harabasz Score run the following commands:\n\nOBS: The argument number_of_clusters is not the desired clusters quantity,\nbut it is the quantity of clusters/categories that might exit in the analysed dataset.\n\n#### Calculate Davies-Bouldin Score\n```bash Python installation command\ncluster-pub-metrics davies-bouldin-score {source_file} {number_of_clusters}\n```\n\n#### Calculate Calinski-Harabasz Score\n```bash Python installation command\ncluster-pub-metrics calinski-harabasz-score {source_file} {number_of_clusters}\n```\n\n#### Calculate Silhouette Score\n```bash Python installation command\ncluster-pub-metrics silhouette-score {source_file} {number_of_clusters} --distance-metric={distance_metric}\n```\n\n#### To obtain help for the score commands listed above run the following command:\n```bash Python installation command\ncluster-pub-metrics {score_command} -- help\n```\n\n## Background Information \ud83d\udd0d\n\nThe default hyperparameters and algorithms used in this project are:\n\n- Word Embeddings Technicque: Hash2Vec\n- Dimensionality Reduction Technicque: SVD\n- Number of singular values used in SVD: 8\n- Clustering Algorithm: Hierarchical Clustering\n- Distance Metric: Cosine Similarity\n- Linkage Method: Weighted\n- Supported Languages: English",
    "bugtrack_url": null,
    "license": null,
    "summary": "CLI to cluster scientific papers",
    "version": "0.2.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "af4609f3162c05917118cfb613c409cc2355c9de757057d87ae4a78e232cddba",
                "md5": "9cc89bad2ae9fefeac416de660982b69",
                "sha256": "c22b192badfde47f37bcbb43a338e28c649bf978838eaeefc373bdc95f166083"
            },
            "downloads": -1,
            "filename": "cluster_pub-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9cc89bad2ae9fefeac416de660982b69",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.11",
            "size": 17124,
            "upload_time": "2024-10-26T14:07:50",
            "upload_time_iso_8601": "2024-10-26T14:07:50.715917Z",
            "url": "https://files.pythonhosted.org/packages/af/46/09f3162c05917118cfb613c409cc2355c9de757057d87ae4a78e232cddba/cluster_pub-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "883141c35275bd29e6ca9909c69adaa85220d07b213c988b447a739ceaea5c90",
                "md5": "403873341f1b4bf93f14e13437cbb175",
                "sha256": "e4ef3fa30903c711bdccca9a0a64844628a702be9f0006e300e0b7d52cc36fcc"
            },
            "downloads": -1,
            "filename": "cluster_pub-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "403873341f1b4bf93f14e13437cbb175",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.11",
            "size": 21900,
            "upload_time": "2024-10-26T14:07:52",
            "upload_time_iso_8601": "2024-10-26T14:07:52.181215Z",
            "url": "https://files.pythonhosted.org/packages/88/31/41c35275bd29e6ca9909c69adaa85220d07b213c988b447a739ceaea5c90/cluster_pub-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-26 14:07:52",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "cluster-pub"
}
        
Elapsed time: 1.80682s