psicalc


Namepsicalc JSON
Version 0.6.1 PyPI version JSON
download
home_pageNone
SummaryAlgorithm for clustering protein multiple sequence alignments using normalized mutual information.
upload_time2024-04-08 23:08:36
maintainerNone
docs_urlNone
authorNone
requires_python>=3.6
licenseNone
keywords bioinformatics
VCS
bugtrack_url
requirements numpy pandas scikit-learn scipy
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PSICalc Algorithm Package

This is a package for clustering Multiple Sequence Alignments (MSAs) utilizing normalized mutual information to examine protein subdomains. A complete data visualization tool for psicalc is available on the releases page.

As an example:

```
import psicalc as pc

file = "<your_fasta_file>" # e.g "PF02517_seed.txt"

data = pc.read_txt_file_format(file) # read Fasta file

data = pc.durston_schema(data, 1) # Label column index starting at 1

# If you have multiple sequences or labels, merge them first
data = pc.merge_sequences([data], ['HIST'])

result = pc.find_clusters(1, data) # will sample every column against msa

# Optionally write dictionary to csv
pc.write_output_data(1, result)
```

The program will run and return a csv or xlsx file with the strongest clusters found in the MSA provided.

Our initial publication can be found here: https://academic.oup.com/bioinformaticsadvances/article/2/1/vbac058/6671262

Following our initial publication, the program was found to associate invariant columns with variable columns in some cases. It was determined that the invariant columns were causing an issue, and due to their low entropy, invariant or nearly invariant positions offered little information in the way of meaningful clustering. Therefore, in the latest version (0.5.1 and beyond), we have added the ability to filter out low entropy columns using a sliding scale from 0-0.25 (0-25%) entropy where entropy is the number of different amino acids found in a column along with the number of occurrences of each amino acid. Invariant columns (i.e., those with only one amino acid) have an entropy of 0. A report of the columns removed due to low entropy is included with the output data file.
As a result of these changes, data run using this latest version will not match what was found in our initial paper, but should represent clusters based upon meaningful relationships. In all cases, researchers are advised to inspect the outputs to confirm the associations are meaningful.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "psicalc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "bioinformatics",
    "author": null,
    "author_email": "Joe Deweese Lab <jdeweeselab@gmail.com>, Thomas Townsley <thomas@mandosoft.dev>, Marc Soda <4rqh5y4p@duck.com>",
    "download_url": "https://files.pythonhosted.org/packages/14/a0/a22e47602404d22f191ae02d98d1d0edcdf563be63c8ce314e238cc902ef/psicalc-0.6.1.tar.gz",
    "platform": null,
    "description": "# PSICalc Algorithm Package\n\nThis is a package for clustering Multiple Sequence Alignments (MSAs) utilizing normalized mutual information to examine protein subdomains. A complete data visualization tool for psicalc is available on the releases page.\n\nAs an example:\n\n```\nimport psicalc as pc\n\nfile = \"<your_fasta_file>\" # e.g \"PF02517_seed.txt\"\n\ndata = pc.read_txt_file_format(file) # read Fasta file\n\ndata = pc.durston_schema(data, 1) # Label column index starting at 1\n\n# If you have multiple sequences or labels, merge them first\ndata = pc.merge_sequences([data], ['HIST'])\n\nresult = pc.find_clusters(1, data) # will sample every column against msa\n\n# Optionally write dictionary to csv\npc.write_output_data(1, result)\n```\n\nThe program will run and return a csv or xlsx file with the strongest clusters found in the MSA provided.\n\nOur initial publication can be found here: https://academic.oup.com/bioinformaticsadvances/article/2/1/vbac058/6671262\n\nFollowing our initial publication, the program was found to associate invariant columns with variable columns in some cases. It was determined that the invariant columns were causing an issue, and due to their low entropy, invariant or nearly invariant positions offered little information in the way of meaningful clustering. Therefore, in the latest version (0.5.1 and beyond), we have added the ability to filter out low entropy columns using a sliding scale from 0-0.25 (0-25%) entropy where entropy is the number of different amino acids found in a column along with the number of occurrences of each amino acid. Invariant columns (i.e., those with only one amino acid) have an entropy of 0. A report of the columns removed due to low entropy is included with the output data file.\nAs a result of these changes, data run using this latest version will not match what was found in our initial paper, but should represent clusters based upon meaningful relationships. In all cases, researchers are advised to inspect the outputs to confirm the associations are meaningful.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Algorithm for clustering protein multiple sequence alignments using normalized mutual information.",
    "version": "0.6.1",
    "project_urls": {
        "Homepage": "https://github.com/jdeweeselab/psicalc-package"
    },
    "split_keywords": [
        "bioinformatics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2975208fecb2b35850dedb9badb40c045f7c49cd1b21b68ac84aa4a34a7b0c4c",
                "md5": "85141c42ea928f9c1ff3c41035fc5051",
                "sha256": "8754c1943ebbd6b6e3758aa797abe0c067536ace9170d8fb85900e27e1718cf9"
            },
            "downloads": -1,
            "filename": "psicalc-0.6.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "85141c42ea928f9c1ff3c41035fc5051",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 13079,
            "upload_time": "2024-04-08T23:08:34",
            "upload_time_iso_8601": "2024-04-08T23:08:34.255279Z",
            "url": "https://files.pythonhosted.org/packages/29/75/208fecb2b35850dedb9badb40c045f7c49cd1b21b68ac84aa4a34a7b0c4c/psicalc-0.6.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "14a0a22e47602404d22f191ae02d98d1d0edcdf563be63c8ce314e238cc902ef",
                "md5": "8eb419fc54c06c7046ff746903852287",
                "sha256": "b98f3a7b14936ab2e6e14b424f363dfca1e2a1a92c4d95ddf2764f652a29cce7"
            },
            "downloads": -1,
            "filename": "psicalc-0.6.1.tar.gz",
            "has_sig": false,
            "md5_digest": "8eb419fc54c06c7046ff746903852287",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 18147,
            "upload_time": "2024-04-08T23:08:36",
            "upload_time_iso_8601": "2024-04-08T23:08:36.110877Z",
            "url": "https://files.pythonhosted.org/packages/14/a0/a22e47602404d22f191ae02d98d1d0edcdf563be63c8ce314e238cc902ef/psicalc-0.6.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-08 23:08:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jdeweeselab",
    "github_project": "psicalc-package",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.26.4"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.1"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.4.1.post1"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.12.0"
                ]
            ]
        }
    ],
    "lcname": "psicalc"
}
        
Elapsed time: 0.25599s