MRItaxonomy


NameMRItaxonomy JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/mriglobal/MRItaxonomy
SummaryMRIGlobal's taxonomy related operators
upload_time2024-03-12 16:51:17
maintainer
docs_urlNone
authorMRIGlobal Bioinformatics Team
requires_python
licenseMIT
keywords mriglobal taxonomy ncbi
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MRItaxonomy - MRIGlobals Taxonomy Library

A compendium of convenient taxonomic related operations interfacing with NCBI

### installation
```
pip install MRItaxonomy
```

### import the whole module, or sub-modules
```
import MRItaxonomy
```
or
```
from MRItaxonomy import NCBI_fetch
from MRItaxonomy import accession2taxid
from MRItaxonomy import nnID
from MRItaxonomy import nucDL
from MRItaxonomy import protacc2taxid
from MRItaxonomy import slidingwindow
from MRItaxonomy import taxid
from MRItaxonomy import taxid2name
```

### NCBI_fetch
functions to intially set up and update the NCBI data pulls. running initialize() once after pip installation is reccomended as this step is time consuming up front, but does not have to be done again unless updating is desired
```
from MRItaxonomy import NCBI_fetch
NCBI_fetch.initialize()
```
to re-pull the latest from NCBI (will not re-download if no change)
```
NCBI_fetch.update()
```

### accession2taxid
contains a method that loads the accession-to-taxid mapping data object, and a function that reports the associated taxid for the passed accession
```
from MRItaxonomy import accession2taxid
accession2taxid.load_trie()    # note: does not have to be called. will automatically be applied the first time get_taxid() is run
accession2taxid.get_taxid(accession)
```

### nnID
contains a method that, from a given taxon, returns a list of taxonomies that are near neighbors to the given taxon
```
from MRItaxonomy import nnID
nnID.get_id(taxon)
```

### nucDL
contains methods to access NCBI's ftp site and download nucleotide records. takes in a taxid, thread count (for parallelism), a working directory, and a database choise between genbank/refseq
```
from MRItaxonomy import nucDL
nucDL.dl(tax, threads, path, db='genbank'/'refseq')
```

### protacc2taxid
works similarly as accession2taxid does, but with protein accessions instead of nucleotide
```
from MRItaxonomy import protacc2taxid
protacc2taxid.load_dataframe()    # note: does not have to be called. will automatically be applied the first time get_taxid() is run
protacc2taxid.get_taxid(prot_accession)
```

### slidingwindow
this module slides a window across a folder of nucleotide records and outputs window-sized reads along the length of the input nucleotide record. can specify what suffix to use for each chunked output (default=.fna)
```
from MRItaxonomy import slidingwindow
slidingwindow.reads_generation(path_of_fasta_folder, window_size=150, extension='.fna')
```

### taxid
this module handles operations having to do with the taxonomic trees via the NCBI nodes.dmp and merged.dmp files
```
from MRItaxonomy import taxid
taxid.load_dbs()    # note: does not have to be called. will automatically be applied the first time another MRItaxonomy.taxid() function needs the databases

taxid.get_parent(taxid)    # returns the parent taxid for the given taxid

taxid.get_rank(taixd)    # returns the rank of the given taxid (superkingdom, kingdom, phylum, class, order, family, genus, species)

taxid.getnodeatrank(taxid, selected rank)    # returns the taxid at the taxonomic rank (superkingdom, kingdom, phylum, class, order, family, genus, species) for the given taxid

taxid.get_merge(taxid)    # if the taxid is in the nodes.dmp database, returns the taxid. otherwise, if it's in the merged database, return the associated merged.dmp entry. If neither is true, returns 0.
```


### taxid2name
this module takes in a taxid and returns the associated scientific name
```
from MRItaxonomy import taxid2name
taxid.get_name(taxid)
```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mriglobal/MRItaxonomy",
    "name": "MRItaxonomy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "mriglobal taxonomy ncbi",
    "author": "MRIGlobal Bioinformatics Team",
    "author_email": "biofx@mriglobal.org",
    "download_url": "https://files.pythonhosted.org/packages/58/a6/b94d8e0b99dde48898ff0815bac5c81e9a243655f93c44a2991e46f8d6fc/MRItaxonomy-1.0.0.tar.gz",
    "platform": null,
    "description": "# MRItaxonomy - MRIGlobals Taxonomy Library\n\nA compendium of convenient taxonomic related operations interfacing with NCBI\n\n### installation\n```\npip install MRItaxonomy\n```\n\n### import the whole module, or sub-modules\n```\nimport MRItaxonomy\n```\nor\n```\nfrom MRItaxonomy import NCBI_fetch\nfrom MRItaxonomy import accession2taxid\nfrom MRItaxonomy import nnID\nfrom MRItaxonomy import nucDL\nfrom MRItaxonomy import protacc2taxid\nfrom MRItaxonomy import slidingwindow\nfrom MRItaxonomy import taxid\nfrom MRItaxonomy import taxid2name\n```\n\n### NCBI_fetch\nfunctions to intially set up and update the NCBI data pulls. running initialize() once after pip installation is reccomended as this step is time consuming up front, but does not have to be done again unless updating is desired\n```\nfrom MRItaxonomy import NCBI_fetch\nNCBI_fetch.initialize()\n```\nto re-pull the latest from NCBI (will not re-download if no change)\n```\nNCBI_fetch.update()\n```\n\n### accession2taxid\ncontains a method that loads the accession-to-taxid mapping data object, and a function that reports the associated taxid for the passed accession\n```\nfrom MRItaxonomy import accession2taxid\naccession2taxid.load_trie()    # note: does not have to be called. will automatically be applied the first time get_taxid() is run\naccession2taxid.get_taxid(accession)\n```\n\n### nnID\ncontains a method that, from a given taxon, returns a list of taxonomies that are near neighbors to the given taxon\n```\nfrom MRItaxonomy import nnID\nnnID.get_id(taxon)\n```\n\n### nucDL\ncontains methods to access NCBI's ftp site and download nucleotide records. takes in a taxid, thread count (for parallelism), a working directory, and a database choise between genbank/refseq\n```\nfrom MRItaxonomy import nucDL\nnucDL.dl(tax, threads, path, db='genbank'/'refseq')\n```\n\n### protacc2taxid\nworks similarly as accession2taxid does, but with protein accessions instead of nucleotide\n```\nfrom MRItaxonomy import protacc2taxid\nprotacc2taxid.load_dataframe()    # note: does not have to be called. will automatically be applied the first time get_taxid() is run\nprotacc2taxid.get_taxid(prot_accession)\n```\n\n### slidingwindow\nthis module slides a window across a folder of nucleotide records and outputs window-sized reads along the length of the input nucleotide record. can specify what suffix to use for each chunked output (default=.fna)\n```\nfrom MRItaxonomy import slidingwindow\nslidingwindow.reads_generation(path_of_fasta_folder, window_size=150, extension='.fna')\n```\n\n### taxid\nthis module handles operations having to do with the taxonomic trees via the NCBI nodes.dmp and merged.dmp files\n```\nfrom MRItaxonomy import taxid\ntaxid.load_dbs()    # note: does not have to be called. will automatically be applied the first time another MRItaxonomy.taxid() function needs the databases\n\ntaxid.get_parent(taxid)    # returns the parent taxid for the given taxid\n\ntaxid.get_rank(taixd)    # returns the rank of the given taxid (superkingdom, kingdom, phylum, class, order, family, genus, species)\n\ntaxid.getnodeatrank(taxid, selected rank)    # returns the taxid at the taxonomic rank (superkingdom, kingdom, phylum, class, order, family, genus, species) for the given taxid\n\ntaxid.get_merge(taxid)    # if the taxid is in the nodes.dmp database, returns the taxid. otherwise, if it's in the merged database, return the associated merged.dmp entry. If neither is true, returns 0.\n```\n\n\n### taxid2name\nthis module takes in a taxid and returns the associated scientific name\n```\nfrom MRItaxonomy import taxid2name\ntaxid.get_name(taxid)\n```\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "MRIGlobal's taxonomy related operators",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/mriglobal/MRItaxonomy"
    },
    "split_keywords": [
        "mriglobal",
        "taxonomy",
        "ncbi"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "66a96309254d8a571a0b06838f3dd2e7da1d6ecf11753c67b89eaa39a2db4c6c",
                "md5": "008fea967b72cdceec7c477df484a008",
                "sha256": "f2724c5df2dc1698b935f08d548b2cf7decf605e29ff93ca78a41e2948771c30"
            },
            "downloads": -1,
            "filename": "MRItaxonomy-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "008fea967b72cdceec7c477df484a008",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 24017,
            "upload_time": "2024-03-12T16:51:14",
            "upload_time_iso_8601": "2024-03-12T16:51:14.317876Z",
            "url": "https://files.pythonhosted.org/packages/66/a9/6309254d8a571a0b06838f3dd2e7da1d6ecf11753c67b89eaa39a2db4c6c/MRItaxonomy-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "58a6b94d8e0b99dde48898ff0815bac5c81e9a243655f93c44a2991e46f8d6fc",
                "md5": "eb066ccc2e3c199086ef269c4a1f8f3c",
                "sha256": "ab2b1c8984b51ffb09b315b455b43996f73532a82b0ba1a805333ad4542297c7"
            },
            "downloads": -1,
            "filename": "MRItaxonomy-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "eb066ccc2e3c199086ef269c4a1f8f3c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 20640,
            "upload_time": "2024-03-12T16:51:17",
            "upload_time_iso_8601": "2024-03-12T16:51:17.068058Z",
            "url": "https://files.pythonhosted.org/packages/58/a6/b94d8e0b99dde48898ff0815bac5c81e9a243655f93c44a2991e46f8d6fc/MRItaxonomy-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-12 16:51:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mriglobal",
    "github_project": "MRItaxonomy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "mritaxonomy"
}
        
Elapsed time: 0.20011s