taxadb2


Nametaxadb2 JSON
Version 0.12.3 PyPI version JSON
download
home_pageNone
SummaryLocally query the NCBI taxonomy
upload_time2025-02-19 12:10:11
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords ncbi taxonomy
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Taxadb2

[![Documentation Status](https://readthedocs.org/projects/taxadb2/badge/?version=latest)](http://taxadb.readthedocs.io/en/latest/?badge=latest)
[![made-with-python](https://img.shields.io/badge/made%20with-python3-blue.svg)](https://www.python.org/)
[![PyPI version](https://badge.fury.io/py/taxadb2.svg)](https://pypi.org/project/taxadb2/)
[![LICENSE](https://img.shields.io/badge/license-MIT-lightgrey.svg)](https://github.com/kullrich/taxadb2)

Taxadb2 is an application to locally query the ncbi taxonomy. Taxadb2 is written in python, and access its database using the [peewee](http://peewee.readthedocs.io) library.

Taxadb2 is a fork from [https://github.com/HadrienG/taxadb](https://github.com/HadrienG/taxadb) and handles the `merged.dmp` ncbi taxonomy file to deal with updated taxIDs.

* the built-in support for [MySQL](https://www.mysql.com) and [PostgreSQL](https://www.postgresql.org) was not touched and kept as it is
* `merged.dmp` support was added

In brief Taxadb2:

* is a small tool to query the [ncbi](https://ncbi.nlm.nih.gov/taxonomy) taxonomy.
* is written in python >= 3.10.
* has built-in support for [SQLite](https://www.sqlite.org), [MySQL](https://www.mysql.com) and [PostgreSQL](https://www.postgresql.org).
* has available pre-built SQLite databases.
* has a comprehensive API documentation.


## Installation

Taxadb2 requires python >= 3.10 to work. To install taxadb2 with sqlite support, simply type the following in your terminal:

    pip3 install taxadb2

If you wish to use MySQL or PostgreSQL, please refer to the full [documentation](http://taxadb2.readthedocs.io/en/latest/)

## Usage

### Querying the Database

Firstly, make sure you have [built](#creating-the-database) the database

Below you can find basic examples. For more complete examples, please refer to the complete [API documentation](http://taxadb2.readthedocs.io/en/latest/)

```python
    >>> from taxadb2.taxid import TaxID
    >>> from taxadb2.names import SciName
    >>> from taxadb2.accessionid import AccessionID
    >>> dbname = "taxadb2/test/test_db.sqlite"
    >>> ncbi = {
    >>>    'taxid': TaxID(dbtype='sqlite', dbname=dbname),
    >>>    'names': SciName(dbtype='sqlite', dbname=dbname),
    >>>    'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)
    >>> }

    >>> taxid2name = ncbi['taxid'].sci_name(2)
    >>> print(taxid2name)
    Bacteria
    >>> lineage = ncbi['taxid'].lineage_name(17)
    >>> print(lineage[:5])
    ['Methylophilus methylotrophus', 'Methylophilus', 'Methylophilaceae', 'Nitrosomonadales', 'Betaproteobacteria']
    >>> lineage = ncbi['taxid'].lineage_name(17, reverse=True)
    >>> print(lineage[:5])
    ['cellular organisms', 'Bacteria', 'Pseudomonadati', 'Pseudomonadota', 'Betaproteobacteria']

    >>> ncbi['taxid'].has_parent(17, 'Bacteria')
    True
```

Get the taxid from a scientific name.

```python
    >>> from taxadb2.taxid import TaxID
    >>> from taxadb2.names import SciName
    >>> from taxadb2.accessionid import AccessionID
    >>> dbname = "taxadb2/test/test_db.sqlite"
    >>> ncbi = {
    >>>    'taxid': TaxID(dbtype='sqlite', dbname=dbname),
    >>>    'names': SciName(dbtype='sqlite', dbname=dbname),
    >>>    'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)
    >>> }
    
    >>> name2taxid = ncbi['names'].taxid('Pseudomonadota')
    >>> print(name2taxid)
    1224
```

Automatic detection of `old` taxIDs imported from `merged.dmp`.


```python
    >>> from taxadb2.taxid import TaxID
    >>> from taxadb2.names import SciName
    >>> from taxadb2.accessionid import AccessionID
    >>> dbname = "taxadb2/test/test_db.sqlite"
    >>> ncbi = {
    >>>    'taxid': TaxID(dbtype='sqlite', dbname=dbname),
    >>>    'names': SciName(dbtype='sqlite', dbname=dbname),
    >>>    'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)
    >>> }

    >>> taxid2name = ncbi['taxid'].sci_name(30)
    TaxID 30 is deprecated, using 29 instead.
    >>> print(taxid2name)
    Myxococcales
```

Get the taxonomic information for accession number(s).

```python
    >>> from taxadb2.taxid import TaxID
    >>> from taxadb2.names import SciName
    >>> from taxadb2.accessionid import AccessionID
    >>> dbname = "taxadb2/test/test_db.sqlite"
    >>> ncbi = {
    >>>    'taxid': TaxID(dbtype='sqlite', dbname=dbname),
    >>>    'names': SciName(dbtype='sqlite', dbname=dbname),
    >>>    'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)
    >>> }

    >>> my_accessions = ['A01460']
    >>> taxids = ncbi['accessionid'].taxid(my_accessions)
    >>> taxids
    <generator object AccessionID.taxid at 0x103e21bd0>
    >>> for ti in taxids:
        print(ti)
    ('A01460', 17)
```

You can also use a configuration file in order to automatically set database connection parameters at object build. Either set config parameter to __init__ object method:

```python
    >>> from taxadb2.taxid import TaxID
    >>> from taxadb2.names import SciName
    >>> from taxadb2.accessionid import AccessionID
    >>> config_path = "taxadb2/test/taxadb2.cfg"
    >>> ncbi = {
    >>>    'taxid': TaxID(config=config_path),
    >>>    'names': SciName(config=config_path),
    >>>    'accessionid': AccessionID(config=config_path)
    >>> }

    >>> ncbi['taxid'].sci_name(2)
    Bacteria
    >>> ...
```

or set environment variable TAXADB_CONFIG which point to configuration file:

```bash
    $ export TAXADB2_CONFIG='taxadb2/test/taxadb2.cfg'
```

```python
    >>> from taxadb2.taxid import TaxID
    >>> from taxadb2.names import SciName
    >>> from taxadb2.accessionid import AccessionID
    >>> ncbi = {
    >>>    'taxid': TaxID(),
    >>>    'names': SciName(),
    >>>    'accessionid': AccessionID()
    >>> }

    >>> ncbi['taxid'].sci_name(2)
    Bacteria
    >>> ...
```

Check documentation for more information.

### Creating the Database

#### Download data

The following commands will download the necessary files from the [ncbi ftp](https://ftp.ncbi.nlm.nih.gov/) into the directory `taxadb`.
```
$ taxadb2 download --outdir taxadb --type taxa
```

#### Insert data

##### SQLite

```
$ taxadb2 create --division taxa --input taxadb --dbname taxadb.sqlite
```
You can then safely remove the downloaded files
```
$ rm -r taxadb
```

You can easily rerun the same command, `taxadb2` is able to skip already inserted `taxid` as well as `accession`.

## Tests

**Note:** Relies on the `pytest` module. `pip install pytest`

You can easily run some tests. Go to the root directory of this projects `cd /path/to/taxadb2` and run
`pytest -v`.

This simple command will run tests against an `SQLite` test database called `test_db.sqlite` located in `taxadb2/test`
directory.

It is also possible to only run tests related to accessionid or taxid as follow
```
$ pytest -m 'taxid'
$ pytest -m 'accessionid'
```

You can also use the configuration file located in root distribution `taxadb2.ini` as follow. This file should contain
database connection settings:
```
$ pytest taxadb2/test --config='taxadb2.ini'
```

## License

Code is under the [MIT](LICENSE) license.

## Issues

Found a bug or have a question? Please open an [issue](https://github.com/kullrich/taxadb2/issues)

## Contributing

Thought about a new feature that you'd like us to implement? Open an [issue](https://github.com/kullrich/taxadb2/issues) or fork the repository and submit a [pull request](https://github.com/kullrich/taxadb2/pulls)

## Code of Conduct - Participation guidelines

This repository adhere to [Contributor Covenant](http://contributor-covenant.org) code of conduct for in any interactions you have within this project. (see [Code of Conduct](https://github.com/kullrich/taxadb2/blob/devel/CODE_OF_CONDUCT.md))

See also the policy against sexualized discrimination, harassment and violence for the Max Planck Society [Code-of-Conduct](https://www.mpg.de/11961177/code-of-conduct-en.pdf).

By contributing to this project, you agree to abide by its terms.

## References

https://github.com/HadrienG/taxadb


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "taxadb2",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Kristian K Ullrich <ullrich@evolbio.mpg.de>",
    "keywords": "ncbi, taxonomy",
    "author": null,
    "author_email": "Kristian K Ullrich <ullrich@evolbio.mpg.de>, Hadrien Gourl\u00e9 <hadrien.gourle@slu.se>, Juliette Hayer <juliette.hayer@slu.se>, Emmanuel Quevillon <tuco@pasteur.fr>",
    "download_url": "https://files.pythonhosted.org/packages/90/d7/d0bbe21dc4f559c9eb5c381350c98864d69c720e18346718c8d765718422/taxadb2-0.12.3.tar.gz",
    "platform": null,
    "description": "# Taxadb2\n\n[![Documentation Status](https://readthedocs.org/projects/taxadb2/badge/?version=latest)](http://taxadb.readthedocs.io/en/latest/?badge=latest)\n[![made-with-python](https://img.shields.io/badge/made%20with-python3-blue.svg)](https://www.python.org/)\n[![PyPI version](https://badge.fury.io/py/taxadb2.svg)](https://pypi.org/project/taxadb2/)\n[![LICENSE](https://img.shields.io/badge/license-MIT-lightgrey.svg)](https://github.com/kullrich/taxadb2)\n\nTaxadb2 is an application to locally query the ncbi taxonomy. Taxadb2 is written in python, and access its database using the [peewee](http://peewee.readthedocs.io) library.\n\nTaxadb2 is a fork from [https://github.com/HadrienG/taxadb](https://github.com/HadrienG/taxadb) and handles the `merged.dmp` ncbi taxonomy file to deal with updated taxIDs.\n\n* the built-in support for [MySQL](https://www.mysql.com) and [PostgreSQL](https://www.postgresql.org) was not touched and kept as it is\n* `merged.dmp` support was added\n\nIn brief Taxadb2:\n\n* is a small tool to query the [ncbi](https://ncbi.nlm.nih.gov/taxonomy) taxonomy.\n* is written in python >= 3.10.\n* has built-in support for [SQLite](https://www.sqlite.org), [MySQL](https://www.mysql.com) and [PostgreSQL](https://www.postgresql.org).\n* has available pre-built SQLite databases.\n* has a comprehensive API documentation.\n\n\n## Installation\n\nTaxadb2 requires python >= 3.10 to work. To install taxadb2 with sqlite support, simply type the following in your terminal:\n\n    pip3 install taxadb2\n\nIf you wish to use MySQL or PostgreSQL, please refer to the full [documentation](http://taxadb2.readthedocs.io/en/latest/)\n\n## Usage\n\n### Querying the Database\n\nFirstly, make sure you have [built](#creating-the-database) the database\n\nBelow you can find basic examples. For more complete examples, please refer to the complete [API documentation](http://taxadb2.readthedocs.io/en/latest/)\n\n```python\n    >>> from taxadb2.taxid import TaxID\n    >>> from taxadb2.names import SciName\n    >>> from taxadb2.accessionid import AccessionID\n    >>> dbname = \"taxadb2/test/test_db.sqlite\"\n    >>> ncbi = {\n    >>>    'taxid': TaxID(dbtype='sqlite', dbname=dbname),\n    >>>    'names': SciName(dbtype='sqlite', dbname=dbname),\n    >>>    'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)\n    >>> }\n\n    >>> taxid2name = ncbi['taxid'].sci_name(2)\n    >>> print(taxid2name)\n    Bacteria\n    >>> lineage = ncbi['taxid'].lineage_name(17)\n    >>> print(lineage[:5])\n    ['Methylophilus methylotrophus', 'Methylophilus', 'Methylophilaceae', 'Nitrosomonadales', 'Betaproteobacteria']\n    >>> lineage = ncbi['taxid'].lineage_name(17, reverse=True)\n    >>> print(lineage[:5])\n    ['cellular organisms', 'Bacteria', 'Pseudomonadati', 'Pseudomonadota', 'Betaproteobacteria']\n\n    >>> ncbi['taxid'].has_parent(17, 'Bacteria')\n    True\n```\n\nGet the taxid from a scientific name.\n\n```python\n    >>> from taxadb2.taxid import TaxID\n    >>> from taxadb2.names import SciName\n    >>> from taxadb2.accessionid import AccessionID\n    >>> dbname = \"taxadb2/test/test_db.sqlite\"\n    >>> ncbi = {\n    >>>    'taxid': TaxID(dbtype='sqlite', dbname=dbname),\n    >>>    'names': SciName(dbtype='sqlite', dbname=dbname),\n    >>>    'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)\n    >>> }\n    \n    >>> name2taxid = ncbi['names'].taxid('Pseudomonadota')\n    >>> print(name2taxid)\n    1224\n```\n\nAutomatic detection of `old` taxIDs imported from `merged.dmp`.\n\n\n```python\n    >>> from taxadb2.taxid import TaxID\n    >>> from taxadb2.names import SciName\n    >>> from taxadb2.accessionid import AccessionID\n    >>> dbname = \"taxadb2/test/test_db.sqlite\"\n    >>> ncbi = {\n    >>>    'taxid': TaxID(dbtype='sqlite', dbname=dbname),\n    >>>    'names': SciName(dbtype='sqlite', dbname=dbname),\n    >>>    'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)\n    >>> }\n\n    >>> taxid2name = ncbi['taxid'].sci_name(30)\n    TaxID 30 is deprecated, using 29 instead.\n    >>> print(taxid2name)\n    Myxococcales\n```\n\nGet the taxonomic information for accession number(s).\n\n```python\n    >>> from taxadb2.taxid import TaxID\n    >>> from taxadb2.names import SciName\n    >>> from taxadb2.accessionid import AccessionID\n    >>> dbname = \"taxadb2/test/test_db.sqlite\"\n    >>> ncbi = {\n    >>>    'taxid': TaxID(dbtype='sqlite', dbname=dbname),\n    >>>    'names': SciName(dbtype='sqlite', dbname=dbname),\n    >>>    'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)\n    >>> }\n\n    >>> my_accessions = ['A01460']\n    >>> taxids = ncbi['accessionid'].taxid(my_accessions)\n    >>> taxids\n    <generator object AccessionID.taxid at 0x103e21bd0>\n    >>> for ti in taxids:\n        print(ti)\n    ('A01460', 17)\n```\n\nYou can also use a configuration file in order to automatically set database connection parameters at object build. Either set config parameter to __init__ object method:\n\n```python\n    >>> from taxadb2.taxid import TaxID\n    >>> from taxadb2.names import SciName\n    >>> from taxadb2.accessionid import AccessionID\n    >>> config_path = \"taxadb2/test/taxadb2.cfg\"\n    >>> ncbi = {\n    >>>    'taxid': TaxID(config=config_path),\n    >>>    'names': SciName(config=config_path),\n    >>>    'accessionid': AccessionID(config=config_path)\n    >>> }\n\n    >>> ncbi['taxid'].sci_name(2)\n    Bacteria\n    >>> ...\n```\n\nor set environment variable TAXADB_CONFIG which point to configuration file:\n\n```bash\n    $ export TAXADB2_CONFIG='taxadb2/test/taxadb2.cfg'\n```\n\n```python\n    >>> from taxadb2.taxid import TaxID\n    >>> from taxadb2.names import SciName\n    >>> from taxadb2.accessionid import AccessionID\n    >>> ncbi = {\n    >>>    'taxid': TaxID(),\n    >>>    'names': SciName(),\n    >>>    'accessionid': AccessionID()\n    >>> }\n\n    >>> ncbi['taxid'].sci_name(2)\n    Bacteria\n    >>> ...\n```\n\nCheck documentation for more information.\n\n### Creating the Database\n\n#### Download data\n\nThe following commands will download the necessary files from the [ncbi ftp](https://ftp.ncbi.nlm.nih.gov/) into the directory `taxadb`.\n```\n$ taxadb2 download --outdir taxadb --type taxa\n```\n\n#### Insert data\n\n##### SQLite\n\n```\n$ taxadb2 create --division taxa --input taxadb --dbname taxadb.sqlite\n```\nYou can then safely remove the downloaded files\n```\n$ rm -r taxadb\n```\n\nYou can easily rerun the same command, `taxadb2` is able to skip already inserted `taxid` as well as `accession`.\n\n## Tests\n\n**Note:** Relies on the `pytest` module. `pip install pytest`\n\nYou can easily run some tests. Go to the root directory of this projects `cd /path/to/taxadb2` and run\n`pytest -v`.\n\nThis simple command will run tests against an `SQLite` test database called `test_db.sqlite` located in `taxadb2/test`\ndirectory.\n\nIt is also possible to only run tests related to accessionid or taxid as follow\n```\n$ pytest -m 'taxid'\n$ pytest -m 'accessionid'\n```\n\nYou can also use the configuration file located in root distribution `taxadb2.ini` as follow. This file should contain\ndatabase connection settings:\n```\n$ pytest taxadb2/test --config='taxadb2.ini'\n```\n\n## License\n\nCode is under the [MIT](LICENSE) license.\n\n## Issues\n\nFound a bug or have a question? Please open an [issue](https://github.com/kullrich/taxadb2/issues)\n\n## Contributing\n\nThought about a new feature that you'd like us to implement? Open an [issue](https://github.com/kullrich/taxadb2/issues) or fork the repository and submit a [pull request](https://github.com/kullrich/taxadb2/pulls)\n\n## Code of Conduct - Participation guidelines\n\nThis repository adhere to [Contributor Covenant](http://contributor-covenant.org) code of conduct for in any interactions you have within this project. (see [Code of Conduct](https://github.com/kullrich/taxadb2/blob/devel/CODE_OF_CONDUCT.md))\n\nSee also the policy against sexualized discrimination, harassment and violence for the Max Planck Society [Code-of-Conduct](https://www.mpg.de/11961177/code-of-conduct-en.pdf).\n\nBy contributing to this project, you agree to abide by its terms.\n\n## References\n\nhttps://github.com/HadrienG/taxadb\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Locally query the NCBI taxonomy",
    "version": "0.12.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/kullrich/taxadb2/issues",
        "Homepage": "https://github.com/kullrich/taxadb2",
        "documentation": "https://taxadb2.readthedocs.io/en/latest/",
        "repository": "https://github.com/kullrich/taxadb2"
    },
    "split_keywords": [
        "ncbi",
        " taxonomy"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "37f97764175d85953c622467e8a7fd6907bc80a294b4bd8ef2b27cc7cbaeb676",
                "md5": "bac982fa9e88eb5ee0a160e23ca2cfe4",
                "sha256": "a259ae7afac435e9ea4b1bb3d6bc0ba71bc573250dda7a50a8c4c2b7e7c3eb38"
            },
            "downloads": -1,
            "filename": "taxadb2-0.12.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bac982fa9e88eb5ee0a160e23ca2cfe4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 26062,
            "upload_time": "2025-02-19T12:10:08",
            "upload_time_iso_8601": "2025-02-19T12:10:08.948888Z",
            "url": "https://files.pythonhosted.org/packages/37/f9/7764175d85953c622467e8a7fd6907bc80a294b4bd8ef2b27cc7cbaeb676/taxadb2-0.12.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "90d7d0bbe21dc4f559c9eb5c381350c98864d69c720e18346718c8d765718422",
                "md5": "02c6ea2d5fdcbb1c10649dc70060bcc4",
                "sha256": "c3f8b4add73de45f599e5c3e3aeecc7b0982159f9df471311f0df9d2f7bdb322"
            },
            "downloads": -1,
            "filename": "taxadb2-0.12.3.tar.gz",
            "has_sig": false,
            "md5_digest": "02c6ea2d5fdcbb1c10649dc70060bcc4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 24358,
            "upload_time": "2025-02-19T12:10:11",
            "upload_time_iso_8601": "2025-02-19T12:10:11.253641Z",
            "url": "https://files.pythonhosted.org/packages/90/d7/d0bbe21dc4f559c9eb5c381350c98864d69c720e18346718c8d765718422/taxadb2-0.12.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-19 12:10:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kullrich",
    "github_project": "taxadb2",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "taxadb2"
}
        
Elapsed time: 0.40949s