taxomatcher


Nametaxomatcher JSON
Version 0.2 PyPI version JSON
download
home_pagehttps://github.com/Lswhiteh/taxomatcher
SummaryA tool for parsing and extracting taxon synonyms.
upload_time2023-05-05 19:26:14
maintainer
docs_urlNone
authorLogan Whitehouse
requires_python>=3.6
licenseMIT
keywords population genetics phylogenetics taxonomy
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Taxomatcher

Python modules to query the GFIB or NCBI Taxonomy databases for synonym names of target species.

## Installation

Required packages:

- Biopython
- pygbif
- pandas
- tqdm

Easiest installation using a conda environment and pip:

```{bash}
conda create -n tm-env -c conda-forge python
conda activate tm-env
pip install taxomatcher
```

---

## Modes

### GBIF

CSV input file should be a single column of target species to look up (other columns will be ignored). Names can be either space or underscore ("_") separated, i.e. "Sphenodon_punctatus" is equivalent to "Sphenodon punctatus". 

E.g.
```
Sphenodon_punctatus
Gonyosoma_prasinus
```

Usage:

```{bash}
$ taxomatcher gbif -h
usage: taxomatcher gbif [-h] -i INPUT_CSV -o OUTFILE [-t THREADS]

options:
  -h, --help            show this help message and exit
  -i INPUT_CSV, --csv INPUT_CSV
                        CSV where first column is a list of target species names to look up.
  -o OUTFILE, --outfile OUTFILE
                        Path to output.
  -t THREADS, --threads THREADS
```

### NCBI

Due to the specificity of Entrez results and the relative sparseness of taxonomy data a CSV intended for NCBI matching can have multiple columns, assuming the first column is the target species and the remaining columns are prior-known synonyms. All names will be flattened and searched to increase the chances of matches in the database. A single-column CSV file will also work, identically to the GBIF format.

Note: currently the multi-Entrez script assumes the final column in the CSV is for notes. If people decide this is worth fixing I will, but it seems like the GBIF approach is much better across the board.

E.g.

```{text}
Sphenodon_punctatus,Hatteria_punctata
Gonyosoma_prasinus,Coluber_prasinus,Elaphe_prasina,Rhadinophis_prasinus,Rhadinophis_prasina
```

Usage:

```{bash}
$ taxomatcher ncbi -h
usage: taxomatcher ncbi [-h] -i INPUT_CSV -o OUTFILE -e EMAIL

options:
  -h, --help            show this help message and exit
  -i INPUT_CSV, --csv INPUT_CSV
                        CSV where first column is a list of target species names to look up.
  -o OUTFILE, --outfile OUTFILE
                        Path to output.
  -e EMAIL, --email EMAIL
```

### Matching trait files to taxomatcher output

Once you have found synonyms from tree output you can match that to any phenotypic/trait data you have. Similar to the other inputs the first column must contain the species names you're targeting, and they should be formatted identically to how the taxomatcher output looks (case-sensitive and separated by underscores.)

Usage:

```{bash}
$ taxomatcher trait -h
usage: taxomatcher trait [-h] -t TRAITFILE -s SPECIESFILE -o OUTFILE

options:
  -h, --help            show this help message and exit
  -t TRAITFILE, --traitfile TRAITFILE
                        CSV of trait values, first column must be species names.
  -s SPECIESFILE, --speciesfile SPECIESFILE
                        CSV of species synonyms output by the gbif or ncbi modules.
  -o OUTFILE, --outfile OUTFILE
                        Path to output.
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Lswhiteh/taxomatcher",
    "name": "taxomatcher",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "population genetics,phylogenetics,taxonomy",
    "author": "Logan Whitehouse",
    "author_email": "lswhiteh@unc.edu",
    "download_url": "https://files.pythonhosted.org/packages/ae/37/61d793f25f8d8ebcbe5668dd44acacec90ee5336746378a2265c7a874e7c/taxomatcher-0.2.tar.gz",
    "platform": null,
    "description": "# Taxomatcher\n\nPython modules to query the GFIB or NCBI Taxonomy databases for synonym names of target species.\n\n## Installation\n\nRequired packages:\n\n- Biopython\n- pygbif\n- pandas\n- tqdm\n\nEasiest installation using a conda environment and pip:\n\n```{bash}\nconda create -n tm-env -c conda-forge python\nconda activate tm-env\npip install taxomatcher\n```\n\n---\n\n## Modes\n\n### GBIF\n\nCSV input file should be a single column of target species to look up (other columns will be ignored). Names can be either space or underscore (\"_\") separated, i.e. \"Sphenodon_punctatus\" is equivalent to \"Sphenodon punctatus\". \n\nE.g.\n```\nSphenodon_punctatus\nGonyosoma_prasinus\n```\n\nUsage:\n\n```{bash}\n$ taxomatcher gbif -h\nusage: taxomatcher gbif [-h] -i INPUT_CSV -o OUTFILE [-t THREADS]\n\noptions:\n  -h, --help            show this help message and exit\n  -i INPUT_CSV, --csv INPUT_CSV\n                        CSV where first column is a list of target species names to look up.\n  -o OUTFILE, --outfile OUTFILE\n                        Path to output.\n  -t THREADS, --threads THREADS\n```\n\n### NCBI\n\nDue to the specificity of Entrez results and the relative sparseness of taxonomy data a CSV intended for NCBI matching can have multiple columns, assuming the first column is the target species and the remaining columns are prior-known synonyms. All names will be flattened and searched to increase the chances of matches in the database. A single-column CSV file will also work, identically to the GBIF format.\n\nNote: currently the multi-Entrez script assumes the final column in the CSV is for notes. If people decide this is worth fixing I will, but it seems like the GBIF approach is much better across the board.\n\nE.g.\n\n```{text}\nSphenodon_punctatus,Hatteria_punctata\nGonyosoma_prasinus,Coluber_prasinus,Elaphe_prasina,Rhadinophis_prasinus,Rhadinophis_prasina\n```\n\nUsage:\n\n```{bash}\n$ taxomatcher ncbi -h\nusage: taxomatcher ncbi [-h] -i INPUT_CSV -o OUTFILE -e EMAIL\n\noptions:\n  -h, --help            show this help message and exit\n  -i INPUT_CSV, --csv INPUT_CSV\n                        CSV where first column is a list of target species names to look up.\n  -o OUTFILE, --outfile OUTFILE\n                        Path to output.\n  -e EMAIL, --email EMAIL\n```\n\n### Matching trait files to taxomatcher output\n\nOnce you have found synonyms from tree output you can match that to any phenotypic/trait data you have. Similar to the other inputs the first column must contain the species names you're targeting, and they should be formatted identically to how the taxomatcher output looks (case-sensitive and separated by underscores.)\n\nUsage:\n\n```{bash}\n$ taxomatcher trait -h\nusage: taxomatcher trait [-h] -t TRAITFILE -s SPECIESFILE -o OUTFILE\n\noptions:\n  -h, --help            show this help message and exit\n  -t TRAITFILE, --traitfile TRAITFILE\n                        CSV of trait values, first column must be species names.\n  -s SPECIESFILE, --speciesfile SPECIESFILE\n                        CSV of species synonyms output by the gbif or ncbi modules.\n  -o OUTFILE, --outfile OUTFILE\n                        Path to output.\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A tool for parsing and extracting taxon synonyms.",
    "version": "0.2",
    "project_urls": {
        "GitHub": "https://github.com/Lswhiteh/taxomatcher",
        "Homepage": "https://github.com/Lswhiteh/taxomatcher",
        "Issues": "https://github.com/Lswhiteh/taxomatcher/issues"
    },
    "split_keywords": [
        "population genetics",
        "phylogenetics",
        "taxonomy"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ae3761d793f25f8d8ebcbe5668dd44acacec90ee5336746378a2265c7a874e7c",
                "md5": "f1b8a883b13e71d8631f128aa899dc85",
                "sha256": "faa4f915666a055d911b2c2b2d6eee0bca4952bcc49cda8f6ee1cc4d26b2d13d"
            },
            "downloads": -1,
            "filename": "taxomatcher-0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "f1b8a883b13e71d8631f128aa899dc85",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 8939,
            "upload_time": "2023-05-05T19:26:14",
            "upload_time_iso_8601": "2023-05-05T19:26:14.004155Z",
            "url": "https://files.pythonhosted.org/packages/ae/37/61d793f25f8d8ebcbe5668dd44acacec90ee5336746378a2265c7a874e7c/taxomatcher-0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-05 19:26:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Lswhiteh",
    "github_project": "taxomatcher",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "taxomatcher"
}
        
Elapsed time: 0.06866s