biosynonyms


Namebiosynonyms JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/biopragmatics/biosynonyms
SummaryA decentralized database of synonyms for biomedical concepts and entities.
upload_time2024-01-30 13:51:54
maintainerCharles Tapley Hoyt
docs_urlNone
authorCharles Tapley Hoyt
requires_python>=3.8
licenseMIT
keywords databases biological databases biomedical databases
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Biosynonyms

A decentralized database of synonyms for biomedical entities and concepts. This
resource is meant to be complementary to ontologies, databases, and other
controlled vocabularies that provide synonyms. It's released under a permissive
license (CC0), so they can be easily adopted by/contributed back to upstream resources.

Here's how to get the data:

```python
import biosynonyms

# Uses an internal data structure
positive_synonyms = biosynonyms.get_positive_synonyms()
negative_synonyms = biosynonyms.get_negative_synonyms()

# Get ready for use in NER with Gilda, only using positive synonyms
gilda_terms = biosynonyms.get_gilda_terms()
```

### Synonyms

The data are also accessible directly through TSV such that anyone can consume them
from any programming language.

The [`positives.tsv`](src/biosynonyms/resources/positives.tsv) has the following
columns:

1. `text` the synonym text itself
2. `curie` the compact uniform resource identifier (CURIE) for a biomedical
   entity or concept, standardized using the Bioregistry
3. `name` the standard name for the concept
4. `scope` the match type, written as a CURIE from
   the [OBO in OWL (`oio`)](https://bioregistry.io/oio) controlled vocabulary,
   i.e., one of:
    - `oboInOwl:hasExactSynonym`
    - `oboInOwl:hasNarrowSynonym`
    - `oboInOwl:hasBroadSynonym`
    - `oboInOwl:hasRelatedSynonym`
    - `oboInOwl:hasSynonym` (use this if the scope is unknown)
5. `type` the synonym property type, written as a CURIE from
   the [OBO Metadata Ontology (`omo`)](https://bioregistry.io/omo) controlled vocabulary,
   e.g., one of:
    - `OMO:0003000` (abbreviation)
    - `OMO:0003001` (ambiguous synonym)
    - `OMO:0003002` (dubious synonym)
    - `OMO:0003003` (layperson synonym)
    - `OMO:0003004` (plural form)
    - ...
6. `references` a comma-delimited list of CURIEs corresponding to publications
   that use the given synonym (ideally using highly actionable identifiers from
   semantic spaces like [`pubmed`](https://bioregistry.io/pubmed),
   [`pmc`](https://bioregistry.io/pmc), [`doi`](https://bioregistry.io/doi))
7. `contributor` the ORCID identifier of the contributor

Here's an example of some rows in the synonyms table (with linkified CURIEs):

| text                            | curie                                             | scope                                                             | references                                                                                                           | contributor                                                             |
|---------------------------------|---------------------------------------------------|-------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|
| PI(3,4,5)P3                     | [CHEBI:16618](https://bioregistry.io/CHEBI:16618) | [oio:hasExactSynonym](https://bioregistry.io/oio:hasExactSynonym) | [pubmed:29623928](https://bioregistry.io/pubmed:29623928), [pubmed:20817957](https://bioregistry.io/pubmed:20817957) | [0000-0003-4423-4370](https://bioregistry.io/orcid:0000-0003-4423-4370) |
| phosphatidylinositol (3,4,5) P3 | [CHEBI:16618](https://bioregistry.io/CHEBI:16618) | [oio:hasExactSynonym](https://bioregistry.io/oio:hasExactSynonym) | [pubmed:29695532](https://bioregistry.io/pubmed:29695532)                                                            | [0000-0003-4423-4370](https://bioregistry.io/orcid:0000-0003-4423-4370) | 

### Incorrect Synonyms

The [`negatives.tsv`](src/biosynonyms/resources/negatives.tsv) has the following
columns for non-trivial examples of text strings that aren't synonyms. This
document doesn't address the same issues as context-based disambiguation, but
rather helps dscribe issues like incorrect sub-string matching:

1. `text` the non-synonym text itself
2. `curie` the compact uniform resource identifier (CURIE) for a biomedical
   entity or concept that **does not** match the following text, standardized
   using the Bioregistry
3. `references` same as for `positives.tsv`, illustrating documents where this
   string appears
4. `contributor` the ORCID identifier of the contributor

Here's an example of some rows in the negative synonyms table (with linkified
CURIEs):

| text        | curie                                           | references                                                                                                           | contributor                                                             |
|-------------|-------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|
| PI(3,4,5)P3 | [hgnc:22979](https://bioregistry.io/hgnc:22979) | [pubmed:29623928](https://bioregistry.io/pubmed:29623928), [pubmed:20817957](https://bioregistry.io/pubmed:20817957) | [0000-0003-4423-4370](https://bioregistry.io/orcid:0000-0003-4423-4370) |

## Known Limitations

It's hard to know which exact matches between different vocabularies could be
used to deduplicate synonyms. Right now, this isn't covered but some partial
solutions already exist that could be adopted.

## License

All data are available under CC0 license. All code is available under MIT
license.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/biopragmatics/biosynonyms",
    "name": "biosynonyms",
    "maintainer": "Charles Tapley Hoyt",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "cthoyt@gmail.com",
    "keywords": "databases,biological databases,biomedical databases",
    "author": "Charles Tapley Hoyt",
    "author_email": "cthoyt@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/64/a0/e2c8a0a0f1ef0eb3b3379794123aa81346f2ba85f1345de5fbcaa3af3268/biosynonyms-0.0.1.tar.gz",
    "platform": null,
    "description": "# Biosynonyms\n\nA decentralized database of synonyms for biomedical entities and concepts. This\nresource is meant to be complementary to ontologies, databases, and other\ncontrolled vocabularies that provide synonyms. It's released under a permissive\nlicense (CC0), so they can be easily adopted by/contributed back to upstream resources.\n\nHere's how to get the data:\n\n```python\nimport biosynonyms\n\n# Uses an internal data structure\npositive_synonyms = biosynonyms.get_positive_synonyms()\nnegative_synonyms = biosynonyms.get_negative_synonyms()\n\n# Get ready for use in NER with Gilda, only using positive synonyms\ngilda_terms = biosynonyms.get_gilda_terms()\n```\n\n### Synonyms\n\nThe data are also accessible directly through TSV such that anyone can consume them\nfrom any programming language.\n\nThe [`positives.tsv`](src/biosynonyms/resources/positives.tsv) has the following\ncolumns:\n\n1. `text` the synonym text itself\n2. `curie` the compact uniform resource identifier (CURIE) for a biomedical\n   entity or concept, standardized using the Bioregistry\n3. `name` the standard name for the concept\n4. `scope` the match type, written as a CURIE from\n   the [OBO in OWL (`oio`)](https://bioregistry.io/oio) controlled vocabulary,\n   i.e., one of:\n    - `oboInOwl:hasExactSynonym`\n    - `oboInOwl:hasNarrowSynonym`\n    - `oboInOwl:hasBroadSynonym`\n    - `oboInOwl:hasRelatedSynonym`\n    - `oboInOwl:hasSynonym` (use this if the scope is unknown)\n5. `type` the synonym property type, written as a CURIE from\n   the [OBO Metadata Ontology (`omo`)](https://bioregistry.io/omo) controlled vocabulary,\n   e.g., one of:\n    - `OMO:0003000` (abbreviation)\n    - `OMO:0003001` (ambiguous synonym)\n    - `OMO:0003002` (dubious synonym)\n    - `OMO:0003003` (layperson synonym)\n    - `OMO:0003004` (plural form)\n    - ...\n6. `references` a comma-delimited list of CURIEs corresponding to publications\n   that use the given synonym (ideally using highly actionable identifiers from\n   semantic spaces like [`pubmed`](https://bioregistry.io/pubmed),\n   [`pmc`](https://bioregistry.io/pmc), [`doi`](https://bioregistry.io/doi))\n7. `contributor` the ORCID identifier of the contributor\n\nHere's an example of some rows in the synonyms table (with linkified CURIEs):\n\n| text                            | curie                                             | scope                                                             | references                                                                                                           | contributor                                                             |\n|---------------------------------|---------------------------------------------------|-------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|\n| PI(3,4,5)P3                     | [CHEBI:16618](https://bioregistry.io/CHEBI:16618) | [oio:hasExactSynonym](https://bioregistry.io/oio:hasExactSynonym) | [pubmed:29623928](https://bioregistry.io/pubmed:29623928), [pubmed:20817957](https://bioregistry.io/pubmed:20817957) | [0000-0003-4423-4370](https://bioregistry.io/orcid:0000-0003-4423-4370) |\n| phosphatidylinositol (3,4,5) P3 | [CHEBI:16618](https://bioregistry.io/CHEBI:16618) | [oio:hasExactSynonym](https://bioregistry.io/oio:hasExactSynonym) | [pubmed:29695532](https://bioregistry.io/pubmed:29695532)                                                            | [0000-0003-4423-4370](https://bioregistry.io/orcid:0000-0003-4423-4370) | \n\n### Incorrect Synonyms\n\nThe [`negatives.tsv`](src/biosynonyms/resources/negatives.tsv) has the following\ncolumns for non-trivial examples of text strings that aren't synonyms. This\ndocument doesn't address the same issues as context-based disambiguation, but\nrather helps dscribe issues like incorrect sub-string matching:\n\n1. `text` the non-synonym text itself\n2. `curie` the compact uniform resource identifier (CURIE) for a biomedical\n   entity or concept that **does not** match the following text, standardized\n   using the Bioregistry\n3. `references` same as for `positives.tsv`, illustrating documents where this\n   string appears\n4. `contributor` the ORCID identifier of the contributor\n\nHere's an example of some rows in the negative synonyms table (with linkified\nCURIEs):\n\n| text        | curie                                           | references                                                                                                           | contributor                                                             |\n|-------------|-------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|\n| PI(3,4,5)P3 | [hgnc:22979](https://bioregistry.io/hgnc:22979) | [pubmed:29623928](https://bioregistry.io/pubmed:29623928), [pubmed:20817957](https://bioregistry.io/pubmed:20817957) | [0000-0003-4423-4370](https://bioregistry.io/orcid:0000-0003-4423-4370) |\n\n## Known Limitations\n\nIt's hard to know which exact matches between different vocabularies could be\nused to deduplicate synonyms. Right now, this isn't covered but some partial\nsolutions already exist that could be adopted.\n\n## License\n\nAll data are available under CC0 license. All code is available under MIT\nlicense.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A decentralized database of synonyms for biomedical concepts and entities.",
    "version": "0.0.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/biopragmatics/biosynonyms/issues",
        "Download": "https://github.com/biopragmatics/biosynonyms/releases",
        "Homepage": "https://github.com/biopragmatics/biosynonyms"
    },
    "split_keywords": [
        "databases",
        "biological databases",
        "biomedical databases"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b80737f46d67ab6834176a6e04c0ac164476e2b94e000492d1d314c1e2ec3a65",
                "md5": "75efe3b899ab4170d49b9a7de833fda1",
                "sha256": "1e18e5c320b3fbf51e5cb735b39d55c37794533d9adb00d6ec8668bd73dcb026"
            },
            "downloads": -1,
            "filename": "biosynonyms-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "75efe3b899ab4170d49b9a7de833fda1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 14924,
            "upload_time": "2024-01-30T13:51:50",
            "upload_time_iso_8601": "2024-01-30T13:51:50.287269Z",
            "url": "https://files.pythonhosted.org/packages/b8/07/37f46d67ab6834176a6e04c0ac164476e2b94e000492d1d314c1e2ec3a65/biosynonyms-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "64a0e2c8a0a0f1ef0eb3b3379794123aa81346f2ba85f1345de5fbcaa3af3268",
                "md5": "25d516a7fafbb9e98151f314efeb3be0",
                "sha256": "1760011feaa06974fc1d2b6978388543fd909c5d9a50b425825f93248a8acfee"
            },
            "downloads": -1,
            "filename": "biosynonyms-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "25d516a7fafbb9e98151f314efeb3be0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 17637,
            "upload_time": "2024-01-30T13:51:54",
            "upload_time_iso_8601": "2024-01-30T13:51:54.745187Z",
            "url": "https://files.pythonhosted.org/packages/64/a0/e2c8a0a0f1ef0eb3b3379794123aa81346f2ba85f1345de5fbcaa3af3268/biosynonyms-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-30 13:51:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "biopragmatics",
    "github_project": "biosynonyms",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "biosynonyms"
}
        
Elapsed time: 0.18107s