<!--
<p align="center">
<img src="https://github.com/cthoyt/umls_downloader/raw/main/docs/source/logo.png" height="150">
</p>
-->
<h1 align="center">
UMLS Downloader
</h1>
<p align="center">
<a href="https://github.com/cthoyt/umls_downloader/actions?query=workflow%3ATests">
<img alt="Tests" src="https://github.com/cthoyt/umls_downloader/workflows/Tests/badge.svg" />
</a>
<a href='https://umls-downloader.readthedocs.io/en/latest/?badge=latest'>
<img src='https://readthedocs.org/projects/umls-downloader/badge/?version=latest' alt='Documentation Status' />
</a>
<a href="https://pypi.org/project/umls_downloader">
<img alt="PyPI" src="https://img.shields.io/pypi/v/umls_downloader" />
</a>
<a href="https://pypi.org/project/umls_downloader">
<img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/umls_downloader" />
</a>
<a href="https://github.com/cthoyt/umls_downloader/blob/main/LICENSE">
<img alt="PyPI - License" src="https://img.shields.io/pypi/l/umls_downloader" />
</a>
<a href='https://github.com/psf/black'>
<img src='https://img.shields.io/badge/code%20style-black-000000.svg' alt='Code style: black' />
</a>
</p>
Don't worry about [UMLS Terminology Services (UTS)](https://uts.nlm.nih.gov/uts/)
licensing and distribution rules - just use
`umls_downloader` to write code that knows how to download content and use it
automatically from the following (non-exhaustive) list of resources:
- [UMLS](https://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html)
- [RxNorm](https://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html)
- [SemMedDB](https://lhncbc.nlm.nih.gov/ii/tools/SemRep_SemMedDB_SKR/SemMedDB_download.html)
- [SNOMED-CT](https://www.nlm.nih.gov/healthit/snomedct/international.html)
- potentially more in the future
or any content that can be downloaded through
the [UTS ticket granting](https://documentation.uts.nlm.nih.gov/automating-downloads.html)
system. There's no centralized list of content available through the UTS so
suggestions for additional resources are welcome through
the [issue tracker](https://github.com/cthoyt/umls_downloader/issues).
Full documentation are available at [umls-downloader.readthedocs.io](https://umls-downloader.readthedocs.io).
## Installation
```bash
$ pip install umls_downloader
```
## Download A Specific Version of UMLS
```python
import os
from umls_downloader import download_umls
# Get this from https://uts.nlm.nih.gov/uts/edit-profile
api_key = ...
path = download_umls(version="2021AB", api_key=api_key)
# This is where it gets downloaded: ~/.data/bio/umls/2021AB/umls-2021AB-mrconso.zip
expected_path = os.path.join(
os.path.expanduser("~"), ".data", "umls", "2021AB",
"umls-2021AB-mrconso.zip",
)
assert expected_path == path.as_posix()
```
After it's been downloaded once, it's smart and doesn't need to download again.
It gets stored using [`pystow`](https://github.com/cthoyt/pystow) automatically
in the `~/.data/bio/umls` directory.
A full list of functions is available in the
[documentation](https://umls-downloader.readthedocs.io).
## Automating Configuration of UTS Credentials
There are two ways to automatically set the username and password so you don't
have to worry about getting it and passing it around in your python code:
1. Set `UMLS_API_KEY` in the environment
2. Create `~/.config/umls.ini` and set in the `[umls]` section a `api_key` key.
```python
from umls_downloader import download_umls
# Same path as before
path = download_umls(version="2021AB")
```
## Download the Latest Version
First, you'll have to
install [`bioversions`](https://github.com/cthoyt/bioversions)
with `pip install bioversions`, whose job it is to look up the latest version of
many databases. Then, you can modify the previous code slightly by omitting
the `version` keyword argument:
```python
from umls_downloader import download_umls
# Same path as before (as of November 21st, 2021)
path = download_umls()
```
## Download and open the file
The UMLS file is zipped, so it's usually accompanied with the following
boilerplate code:
```python
import zipfile
from umls_downloader import download_umls
path = download_umls()
with zipfile.ZipFile(path) as zip_file:
with zip_file.open("MRCONSO.RRF", mode="r") as file:
for line in file:
...
```
This exact code is wrapped with the `open_umls()` using Python's context manager
so it can more simply be written as:
```python
from umls_downloader import open_umls
with open_umls() as file:
for line in file:
...
```
The `version` and `api_key` arguments also apply here.
## Why not an API?
The UMLS provides an [API](https://documentation.uts.nlm.nih.gov/rest/home.html)
for access to tiny bits of data at a time. There are even two recent (last 5
years) packages [`umls-api`](https://pypi.org/project/umls-api)
[`connect-umls`](https://pypi.org/project/connect-umls) that provide a wrapper
around them. However, API access is generally rate limited, difficult to use in
bulk, and slow. For working with UMLS (or any other database, for that matter)in
bulk, it's necessary to download full database dumps.
## 👋 Attribution
### ⚖️ License
The code in this package is licensed under the MIT License.
### 🍪 Cookiecutter
This package was created
with [@audreyfeldroy](https://github.com/audreyfeldroy)'s
[cookiecutter](https://github.com/cookiecutter/cookiecutter) package
using [@cthoyt](https://github.com/cthoyt)'s
[cookiecutter-snekpack](https://github.com/cthoyt/cookiecutter-snekpack)
template.
Raw data
{
"_id": null,
"home_page": "https://github.com/cthoyt/umls_downloader",
"name": "umls-downloader",
"maintainer": "Charles Tapley Hoyt",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "cthoyt@gmail.com",
"keywords": "snekpack,cookiecutter,UMLS,SNOMED-CT,RxNorm",
"author": "Charles Tapley Hoyt",
"author_email": "cthoyt@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/a4/a1/c350d8148c7dc98a118214430317700918647a5bf581dac61abb88f82e32/umls_downloader-0.1.2.tar.gz",
"platform": null,
"description": "<!--\n<p align=\"center\">\n <img src=\"https://github.com/cthoyt/umls_downloader/raw/main/docs/source/logo.png\" height=\"150\">\n</p>\n-->\n\n<h1 align=\"center\">\n UMLS Downloader\n</h1>\n\n<p align=\"center\">\n <a href=\"https://github.com/cthoyt/umls_downloader/actions?query=workflow%3ATests\">\n <img alt=\"Tests\" src=\"https://github.com/cthoyt/umls_downloader/workflows/Tests/badge.svg\" />\n </a>\n <a href='https://umls-downloader.readthedocs.io/en/latest/?badge=latest'>\n <img src='https://readthedocs.org/projects/umls-downloader/badge/?version=latest' alt='Documentation Status' />\n </a>\n <a href=\"https://pypi.org/project/umls_downloader\">\n <img alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/umls_downloader\" />\n </a>\n <a href=\"https://pypi.org/project/umls_downloader\">\n <img alt=\"PyPI - Python Version\" src=\"https://img.shields.io/pypi/pyversions/umls_downloader\" />\n </a>\n <a href=\"https://github.com/cthoyt/umls_downloader/blob/main/LICENSE\">\n <img alt=\"PyPI - License\" src=\"https://img.shields.io/pypi/l/umls_downloader\" />\n </a>\n <a href='https://github.com/psf/black'>\n <img src='https://img.shields.io/badge/code%20style-black-000000.svg' alt='Code style: black' />\n </a>\n</p>\n\nDon't worry about [UMLS Terminology Services (UTS)](https://uts.nlm.nih.gov/uts/)\nlicensing and distribution rules - just use\n`umls_downloader` to write code that knows how to download content and use it\nautomatically from the following (non-exhaustive) list of resources:\n\n- [UMLS](https://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html)\n- [RxNorm](https://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html)\n- [SemMedDB](https://lhncbc.nlm.nih.gov/ii/tools/SemRep_SemMedDB_SKR/SemMedDB_download.html)\n- [SNOMED-CT](https://www.nlm.nih.gov/healthit/snomedct/international.html)\n- potentially more in the future\n\nor any content that can be downloaded through\nthe [UTS ticket granting](https://documentation.uts.nlm.nih.gov/automating-downloads.html)\nsystem. There's no centralized list of content available through the UTS so\nsuggestions for additional resources are welcome through\nthe [issue tracker](https://github.com/cthoyt/umls_downloader/issues).\n\nFull documentation are available at [umls-downloader.readthedocs.io](https://umls-downloader.readthedocs.io).\n\n## Installation\n\n```bash\n$ pip install umls_downloader\n```\n\n## Download A Specific Version of UMLS\n\n```python\nimport os\nfrom umls_downloader import download_umls\n\n# Get this from https://uts.nlm.nih.gov/uts/edit-profile\napi_key = ...\n\npath = download_umls(version=\"2021AB\", api_key=api_key)\n\n# This is where it gets downloaded: ~/.data/bio/umls/2021AB/umls-2021AB-mrconso.zip\nexpected_path = os.path.join(\n os.path.expanduser(\"~\"), \".data\", \"umls\", \"2021AB\",\n \"umls-2021AB-mrconso.zip\",\n)\nassert expected_path == path.as_posix()\n```\n\nAfter it's been downloaded once, it's smart and doesn't need to download again.\nIt gets stored using [`pystow`](https://github.com/cthoyt/pystow) automatically\nin the `~/.data/bio/umls` directory.\n\nA full list of functions is available in the\n[documentation](https://umls-downloader.readthedocs.io).\n\n## Automating Configuration of UTS Credentials\n\nThere are two ways to automatically set the username and password so you don't\nhave to worry about getting it and passing it around in your python code:\n\n1. Set `UMLS_API_KEY` in the environment\n2. Create `~/.config/umls.ini` and set in the `[umls]` section a `api_key` key.\n\n```python\nfrom umls_downloader import download_umls\n\n# Same path as before\npath = download_umls(version=\"2021AB\")\n```\n\n## Download the Latest Version\n\nFirst, you'll have to\ninstall [`bioversions`](https://github.com/cthoyt/bioversions)\nwith `pip install bioversions`, whose job it is to look up the latest version of\nmany databases. Then, you can modify the previous code slightly by omitting\nthe `version` keyword argument:\n\n```python\nfrom umls_downloader import download_umls\n\n# Same path as before (as of November 21st, 2021)\npath = download_umls()\n```\n\n## Download and open the file\n\nThe UMLS file is zipped, so it's usually accompanied with the following\nboilerplate code:\n\n```python\nimport zipfile\nfrom umls_downloader import download_umls\n\npath = download_umls()\nwith zipfile.ZipFile(path) as zip_file:\n with zip_file.open(\"MRCONSO.RRF\", mode=\"r\") as file:\n for line in file:\n ...\n```\n\nThis exact code is wrapped with the `open_umls()` using Python's context manager\nso it can more simply be written as:\n\n```python\nfrom umls_downloader import open_umls\n\nwith open_umls() as file:\n for line in file:\n ...\n```\n\nThe `version` and `api_key` arguments also apply here.\n\n## Why not an API?\n\nThe UMLS provides an [API](https://documentation.uts.nlm.nih.gov/rest/home.html)\nfor access to tiny bits of data at a time. There are even two recent (last 5\nyears) packages [`umls-api`](https://pypi.org/project/umls-api)\n[`connect-umls`](https://pypi.org/project/connect-umls) that provide a wrapper\naround them. However, API access is generally rate limited, difficult to use in\nbulk, and slow. For working with UMLS (or any other database, for that matter)in\nbulk, it's necessary to download full database dumps.\n\n## \ud83d\udc4b Attribution\n\n### \u2696\ufe0f License\n\nThe code in this package is licensed under the MIT License.\n\n### \ud83c\udf6a Cookiecutter\n\nThis package was created\nwith [@audreyfeldroy](https://github.com/audreyfeldroy)'s\n[cookiecutter](https://github.com/cookiecutter/cookiecutter) package\nusing [@cthoyt](https://github.com/cthoyt)'s\n[cookiecutter-snekpack](https://github.com/cthoyt/cookiecutter-snekpack)\ntemplate.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Automate downloading UMLS data.",
"version": "0.1.2",
"project_urls": {
"Bug Tracker": "https://github.com/cthoyt/umls_downloader/issues",
"Download": "https://github.com/cthoyt/umls_downloader/releases",
"Homepage": "https://github.com/cthoyt/umls_downloader",
"Source Code": "https://github.com/cthoyt/umls_downloader"
},
"split_keywords": [
"snekpack",
"cookiecutter",
"umls",
"snomed-ct",
"rxnorm"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f25bdbb9f6ade9e0105e72ee3ed4de651e48c5755ed0181a87991e1726ae0d04",
"md5": "0ef961c4f50ab059de8deee814f18bdc",
"sha256": "5a36d06562a6325e1c75fdda75b67d5a117ede88f20ecdabd6a0690b3d9e37bb"
},
"downloads": -1,
"filename": "umls_downloader-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0ef961c4f50ab059de8deee814f18bdc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 13060,
"upload_time": "2023-11-07T14:28:39",
"upload_time_iso_8601": "2023-11-07T14:28:39.015397Z",
"url": "https://files.pythonhosted.org/packages/f2/5b/dbb9f6ade9e0105e72ee3ed4de651e48c5755ed0181a87991e1726ae0d04/umls_downloader-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a4a1c350d8148c7dc98a118214430317700918647a5bf581dac61abb88f82e32",
"md5": "40d3e55acc3e06618a3ef4c85b421f85",
"sha256": "0e79ef60f785c0ca51939820ffb1fb346c72fe8fccb2dcfe98133b80e09a7b12"
},
"downloads": -1,
"filename": "umls_downloader-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "40d3e55acc3e06618a3ef4c85b421f85",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 16112,
"upload_time": "2023-11-07T14:28:47",
"upload_time_iso_8601": "2023-11-07T14:28:47.040153Z",
"url": "https://files.pythonhosted.org/packages/a4/a1/c350d8148c7dc98a118214430317700918647a5bf581dac61abb88f82e32/umls_downloader-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-07 14:28:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cthoyt",
"github_project": "umls_downloader",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "umls-downloader"
}