Name | nlp4bia JSON |
Version |
2.1.9
JSON |
| download |
home_page | None |
Summary | Download NLP4BIA benchmarks and load datasets in their format |
upload_time | 2025-02-20 10:26:59 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.7 |
license | MIT License Copyright (c) 2024 Alberto Becerra Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
keywords |
feed
reader
tutorial
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# NLP4BIA Library
This repository provides a Python library for loading, processing, and utilizing biomedical datasets curated by the NLP4BIA research group at the Barcelona Supercomputing Center (BSC). The datasets are specifically designed for natural language processing (NLP) tasks in the biomedical domain.
---
## Available Dataset Loaders
The library currently supports the following dataset loaders, which are part of public benchmarks:
### 1. **Distemist**
- **Description**: A dataset for disease mentions recognition and normalization in Spanish medical texts.
- **Zenodo Repository**: [Distemist Zenodo](https://doi.org/10.5281/zenodo.7614764)
### 2. **Meddoplace**
- **Description**: A dataset for place name recognition in Spanish medical texts.
- **Zenodo Repository**: [Meddoplace Zenodo](https://doi.org/10.5281/zenodo.8403498)
### 3. **Medprocner**
- **Description**: A dataset for procedure name recognition in Spanish medical texts.
- **Zenodo Repository**: [Medprocner Zenodo](https://doi.org/10.5281/zenodo.7817667)
### 4. **Symptemist**
- **Description**: A dataset for symptom mentions recognition in Spanish medical texts.
- **Zenodo Repository**: [Symptemist Zenodo](https://doi.org/10.5281/zenodo.10635215)
---
## Installation
```bash
pip install nlp4bia
```
---
## Quick Start Guide
### Example Usage
Here's how to use one of the dataset loaders, such as `DistemistLoader`:
```python
from nlp4bia.datasets.benchmark.distemist import DistemistLoader
# Initialize loader
distemist_loader = DistemistLoader(lang="es", download_if_missing=True)
# Load and preprocess data
dis_df = distemist_loader.df
print(dis_df.head())
```
Dataset folders are automatically downloaded and extracted to the `~/.nlp4bia` directory.
### Column Descriptions
#### Dataset Columns
- **filenameid**: Unique identifier combining filename and offset information.
- **mention_class**: The class of the mention (e.g., disease, symptom, etc.).
- **span**: Text span corresponding to the mention.
- **code**: The normalized code for the mention (usually to SNOMED CT).
- **sem_rel**: Semantic relationships associated with the mention.
- **is_abbreviation**: Indicates if the mention is an abbreviation.
- **is_composite**: Indicates if the mention is a composite term.
- **needs_context**: Indicates if the mention requires additional context.
- **extension_esp**: Additional information specific to Spanish texts.
#### Gazetteer Columns
- **code**: Normalized code for the term.
- **language**: Language of the term.
- **term**: The term itself.
- **semantic_tag**: Semantic tag associated with the term.
- **mainterm**: Indicates if the term is a primary term.
---
## Contributing
Contributions to expand the dataset loaders or improve existing functionality are welcome! Please open an issue or submit a pull request.
---
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
---
## References
If you use this library or its datasets in your research, please cite the corresponding Zenodo repositories or related publications.
---
# Instructions for Maintainers
1. Update the version in `nlp4bia/__init__.py` and in `pyproject.toml`.
2. Remove the `dist` folder (`rm -rf dist`).
3. Build the package (`python -m build`).
4. Check the package (`twine check dist/*`).
5. Upload the package (`twine upload dist/*`).
6. Install the package (`pip install nlp4bia`).
Raw data
{
"_id": null,
"home_page": null,
"name": "nlp4bia",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "feed, reader, tutorial",
"author": null,
"author_email": "Alberto Becerra <alberto.becerra.tome@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/ef/51/5fc8109a3eeebc3b657334fd031b2a59278926a0c18958fdc3041125cc34/nlp4bia-2.1.9.tar.gz",
"platform": null,
"description": "# NLP4BIA Library\n\nThis repository provides a Python library for loading, processing, and utilizing biomedical datasets curated by the NLP4BIA research group at the Barcelona Supercomputing Center (BSC). The datasets are specifically designed for natural language processing (NLP) tasks in the biomedical domain.\n\n---\n\n## Available Dataset Loaders\n\nThe library currently supports the following dataset loaders, which are part of public benchmarks:\n\n### 1. **Distemist**\n - **Description**: A dataset for disease mentions recognition and normalization in Spanish medical texts.\n - **Zenodo Repository**: [Distemist Zenodo](https://doi.org/10.5281/zenodo.7614764)\n\n### 2. **Meddoplace**\n - **Description**: A dataset for place name recognition in Spanish medical texts.\n - **Zenodo Repository**: [Meddoplace Zenodo](https://doi.org/10.5281/zenodo.8403498)\n\n### 3. **Medprocner**\n - **Description**: A dataset for procedure name recognition in Spanish medical texts.\n - **Zenodo Repository**: [Medprocner Zenodo](https://doi.org/10.5281/zenodo.7817667)\n\n### 4. **Symptemist**\n - **Description**: A dataset for symptom mentions recognition in Spanish medical texts.\n - **Zenodo Repository**: [Symptemist Zenodo](https://doi.org/10.5281/zenodo.10635215)\n---\n\n## Installation\n\n```bash\npip install nlp4bia\n```\n\n---\n\n## Quick Start Guide\n\n### Example Usage\n\nHere's how to use one of the dataset loaders, such as `DistemistLoader`:\n\n```python\nfrom nlp4bia.datasets.benchmark.distemist import DistemistLoader\n\n# Initialize loader\ndistemist_loader = DistemistLoader(lang=\"es\", download_if_missing=True)\n\n# Load and preprocess data\ndis_df = distemist_loader.df\nprint(dis_df.head())\n```\n\nDataset folders are automatically downloaded and extracted to the `~/.nlp4bia` directory.\n\n### Column Descriptions\n\n#### Dataset Columns\n- **filenameid**: Unique identifier combining filename and offset information.\n- **mention_class**: The class of the mention (e.g., disease, symptom, etc.).\n- **span**: Text span corresponding to the mention.\n- **code**: The normalized code for the mention (usually to SNOMED CT).\n- **sem_rel**: Semantic relationships associated with the mention.\n- **is_abbreviation**: Indicates if the mention is an abbreviation.\n- **is_composite**: Indicates if the mention is a composite term.\n- **needs_context**: Indicates if the mention requires additional context.\n- **extension_esp**: Additional information specific to Spanish texts.\n\n#### Gazetteer Columns\n- **code**: Normalized code for the term.\n- **language**: Language of the term.\n- **term**: The term itself.\n- **semantic_tag**: Semantic tag associated with the term.\n- **mainterm**: Indicates if the term is a primary term.\n\n---\n\n## Contributing\n\nContributions to expand the dataset loaders or improve existing functionality are welcome! Please open an issue or submit a pull request.\n\n---\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n---\n\n## References\n\nIf you use this library or its datasets in your research, please cite the corresponding Zenodo repositories or related publications.\n\n---\n# Instructions for Maintainers\n\n1. Update the version in `nlp4bia/__init__.py` and in `pyproject.toml`.\n2. Remove the `dist` folder (`rm -rf dist`).\n3. Build the package (`python -m build`).\n4. Check the package (`twine check dist/*`).\n5. Upload the package (`twine upload dist/*`).\n6. Install the package (`pip install nlp4bia`).\n",
"bugtrack_url": null,
"license": "MIT License Copyright (c) 2024 Alberto Becerra Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
"summary": "Download NLP4BIA benchmarks and load datasets in their format",
"version": "2.1.9",
"project_urls": null,
"split_keywords": [
"feed",
" reader",
" tutorial"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "831d1a50065ecad7b5987adb62e01d689c97665361307e0cfea539d3076a5702",
"md5": "59e00db9c33ce6a395267cdb12832b89",
"sha256": "1817d5bd8d4976eff6832178580a411535e4974bcf8ab90c922c0b0a0625c465"
},
"downloads": -1,
"filename": "nlp4bia-2.1.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "59e00db9c33ce6a395267cdb12832b89",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 21569,
"upload_time": "2025-02-20T10:26:57",
"upload_time_iso_8601": "2025-02-20T10:26:57.706341Z",
"url": "https://files.pythonhosted.org/packages/83/1d/1a50065ecad7b5987adb62e01d689c97665361307e0cfea539d3076a5702/nlp4bia-2.1.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ef515fc8109a3eeebc3b657334fd031b2a59278926a0c18958fdc3041125cc34",
"md5": "c511115692ed4998ddee91f073ad8c00",
"sha256": "c02c4ffe0adb61471025fb8037745a944453bfe17e0fc4480d3a2bacd7a2065e"
},
"downloads": -1,
"filename": "nlp4bia-2.1.9.tar.gz",
"has_sig": false,
"md5_digest": "c511115692ed4998ddee91f073ad8c00",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 16078,
"upload_time": "2025-02-20T10:26:59",
"upload_time_iso_8601": "2025-02-20T10:26:59.726719Z",
"url": "https://files.pythonhosted.org/packages/ef/51/5fc8109a3eeebc3b657334fd031b2a59278926a0c18958fdc3041125cc34/nlp4bia-2.1.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-20 10:26:59",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "nlp4bia"
}