Name | runrex JSON |
Version |
0.5.0
JSON |
| download |
home_page | |
Summary | Library to aid in organizing, running, and debugging regular expressions against large bodies of text. |
upload_time | 2023-03-24 18:52:54 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.8 |
license | |
keywords |
nlp
information extraction
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
[![Contributors][contributors-shield]][contributors-url]
[![Forks][forks-shield]][forks-url]
[![Stargazers][stars-shield]][stars-url]
[![Issues][issues-shield]][issues-url]
[![MIT License][license-shield]][license-url]
[![LinkedIn][linkedin-shield]][linkedin-url]
<!-- PROJECT LOGO -->
<br />
<div>
<p>
<a href="https://github.com/kpwhri/runrex">
<img src="images/logo.png" alt="Logo">
</a>
</p>
<h3 align="center">Runrex</h3>
<p>
Library to aid in organizing, running, and debugging regular expressions against large bodies of text.
</p>
</div>
<!-- TABLE OF CONTENTS -->
## Table of Contents
* [About the Project](#about-the-project)
* [Getting Started](#getting-started)
* [Prerequisites](#prerequisites)
* [Installation](#installation)
* [Usage](#usage)
* [Roadmap](#roadmap)
* [Contributing](#contributing)
* [License](#license)
* [Contact](#contact)
* [Acknowledgements](#acknowledgements)
## About the Project
The goal of this library is to simplify the deployment of regular expression on large bodies of text, in a variety of input formats.
<!-- GETTING STARTED -->
## Getting Started
To get a local copy up and running follow these simple steps.
### Prerequisites
* Python 3.8+
* runrex package: https://github.com/kpwhri/runrex
### Installation
1. Clone the repo
```sh
git clone https://github.com/kpwhri/runrex.git
```
2. Install requirements (`requirements-dev` is for test packages)
```sh
pip install -r requirements.txt -r requirements-dev.txt
```
3. If you wish to read text from SAS or SQL, you will need to install additional requirements. These additional requirements files may be of use:
- ODBC-connection: `requirements-db.txt`
- Postgres: `requirements-psql.txt`
- SAS: `requirements-sas.txt`
4. Run tests.
```sh
set/export PYTHONPATH=src
pytest tests
```
## Usage
### Example Implementations
* [Social Isolation](https://github.com/kpwhri/social-isolation-runrex)
* [Acute Pancreatitis](https://github.com/kpwhri/apanc-runrex)
* [Anaphylaxis](https://github.com/kpwhri/anaphylaxis-runrex)
* [PCOS](https://github.com/kpwhri/pcos-runrex)
### Build Customized Algorithm
* Create 4 files:
* `patterns.py`: defines regular expressions of interest
* See `examples/example_patterns.py` for some examples
* `test_patterns.py`: tests for those regular expressions
* Why? Make sure the patterns do what you think they do
* `algorithm.py`: defines algorithm (how to use regular expressions); returns a Result
* See `examples/example_algorithm.py` for guidance
* `config.(py|json|yaml)`: various configurations defined in `schema.py`
* See example in `examples/example_config.py` for basic config
## Input Data
Accepts a variety of input formats, but will need to at least specify a `document_id` and `document_text`. The names are configurable.
### Sentence Splitting
By default, the input document text is expected to have each sentence on a separate line. If a sentence splitting scheme is desired, it will need to be supplied to the application.
### Schema/Examples
For more details, see the [example config](https://github.com/kpwhri/runrex/blob/master/examples/example_config.py)
or consult the [schema](https://github.com/kpwhri/runrex/blob/master/src/runrex/schema.py)
## Output Format
* Recommended output format is `jsonl`
- The data can be extracted using python:
```python
import json
with open('output.jsonl') as fh:
for line in fh:
data = json.loads(line) # data is dict
```
* Output variables are configurable and can include:
- **id**: unique id for line
- **name**: document name
- **algorithm**: name of algorithm with finding
- **value**
- **category**: name of category (usually the pattern; multiple categories contribute to an algorithm)
- **date**
- **extras**
- **matches**: pattern matches
- **text**: captured text
- **start**: start index/offset of match
- **end**: end index/offset of match
* Scripts to accomplish useful tasks with the output are included in the `scripts` directory.
## Versions
Uses [SEMVER](https://semver.org/).
See https://github.com/kpwhri/runrex/releases.
<!-- ROADMAP -->
## Roadmap
See the [open issues](https://github.com/kpwhri/runrex/issues) for a list of proposed features (and known issues).
<!-- CONTRIBUTING -->
## Contributing
Any contributions you make are **greatly appreciated**.
1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
<!-- LICENSE -->
## License
Distributed under the MIT License.
See `LICENSE` or https://kpwhri.mit-license.org for more information.
<!-- CONTACT -->
## Contact
Please use the [issue tracker](https://github.com/kpwhri/runrex/issues).
<!-- ACKNOWLEDGEMENTS -->
## Acknowledgements
<!-- MARKDOWN LINKS & IMAGES -->
<!-- https://www.markdownguide.org/basic-syntax/#reference-style-links -->
[contributors-shield]: https://img.shields.io/github/contributors/kpwhri/runrex.svg?style=flat-square
[contributors-url]: https://github.com/kpwhri/runrex/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/kpwhri/runrex.svg?style=flat-square
[forks-url]: https://github.com/kpwhri/runrex/network/members
[stars-shield]: https://img.shields.io/github/stars/kpwhri/runrex.svg?style=flat-square
[stars-url]: https://github.com/kpwhri/runrex/stargazers
[issues-shield]: https://img.shields.io/github/issues/kpwhri/runrex.svg?style=flat-square
[issues-url]: https://github.com/kpwhri/runrex/issues
[license-shield]: https://img.shields.io/github/license/kpwhri/runrex.svg?style=flat-square
[license-url]: https://kpwhri.mit-license.org/
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555
[linkedin-url]: https://www.linkedin.com/company/kaiserpermanentewashingtonresearch
<!-- [product-screenshot]: images/screenshot.png -->
Raw data
{
"_id": null,
"home_page": "",
"name": "runrex",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "nlp,information extraction",
"author": "",
"author_email": "dcronkite <dcronkite+pypi@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/a4/c7/309a1e180ba0a7d5090a7e36b58023ced8372df8635cee67b9ff230e9c01/runrex-0.5.0.tar.gz",
"platform": null,
"description": "[![Contributors][contributors-shield]][contributors-url]\n[![Forks][forks-shield]][forks-url]\n[![Stargazers][stars-shield]][stars-url]\n[![Issues][issues-shield]][issues-url]\n[![MIT License][license-shield]][license-url]\n[![LinkedIn][linkedin-shield]][linkedin-url]\n\n\n\n<!-- PROJECT LOGO -->\n<br />\n<div>\n <p>\n <a href=\"https://github.com/kpwhri/runrex\">\n <img src=\"images/logo.png\" alt=\"Logo\">\n </a>\n </p>\n\n <h3 align=\"center\">Runrex</h3>\n\n <p>\n Library to aid in organizing, running, and debugging regular expressions against large bodies of text.\n </p>\n</div>\n\n\n<!-- TABLE OF CONTENTS -->\n## Table of Contents\n\n* [About the Project](#about-the-project)\n* [Getting Started](#getting-started)\n * [Prerequisites](#prerequisites)\n * [Installation](#installation)\n* [Usage](#usage)\n* [Roadmap](#roadmap)\n* [Contributing](#contributing)\n* [License](#license)\n* [Contact](#contact)\n* [Acknowledgements](#acknowledgements)\n\n\n\n## About the Project \nThe goal of this library is to simplify the deployment of regular expression on large bodies of text, in a variety of input formats.\n\n\n<!-- GETTING STARTED -->\n## Getting Started\n\nTo get a local copy up and running follow these simple steps.\n\n### Prerequisites\n\n* Python 3.8+\n* runrex package: https://github.com/kpwhri/runrex\n\n### Installation\n \n1. Clone the repo\n ```sh\n git clone https://github.com/kpwhri/runrex.git\n ```\n2. Install requirements (`requirements-dev` is for test packages)\n ```sh\n pip install -r requirements.txt -r requirements-dev.txt\n ```\n3. If you wish to read text from SAS or SQL, you will need to install additional requirements. These additional requirements files may be of use:\n - ODBC-connection: `requirements-db.txt`\n - Postgres: `requirements-psql.txt`\n - SAS: `requirements-sas.txt`\n4. Run tests.\n ```sh\n set/export PYTHONPATH=src\n pytest tests\n ```\n\n## Usage\n\n### Example Implementations\n* [Social Isolation](https://github.com/kpwhri/social-isolation-runrex)\n* [Acute Pancreatitis](https://github.com/kpwhri/apanc-runrex)\n* [Anaphylaxis](https://github.com/kpwhri/anaphylaxis-runrex)\n* [PCOS](https://github.com/kpwhri/pcos-runrex)\n\n### Build Customized Algorithm\n\n* Create 4 files:\n * `patterns.py`: defines regular expressions of interest\n * See `examples/example_patterns.py` for some examples\n * `test_patterns.py`: tests for those regular expressions\n * Why? Make sure the patterns do what you think they do\n * `algorithm.py`: defines algorithm (how to use regular expressions); returns a Result\n * See `examples/example_algorithm.py` for guidance\n * `config.(py|json|yaml)`: various configurations defined in `schema.py`\n * See example in `examples/example_config.py` for basic config \n\n## Input Data\n\nAccepts a variety of input formats, but will need to at least specify a `document_id` and `document_text`. The names are configurable.\n\n### Sentence Splitting\n\nBy default, the input document text is expected to have each sentence on a separate line. If a sentence splitting scheme is desired, it will need to be supplied to the application. \n\n### Schema/Examples\nFor more details, see the [example config](https://github.com/kpwhri/runrex/blob/master/examples/example_config.py) \nor consult the [schema](https://github.com/kpwhri/runrex/blob/master/src/runrex/schema.py)\n\n## Output Format\n\n* Recommended output format is `jsonl`\n - The data can be extracted using python:\n```python\nimport json\nwith open('output.jsonl') as fh:\n for line in fh:\n data = json.loads(line) # data is dict\n```\n\n* Output variables are configurable and can include:\n - **id**: unique id for line\n - **name**: document name\n - **algorithm**: name of algorithm with finding\n - **value**\n - **category**: name of category (usually the pattern; multiple categories contribute to an algorithm)\n - **date**\n - **extras**\n - **matches**: pattern matches\n - **text**: captured text\n - **start**: start index/offset of match\n - **end**: end index/offset of match\n\n* Scripts to accomplish useful tasks with the output are included in the `scripts` directory.\n\n## Versions\n\nUses [SEMVER](https://semver.org/).\n\nSee https://github.com/kpwhri/runrex/releases.\n\n<!-- ROADMAP -->\n## Roadmap\n\nSee the [open issues](https://github.com/kpwhri/runrex/issues) for a list of proposed features (and known issues).\n\n\n\n<!-- CONTRIBUTING -->\n## Contributing\n\nAny contributions you make are **greatly appreciated**.\n\n1. Fork the Project\n2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the Branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n\n<!-- LICENSE -->\n## License\n\nDistributed under the MIT License. \n\nSee `LICENSE` or https://kpwhri.mit-license.org for more information.\n\n\n\n<!-- CONTACT -->\n## Contact\n\nPlease use the [issue tracker](https://github.com/kpwhri/runrex/issues). \n\n\n<!-- ACKNOWLEDGEMENTS -->\n## Acknowledgements\n\n\n\n<!-- MARKDOWN LINKS & IMAGES -->\n<!-- https://www.markdownguide.org/basic-syntax/#reference-style-links -->\n[contributors-shield]: https://img.shields.io/github/contributors/kpwhri/runrex.svg?style=flat-square\n[contributors-url]: https://github.com/kpwhri/runrex/graphs/contributors\n[forks-shield]: https://img.shields.io/github/forks/kpwhri/runrex.svg?style=flat-square\n[forks-url]: https://github.com/kpwhri/runrex/network/members\n[stars-shield]: https://img.shields.io/github/stars/kpwhri/runrex.svg?style=flat-square\n[stars-url]: https://github.com/kpwhri/runrex/stargazers\n[issues-shield]: https://img.shields.io/github/issues/kpwhri/runrex.svg?style=flat-square\n[issues-url]: https://github.com/kpwhri/runrex/issues\n[license-shield]: https://img.shields.io/github/license/kpwhri/runrex.svg?style=flat-square\n[license-url]: https://kpwhri.mit-license.org/\n[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555\n[linkedin-url]: https://www.linkedin.com/company/kaiserpermanentewashingtonresearch\n<!-- [product-screenshot]: images/screenshot.png -->\n\n",
"bugtrack_url": null,
"license": "",
"summary": "Library to aid in organizing, running, and debugging regular expressions against large bodies of text.",
"version": "0.5.0",
"split_keywords": [
"nlp",
"information extraction"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "face70f45c98f951c15f997405bab3b4f17d22f8cad07ceb6d11528af8c23f89",
"md5": "0c6e68970a6360c533f7e8e4f4cc8e82",
"sha256": "abc02b2492699962b45efc62400a3ee35afdb52def21108059ecb1de912545a6"
},
"downloads": -1,
"filename": "runrex-0.5.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0c6e68970a6360c533f7e8e4f4cc8e82",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 39609,
"upload_time": "2023-03-24T18:52:53",
"upload_time_iso_8601": "2023-03-24T18:52:53.063357Z",
"url": "https://files.pythonhosted.org/packages/fa/ce/70f45c98f951c15f997405bab3b4f17d22f8cad07ceb6d11528af8c23f89/runrex-0.5.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a4c7309a1e180ba0a7d5090a7e36b58023ced8372df8635cee67b9ff230e9c01",
"md5": "0091b3c7b9de974823908d19293f4cd1",
"sha256": "5498b907fe89c54f545a4bdfab266a92dd24f1af01a3e3a74415c164aabffa43"
},
"downloads": -1,
"filename": "runrex-0.5.0.tar.gz",
"has_sig": false,
"md5_digest": "0091b3c7b9de974823908d19293f4cd1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 54015,
"upload_time": "2023-03-24T18:52:54",
"upload_time_iso_8601": "2023-03-24T18:52:54.743752Z",
"url": "https://files.pythonhosted.org/packages/a4/c7/309a1e180ba0a7d5090a7e36b58023ced8372df8635cee67b9ff230e9c01/runrex-0.5.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-24 18:52:54",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "runrex"
}