runrex


Namerunrex JSON
Version 0.5.0 PyPI version JSON
download
home_page
SummaryLibrary to aid in organizing, running, and debugging regular expressions against large bodies of text.
upload_time2023-03-24 18:52:54
maintainer
docs_urlNone
author
requires_python>=3.8
license
keywords nlp information extraction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Contributors][contributors-shield]][contributors-url]
[![Forks][forks-shield]][forks-url]
[![Stargazers][stars-shield]][stars-url]
[![Issues][issues-shield]][issues-url]
[![MIT License][license-shield]][license-url]
[![LinkedIn][linkedin-shield]][linkedin-url]



<!-- PROJECT LOGO -->
<br />
<div>
  <p>
    <a href="https://github.com/kpwhri/runrex">
      <img src="images/logo.png" alt="Logo">
    </a>
  </p>

  <h3 align="center">Runrex</h3>

  <p>
    Library to aid in organizing, running, and debugging regular expressions against large bodies of text.
  </p>
</div>


<!-- TABLE OF CONTENTS -->
## Table of Contents

* [About the Project](#about-the-project)
* [Getting Started](#getting-started)
  * [Prerequisites](#prerequisites)
  * [Installation](#installation)
* [Usage](#usage)
* [Roadmap](#roadmap)
* [Contributing](#contributing)
* [License](#license)
* [Contact](#contact)
* [Acknowledgements](#acknowledgements)



## About the Project 
The goal of this library is to simplify the deployment of regular expression on large bodies of text, in a variety of input formats.


<!-- GETTING STARTED -->
## Getting Started

To get a local copy up and running follow these simple steps.

### Prerequisites

* Python 3.8+
* runrex package: https://github.com/kpwhri/runrex

### Installation
 
1. Clone the repo
    ```sh
    git clone https://github.com/kpwhri/runrex.git
    ```
2. Install requirements (`requirements-dev` is for test packages)
    ```sh
    pip install -r requirements.txt -r requirements-dev.txt
    ```
3. If you wish to read text from SAS or SQL, you will need to install additional requirements. These additional requirements files may be of use:
    - ODBC-connection: `requirements-db.txt`
    - Postgres: `requirements-psql.txt`
    - SAS: `requirements-sas.txt`
4. Run tests.
    ```sh
    set/export PYTHONPATH=src
    pytest tests
    ```

## Usage

### Example Implementations
* [Social Isolation](https://github.com/kpwhri/social-isolation-runrex)
* [Acute Pancreatitis](https://github.com/kpwhri/apanc-runrex)
* [Anaphylaxis](https://github.com/kpwhri/anaphylaxis-runrex)
* [PCOS](https://github.com/kpwhri/pcos-runrex)

### Build Customized Algorithm

* Create 4 files:
    * `patterns.py`: defines regular expressions of interest
        * See `examples/example_patterns.py` for some examples
    * `test_patterns.py`: tests for those regular expressions
        * Why? Make sure the patterns do what you think they do
    * `algorithm.py`: defines algorithm (how to use regular expressions); returns a Result
        * See `examples/example_algorithm.py` for guidance
    * `config.(py|json|yaml)`: various configurations defined in `schema.py`
        * See example in `examples/example_config.py` for basic config  

## Input Data

Accepts a variety of input formats, but will need to at least specify a `document_id` and `document_text`. The names are configurable.

### Sentence Splitting

By default, the input document text is expected to have each sentence on a separate line. If a sentence splitting scheme is desired, it will need to be supplied to the application. 

### Schema/Examples
For more details, see the [example config](https://github.com/kpwhri/runrex/blob/master/examples/example_config.py) 
or consult the [schema](https://github.com/kpwhri/runrex/blob/master/src/runrex/schema.py)

## Output Format

* Recommended output format is `jsonl`
    - The data can be extracted using python:
```python
import json
with open('output.jsonl') as fh:
    for line in fh:
         data = json.loads(line)  # data is dict
```

* Output variables are configurable and can include:
    - **id**: unique id for line
    - **name**: document name
    - **algorithm**: name of algorithm with finding
    - **value**
    - **category**: name of category (usually the pattern; multiple categories contribute to an algorithm)
    - **date**
    - **extras**
    - **matches**: pattern matches
    - **text**: captured text
    - **start**: start index/offset of match
    - **end**: end index/offset of match

* Scripts to accomplish useful tasks with the output are included in the `scripts` directory.

## Versions

Uses [SEMVER](https://semver.org/).

See https://github.com/kpwhri/runrex/releases.

<!-- ROADMAP -->
## Roadmap

See the [open issues](https://github.com/kpwhri/runrex/issues) for a list of proposed features (and known issues).



<!-- CONTRIBUTING -->
## Contributing

Any contributions you make are **greatly appreciated**.

1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request


<!-- LICENSE -->
## License

Distributed under the MIT License. 

See `LICENSE` or https://kpwhri.mit-license.org for more information.



<!-- CONTACT -->
## Contact

Please use the [issue tracker](https://github.com/kpwhri/runrex/issues). 


<!-- ACKNOWLEDGEMENTS -->
## Acknowledgements



<!-- MARKDOWN LINKS & IMAGES -->
<!-- https://www.markdownguide.org/basic-syntax/#reference-style-links -->
[contributors-shield]: https://img.shields.io/github/contributors/kpwhri/runrex.svg?style=flat-square
[contributors-url]: https://github.com/kpwhri/runrex/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/kpwhri/runrex.svg?style=flat-square
[forks-url]: https://github.com/kpwhri/runrex/network/members
[stars-shield]: https://img.shields.io/github/stars/kpwhri/runrex.svg?style=flat-square
[stars-url]: https://github.com/kpwhri/runrex/stargazers
[issues-shield]: https://img.shields.io/github/issues/kpwhri/runrex.svg?style=flat-square
[issues-url]: https://github.com/kpwhri/runrex/issues
[license-shield]: https://img.shields.io/github/license/kpwhri/runrex.svg?style=flat-square
[license-url]: https://kpwhri.mit-license.org/
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555
[linkedin-url]: https://www.linkedin.com/company/kaiserpermanentewashingtonresearch
<!-- [product-screenshot]: images/screenshot.png -->


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "runrex",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "nlp,information extraction",
    "author": "",
    "author_email": "dcronkite <dcronkite+pypi@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/a4/c7/309a1e180ba0a7d5090a7e36b58023ced8372df8635cee67b9ff230e9c01/runrex-0.5.0.tar.gz",
    "platform": null,
    "description": "[![Contributors][contributors-shield]][contributors-url]\n[![Forks][forks-shield]][forks-url]\n[![Stargazers][stars-shield]][stars-url]\n[![Issues][issues-shield]][issues-url]\n[![MIT License][license-shield]][license-url]\n[![LinkedIn][linkedin-shield]][linkedin-url]\n\n\n\n<!-- PROJECT LOGO -->\n<br />\n<div>\n  <p>\n    <a href=\"https://github.com/kpwhri/runrex\">\n      <img src=\"images/logo.png\" alt=\"Logo\">\n    </a>\n  </p>\n\n  <h3 align=\"center\">Runrex</h3>\n\n  <p>\n    Library to aid in organizing, running, and debugging regular expressions against large bodies of text.\n  </p>\n</div>\n\n\n<!-- TABLE OF CONTENTS -->\n## Table of Contents\n\n* [About the Project](#about-the-project)\n* [Getting Started](#getting-started)\n  * [Prerequisites](#prerequisites)\n  * [Installation](#installation)\n* [Usage](#usage)\n* [Roadmap](#roadmap)\n* [Contributing](#contributing)\n* [License](#license)\n* [Contact](#contact)\n* [Acknowledgements](#acknowledgements)\n\n\n\n## About the Project \nThe goal of this library is to simplify the deployment of regular expression on large bodies of text, in a variety of input formats.\n\n\n<!-- GETTING STARTED -->\n## Getting Started\n\nTo get a local copy up and running follow these simple steps.\n\n### Prerequisites\n\n* Python 3.8+\n* runrex package: https://github.com/kpwhri/runrex\n\n### Installation\n \n1. Clone the repo\n    ```sh\n    git clone https://github.com/kpwhri/runrex.git\n    ```\n2. Install requirements (`requirements-dev` is for test packages)\n    ```sh\n    pip install -r requirements.txt -r requirements-dev.txt\n    ```\n3. If you wish to read text from SAS or SQL, you will need to install additional requirements. These additional requirements files may be of use:\n    - ODBC-connection: `requirements-db.txt`\n    - Postgres: `requirements-psql.txt`\n    - SAS: `requirements-sas.txt`\n4. Run tests.\n    ```sh\n    set/export PYTHONPATH=src\n    pytest tests\n    ```\n\n## Usage\n\n### Example Implementations\n* [Social Isolation](https://github.com/kpwhri/social-isolation-runrex)\n* [Acute Pancreatitis](https://github.com/kpwhri/apanc-runrex)\n* [Anaphylaxis](https://github.com/kpwhri/anaphylaxis-runrex)\n* [PCOS](https://github.com/kpwhri/pcos-runrex)\n\n### Build Customized Algorithm\n\n* Create 4 files:\n    * `patterns.py`: defines regular expressions of interest\n        * See `examples/example_patterns.py` for some examples\n    * `test_patterns.py`: tests for those regular expressions\n        * Why? Make sure the patterns do what you think they do\n    * `algorithm.py`: defines algorithm (how to use regular expressions); returns a Result\n        * See `examples/example_algorithm.py` for guidance\n    * `config.(py|json|yaml)`: various configurations defined in `schema.py`\n        * See example in `examples/example_config.py` for basic config  \n\n## Input Data\n\nAccepts a variety of input formats, but will need to at least specify a `document_id` and `document_text`. The names are configurable.\n\n### Sentence Splitting\n\nBy default, the input document text is expected to have each sentence on a separate line. If a sentence splitting scheme is desired, it will need to be supplied to the application. \n\n### Schema/Examples\nFor more details, see the [example config](https://github.com/kpwhri/runrex/blob/master/examples/example_config.py) \nor consult the [schema](https://github.com/kpwhri/runrex/blob/master/src/runrex/schema.py)\n\n## Output Format\n\n* Recommended output format is `jsonl`\n    - The data can be extracted using python:\n```python\nimport json\nwith open('output.jsonl') as fh:\n    for line in fh:\n         data = json.loads(line)  # data is dict\n```\n\n* Output variables are configurable and can include:\n    - **id**: unique id for line\n    - **name**: document name\n    - **algorithm**: name of algorithm with finding\n    - **value**\n    - **category**: name of category (usually the pattern; multiple categories contribute to an algorithm)\n    - **date**\n    - **extras**\n    - **matches**: pattern matches\n    - **text**: captured text\n    - **start**: start index/offset of match\n    - **end**: end index/offset of match\n\n* Scripts to accomplish useful tasks with the output are included in the `scripts` directory.\n\n## Versions\n\nUses [SEMVER](https://semver.org/).\n\nSee https://github.com/kpwhri/runrex/releases.\n\n<!-- ROADMAP -->\n## Roadmap\n\nSee the [open issues](https://github.com/kpwhri/runrex/issues) for a list of proposed features (and known issues).\n\n\n\n<!-- CONTRIBUTING -->\n## Contributing\n\nAny contributions you make are **greatly appreciated**.\n\n1. Fork the Project\n2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the Branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n\n<!-- LICENSE -->\n## License\n\nDistributed under the MIT License. \n\nSee `LICENSE` or https://kpwhri.mit-license.org for more information.\n\n\n\n<!-- CONTACT -->\n## Contact\n\nPlease use the [issue tracker](https://github.com/kpwhri/runrex/issues). \n\n\n<!-- ACKNOWLEDGEMENTS -->\n## Acknowledgements\n\n\n\n<!-- MARKDOWN LINKS & IMAGES -->\n<!-- https://www.markdownguide.org/basic-syntax/#reference-style-links -->\n[contributors-shield]: https://img.shields.io/github/contributors/kpwhri/runrex.svg?style=flat-square\n[contributors-url]: https://github.com/kpwhri/runrex/graphs/contributors\n[forks-shield]: https://img.shields.io/github/forks/kpwhri/runrex.svg?style=flat-square\n[forks-url]: https://github.com/kpwhri/runrex/network/members\n[stars-shield]: https://img.shields.io/github/stars/kpwhri/runrex.svg?style=flat-square\n[stars-url]: https://github.com/kpwhri/runrex/stargazers\n[issues-shield]: https://img.shields.io/github/issues/kpwhri/runrex.svg?style=flat-square\n[issues-url]: https://github.com/kpwhri/runrex/issues\n[license-shield]: https://img.shields.io/github/license/kpwhri/runrex.svg?style=flat-square\n[license-url]: https://kpwhri.mit-license.org/\n[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555\n[linkedin-url]: https://www.linkedin.com/company/kaiserpermanentewashingtonresearch\n<!-- [product-screenshot]: images/screenshot.png -->\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Library to aid in organizing, running, and debugging regular expressions against large bodies of text.",
    "version": "0.5.0",
    "split_keywords": [
        "nlp",
        "information extraction"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "face70f45c98f951c15f997405bab3b4f17d22f8cad07ceb6d11528af8c23f89",
                "md5": "0c6e68970a6360c533f7e8e4f4cc8e82",
                "sha256": "abc02b2492699962b45efc62400a3ee35afdb52def21108059ecb1de912545a6"
            },
            "downloads": -1,
            "filename": "runrex-0.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0c6e68970a6360c533f7e8e4f4cc8e82",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 39609,
            "upload_time": "2023-03-24T18:52:53",
            "upload_time_iso_8601": "2023-03-24T18:52:53.063357Z",
            "url": "https://files.pythonhosted.org/packages/fa/ce/70f45c98f951c15f997405bab3b4f17d22f8cad07ceb6d11528af8c23f89/runrex-0.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a4c7309a1e180ba0a7d5090a7e36b58023ced8372df8635cee67b9ff230e9c01",
                "md5": "0091b3c7b9de974823908d19293f4cd1",
                "sha256": "5498b907fe89c54f545a4bdfab266a92dd24f1af01a3e3a74415c164aabffa43"
            },
            "downloads": -1,
            "filename": "runrex-0.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "0091b3c7b9de974823908d19293f4cd1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 54015,
            "upload_time": "2023-03-24T18:52:54",
            "upload_time_iso_8601": "2023-03-24T18:52:54.743752Z",
            "url": "https://files.pythonhosted.org/packages/a4/c7/309a1e180ba0a7d5090a7e36b58023ced8372df8635cee67b9ff230e9c01/runrex-0.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-24 18:52:54",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "runrex"
}
        
Elapsed time: 0.50449s