ai-data-preprocessing-queue


Nameai-data-preprocessing-queue JSON
Version 1.5.0 PyPI version JSON
download
home_pagehttps://github.com/SamhammerAG/ai_data_preprocessing_queue
SummaryCan be used to pre process data before ai processing
upload_time2024-09-16 11:30:33
maintainerNone
docs_urlNone
authorSamhammer AG
requires_python>=3.12
licenseMIT
keywords text processing ai
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ai-data-preprocessing-queue
[![Maintainability][codeclimate-image]][codeclimate-url]
[![Coverage Status][coveralls-image]][coveralls-url]
[![Known Vulnerabilities][snyk-image]][snyk-url]

## What it does
This tool is intended for preparing data for further processing.
It contains different text processing steps that can be enabled or disabled dynamically.


### Installation
pip install ai-data-preprocessing-queue

## How to use
```python
from ai_data_preprocessing_queue import Pipeline

state = {}
pre_processor_dict = {
  'to_lower' : None,
  'spellcheck' : 'test\r\ntesting'
}
pipeline = Pipeline(pre_processor_dict)
value = pipeline.consume('Input text', state)
```

`state` is optional here and can be used to cache preprocessing data between pipeline calls.

The preprocessors that the pipeline should use have to be transmitted as keys within a dictionary.
Some preprocessors also require additional data to function.
The data must be converted into string form and assigned to its preprocessor within the dictionary.

This dictionary then needs to be transmitted to the pipeline through its constructor.

Note: Pipeline has to be instantiated only once and can be reused.

## Existing preprocessors

### To Lower Case
Name: to_lower 

Required additional data: -

Converts the text to lower case characters.

### Remove Numbers
Name: remove_numbers

Required additional data: -

Removes all numbers from the text.

### Remove Punctuation
Name: remove_punctuation

Required additional data: -

Removes all special characters from the text.

### Text only
Name: text_only

Required additional data: -

Removes all special characters and numbers from the text.

### Spellcheck (Levenshtein)
Name: spellcheck

Required additional data: A string containing words, separated by newline, i.e. "word1\r\nword2"

Takes a list of words representing the correct spelling. Words within the given text that are close to a word from this list will be replaced with the listed word.

### Regex replacement
Name: regex_replacement

Required additional data: CSV data in string form with the following line format: <pattern>,<replacement>,<order>
  - pattern: a regex pattern that is to be found within the text
  - replacement: the word/text by which any match should be replaced
  - order: the order in which the regex entries are supposed to be applied (lowest number will be applied first!)

This preprocessor will search for occurrences of specific entities in your text and replace them by a specified pattern.

### Token Replacement
Name: token_replacement

Required additional data: CSV data in string form with the following line format: <text>,<replacement>,<order>
  - text: one or multiple words to search within the text
  - replacement: the word/text by which any match should be replaced
  - order: the order in which the entries are supposed to be applied (largest number will be applied first!)

With this preprocessor you can replace specific words and abbreviations within the text with specified tokens. It is also possible to replace abbreviations ending with a dot. Other special characters are not supported, though.

## How to start developing

### With VS Code

Just install VS Code with the Dev Containers extension. All required extensions and configurations are prepared automatically.

### With PyCharm

* Install the latest PyCharm version
* Install PyCharm plugin BlackConnect
* Install PyCharm plugin Mypy
* Configure the Python interpreter/venv
* pip install requirements-dev.txt
* pip install black[d]
* Ctl+Alt+S => Check Tools => BlackConnect => Trigger when saving changed files
* Ctl+Alt+S => Check Tools => BlackConnect => Trigger on code reformat
* Ctl+Alt+S => Click Tools => BlackConnect => "Load from pyproject.yaml" (ensure line length is 120)
* Ctl+Alt+S => Click Tools => BlackConnect => Configure path to the blackd.exe at the "local instance" config (e.g. C:\Python310\Scripts\blackd.exe)
* Ctl+Alt+S => Click Tools => Actions on save => Reformat code
* Restart PyCharm

## How to publish
* Update the version in setup.py and commit your change
* Create a tag with the same version number
* Let GitHub do the rest

[codeclimate-image]:https://api.codeclimate.com/v1/badges/bcde3599d064f687803f/maintainability
[codeclimate-url]:https://codeclimate.com/github/SamhammerAG/ai-data-preprocessing-queue/maintainability
[coveralls-image]:https://coveralls.io/repos/github/SamhammerAG/ai-data-preprocessing-queue/badge.svg?branch=master
[coveralls-url]:https://coveralls.io/github/SamhammerAG/ai-data-preprocessing-queue?branch=master
[snyk-image]:https://snyk.io/test/github/SamhammerAG/ai-data-preprocessing-queue/badge.svg
[snyk-url]:https://snyk.io/test/github/SamhammerAG/ai-data-preprocessing-queue

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SamhammerAG/ai_data_preprocessing_queue",
    "name": "ai-data-preprocessing-queue",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "text processing, ai",
    "author": "Samhammer AG",
    "author_email": "support@samhammer.de",
    "download_url": "https://files.pythonhosted.org/packages/5c/81/0657c9e5e5fe73d2fb6ddd2629d6d11c1f676f2c146260f03299b43bbeaf/ai_data_preprocessing_queue-1.5.0.tar.gz",
    "platform": null,
    "description": "# ai-data-preprocessing-queue\n[![Maintainability][codeclimate-image]][codeclimate-url]\n[![Coverage Status][coveralls-image]][coveralls-url]\n[![Known Vulnerabilities][snyk-image]][snyk-url]\n\n## What it does\nThis tool is intended for preparing data for further processing.\nIt contains different text processing steps that can be enabled or disabled dynamically.\n\n\n### Installation\npip install ai-data-preprocessing-queue\n\n## How to use\n```python\nfrom ai_data_preprocessing_queue import Pipeline\n\nstate = {}\npre_processor_dict = {\n  'to_lower' : None,\n  'spellcheck' : 'test\\r\\ntesting'\n}\npipeline = Pipeline(pre_processor_dict)\nvalue = pipeline.consume('Input text', state)\n```\n\n`state` is optional here and can be used to cache preprocessing data between pipeline calls.\n\nThe preprocessors that the pipeline should use have to be transmitted as keys within a dictionary.\nSome preprocessors also require additional data to function.\nThe data must be converted into string form and assigned to its preprocessor within the dictionary.\n\nThis dictionary then needs to be transmitted to the pipeline through its constructor.\n\nNote: Pipeline has to be instantiated only once and can be reused.\n\n## Existing preprocessors\n\n### To Lower Case\nName: to_lower \n\nRequired additional data: -\n\nConverts the text to lower case characters.\n\n### Remove Numbers\nName: remove_numbers\n\nRequired additional data: -\n\nRemoves all numbers from the text.\n\n### Remove Punctuation\nName: remove_punctuation\n\nRequired additional data: -\n\nRemoves all special characters from the text.\n\n### Text only\nName: text_only\n\nRequired additional data: -\n\nRemoves all special characters and numbers from the text.\n\n### Spellcheck (Levenshtein)\nName: spellcheck\n\nRequired additional data: A string containing words, separated by newline, i.e. \"word1\\r\\nword2\"\n\nTakes a list of words representing the correct spelling. Words within the given text that are close to a word from this list will be replaced with the listed word.\n\n### Regex replacement\nName: regex_replacement\n\nRequired additional data: CSV data in string form with the following line format: <pattern>,<replacement>,<order>\n  - pattern: a regex pattern that is to be found within the text\n  - replacement: the word/text by which any match should be replaced\n  - order: the order in which the regex entries are supposed to be applied (lowest number will be applied first!)\n\nThis preprocessor will search for occurrences of specific entities in your text and replace them by a specified pattern.\n\n### Token Replacement\nName: token_replacement\n\nRequired additional data: CSV data in string form with the following line format: <text>,<replacement>,<order>\n  - text: one or multiple words to search within the text\n  - replacement: the word/text by which any match should be replaced\n  - order: the order in which the entries are supposed to be applied (largest number will be applied first!)\n\nWith this preprocessor you can replace specific words and abbreviations within the text with specified tokens. It is also possible to replace abbreviations ending with a dot. Other special characters are not supported, though.\n\n## How to start developing\n\n### With VS Code\n\nJust install VS Code with the Dev Containers extension. All required extensions and configurations are prepared automatically.\n\n### With PyCharm\n\n* Install the latest PyCharm version\n* Install PyCharm plugin BlackConnect\n* Install PyCharm plugin Mypy\n* Configure the Python interpreter/venv\n* pip install requirements-dev.txt\n* pip install black[d]\n* Ctl+Alt+S => Check Tools => BlackConnect => Trigger when saving changed files\n* Ctl+Alt+S => Check Tools => BlackConnect => Trigger on code reformat\n* Ctl+Alt+S => Click Tools => BlackConnect => \"Load from pyproject.yaml\" (ensure line length is 120)\n* Ctl+Alt+S => Click Tools => BlackConnect => Configure path to the blackd.exe at the \"local instance\" config (e.g. C:\\Python310\\Scripts\\blackd.exe)\n* Ctl+Alt+S => Click Tools => Actions on save => Reformat code\n* Restart PyCharm\n\n## How to publish\n* Update the version in setup.py and commit your change\n* Create a tag with the same version number\n* Let GitHub do the rest\n\n[codeclimate-image]:https://api.codeclimate.com/v1/badges/bcde3599d064f687803f/maintainability\n[codeclimate-url]:https://codeclimate.com/github/SamhammerAG/ai-data-preprocessing-queue/maintainability\n[coveralls-image]:https://coveralls.io/repos/github/SamhammerAG/ai-data-preprocessing-queue/badge.svg?branch=master\n[coveralls-url]:https://coveralls.io/github/SamhammerAG/ai-data-preprocessing-queue?branch=master\n[snyk-image]:https://snyk.io/test/github/SamhammerAG/ai-data-preprocessing-queue/badge.svg\n[snyk-url]:https://snyk.io/test/github/SamhammerAG/ai-data-preprocessing-queue\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Can be used to pre process data before ai processing",
    "version": "1.5.0",
    "project_urls": {
        "Bug Reports": "https://github.com/SamhammerAG/ai_data_preprocessing_queue/issues",
        "Documentation": "https://github.com/SamhammerAG/ai_data_preprocessing_queue",
        "Homepage": "https://github.com/SamhammerAG/ai_data_preprocessing_queue",
        "Source": "https://github.com/SamhammerAG/ai_data_preprocessing_queue"
    },
    "split_keywords": [
        "text processing",
        " ai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9bfe88f3995930e8feca06cd100517d78208cbf4c4b2b532f1965e4805b6e553",
                "md5": "bfa8e1e0bdb54c02e8f2dd714ae701be",
                "sha256": "bb41bba0b93b9428b78dbca7438aa66af43a67a545c67bcd3a53537b09fe82a6"
            },
            "downloads": -1,
            "filename": "ai_data_preprocessing_queue-1.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bfa8e1e0bdb54c02e8f2dd714ae701be",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 11036,
            "upload_time": "2024-09-16T11:30:32",
            "upload_time_iso_8601": "2024-09-16T11:30:32.017850Z",
            "url": "https://files.pythonhosted.org/packages/9b/fe/88f3995930e8feca06cd100517d78208cbf4c4b2b532f1965e4805b6e553/ai_data_preprocessing_queue-1.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5c810657c9e5e5fe73d2fb6ddd2629d6d11c1f676f2c146260f03299b43bbeaf",
                "md5": "4651e456e812bb23ff90035438ad9d27",
                "sha256": "12b1b08fbd600a7908412324a54bb4902841b3835722314fb17dcac1d4c5ce15"
            },
            "downloads": -1,
            "filename": "ai_data_preprocessing_queue-1.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4651e456e812bb23ff90035438ad9d27",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 11256,
            "upload_time": "2024-09-16T11:30:33",
            "upload_time_iso_8601": "2024-09-16T11:30:33.507804Z",
            "url": "https://files.pythonhosted.org/packages/5c/81/0657c9e5e5fe73d2fb6ddd2629d6d11c1f676f2c146260f03299b43bbeaf/ai_data_preprocessing_queue-1.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-16 11:30:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SamhammerAG",
    "github_project": "ai_data_preprocessing_queue",
    "github_not_found": true,
    "lcname": "ai-data-preprocessing-queue"
}
        
Elapsed time: 0.67639s