pippi-lang


Namepippi-lang JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/szymonrucinski/pippi-lang
SummaryA simple package to create elegant nlp pipelines using sklearn.
upload_time2023-02-10 23:56:59
maintainer
docs_urlNone
authorSzymon Ruciński
requires_python
license
keywords python stream sockets
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Text cleaning Pipeline 

[![Build package](https://github.com/szymonrucinski/pippi-lang/actions/workflows/build-pkg.yml/badge.svg)](https://github.com/szymonrucinski/pippi-lang/actions/workflows/build-pkg.yml) [![Check style](https://github.com/szymonrucinski/pippi-lang/actions/workflows/check-style.yml/badge.svg)](https://github.com/szymonrucinski/pippi-lang/actions/workflows/check-style.yml)[![Run Tests](https://github.com/szymonrucinski/pippi-lang/actions/workflows/run-tests.yml/badge.svg)](https://github.com/szymonrucinski/pippi-lang/actions/workflows/run-tests.yml)
___
## Description
This code contains a pipeline for pre-processing text data for sentiment analysis. It includes steps for removing stop words, HTML tags, changing letter size, and removing punctuation.
*Future code will include text-transformations like word-embedding and word-vectorization.*

### Example
Elegant data pipelines are a key component of any data science project. They allow you to automate the process of cleaning, transforming, and analyzing data. This code is a simple example of how to create a pipeline for text data using cutom transformers and the sklearn Pipeline class.

``` python

from pippi import (
    TransformLettersSize,
    RemoveStopWords,
    Lemmatize,
    RemovePunctuation,
    RemoveHTMLTags,
)
from sklearn.pipeline import Pipeline
import pandas as pd

    pipeline = Pipeline(
        steps=[
            ("remove_stop_words", RemoveStopWords(columns=["review","sentiment"])),
            ("remove_html_tags", RemoveHTMLTags(columns=df.columns.to_list())),
            ("uppercase_letters", TransformLettersSize(columns=["sentiment"], case_transform="upper")),
            ("remove_punctuation", RemovePunctuation(columns=["review"])),
        ]
    )
    output = pipeline.fit_transform(df)
    df = pd.DataFrame(output, columns=["review", "sentiment"])

```
Pipeline Visualization:

``` markdown
[RemoveStopWords] -> [RemoveHTMLTags] -> [TransformLettersSize] ->   [RemovePunctuation]
```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/szymonrucinski/pippi-lang",
    "name": "pippi-lang",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "python,stream,sockets",
    "author": "Szymon Ruci\u0144ski",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/d9/d5/13e4af263ad2bd7b9f5d5e88ec0fcb9256176c3b8976b52f229007008ad9/pippi-lang-0.0.2.tar.gz",
    "platform": null,
    "description": "\n# Text cleaning Pipeline \n\n[![Build package](https://github.com/szymonrucinski/pippi-lang/actions/workflows/build-pkg.yml/badge.svg)](https://github.com/szymonrucinski/pippi-lang/actions/workflows/build-pkg.yml) [![Check style](https://github.com/szymonrucinski/pippi-lang/actions/workflows/check-style.yml/badge.svg)](https://github.com/szymonrucinski/pippi-lang/actions/workflows/check-style.yml)[![Run Tests](https://github.com/szymonrucinski/pippi-lang/actions/workflows/run-tests.yml/badge.svg)](https://github.com/szymonrucinski/pippi-lang/actions/workflows/run-tests.yml)\n___\n## Description\nThis code contains a pipeline for pre-processing text data for sentiment analysis. It includes steps for removing stop words, HTML tags, changing letter size, and removing punctuation.\n*Future code will include text-transformations like word-embedding and word-vectorization.*\n\n### Example\nElegant data pipelines are a key component of any data science project. They allow you to automate the process of cleaning, transforming, and analyzing data. This code is a simple example of how to create a pipeline for text data using cutom transformers and the sklearn Pipeline class.\n\n``` python\n\nfrom pippi import (\n    TransformLettersSize,\n    RemoveStopWords,\n    Lemmatize,\n    RemovePunctuation,\n    RemoveHTMLTags,\n)\nfrom sklearn.pipeline import Pipeline\nimport pandas as pd\n\n    pipeline = Pipeline(\n        steps=[\n            (\"remove_stop_words\", RemoveStopWords(columns=[\"review\",\"sentiment\"])),\n            (\"remove_html_tags\", RemoveHTMLTags(columns=df.columns.to_list())),\n            (\"uppercase_letters\", TransformLettersSize(columns=[\"sentiment\"], case_transform=\"upper\")),\n            (\"remove_punctuation\", RemovePunctuation(columns=[\"review\"])),\n        ]\n    )\n    output = pipeline.fit_transform(df)\n    df = pd.DataFrame(output, columns=[\"review\", \"sentiment\"])\n\n```\nPipeline Visualization:\n\n``` markdown\n[RemoveStopWords] -> [RemoveHTMLTags] -> [TransformLettersSize] ->   [RemovePunctuation]\n```\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A simple package to create elegant nlp pipelines using sklearn.",
    "version": "0.0.2",
    "split_keywords": [
        "python",
        "stream",
        "sockets"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "279b916682ccd746db633150d8454d62e50d4d2096464afd52ab2535d94d3e87",
                "md5": "60f5a0cb1d08db5922d3c0e4c24076cb",
                "sha256": "a3f510d06413b43a8a8a5ec7d1702ce608593bbe8c355bfd018565352ca5e79b"
            },
            "downloads": -1,
            "filename": "pippi_lang-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "60f5a0cb1d08db5922d3c0e4c24076cb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9292,
            "upload_time": "2023-02-10T23:56:57",
            "upload_time_iso_8601": "2023-02-10T23:56:57.012647Z",
            "url": "https://files.pythonhosted.org/packages/27/9b/916682ccd746db633150d8454d62e50d4d2096464afd52ab2535d94d3e87/pippi_lang-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d9d513e4af263ad2bd7b9f5d5e88ec0fcb9256176c3b8976b52f229007008ad9",
                "md5": "9cb87a17e16817d7a8234f3bebd4d5f3",
                "sha256": "01548a042fad6770b8551647e6eead9ed85a520074afb96d758a6ba7b2529343"
            },
            "downloads": -1,
            "filename": "pippi-lang-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "9cb87a17e16817d7a8234f3bebd4d5f3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 6300,
            "upload_time": "2023-02-10T23:56:59",
            "upload_time_iso_8601": "2023-02-10T23:56:59.731157Z",
            "url": "https://files.pythonhosted.org/packages/d9/d5/13e4af263ad2bd7b9f5d5e88ec0fcb9256176c3b8976b52f229007008ad9/pippi-lang-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-02-10 23:56:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "szymonrucinski",
    "github_project": "pippi-lang",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "pippi-lang"
}
        
Elapsed time: 0.04800s