molda


Namemolda JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/SigmoidAI/molda
SummaryMolda is a sci-kit learn inspired Python library for text vectorization of corpora. It is adapted to work in pipelines and numpy arrays.
upload_time2023-03-16 18:45:13
maintainer
docs_urlNone
authorSigmoidAI - Stojoc Vladimir, Smocvin Denis, Butucea Andrei, Sclifos Tudor
requires_python
licenseMIT
keywords ml machine learning natural language processing python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Molda

Molda is a sci-kit learn inspired Python library for text vectorization of corpora. It is adapted to work in pipelines and numpy arrays.

The current version supports many algorithms denoted by the following classes:

* TTestVectorizer
* TficfVectorizer
* ObservedExpectedVectorizer
* LTUVectorizer
* Gref94Vectorizer
* ATCVectorizer

These classes are based on the sci-kit learn's CountVectorizer.

You need to instantiate the vectorizer with the parameters you need, fit and apply the transformations. Here is an example:

```python
from Tficf import TficfVectorizer

corpus = np.array([
    "Even though I enjoyed watching that, This is bullshit",
    "I really enjoyed watching that",
    "I resent watching this video"
])

y = [1, 0, 1]

v = TficfVectorizer()
v.fit(corpus, y)
v.transform(['Hello, there'])
```

Also, you can include the vectorizer in a pipeline, like in the following example:

```python
pipe = Pipeline([
            ('vectorizer', TficfVectorizer()),
            ('scaler', StandardScaler(with_mean=False)),
            ('estimator', SVC())
        ])
pipe.fit(corpus, y)
pipe.score(corpus, y)
pipe.predict(['This is wonderful'])
```

Molda works with Pandas DataFrames too:
```python
df = pd.read_csv('../irony-labeled.csv')
df = df.dropna()

corpus_ = df['comment_text'].to_numpy()
y_ = df['label'].to_numpy()

v = TficfVectorizer()
v.fit(corpus_, y_)
v.transform(['Hello, there', 'Goodbye'])
```

With love from Sigmoid.

We are open for feedback. Please send your impression to vladimir.stojoc@gmail.com


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SigmoidAI/molda",
    "name": "molda",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "ml,machine learning,natural language processing,python",
    "author": "SigmoidAI - Stojoc Vladimir, Smocvin Denis, Butucea Andrei, Sclifos Tudor",
    "author_email": "vladimir.stojoc@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/59/ec/8c8e27218d861450917cfc81afbdc0521091be34da3ac0e5d2ee6e15a09e/molda-0.1.2.tar.gz",
    "platform": null,
    "description": "\n# Molda\n\nMolda is a sci-kit learn inspired Python library for text vectorization of corpora. It is adapted to work in pipelines and numpy arrays.\n\nThe current version supports many algorithms denoted by the following classes:\n\n* TTestVectorizer\n* TficfVectorizer\n* ObservedExpectedVectorizer\n* LTUVectorizer\n* Gref94Vectorizer\n* ATCVectorizer\n\nThese classes are based on the sci-kit learn's CountVectorizer.\n\nYou need to instantiate the vectorizer with the parameters you need, fit and apply the transformations. Here is an example:\n\n```python\nfrom Tficf import TficfVectorizer\n\ncorpus = np.array([\n    \"Even though I enjoyed watching that, This is bullshit\",\n    \"I really enjoyed watching that\",\n    \"I resent watching this video\"\n])\n\ny = [1, 0, 1]\n\nv = TficfVectorizer()\nv.fit(corpus, y)\nv.transform(['Hello, there'])\n```\n\nAlso, you can include the vectorizer in a pipeline, like in the following example:\n\n```python\npipe = Pipeline([\n            ('vectorizer', TficfVectorizer()),\n            ('scaler', StandardScaler(with_mean=False)),\n            ('estimator', SVC())\n        ])\npipe.fit(corpus, y)\npipe.score(corpus, y)\npipe.predict(['This is wonderful'])\n```\n\nMolda works with Pandas DataFrames too:\n```python\ndf = pd.read_csv('../irony-labeled.csv')\ndf = df.dropna()\n\ncorpus_ = df['comment_text'].to_numpy()\ny_ = df['label'].to_numpy()\n\nv = TficfVectorizer()\nv.fit(corpus_, y_)\nv.transform(['Hello, there', 'Goodbye'])\n```\n\nWith love from Sigmoid.\n\nWe are open for feedback. Please send your impression to vladimir.stojoc@gmail.com\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Molda is a sci-kit learn inspired Python library for text vectorization of corpora. It is adapted to work in pipelines and numpy arrays.",
    "version": "0.1.2",
    "split_keywords": [
        "ml",
        "machine learning",
        "natural language processing",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c4ce5b7a5be136f5268e8a341a4755f5cc5816416d9dee11fd75b82f2bfe060e",
                "md5": "d374007bf5384aebd5c67d956c124b76",
                "sha256": "7617192ab291e3db475d11532e2163163a85928a19e64c2c6217469a7542162f"
            },
            "downloads": -1,
            "filename": "molda-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d374007bf5384aebd5c67d956c124b76",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 23174,
            "upload_time": "2023-03-16T18:45:11",
            "upload_time_iso_8601": "2023-03-16T18:45:11.285893Z",
            "url": "https://files.pythonhosted.org/packages/c4/ce/5b7a5be136f5268e8a341a4755f5cc5816416d9dee11fd75b82f2bfe060e/molda-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "59ec8c8e27218d861450917cfc81afbdc0521091be34da3ac0e5d2ee6e15a09e",
                "md5": "aaba176e8926593bc1663f9a4019d190",
                "sha256": "b95a25dea0cb813ac33b3afdf8fd3de7ad334000c1de2abb586577babf3029f3"
            },
            "downloads": -1,
            "filename": "molda-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "aaba176e8926593bc1663f9a4019d190",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 15164,
            "upload_time": "2023-03-16T18:45:13",
            "upload_time_iso_8601": "2023-03-16T18:45:13.558021Z",
            "url": "https://files.pythonhosted.org/packages/59/ec/8c8e27218d861450917cfc81afbdc0521091be34da3ac0e5d2ee6e15a09e/molda-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-16 18:45:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "SigmoidAI",
    "github_project": "molda",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "molda"
}
        
Elapsed time: 0.05378s