molda

Name	molda JSON
Version	0.1.2 JSON
	download
home_page	https://github.com/SigmoidAI/molda
Summary	Molda is a sci-kit learn inspired Python library for text vectorization of corpora. It is adapted to work in pipelines and numpy arrays.
upload_time	2023-03-16 18:45:13
maintainer
docs_url	None
author	SigmoidAI - Stojoc Vladimir, Smocvin Denis, Butucea Andrei, Sclifos Tudor
requires_python
license	MIT
keywords	ml machine learning natural language processing python
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            
# Molda

Molda is a sci-kit learn inspired Python library for text vectorization of corpora. It is adapted to work in pipelines and numpy arrays.

The current version supports many algorithms denoted by the following classes:

* TTestVectorizer
* TficfVectorizer
* ObservedExpectedVectorizer
* LTUVectorizer
* Gref94Vectorizer
* ATCVectorizer

These classes are based on the sci-kit learn's CountVectorizer.

You need to instantiate the vectorizer with the parameters you need, fit and apply the transformations. Here is an example:

```python
from Tficf import TficfVectorizer

corpus = np.array([
    "Even though I enjoyed watching that, This is bullshit",
    "I really enjoyed watching that",
    "I resent watching this video"
])

y = [1, 0, 1]

v = TficfVectorizer()
v.fit(corpus, y)
v.transform(['Hello, there'])
```

Also, you can include the vectorizer in a pipeline, like in the following example:

```python
pipe = Pipeline([
            ('vectorizer', TficfVectorizer()),
            ('scaler', StandardScaler(with_mean=False)),
            ('estimator', SVC())
        ])
pipe.fit(corpus, y)
pipe.score(corpus, y)
pipe.predict(['This is wonderful'])
```

Molda works with Pandas DataFrames too:
```python
df = pd.read_csv('../irony-labeled.csv')
df = df.dropna()

corpus_ = df['comment_text'].to_numpy()
y_ = df['label'].to_numpy()

v = TficfVectorizer()
v.fit(corpus_, y_)
v.transform(['Hello, there', 'Goodbye'])
```

With love from Sigmoid.

We are open for feedback. Please send your impression to vladimir.stojoc@gmail.com

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SigmoidAI/molda",
    "name": "molda",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "ml,machine learning,natural language processing,python",
    "author": "SigmoidAI - Stojoc Vladimir, Smocvin Denis, Butucea Andrei, Sclifos Tudor",
    "author_email": "vladimir.stojoc@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/59/ec/8c8e27218d861450917cfc81afbdc0521091be34da3ac0e5d2ee6e15a09e/molda-0.1.2.tar.gz",
    "platform": null,
    "description": "\n# Molda\n\nMolda is a sci-kit learn inspired Python library for text vectorization of corpora. It is adapted to work in pipelines and numpy arrays.\n\nThe current version supports many algorithms denoted by the following classes:\n\n* TTestVectorizer\n* TficfVectorizer\n* ObservedExpectedVectorizer\n* LTUVectorizer\n* Gref94Vectorizer\n* ATCVectorizer\n\nThese classes are based on the sci-kit learn's CountVectorizer.\n\nYou need to instantiate the vectorizer with the parameters you need, fit and apply the transformations. Here is an example:\n\n```python\nfrom Tficf import TficfVectorizer\n\ncorpus = np.array([\n    \"Even though I enjoyed watching that, This is bullshit\",\n    \"I really enjoyed watching that\",\n    \"I resent watching this video\"\n])\n\ny = [1, 0, 1]\n\nv = TficfVectorizer()\nv.fit(corpus, y)\nv.transform(['Hello, there'])\n```\n\nAlso, you can include the vectorizer in a pipeline, like in the following example:\n\n```python\npipe = Pipeline([\n            ('vectorizer', TficfVectorizer()),\n            ('scaler', StandardScaler(with_mean=False)),\n            ('estimator', SVC())\n        ])\npipe.fit(corpus, y)\npipe.score(corpus, y)\npipe.predict(['This is wonderful'])\n```\n\nMolda works with Pandas DataFrames too:\n```python\ndf = pd.read_csv('../irony-labeled.csv')\ndf = df.dropna()\n\ncorpus_ = df['comment_text'].to_numpy()\ny_ = df['label'].to_numpy()\n\nv = TficfVectorizer()\nv.fit(corpus_, y_)\nv.transform(['Hello, there', 'Goodbye'])\n```\n\nWith love from Sigmoid.\n\nWe are open for feedback. Please send your impression to vladimir.stojoc@gmail.com\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Molda is a sci-kit learn inspired Python library for text vectorization of corpora. It is adapted to work in pipelines and numpy arrays.",
    "version": "0.1.2",
    "split_keywords": [
        "ml",
        "machine learning",
        "natural language processing",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c4ce5b7a5be136f5268e8a341a4755f5cc5816416d9dee11fd75b82f2bfe060e",
                "md5": "d374007bf5384aebd5c67d956c124b76",
                "sha256": "7617192ab291e3db475d11532e2163163a85928a19e64c2c6217469a7542162f"
            },
            "downloads": -1,
            "filename": "molda-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d374007bf5384aebd5c67d956c124b76",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 23174,
            "upload_time": "2023-03-16T18:45:11",
            "upload_time_iso_8601": "2023-03-16T18:45:11.285893Z",
            "url": "https://files.pythonhosted.org/packages/c4/ce/5b7a5be136f5268e8a341a4755f5cc5816416d9dee11fd75b82f2bfe060e/molda-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "59ec8c8e27218d861450917cfc81afbdc0521091be34da3ac0e5d2ee6e15a09e",
                "md5": "aaba176e8926593bc1663f9a4019d190",
                "sha256": "b95a25dea0cb813ac33b3afdf8fd3de7ad334000c1de2abb586577babf3029f3"
            },
            "downloads": -1,
            "filename": "molda-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "aaba176e8926593bc1663f9a4019d190",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 15164,
            "upload_time": "2023-03-16T18:45:13",
            "upload_time_iso_8601": "2023-03-16T18:45:13.558021Z",
            "url": "https://files.pythonhosted.org/packages/59/ec/8c8e27218d861450917cfc81afbdc0521091be34da3ac0e5d2ee6e15a09e/molda-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-16 18:45:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "SigmoidAI",
    "github_project": "molda",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "molda"
}

SigmoidAI - Stojoc Vladimir, Smocvin Denis, Butucea Andrei, Sclifos Tudor