# Molda
Molda is a sci-kit learn inspired Python library for text vectorization of corpora. It is adapted to work in pipelines and numpy arrays.
The current version supports many algorithms denoted by the following classes:
* TTestVectorizer
* TficfVectorizer
* ObservedExpectedVectorizer
* LTUVectorizer
* Gref94Vectorizer
* ATCVectorizer
These classes are based on the sci-kit learn's CountVectorizer.
You need to instantiate the vectorizer with the parameters you need, fit and apply the transformations. Here is an example:
```python
from Tficf import TficfVectorizer
corpus = np.array([
"Even though I enjoyed watching that, This is bullshit",
"I really enjoyed watching that",
"I resent watching this video"
])
y = [1, 0, 1]
v = TficfVectorizer()
v.fit(corpus, y)
v.transform(['Hello, there'])
```
Also, you can include the vectorizer in a pipeline, like in the following example:
```python
pipe = Pipeline([
('vectorizer', TficfVectorizer()),
('scaler', StandardScaler(with_mean=False)),
('estimator', SVC())
])
pipe.fit(corpus, y)
pipe.score(corpus, y)
pipe.predict(['This is wonderful'])
```
Molda works with Pandas DataFrames too:
```python
df = pd.read_csv('../irony-labeled.csv')
df = df.dropna()
corpus_ = df['comment_text'].to_numpy()
y_ = df['label'].to_numpy()
v = TficfVectorizer()
v.fit(corpus_, y_)
v.transform(['Hello, there', 'Goodbye'])
```
With love from Sigmoid.
We are open for feedback. Please send your impression to vladimir.stojoc@gmail.com
Raw data
{
"_id": null,
"home_page": "https://github.com/SigmoidAI/molda",
"name": "molda",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "ml,machine learning,natural language processing,python",
"author": "SigmoidAI - Stojoc Vladimir, Smocvin Denis, Butucea Andrei, Sclifos Tudor",
"author_email": "vladimir.stojoc@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/59/ec/8c8e27218d861450917cfc81afbdc0521091be34da3ac0e5d2ee6e15a09e/molda-0.1.2.tar.gz",
"platform": null,
"description": "\n# Molda\n\nMolda is a sci-kit learn inspired Python library for text vectorization of corpora. It is adapted to work in pipelines and numpy arrays.\n\nThe current version supports many algorithms denoted by the following classes:\n\n* TTestVectorizer\n* TficfVectorizer\n* ObservedExpectedVectorizer\n* LTUVectorizer\n* Gref94Vectorizer\n* ATCVectorizer\n\nThese classes are based on the sci-kit learn's CountVectorizer.\n\nYou need to instantiate the vectorizer with the parameters you need, fit and apply the transformations. Here is an example:\n\n```python\nfrom Tficf import TficfVectorizer\n\ncorpus = np.array([\n \"Even though I enjoyed watching that, This is bullshit\",\n \"I really enjoyed watching that\",\n \"I resent watching this video\"\n])\n\ny = [1, 0, 1]\n\nv = TficfVectorizer()\nv.fit(corpus, y)\nv.transform(['Hello, there'])\n```\n\nAlso, you can include the vectorizer in a pipeline, like in the following example:\n\n```python\npipe = Pipeline([\n ('vectorizer', TficfVectorizer()),\n ('scaler', StandardScaler(with_mean=False)),\n ('estimator', SVC())\n ])\npipe.fit(corpus, y)\npipe.score(corpus, y)\npipe.predict(['This is wonderful'])\n```\n\nMolda works with Pandas DataFrames too:\n```python\ndf = pd.read_csv('../irony-labeled.csv')\ndf = df.dropna()\n\ncorpus_ = df['comment_text'].to_numpy()\ny_ = df['label'].to_numpy()\n\nv = TficfVectorizer()\nv.fit(corpus_, y_)\nv.transform(['Hello, there', 'Goodbye'])\n```\n\nWith love from Sigmoid.\n\nWe are open for feedback. Please send your impression to vladimir.stojoc@gmail.com\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Molda is a sci-kit learn inspired Python library for text vectorization of corpora. It is adapted to work in pipelines and numpy arrays.",
"version": "0.1.2",
"split_keywords": [
"ml",
"machine learning",
"natural language processing",
"python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c4ce5b7a5be136f5268e8a341a4755f5cc5816416d9dee11fd75b82f2bfe060e",
"md5": "d374007bf5384aebd5c67d956c124b76",
"sha256": "7617192ab291e3db475d11532e2163163a85928a19e64c2c6217469a7542162f"
},
"downloads": -1,
"filename": "molda-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d374007bf5384aebd5c67d956c124b76",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 23174,
"upload_time": "2023-03-16T18:45:11",
"upload_time_iso_8601": "2023-03-16T18:45:11.285893Z",
"url": "https://files.pythonhosted.org/packages/c4/ce/5b7a5be136f5268e8a341a4755f5cc5816416d9dee11fd75b82f2bfe060e/molda-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "59ec8c8e27218d861450917cfc81afbdc0521091be34da3ac0e5d2ee6e15a09e",
"md5": "aaba176e8926593bc1663f9a4019d190",
"sha256": "b95a25dea0cb813ac33b3afdf8fd3de7ad334000c1de2abb586577babf3029f3"
},
"downloads": -1,
"filename": "molda-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "aaba176e8926593bc1663f9a4019d190",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 15164,
"upload_time": "2023-03-16T18:45:13",
"upload_time_iso_8601": "2023-03-16T18:45:13.558021Z",
"url": "https://files.pythonhosted.org/packages/59/ec/8c8e27218d861450917cfc81afbdc0521091be34da3ac0e5d2ee6e15a09e/molda-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-16 18:45:13",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "SigmoidAI",
"github_project": "molda",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "molda"
}