# Textmentations
Textmentations is a Python library for augmenting Korean text.
Inspired by [albumentations](https://github.com/albumentations-team/albumentations).
Textmentations uses the albumentations as a dependency.
## Installation
```
pip install textmentations
```
## A simple example
Textmentations provides text augmentation techniques implemented using the [TextTransform](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/core/transforms_interface.py#L19),
which inherits from the albumentations [BasicTransform](https://github.com/albumentations-team/albumentations/blob/1.4.14/albumentations/core/transforms_interface.py#L48).
This allows textmentations to reuse the existing functionalities of albumentations.
```python
import textmentations as T
text = "어제 식당에 갔다. 목이 너무 말랐다. 먼저 물 한 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다."
rd = T.RandomDeletion(deletion_prob=0.1, min_words_per_sentence=0.8)
ri = T.RandomInsertion(insertion_prob=0.2, n_times=1)
rs = T.RandomSwap(alpha=1)
sr = T.SynonymReplacement(replacement_prob=0.2)
eda = T.Compose([rd, ri, rs, sr])
print(rd(text=text)["text"])
# 식당에 갔다. 목이 너무 말랐다. 먼저 물 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다.
print(ri(text=text)["text"])
# 어제 최근 식당에 갔다. 목이 너무 말랐다. 먼저 물 한 잔을 마셨다 음료수. 그리고 탕수육을 맛있게 먹었다.
print(rs(text=text)["text"])
# 어제 갔다 식당에. 목이 너무 말랐다. 물 먼저 한 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다..
print(sr(text=text)["text"])
# 과거 식당에 갔다. 목이 너무 말랐다. 먼저 소주 한 잔을 마셨다. 그리고 탕수육을 맛있게 먹었다.
print(eda(text=text)["text"])
# 식당에 어제 과거 갔다. 너무 말랐다. 먼저 상수 한 잔을 마셨다 맹물. 그리고 맛있게 먹었다.
```
## List of augmentations
- [AEDA](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/modification/transforms.py#L13)
- [BackTranslation](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/generation/transforms.py#L21)
- [ContextualInsertion](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/generation/transforms.py#L67)
- [ContextualReplacement](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/generation/transforms.py#L128)
- [IterativeMaskFilling](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/generation/transforms.py#L193)
- [RandomDeletion](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/modification/transforms.py#L105)
- [RandomDeletionSentence](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/modification/transforms.py#L177)
- [RandomInsertion](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/modification/transforms.py#L262)
- [RandomSwap](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/modification/transforms.py#L312)
- [RandomSwapSentence](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/modification/transforms.py#L371)
- [SynonymReplacement](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/modification/transforms.py#L411)
## References
- [AEDA: An Easier Data Augmentation Technique for Text Classification](https://arxiv.org/pdf/2108.13230)
- [Conditional BERT Contextual Augmentation](https://arxiv.org/pdf/1812.06705)
- [Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations](https://arxiv.org/pdf/1805.06201)
- [EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks](https://arxiv.org/pdf/1901.11196)
- [Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling](https://arxiv.org/pdf/2401.01830)
- [Korean Stopwords](https://www.ranks.nl/stopwords/korean)
- [Korean WordNet](http://wordnet.kaist.ac.kr/)
- [albumentations](https://github.com/albumentations-team/albumentations)
- [kykim/albert-kor-base](https://huggingface.co/kykim/albert-kor-base)
Raw data
{
"_id": null,
"home_page": "https://github.com/Jaesu26/textmentations",
"name": "textmentations",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "text augmentation, text classification",
"author": "Jaesu Han",
"author_email": "gkswotn9753@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/2b/94/eb13efb0b226fab8d120c69ed4560b8d39100f30e7e49283e339d46d3a1d/textmentations-1.4.0.tar.gz",
"platform": null,
"description": "# Textmentations\r\n\r\nTextmentations is a Python library for augmenting Korean text.\r\nInspired by [albumentations](https://github.com/albumentations-team/albumentations).\r\nTextmentations uses the albumentations as a dependency.\r\n\r\n## Installation\r\n\r\n```\r\npip install textmentations\r\n```\r\n\r\n## A simple example\r\n\r\nTextmentations provides text augmentation techniques implemented using the [TextTransform](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/core/transforms_interface.py#L19),\r\nwhich inherits from the albumentations [BasicTransform](https://github.com/albumentations-team/albumentations/blob/1.4.14/albumentations/core/transforms_interface.py#L48).\r\n\r\nThis allows textmentations to reuse the existing functionalities of albumentations.\r\n\r\n```python\r\nimport textmentations as T\r\n\r\ntext = \"\uc5b4\uc81c \uc2dd\ub2f9\uc5d0 \uac14\ub2e4. \ubaa9\uc774 \ub108\ubb34 \ub9d0\ub790\ub2e4. \uba3c\uc800 \ubb3c \ud55c \uc794\uc744 \ub9c8\uc168\ub2e4. \uadf8\ub9ac\uace0 \ud0d5\uc218\uc721\uc744 \ub9db\uc788\uac8c \uba39\uc5c8\ub2e4.\"\r\nrd = T.RandomDeletion(deletion_prob=0.1, min_words_per_sentence=0.8)\r\nri = T.RandomInsertion(insertion_prob=0.2, n_times=1)\r\nrs = T.RandomSwap(alpha=1)\r\nsr = T.SynonymReplacement(replacement_prob=0.2)\r\neda = T.Compose([rd, ri, rs, sr])\r\n\r\nprint(rd(text=text)[\"text\"])\r\n# \uc2dd\ub2f9\uc5d0 \uac14\ub2e4. \ubaa9\uc774 \ub108\ubb34 \ub9d0\ub790\ub2e4. \uba3c\uc800 \ubb3c \uc794\uc744 \ub9c8\uc168\ub2e4. \uadf8\ub9ac\uace0 \ud0d5\uc218\uc721\uc744 \ub9db\uc788\uac8c \uba39\uc5c8\ub2e4.\r\n\r\nprint(ri(text=text)[\"text\"])\r\n# \uc5b4\uc81c \ucd5c\uadfc \uc2dd\ub2f9\uc5d0 \uac14\ub2e4. \ubaa9\uc774 \ub108\ubb34 \ub9d0\ub790\ub2e4. \uba3c\uc800 \ubb3c \ud55c \uc794\uc744 \ub9c8\uc168\ub2e4 \uc74c\ub8cc\uc218. \uadf8\ub9ac\uace0 \ud0d5\uc218\uc721\uc744 \ub9db\uc788\uac8c \uba39\uc5c8\ub2e4.\r\n\r\nprint(rs(text=text)[\"text\"])\r\n# \uc5b4\uc81c \uac14\ub2e4 \uc2dd\ub2f9\uc5d0. \ubaa9\uc774 \ub108\ubb34 \ub9d0\ub790\ub2e4. \ubb3c \uba3c\uc800 \ud55c \uc794\uc744 \ub9c8\uc168\ub2e4. \uadf8\ub9ac\uace0 \ud0d5\uc218\uc721\uc744 \ub9db\uc788\uac8c \uba39\uc5c8\ub2e4..\r\n\r\nprint(sr(text=text)[\"text\"])\r\n# \uacfc\uac70 \uc2dd\ub2f9\uc5d0 \uac14\ub2e4. \ubaa9\uc774 \ub108\ubb34 \ub9d0\ub790\ub2e4. \uba3c\uc800 \uc18c\uc8fc \ud55c \uc794\uc744 \ub9c8\uc168\ub2e4. \uadf8\ub9ac\uace0 \ud0d5\uc218\uc721\uc744 \ub9db\uc788\uac8c \uba39\uc5c8\ub2e4.\r\n\r\nprint(eda(text=text)[\"text\"])\r\n# \uc2dd\ub2f9\uc5d0 \uc5b4\uc81c \uacfc\uac70 \uac14\ub2e4. \ub108\ubb34 \ub9d0\ub790\ub2e4. \uba3c\uc800 \uc0c1\uc218 \ud55c \uc794\uc744 \ub9c8\uc168\ub2e4 \ub9f9\ubb3c. \uadf8\ub9ac\uace0 \ub9db\uc788\uac8c \uba39\uc5c8\ub2e4.\r\n```\r\n\r\n## List of augmentations\r\n\r\n- [AEDA](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/modification/transforms.py#L13)\r\n- [BackTranslation](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/generation/transforms.py#L21)\r\n- [ContextualInsertion](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/generation/transforms.py#L67)\r\n- [ContextualReplacement](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/generation/transforms.py#L128)\r\n- [IterativeMaskFilling](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/generation/transforms.py#L193)\r\n- [RandomDeletion](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/modification/transforms.py#L105)\r\n- [RandomDeletionSentence](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/modification/transforms.py#L177)\r\n- [RandomInsertion](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/modification/transforms.py#L262)\r\n- [RandomSwap](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/modification/transforms.py#L312)\r\n- [RandomSwapSentence](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/modification/transforms.py#L371)\r\n- [SynonymReplacement](https://github.com/Jaesu26/textmentations/blob/v1.4.0/textmentations/augmentations/modification/transforms.py#L411)\r\n\r\n## References\r\n\r\n- [AEDA: An Easier Data Augmentation Technique for Text Classification](https://arxiv.org/pdf/2108.13230)\r\n- [Conditional BERT Contextual Augmentation](https://arxiv.org/pdf/1812.06705)\r\n- [Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations](https://arxiv.org/pdf/1805.06201)\r\n- [EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks](https://arxiv.org/pdf/1901.11196)\r\n- [Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling](https://arxiv.org/pdf/2401.01830)\r\n- [Korean Stopwords](https://www.ranks.nl/stopwords/korean)\r\n- [Korean WordNet](http://wordnet.kaist.ac.kr/)\r\n- [albumentations](https://github.com/albumentations-team/albumentations)\r\n- [kykim/albert-kor-base](https://huggingface.co/kykim/albert-kor-base)\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python library for augmenting Korean text.",
"version": "1.4.0",
"project_urls": {
"Homepage": "https://github.com/Jaesu26/textmentations"
},
"split_keywords": [
"text augmentation",
" text classification"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "11f31953d24f57ebaf7997b65e81df8fbca4bfa6540bae88b8a6680d595f2749",
"md5": "a13a8d39b558ce80d8cadf886168fa24",
"sha256": "0cde4484974ac184c88cc96cb17f304625c7357ff6bbb4d73c3cf341b156e11d"
},
"downloads": -1,
"filename": "textmentations-1.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a13a8d39b558ce80d8cadf886168fa24",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 49847440,
"upload_time": "2024-11-05T04:10:41",
"upload_time_iso_8601": "2024-11-05T04:10:41.632414Z",
"url": "https://files.pythonhosted.org/packages/11/f3/1953d24f57ebaf7997b65e81df8fbca4bfa6540bae88b8a6680d595f2749/textmentations-1.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2b94eb13efb0b226fab8d120c69ed4560b8d39100f30e7e49283e339d46d3a1d",
"md5": "932501f77344f0a0d755ea5fa3166b7a",
"sha256": "77b1816077c08cc2956f698d8a9825804f09b369a89e4ab50ca36e27839930b1"
},
"downloads": -1,
"filename": "textmentations-1.4.0.tar.gz",
"has_sig": false,
"md5_digest": "932501f77344f0a0d755ea5fa3166b7a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 49848031,
"upload_time": "2024-11-05T04:10:50",
"upload_time_iso_8601": "2024-11-05T04:10:50.859231Z",
"url": "https://files.pythonhosted.org/packages/2b/94/eb13efb0b226fab8d120c69ed4560b8d39100f30e7e49283e339d46d3a1d/textmentations-1.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-05 04:10:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Jaesu26",
"github_project": "textmentations",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "textmentations"
}