bnaug


Namebnaug JSON
Version 1.1.2 PyPI version JSON
download
home_pagehttps://github.com/sagorbrur/bnaug
Summarybnaug is a text augmentation tool for Bangla text.
upload_time2023-08-30 16:29:07
maintainer
docs_urlNone
authorSagor Sarker
requires_python>=3.7
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # bnaug (Bangla Text Augmentation)
__bnaug__ is a text augmentation tool for Bangla text.

## Installation
```
pip install bnaug
```
- Dependencies
    - pytorch >=1.7.0
    
## Demo Notebook
- [bnaug demo](https://github.com/sagorbrur/bnaug/blob/main/notebook/bnaug_demo.ipynb)

## Necessary Model Links
- [word2vec](https://huggingface.co/sagorsarker/bangla_word2vec/resolve/main/bangla_word2vec_gen4.zip)
- [glove vector](https://huggingface.co/sagorsarker/bangla-glove-vectors/resolve/main/bn_glove.300d.zip)

## Sentence Augmentation
### Token Replacement
- Mask generation based augmentation

    ```py
    from bnaug.sentence import TokenReplacement

    tokr = TokenReplacement()
    text = "আমি ঢাকায় বাস করি।"
    output = tokr.masking_based(text, sen_n=5)
    ```

- Word2Vec based augmentation

    ```py
    from bnaug.sentence import TokenReplacement

    tokr = TokenReplacement()
    text = "আমি ঢাকায় বাস করি।"
    model = "msc/bangla_word2vec/bnwiki_word2vec.model"
    output = tokr.word2vec_based(text, model=model, sen_n=5, word_n=5)
    print(output)
    ```

- Glove based augmentation

    ```py
    from bnaug.sentence import TokenReplacement

    tokr = TokenReplacement()
    text = "আমি ঢাকায় বাস করি।"
    vector = "msc/bn_glove.300d.txt"
    output = tokr.glove_based(text, vector_path=vector, sen_n=5, word_n=5)
    print(output)
    ```

### Back Translation
Back translation based augmentation first translate Bangla sentence to English and then again translate the English to Bangla.

```py
from bnaug.sentence import BackTranslation

bt = BackTranslation()
text = "বাংলা ভাষা আন্দোলন তদানীন্তন পূর্ব পাকিস্তানে সংঘটিত একটি সাংস্কৃতিক ও রাজনৈতিক আন্দোলন। "
output = bt.get_augmented_sentences(text)
print(output)

```

### Text Generation
- Paraphrase generation

```py
from bnaug.sentence import TextGeneration

tg = TextGeneration()
text = "বিমানটি যখন মাটিতে নামার জন্য এয়ারপোর্টের কাছাকাছি আসছে, তখন ল্যান্ডিং গিয়ারের খোপের ঢাকনাটি খুলে যায়।"
output = tg.parapharse_generation(text)
print(output)
```

### Random Augmentation
- Random remove part and generate new sentence

    At present it's removing word, stopwords, punctuations, numbers and generate new sentences

    ```py
    from bnaug.sentence import RandomAugmentation

    raug = RandomAugmentation()
    sentence = "আমি ১০০ বাকি দিলাম"
    output = raug.random_remove(sentence)
    print(output)

    ```

    or apply individually

    ```py
    from bnaug import randaug

    text = "১০০ বাকি দিলাম"
    output = randaug.remove_digits(text)
    print(output)

    text = "১০০! বাকি দিলাম?"
    output = randaug.remove_punctuations(text)
    print(output)

    text = "আমি ১০০ বাকি দিলাম"
    randaug.remove_stopwords(text)
    print(output)

    text = "আমি ১০০ বাকি দিলাম"
    randaug.remove_random_word(text)
    print(output)

    text = "আমি ১০০ বাকি দিলাম"
    randaug.remove_random_char(text)
    print(output)
    ```

## Inspired from
- [nlpaug](https://github.com/makcedward/nlpaug)
- [amitness blog post](https://amitness.com/2020/05/data-augmentation-for-nlp/)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sagorbrur/bnaug",
    "name": "bnaug",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Sagor Sarker",
    "author_email": "sagorhem3532@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/15/ef/ba3f00852c102db73029c8cec8ae96c10a2cada28a8b2db4610b114db91c/bnaug-1.1.2.tar.gz",
    "platform": null,
    "description": "# bnaug (Bangla Text Augmentation)\n__bnaug__ is a text augmentation tool for Bangla text.\n\n## Installation\n```\npip install bnaug\n```\n- Dependencies\n    - pytorch >=1.7.0\n    \n## Demo Notebook\n- [bnaug demo](https://github.com/sagorbrur/bnaug/blob/main/notebook/bnaug_demo.ipynb)\n\n## Necessary Model Links\n- [word2vec](https://huggingface.co/sagorsarker/bangla_word2vec/resolve/main/bangla_word2vec_gen4.zip)\n- [glove vector](https://huggingface.co/sagorsarker/bangla-glove-vectors/resolve/main/bn_glove.300d.zip)\n\n## Sentence Augmentation\n### Token Replacement\n- Mask generation based augmentation\n\n    ```py\n    from bnaug.sentence import TokenReplacement\n\n    tokr = TokenReplacement()\n    text = \"\u0986\u09ae\u09bf \u09a2\u09be\u0995\u09be\u09df \u09ac\u09be\u09b8 \u0995\u09b0\u09bf\u0964\"\n    output = tokr.masking_based(text, sen_n=5)\n    ```\n\n- Word2Vec based augmentation\n\n    ```py\n    from bnaug.sentence import TokenReplacement\n\n    tokr = TokenReplacement()\n    text = \"\u0986\u09ae\u09bf \u09a2\u09be\u0995\u09be\u09df \u09ac\u09be\u09b8 \u0995\u09b0\u09bf\u0964\"\n    model = \"msc/bangla_word2vec/bnwiki_word2vec.model\"\n    output = tokr.word2vec_based(text, model=model, sen_n=5, word_n=5)\n    print(output)\n    ```\n\n- Glove based augmentation\n\n    ```py\n    from bnaug.sentence import TokenReplacement\n\n    tokr = TokenReplacement()\n    text = \"\u0986\u09ae\u09bf \u09a2\u09be\u0995\u09be\u09df \u09ac\u09be\u09b8 \u0995\u09b0\u09bf\u0964\"\n    vector = \"msc/bn_glove.300d.txt\"\n    output = tokr.glove_based(text, vector_path=vector, sen_n=5, word_n=5)\n    print(output)\n    ```\n\n### Back Translation\nBack translation based augmentation first translate Bangla sentence to English and then again translate the English to Bangla.\n\n```py\nfrom bnaug.sentence import BackTranslation\n\nbt = BackTranslation()\ntext = \"\u09ac\u09be\u0982\u09b2\u09be \u09ad\u09be\u09b7\u09be \u0986\u09a8\u09cd\u09a6\u09cb\u09b2\u09a8 \u09a4\u09a6\u09be\u09a8\u09c0\u09a8\u09cd\u09a4\u09a8 \u09aa\u09c2\u09b0\u09cd\u09ac \u09aa\u09be\u0995\u09bf\u09b8\u09cd\u09a4\u09be\u09a8\u09c7 \u09b8\u0982\u0998\u099f\u09bf\u09a4 \u098f\u0995\u099f\u09bf \u09b8\u09be\u0982\u09b8\u09cd\u0995\u09c3\u09a4\u09bf\u0995 \u0993 \u09b0\u09be\u099c\u09a8\u09c8\u09a4\u09bf\u0995 \u0986\u09a8\u09cd\u09a6\u09cb\u09b2\u09a8\u0964 \"\noutput = bt.get_augmented_sentences(text)\nprint(output)\n\n```\n\n### Text Generation\n- Paraphrase generation\n\n```py\nfrom bnaug.sentence import TextGeneration\n\ntg = TextGeneration()\ntext = \"\u09ac\u09bf\u09ae\u09be\u09a8\u099f\u09bf \u09af\u0996\u09a8 \u09ae\u09be\u099f\u09bf\u09a4\u09c7 \u09a8\u09be\u09ae\u09be\u09b0 \u099c\u09a8\u09cd\u09af \u098f\u09af\u09bc\u09be\u09b0\u09aa\u09cb\u09b0\u09cd\u099f\u09c7\u09b0 \u0995\u09be\u099b\u09be\u0995\u09be\u099b\u09bf \u0986\u09b8\u099b\u09c7, \u09a4\u0996\u09a8 \u09b2\u09cd\u09af\u09be\u09a8\u09cd\u09a1\u09bf\u0982 \u0997\u09bf\u09af\u09bc\u09be\u09b0\u09c7\u09b0 \u0996\u09cb\u09aa\u09c7\u09b0 \u09a2\u09be\u0995\u09a8\u09be\u099f\u09bf \u0996\u09c1\u09b2\u09c7 \u09af\u09be\u09af\u09bc\u0964\"\noutput = tg.parapharse_generation(text)\nprint(output)\n```\n\n### Random Augmentation\n- Random remove part and generate new sentence\n\n    At present it's removing word, stopwords, punctuations, numbers and generate new sentences\n\n    ```py\n    from bnaug.sentence import RandomAugmentation\n\n    raug = RandomAugmentation()\n    sentence = \"\u0986\u09ae\u09bf \u09e7\u09e6\u09e6 \u09ac\u09be\u0995\u09bf \u09a6\u09bf\u09b2\u09be\u09ae\"\n    output = raug.random_remove(sentence)\n    print(output)\n\n    ```\n\n    or apply individually\n\n    ```py\n    from bnaug import randaug\n\n    text = \"\u09e7\u09e6\u09e6 \u09ac\u09be\u0995\u09bf \u09a6\u09bf\u09b2\u09be\u09ae\"\n    output = randaug.remove_digits(text)\n    print(output)\n\n    text = \"\u09e7\u09e6\u09e6! \u09ac\u09be\u0995\u09bf \u09a6\u09bf\u09b2\u09be\u09ae?\"\n    output = randaug.remove_punctuations(text)\n    print(output)\n\n    text = \"\u0986\u09ae\u09bf \u09e7\u09e6\u09e6 \u09ac\u09be\u0995\u09bf \u09a6\u09bf\u09b2\u09be\u09ae\"\n    randaug.remove_stopwords(text)\n    print(output)\n\n    text = \"\u0986\u09ae\u09bf \u09e7\u09e6\u09e6 \u09ac\u09be\u0995\u09bf \u09a6\u09bf\u09b2\u09be\u09ae\"\n    randaug.remove_random_word(text)\n    print(output)\n\n    text = \"\u0986\u09ae\u09bf \u09e7\u09e6\u09e6 \u09ac\u09be\u0995\u09bf \u09a6\u09bf\u09b2\u09be\u09ae\"\n    randaug.remove_random_char(text)\n    print(output)\n    ```\n\n## Inspired from\n- [nlpaug](https://github.com/makcedward/nlpaug)\n- [amitness blog post](https://amitness.com/2020/05/data-augmentation-for-nlp/)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "bnaug is a text augmentation tool for Bangla text.",
    "version": "1.1.2",
    "project_urls": {
        "Homepage": "https://github.com/sagorbrur/bnaug"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "198a7a0e8389bec7c8694270d13979d93e946261f2497465af5c9fc6f53782c9",
                "md5": "6ce6e580f6e55c0a43501a73bcfd5766",
                "sha256": "eb069f70fa0f7af3fcf2f58d5a9d6f1cf8f0c18fe3b1030630bb75795e0e9fe5"
            },
            "downloads": -1,
            "filename": "bnaug-1.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6ce6e580f6e55c0a43501a73bcfd5766",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 4837,
            "upload_time": "2023-08-30T16:29:06",
            "upload_time_iso_8601": "2023-08-30T16:29:06.297243Z",
            "url": "https://files.pythonhosted.org/packages/19/8a/7a0e8389bec7c8694270d13979d93e946261f2497465af5c9fc6f53782c9/bnaug-1.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "15efba3f00852c102db73029c8cec8ae96c10a2cada28a8b2db4610b114db91c",
                "md5": "a7ca049f0e36bde4f944b882849bd3d9",
                "sha256": "c524078fceb1b2edbef5b0a7e7a4cccc912333e6d5248412ef1a59a6ee18d2f9"
            },
            "downloads": -1,
            "filename": "bnaug-1.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "a7ca049f0e36bde4f944b882849bd3d9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 5104,
            "upload_time": "2023-08-30T16:29:07",
            "upload_time_iso_8601": "2023-08-30T16:29:07.516995Z",
            "url": "https://files.pythonhosted.org/packages/15/ef/ba3f00852c102db73029c8cec8ae96c10a2cada28a8b2db4610b114db91c/bnaug-1.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-30 16:29:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sagorbrur",
    "github_project": "bnaug",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "bnaug"
}
        
Elapsed time: 0.13720s