naganlp


Namenaganlp JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/AgnivaMaiti/naga-nlp
SummaryA Natural Language Processing toolkit for the Nagamese language.
upload_time2025-08-17 23:13:46
maintainerNone
docs_urlNone
authorAgniva Maiti
requires_python>=3.8
licenseMIT License Copyright (c) 2025 Your Name Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords nlp nagamese natural-language-processing machine-translation pos-tagging
VCS
bugtrack_url
requirements numpy datasets transformers scikit-learn torch pandas sentencepiece nltk pytest pytest-cov pytest-mock black flake8 mypy isort
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # NagaNLP: Natural Language Processing for Nagamese

[![PyPI](https://img.shields.io/pypi/v/naganlp)](https://pypi.org/project/naganlp/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python Version](https://img.shields.io/pypi/pyversions/naganlp)](https://pypi.org/project/naganlp/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

A comprehensive Natural Language Processing toolkit for the Nagamese language, developed by Agniva Maiti (4th Year BTech, KIIT-DU, Bhubaneswar, Odisha).

[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

A comprehensive NLP toolkit for the Nagamese language, featuring state-of-the-art models for part-of-speech tagging and machine translation.

## Features

- **Part-of-Speech Tagging**: Fine-tuned BERT model for accurate POS tagging
- **Neural Machine Translation**: Seq2Seq model for Nagamese to English translation
- **Subword Tokenization**: Support for handling out-of-vocabulary words
- **Word Alignment**: Tools for parallel corpus alignment
- **Easy Integration**: Simple Python API for all functionalities

## Installation

```bash
pip install naganlp
```

## Quick Start

### Part-of-Speech Tagging

#### Transformer-based Tagger (Recommended for production)
This uses a fine-tuned transformer model for high accuracy.

```python
from naganlp import PosTagger

# Initialize the tagger (automatically downloads the model on first use)
tagger = PosTagger("agnivamaiti/naganlp-pos-tagger")  # Default model

# Tag a Nagamese sentence
result = tagger.tag("moi school te jai")
print(result)
# Output: [{'entity_group': 'PRON', 'word': 'moi', ...}]
```

#### NLTK-based Tagger (Lightweight, faster but less accurate)
This is a good option for development or when resources are limited.

```python
from naganlp import NltkPosTagger

# First train and save the model (only needed once)
from naganlp.nltk_tagger import train_and_save_nltk_tagger
train_and_save_nltk_tagger("path/to/your/conll/file.conll", "naga_pos_model.pkl")

# Then load and use the trained model
tagger = NltkPosTagger("naga_pos_model.pkl")

# Tag a list of pre-tokenized words
result = tagger.predict(["moi", "school", "te", "jai"])
print(result)
# Output: [('moi', 'PRON'), ('school', 'NOUN'), ('te', 'ADP'), ('jai', 'VERB')]
```

### Translation

```python
from naganlp import Translator

# Initialize the translator
translator = Translator()

# Translate from Nagamese to English
translation = translator.translate("moi school te jai")
print(translation)
# Output: "I go to school"
```

## Documentation

### Data Requirements

- For POS Tagging: CONLL-formatted file with token and POS tag columns
- For Translation: Parallel corpus in CSV format with 'nagamese' and 'english' columns

### Model Training

#### POS Tagger Training

```bash
python main.py train-tagger --conll-file path/to/train.conll --hub-id your-username/naganlp-pos-tagger
```

#### NMT Model Training

```bash
python main.py train-translator --data-file path/to/parallel_corpus.csv --hub-id your-username/naganlp-nmt
```

### Advanced Usage

#### Custom Model Paths

```python
# Load custom models
custom_tagger = PosTagger(model_name_or_path="path/to/custom/model")
custom_translator = Translator(model_path="path/to/translator.pt", vocabs_path="path/to/vocabs.pkl")
```

## Contributing

Contributions are welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) for details.

## Contributing

Contributions are welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) and [Code of Conduct](CODE_OF_CONDUCT.md) for details.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Contact

- Agniva Maiti
- Email: agnivamaiti.official@gmail.com
- GitHub: [@AgnivaMaiti](https://github.com/AgnivaMaiti)
- LinkedIn: [Agniva Maiti](https://linkedin.com/in/agniva-maiti)

## Acknowledgments
- KIIT University for the support and resources
- All contributors and users of this library

## Citation

If you use NagaNLP in your research, please cite:

```bibtex
@software{naganlp2023,
  title={NagaNLP: Natural Language Processing Toolkit for Nagamese},
  author={Your Name},
  year={2023},
  publisher={GitHub},
  journal={GitHub repository},
  howpublished={\url{https://github.com/your-username/naga-nlp}}
}
```

## Support

For questions and support, please open an issue on our [GitHub repository](https://github.com/your-username/naga-nlp/issues).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AgnivaMaiti/naga-nlp",
    "name": "naganlp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "nlp, nagamese, natural-language-processing, machine-translation, pos-tagging",
    "author": "Agniva Maiti",
    "author_email": "Agniva Maiti <agnivamaiti.official@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/03/7a/686dbd988708ef8c82e328ca9ed41483106e3b10826969bb5cf7417947a0/naganlp-0.1.1.tar.gz",
    "platform": null,
    "description": "# NagaNLP: Natural Language Processing for Nagamese\r\n\r\n[![PyPI](https://img.shields.io/pypi/v/naganlp)](https://pypi.org/project/naganlp/)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n[![Python Version](https://img.shields.io/pypi/pyversions/naganlp)](https://pypi.org/project/naganlp/)\r\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\r\n\r\nA comprehensive Natural Language Processing toolkit for the Nagamese language, developed by Agniva Maiti (4th Year BTech, KIIT-DU, Bhubaneswar, Odisha).\r\n\r\n[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\r\n\r\nA comprehensive NLP toolkit for the Nagamese language, featuring state-of-the-art models for part-of-speech tagging and machine translation.\r\n\r\n## Features\r\n\r\n- **Part-of-Speech Tagging**: Fine-tuned BERT model for accurate POS tagging\r\n- **Neural Machine Translation**: Seq2Seq model for Nagamese to English translation\r\n- **Subword Tokenization**: Support for handling out-of-vocabulary words\r\n- **Word Alignment**: Tools for parallel corpus alignment\r\n- **Easy Integration**: Simple Python API for all functionalities\r\n\r\n## Installation\r\n\r\n```bash\r\npip install naganlp\r\n```\r\n\r\n## Quick Start\r\n\r\n### Part-of-Speech Tagging\r\n\r\n#### Transformer-based Tagger (Recommended for production)\r\nThis uses a fine-tuned transformer model for high accuracy.\r\n\r\n```python\r\nfrom naganlp import PosTagger\r\n\r\n# Initialize the tagger (automatically downloads the model on first use)\r\ntagger = PosTagger(\"agnivamaiti/naganlp-pos-tagger\")  # Default model\r\n\r\n# Tag a Nagamese sentence\r\nresult = tagger.tag(\"moi school te jai\")\r\nprint(result)\r\n# Output: [{'entity_group': 'PRON', 'word': 'moi', ...}]\r\n```\r\n\r\n#### NLTK-based Tagger (Lightweight, faster but less accurate)\r\nThis is a good option for development or when resources are limited.\r\n\r\n```python\r\nfrom naganlp import NltkPosTagger\r\n\r\n# First train and save the model (only needed once)\r\nfrom naganlp.nltk_tagger import train_and_save_nltk_tagger\r\ntrain_and_save_nltk_tagger(\"path/to/your/conll/file.conll\", \"naga_pos_model.pkl\")\r\n\r\n# Then load and use the trained model\r\ntagger = NltkPosTagger(\"naga_pos_model.pkl\")\r\n\r\n# Tag a list of pre-tokenized words\r\nresult = tagger.predict([\"moi\", \"school\", \"te\", \"jai\"])\r\nprint(result)\r\n# Output: [('moi', 'PRON'), ('school', 'NOUN'), ('te', 'ADP'), ('jai', 'VERB')]\r\n```\r\n\r\n### Translation\r\n\r\n```python\r\nfrom naganlp import Translator\r\n\r\n# Initialize the translator\r\ntranslator = Translator()\r\n\r\n# Translate from Nagamese to English\r\ntranslation = translator.translate(\"moi school te jai\")\r\nprint(translation)\r\n# Output: \"I go to school\"\r\n```\r\n\r\n## Documentation\r\n\r\n### Data Requirements\r\n\r\n- For POS Tagging: CONLL-formatted file with token and POS tag columns\r\n- For Translation: Parallel corpus in CSV format with 'nagamese' and 'english' columns\r\n\r\n### Model Training\r\n\r\n#### POS Tagger Training\r\n\r\n```bash\r\npython main.py train-tagger --conll-file path/to/train.conll --hub-id your-username/naganlp-pos-tagger\r\n```\r\n\r\n#### NMT Model Training\r\n\r\n```bash\r\npython main.py train-translator --data-file path/to/parallel_corpus.csv --hub-id your-username/naganlp-nmt\r\n```\r\n\r\n### Advanced Usage\r\n\r\n#### Custom Model Paths\r\n\r\n```python\r\n# Load custom models\r\ncustom_tagger = PosTagger(model_name_or_path=\"path/to/custom/model\")\r\ncustom_translator = Translator(model_path=\"path/to/translator.pt\", vocabs_path=\"path/to/vocabs.pkl\")\r\n```\r\n\r\n## Contributing\r\n\r\nContributions are welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) for details.\r\n\r\n## Contributing\r\n\r\nContributions are welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) and [Code of Conduct](CODE_OF_CONDUCT.md) for details.\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## Contact\r\n\r\n- Agniva Maiti\r\n- Email: agnivamaiti.official@gmail.com\r\n- GitHub: [@AgnivaMaiti](https://github.com/AgnivaMaiti)\r\n- LinkedIn: [Agniva Maiti](https://linkedin.com/in/agniva-maiti)\r\n\r\n## Acknowledgments\r\n- KIIT University for the support and resources\r\n- All contributors and users of this library\r\n\r\n## Citation\r\n\r\nIf you use NagaNLP in your research, please cite:\r\n\r\n```bibtex\r\n@software{naganlp2023,\r\n  title={NagaNLP: Natural Language Processing Toolkit for Nagamese},\r\n  author={Your Name},\r\n  year={2023},\r\n  publisher={GitHub},\r\n  journal={GitHub repository},\r\n  howpublished={\\url{https://github.com/your-username/naga-nlp}}\r\n}\r\n```\r\n\r\n## Support\r\n\r\nFor questions and support, please open an issue on our [GitHub repository](https://github.com/your-username/naga-nlp/issues).\r\n",
    "bugtrack_url": null,
    "license": "MIT License\r\n        \r\n        Copyright (c) 2025 Your Name\r\n        \r\n        Permission is hereby granted, free of charge, to any person obtaining a copy\r\n        of this software and associated documentation files (the \"Software\"), to deal\r\n        in the Software without restriction, including without limitation the rights\r\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\r\n        copies of the Software, and to permit persons to whom the Software is\r\n        furnished to do so, subject to the following conditions:\r\n        \r\n        The above copyright notice and this permission notice shall be included in all\r\n        copies or substantial portions of the Software.\r\n        \r\n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\r\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\r\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\r\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\r\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\r\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\r\n        SOFTWARE.\r\n        ",
    "summary": "A Natural Language Processing toolkit for the Nagamese language.",
    "version": "0.1.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/AgnivaMaiti/naga-nlp/issues",
        "Documentation": "https://github.com/AgnivaMaiti/naga-nlp#readme",
        "Homepage": "https://github.com/AgnivaMaiti/naga-nlp",
        "Repository": "https://github.com/AgnivaMaiti/naga-nlp"
    },
    "split_keywords": [
        "nlp",
        " nagamese",
        " natural-language-processing",
        " machine-translation",
        " pos-tagging"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ea7b608fe0db484473a958be9acbc951757530d638816e89c05af6881f65da51",
                "md5": "b522ec21e987a3382bae038f2b3e010d",
                "sha256": "6a3b07a2a1e8355bef73fe4963d052e1f345bf6bf901605cf5020b3ea5ae822e"
            },
            "downloads": -1,
            "filename": "naganlp-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b522ec21e987a3382bae038f2b3e010d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 169244,
            "upload_time": "2025-08-17T23:13:43",
            "upload_time_iso_8601": "2025-08-17T23:13:43.584725Z",
            "url": "https://files.pythonhosted.org/packages/ea/7b/608fe0db484473a958be9acbc951757530d638816e89c05af6881f65da51/naganlp-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "037a686dbd988708ef8c82e328ca9ed41483106e3b10826969bb5cf7417947a0",
                "md5": "62a430b36e3ba1cc59f434f8a256534a",
                "sha256": "81eb638004a759e113b5a7d281f9fce88107822adead29a69ff603f3cd1f5a94"
            },
            "downloads": -1,
            "filename": "naganlp-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "62a430b36e3ba1cc59f434f8a256534a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 506891,
            "upload_time": "2025-08-17T23:13:46",
            "upload_time_iso_8601": "2025-08-17T23:13:46.281043Z",
            "url": "https://files.pythonhosted.org/packages/03/7a/686dbd988708ef8c82e328ca9ed41483106e3b10826969bb5cf7417947a0/naganlp-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-17 23:13:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AgnivaMaiti",
    "github_project": "naga-nlp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.21.0"
                ]
            ]
        },
        {
            "name": "datasets",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    ">=",
                    "4.30.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "sentencepiece",
            "specs": [
                [
                    ">=",
                    "0.1.99"
                ]
            ]
        },
        {
            "name": "nltk",
            "specs": [
                [
                    ">=",
                    "3.8.1"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "7.0.0"
                ]
            ]
        },
        {
            "name": "pytest-cov",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "pytest-mock",
            "specs": [
                [
                    ">=",
                    "3.10.0"
                ]
            ]
        },
        {
            "name": "black",
            "specs": [
                [
                    ">=",
                    "23.0.0"
                ]
            ]
        },
        {
            "name": "flake8",
            "specs": [
                [
                    ">=",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "mypy",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "isort",
            "specs": [
                [
                    ">=",
                    "5.12.0"
                ]
            ]
        }
    ],
    "lcname": "naganlp"
}
        
Elapsed time: 1.11432s