# Bilingual | দ্বিভাষিক
<div align="center">
**High-quality Bangla + English NLP toolkit for production use**
**প্রোডাকশন ব্যবহারের জন্য উচ্চমানের বাংলা + ইংরেজি NLP টুলকিট**
[](LICENSE)
[](https://www.python.org/downloads/)
[](https://github.com/psf/black)
[English](#english) | [বাংলা](#বাংলা)
</div>
---
## English
### Overview
**bilingual** is a Python package providing production-ready tools for Bangla and English natural language processing. It focuses on:
- 🌍 **Bilingual Support**: Equal treatment for Bangla and English
- 👶 **Child-Friendly Content**: Special focus on educational and age-appropriate material
- 🚀 **Production Ready**: Easy installation, comprehensive docs, robust testing
- 🔧 **Flexible**: From tokenization to translation, generation to classification
- 📚 **Well-Documented**: Full documentation in both English and Bangla
### Features
- **Text Normalization**: Unicode normalization, punctuation handling, script cleaning
- **Tokenization**: Shared SentencePiece tokenizer optimized for Bangla + English
- **Language Models**: Bilingual pretrained and fine-tuned models for generation
- **Translation**: Bangla ↔ English translation assistance
- **Classification**: Readability scoring, age-level detection, safety filtering
- **Utilities**: Dataset tools, evaluation metrics, preprocessing pipelines
### Quick Start
#### Installation
```bash
pip install bilingual
```
For development:
```bash
git clone https://github.com/YOUR_ORG/bilingual.git
cd bilingual
pip install -e ".[dev]"
```
#### Basic Usage
```python
from bilingual import bilingual_api as bb
# Load tokenizer
tokenizer = bb.load_tokenizer("bilingual-tokenizer")
# Normalize text
text_bn = bb.normalize_text("আমি স্কুলে যাচ্ছি।", lang="bn")
text_en = bb.normalize_text("I am going to school.", lang="en")
# Generate text
prompt = "A short story about a brave rabbit / সাহসী খরগোশের একটি ছোট গল্প"
story = bb.generate(prompt, model_name="bilingual-small-lm", max_tokens=150)
# Translate
translation = bb.translate("আমি বই পড়তে ভালোবাসি।", src="bn", tgt="en")
print(translation) # "I love to read books."
# Check readability
level = bb.readability_check(text_bn, lang="bn")
print(f"Reading level: {level}")
```
#### CLI Usage
```bash
# Tokenize text
bilingual tokenize --lang bn --text "আমি ভাত খাই।"
# Generate text
bilingual generate --model bilingual-small-lm --prompt "Once upon a time..." --max-tokens 100
# Translate
bilingual translate --src bn --tgt en --text "আমি তোমাকে ভালোবাসি।"
# Evaluate model
bilingual evaluate --dataset data/test.jsonl --model bilingual-small-lm
```
### Project Structure
```
bilingual/
├── bilingual/ # Main package
│ ├── __init__.py
│ ├── api.py # High-level API
│ ├── tokenizer.py # Tokenization utilities
│ ├── normalize.py # Text normalization
│ ├── models/ # Model implementations
│ │ ├── loader.py
│ │ ├── lm.py
│ │ └── translate.py
│ ├── evaluation.py # Evaluation metrics
│ ├── data_utils.py # Dataset utilities
│ └── cli.py # Command-line interface
├── scripts/ # Training and data scripts
├── tests/ # Test suite
├── docs/ # Documentation
│ ├── en/ # English docs
│ └── bn/ # Bangla docs
├── datasets/ # Dataset storage
└── models/ # Model storage
```
### Documentation
- 📖 [Full Documentation](docs/en/README.md)
- 🚀 [Quick Start Guide](docs/en/quickstart.md)
- 🔧 [API Reference](docs/en/api.md)
- 🤝 [Contributing Guide](CONTRIBUTING.md)
- 🗺️ [Roadmap](ROADMAP.md)
### Development
```bash
# Run tests
pytest tests/
# Format code
black bilingual/ tests/
# Type checking
mypy bilingual/
# Lint
flake8 bilingual/
```
### Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
Areas where we need help:
- 📊 Dataset collection and curation
- 🤖 Model training and fine-tuning
- 📝 Documentation and translation
- 🧪 Testing and quality assurance
- 🐛 Bug fixes and improvements
### License
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
### Citation
If you use this package in your research, please cite:
```bibtex
@software{bilingual2025,
title = {Bilingual: High-quality Bangla and English NLP Toolkit},
author = {Bilingual Project Contributors},
year = {2025},
url = {https://github.com/YOUR_ORG/bilingual}
}
```
### Acknowledgments
This project is built with support from the open-source community and aims to advance Bangla language technology for everyone.
---
## বাংলা
### সংক্ষিপ্ত বিবরণ
**bilingual** হল একটি Python প্যাকেজ যা বাংলা এবং ইংরেজি প্রাকৃতিক ভাষা প্রক্রিয়াকরণের জন্য প্রোডাকশন-রেডি টুল প্রদান করে। এটি ফোকাস করে:
- 🌍 **দ্বিভাষিক সমর্থন**: বাংলা এবং ইংরেজির জন্য সমান আচরণ
- 👶 **শিশু-বান্ধব কন্টেন্ট**: শিক্ষামূলক এবং বয়স-উপযুক্ত উপাদানের উপর বিশেষ ফোকাস
- 🚀 **প্রোডাকশন রেডি**: সহজ ইনস্টলেশন, ব্যাপক ডক্স, শক্তিশালী টেস্টিং
- 🔧 **নমনীয়**: টোকেনাইজেশন থেকে অনুবাদ, জেনারেশন থেকে শ্রেণীবিভাগ
- 📚 **ভালভাবে ডকুমেন্টেড**: ইংরেজি এবং বাংলা উভয় ভাষায় সম্পূর্ণ ডকুমেন্টেশন
### বৈশিষ্ট্য
- **টেক্সট নরমালাইজেশন**: ইউনিকোড নরমালাইজেশন, বিরামচিহ্ন হ্যান্ডলিং, স্ক্রিপ্ট পরিষ্কার করা
- **টোকেনাইজেশন**: বাংলা + ইংরেজির জন্য অপ্টিমাইজড শেয়ারড SentencePiece টোকেনাইজার
- **ভাষা মডেল**: জেনারেশনের জন্য দ্বিভাষিক প্রিট্রেইনড এবং ফাইন-টিউনড মডেল
- **অনুবাদ**: বাংলা ↔ ইংরেজি অনুবাদ সহায়তা
- **শ্রেণীবিভাগ**: পঠনযোগ্যতা স্কোরিং, বয়স-স্তর সনাক্তকরণ, নিরাপত্তা ফিল্টারিং
- **ইউটিলিটি**: ডেটাসেট টুল, মূল্যায়ন মেট্রিক্স, প্রিপ্রসেসিং পাইপলাইন
### দ্রুত শুরু
#### ইনস্টলেশন
```bash
pip install bilingual
```
ডেভেলপমেন্টের জন্য:
```bash
git clone https://github.com/YOUR_ORG/bilingual.git
cd bilingual
pip install -e ".[dev]"
```
#### মৌলিক ব্যবহার
```python
from bilingual import bilingual_api as bb
# টোকেনাইজার লোড করুন
tokenizer = bb.load_tokenizer("bilingual-tokenizer")
# টেক্সট নরমালাইজ করুন
text_bn = bb.normalize_text("আমি স্কুলে যাচ্ছি।", lang="bn")
text_en = bb.normalize_text("I am going to school.", lang="en")
# টেক্সট জেনারেট করুন
prompt = "A short story about a brave rabbit / সাহসী খরগোশের একটি ছোট গল্প"
story = bb.generate(prompt, model_name="bilingual-small-lm", max_tokens=150)
# অনুবাদ করুন
translation = bb.translate("আমি বই পড়তে ভালোবাসি।", src="bn", tgt="en")
print(translation) # "I love to read books."
# পঠনযোগ্যতা চেক করুন
level = bb.readability_check(text_bn, lang="bn")
print(f"Reading level: {level}")
```
#### CLI ব্যবহার
```bash
# টেক্সট টোকেনাইজ করুন
bilingual tokenize --lang bn --text "আমি ভাত খাই।"
# টেক্সট জেনারেট করুন
bilingual generate --model bilingual-small-lm --prompt "Once upon a time..." --max-tokens 100
# অনুবাদ করুন
bilingual translate --src bn --tgt en --text "আমি তোমাকে ভালোবাসি।"
# মডেল মূল্যায়ন করুন
bilingual evaluate --dataset data/test.jsonl --model bilingual-small-lm
```
### ডকুমেন্টেশন
- 📖 [সম্পূর্ণ ডকুমেন্টেশন](docs/bn/README.md)
- 🚀 [দ্রুত শুরু গাইড](docs/bn/quickstart.md)
- 🔧 [API রেফারেন্স](docs/bn/api.md)
- 🤝 [অবদান গাইড](CONTRIBUTING.md)
- 🗺️ [রোডম্যাপ](ROADMAP.md)
### অবদান রাখা
আমরা অবদান স্বাগত জানাই! বিস্তারিত জানার জন্য অনুগ্রহ করে আমাদের [অবদান গাইড](CONTRIBUTING.md) দেখুন।
যেসব ক্ষেত্রে আমাদের সাহায্য প্রয়োজন:
- 📊 ডেটাসেট সংগ্রহ এবং কিউরেশন
- 🤖 মডেল ট্রেনিং এবং ফাইন-টিউনিং
- 📝 ডকুমেন্টেশন এবং অনুবাদ
- 🧪 টেস্টিং এবং কোয়ালিটি অ্যাসিউরেন্স
- 🐛 বাগ ফিক্স এবং উন্নতি
### লাইসেন্স
এই প্রকল্পটি Apache License 2.0 এর অধীনে লাইসেন্সপ্রাপ্ত - বিস্তারিত জানার জন্য [LICENSE](LICENSE) ফাইল দেখুন।
### স্বীকৃতি
এই প্রকল্পটি ওপেন-সোর্স কমিউনিটির সমর্থনে তৈরি এবং সবার জন্য বাংলা ভাষা প্রযুক্তি এগিয়ে নিয়ে যাওয়ার লক্ষ্যে কাজ করে।
Raw data
{
"_id": null,
"home_page": null,
"name": "bilingual",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "KhulnaSoft Ltd <info@khulnasoft.com>, Md Sulaiman <dev.sulaiman@icloud.com>",
"keywords": "nlp, bangla, bengali, bilingual, multilingual, tokenization, translation, language-model, ai, machine-learning",
"author": null,
"author_email": "KhulnaSoft Ltd <info@khulnasoft.com>",
"download_url": null,
"platform": null,
"description": "# Bilingual | \u09a6\u09cd\u09ac\u09bf\u09ad\u09be\u09b7\u09bf\u0995\n\n<div align=\"center\">\n\n**High-quality Bangla + English NLP toolkit for production use**\n\n**\u09aa\u09cd\u09b0\u09cb\u09a1\u09be\u0995\u09b6\u09a8 \u09ac\u09cd\u09af\u09ac\u09b9\u09be\u09b0\u09c7\u09b0 \u099c\u09a8\u09cd\u09af \u0989\u099a\u09cd\u099a\u09ae\u09be\u09a8\u09c7\u09b0 \u09ac\u09be\u0982\u09b2\u09be + \u0987\u0982\u09b0\u09c7\u099c\u09bf NLP \u099f\u09c1\u09b2\u0995\u09bf\u099f**\n\n[](LICENSE)\n[](https://www.python.org/downloads/)\n[](https://github.com/psf/black)\n\n[English](#english) | [\u09ac\u09be\u0982\u09b2\u09be](#\u09ac\u09be\u0982\u09b2\u09be)\n\n</div>\n\n---\n\n## English\n\n### Overview\n\n**bilingual** is a Python package providing production-ready tools for Bangla and English natural language processing. It focuses on:\n\n- \ud83c\udf0d **Bilingual Support**: Equal treatment for Bangla and English\n- \ud83d\udc76 **Child-Friendly Content**: Special focus on educational and age-appropriate material\n- \ud83d\ude80 **Production Ready**: Easy installation, comprehensive docs, robust testing\n- \ud83d\udd27 **Flexible**: From tokenization to translation, generation to classification\n- \ud83d\udcda **Well-Documented**: Full documentation in both English and Bangla\n\n### Features\n\n- **Text Normalization**: Unicode normalization, punctuation handling, script cleaning\n- **Tokenization**: Shared SentencePiece tokenizer optimized for Bangla + English\n- **Language Models**: Bilingual pretrained and fine-tuned models for generation\n- **Translation**: Bangla \u2194 English translation assistance\n- **Classification**: Readability scoring, age-level detection, safety filtering\n- **Utilities**: Dataset tools, evaluation metrics, preprocessing pipelines\n\n### Quick Start\n\n#### Installation\n\n```bash\npip install bilingual\n```\n\nFor development:\n\n```bash\ngit clone https://github.com/YOUR_ORG/bilingual.git\ncd bilingual\npip install -e \".[dev]\"\n```\n\n#### Basic Usage\n\n```python\nfrom bilingual import bilingual_api as bb\n\n# Load tokenizer\ntokenizer = bb.load_tokenizer(\"bilingual-tokenizer\")\n\n# Normalize text\ntext_bn = bb.normalize_text(\"\u0986\u09ae\u09bf \u09b8\u09cd\u0995\u09c1\u09b2\u09c7 \u09af\u09be\u099a\u09cd\u099b\u09bf\u0964\", lang=\"bn\")\ntext_en = bb.normalize_text(\"I am going to school.\", lang=\"en\")\n\n# Generate text\nprompt = \"A short story about a brave rabbit / \u09b8\u09be\u09b9\u09b8\u09c0 \u0996\u09b0\u0997\u09cb\u09b6\u09c7\u09b0 \u098f\u0995\u099f\u09bf \u099b\u09cb\u099f \u0997\u09b2\u09cd\u09aa\"\nstory = bb.generate(prompt, model_name=\"bilingual-small-lm\", max_tokens=150)\n\n# Translate\ntranslation = bb.translate(\"\u0986\u09ae\u09bf \u09ac\u0987 \u09aa\u09a1\u09bc\u09a4\u09c7 \u09ad\u09be\u09b2\u09cb\u09ac\u09be\u09b8\u09bf\u0964\", src=\"bn\", tgt=\"en\")\nprint(translation) # \"I love to read books.\"\n\n# Check readability\nlevel = bb.readability_check(text_bn, lang=\"bn\")\nprint(f\"Reading level: {level}\")\n```\n\n#### CLI Usage\n\n```bash\n# Tokenize text\nbilingual tokenize --lang bn --text \"\u0986\u09ae\u09bf \u09ad\u09be\u09a4 \u0996\u09be\u0987\u0964\"\n\n# Generate text\nbilingual generate --model bilingual-small-lm --prompt \"Once upon a time...\" --max-tokens 100\n\n# Translate\nbilingual translate --src bn --tgt en --text \"\u0986\u09ae\u09bf \u09a4\u09cb\u09ae\u09be\u0995\u09c7 \u09ad\u09be\u09b2\u09cb\u09ac\u09be\u09b8\u09bf\u0964\"\n\n# Evaluate model\nbilingual evaluate --dataset data/test.jsonl --model bilingual-small-lm\n```\n\n### Project Structure\n\n```\nbilingual/\n\u251c\u2500\u2500 bilingual/ # Main package\n\u2502 \u251c\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 api.py # High-level API\n\u2502 \u251c\u2500\u2500 tokenizer.py # Tokenization utilities\n\u2502 \u251c\u2500\u2500 normalize.py # Text normalization\n\u2502 \u251c\u2500\u2500 models/ # Model implementations\n\u2502 \u2502 \u251c\u2500\u2500 loader.py\n\u2502 \u2502 \u251c\u2500\u2500 lm.py\n\u2502 \u2502 \u2514\u2500\u2500 translate.py\n\u2502 \u251c\u2500\u2500 evaluation.py # Evaluation metrics\n\u2502 \u251c\u2500\u2500 data_utils.py # Dataset utilities\n\u2502 \u2514\u2500\u2500 cli.py # Command-line interface\n\u251c\u2500\u2500 scripts/ # Training and data scripts\n\u251c\u2500\u2500 tests/ # Test suite\n\u251c\u2500\u2500 docs/ # Documentation\n\u2502 \u251c\u2500\u2500 en/ # English docs\n\u2502 \u2514\u2500\u2500 bn/ # Bangla docs\n\u251c\u2500\u2500 datasets/ # Dataset storage\n\u2514\u2500\u2500 models/ # Model storage\n```\n\n### Documentation\n\n- \ud83d\udcd6 [Full Documentation](docs/en/README.md)\n- \ud83d\ude80 [Quick Start Guide](docs/en/quickstart.md)\n- \ud83d\udd27 [API Reference](docs/en/api.md)\n- \ud83e\udd1d [Contributing Guide](CONTRIBUTING.md)\n- \ud83d\uddfa\ufe0f [Roadmap](ROADMAP.md)\n\n### Development\n\n```bash\n# Run tests\npytest tests/\n\n# Format code\nblack bilingual/ tests/\n\n# Type checking\nmypy bilingual/\n\n# Lint\nflake8 bilingual/\n```\n\n### Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\nAreas where we need help:\n- \ud83d\udcca Dataset collection and curation\n- \ud83e\udd16 Model training and fine-tuning\n- \ud83d\udcdd Documentation and translation\n- \ud83e\uddea Testing and quality assurance\n- \ud83d\udc1b Bug fixes and improvements\n\n### License\n\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\n\n### Citation\n\nIf you use this package in your research, please cite:\n\n```bibtex\n@software{bilingual2025,\n title = {Bilingual: High-quality Bangla and English NLP Toolkit},\n author = {Bilingual Project Contributors},\n year = {2025},\n url = {https://github.com/YOUR_ORG/bilingual}\n}\n```\n\n### Acknowledgments\n\nThis project is built with support from the open-source community and aims to advance Bangla language technology for everyone.\n\n---\n\n## \u09ac\u09be\u0982\u09b2\u09be\n\n### \u09b8\u0982\u0995\u09cd\u09b7\u09bf\u09aa\u09cd\u09a4 \u09ac\u09bf\u09ac\u09b0\u09a3\n\n**bilingual** \u09b9\u09b2 \u098f\u0995\u099f\u09bf Python \u09aa\u09cd\u09af\u09be\u0995\u09c7\u099c \u09af\u09be \u09ac\u09be\u0982\u09b2\u09be \u098f\u09ac\u0982 \u0987\u0982\u09b0\u09c7\u099c\u09bf \u09aa\u09cd\u09b0\u09be\u0995\u09c3\u09a4\u09bf\u0995 \u09ad\u09be\u09b7\u09be \u09aa\u09cd\u09b0\u0995\u09cd\u09b0\u09bf\u09af\u09bc\u09be\u0995\u09b0\u09a3\u09c7\u09b0 \u099c\u09a8\u09cd\u09af \u09aa\u09cd\u09b0\u09cb\u09a1\u09be\u0995\u09b6\u09a8-\u09b0\u09c7\u09a1\u09bf \u099f\u09c1\u09b2 \u09aa\u09cd\u09b0\u09a6\u09be\u09a8 \u0995\u09b0\u09c7\u0964 \u098f\u099f\u09bf \u09ab\u09cb\u0995\u09be\u09b8 \u0995\u09b0\u09c7:\n\n- \ud83c\udf0d **\u09a6\u09cd\u09ac\u09bf\u09ad\u09be\u09b7\u09bf\u0995 \u09b8\u09ae\u09b0\u09cd\u09a5\u09a8**: \u09ac\u09be\u0982\u09b2\u09be \u098f\u09ac\u0982 \u0987\u0982\u09b0\u09c7\u099c\u09bf\u09b0 \u099c\u09a8\u09cd\u09af \u09b8\u09ae\u09be\u09a8 \u0986\u099a\u09b0\u09a3\n- \ud83d\udc76 **\u09b6\u09bf\u09b6\u09c1-\u09ac\u09be\u09a8\u09cd\u09a7\u09ac \u0995\u09a8\u09cd\u099f\u09c7\u09a8\u09cd\u099f**: \u09b6\u09bf\u0995\u09cd\u09b7\u09be\u09ae\u09c2\u09b2\u0995 \u098f\u09ac\u0982 \u09ac\u09af\u09bc\u09b8-\u0989\u09aa\u09af\u09c1\u0995\u09cd\u09a4 \u0989\u09aa\u09be\u09a6\u09be\u09a8\u09c7\u09b0 \u0989\u09aa\u09b0 \u09ac\u09bf\u09b6\u09c7\u09b7 \u09ab\u09cb\u0995\u09be\u09b8\n- \ud83d\ude80 **\u09aa\u09cd\u09b0\u09cb\u09a1\u09be\u0995\u09b6\u09a8 \u09b0\u09c7\u09a1\u09bf**: \u09b8\u09b9\u099c \u0987\u09a8\u09b8\u09cd\u099f\u09b2\u09c7\u09b6\u09a8, \u09ac\u09cd\u09af\u09be\u09aa\u0995 \u09a1\u0995\u09cd\u09b8, \u09b6\u0995\u09cd\u09a4\u09bf\u09b6\u09be\u09b2\u09c0 \u099f\u09c7\u09b8\u09cd\u099f\u09bf\u0982\n- \ud83d\udd27 **\u09a8\u09ae\u09a8\u09c0\u09af\u09bc**: \u099f\u09cb\u0995\u09c7\u09a8\u09be\u0987\u099c\u09c7\u09b6\u09a8 \u09a5\u09c7\u0995\u09c7 \u0985\u09a8\u09c1\u09ac\u09be\u09a6, \u099c\u09c7\u09a8\u09be\u09b0\u09c7\u09b6\u09a8 \u09a5\u09c7\u0995\u09c7 \u09b6\u09cd\u09b0\u09c7\u09a3\u09c0\u09ac\u09bf\u09ad\u09be\u0997\n- \ud83d\udcda **\u09ad\u09be\u09b2\u09ad\u09be\u09ac\u09c7 \u09a1\u0995\u09c1\u09ae\u09c7\u09a8\u09cd\u099f\u09c7\u09a1**: \u0987\u0982\u09b0\u09c7\u099c\u09bf \u098f\u09ac\u0982 \u09ac\u09be\u0982\u09b2\u09be \u0989\u09ad\u09af\u09bc \u09ad\u09be\u09b7\u09be\u09af\u09bc \u09b8\u09ae\u09cd\u09aa\u09c2\u09b0\u09cd\u09a3 \u09a1\u0995\u09c1\u09ae\u09c7\u09a8\u09cd\u099f\u09c7\u09b6\u09a8\n\n### \u09ac\u09c8\u09b6\u09bf\u09b7\u09cd\u099f\u09cd\u09af\n\n- **\u099f\u09c7\u0995\u09cd\u09b8\u099f \u09a8\u09b0\u09ae\u09be\u09b2\u09be\u0987\u099c\u09c7\u09b6\u09a8**: \u0987\u0989\u09a8\u09bf\u0995\u09cb\u09a1 \u09a8\u09b0\u09ae\u09be\u09b2\u09be\u0987\u099c\u09c7\u09b6\u09a8, \u09ac\u09bf\u09b0\u09be\u09ae\u099a\u09bf\u09b9\u09cd\u09a8 \u09b9\u09cd\u09af\u09be\u09a8\u09cd\u09a1\u09b2\u09bf\u0982, \u09b8\u09cd\u0995\u09cd\u09b0\u09bf\u09aa\u09cd\u099f \u09aa\u09b0\u09bf\u09b7\u09cd\u0995\u09be\u09b0 \u0995\u09b0\u09be\n- **\u099f\u09cb\u0995\u09c7\u09a8\u09be\u0987\u099c\u09c7\u09b6\u09a8**: \u09ac\u09be\u0982\u09b2\u09be + \u0987\u0982\u09b0\u09c7\u099c\u09bf\u09b0 \u099c\u09a8\u09cd\u09af \u0985\u09aa\u09cd\u099f\u09bf\u09ae\u09be\u0987\u099c\u09a1 \u09b6\u09c7\u09af\u09bc\u09be\u09b0\u09a1 SentencePiece \u099f\u09cb\u0995\u09c7\u09a8\u09be\u0987\u099c\u09be\u09b0\n- **\u09ad\u09be\u09b7\u09be \u09ae\u09a1\u09c7\u09b2**: \u099c\u09c7\u09a8\u09be\u09b0\u09c7\u09b6\u09a8\u09c7\u09b0 \u099c\u09a8\u09cd\u09af \u09a6\u09cd\u09ac\u09bf\u09ad\u09be\u09b7\u09bf\u0995 \u09aa\u09cd\u09b0\u09bf\u099f\u09cd\u09b0\u09c7\u0987\u09a8\u09a1 \u098f\u09ac\u0982 \u09ab\u09be\u0987\u09a8-\u099f\u09bf\u0989\u09a8\u09a1 \u09ae\u09a1\u09c7\u09b2\n- **\u0985\u09a8\u09c1\u09ac\u09be\u09a6**: \u09ac\u09be\u0982\u09b2\u09be \u2194 \u0987\u0982\u09b0\u09c7\u099c\u09bf \u0985\u09a8\u09c1\u09ac\u09be\u09a6 \u09b8\u09b9\u09be\u09af\u09bc\u09a4\u09be\n- **\u09b6\u09cd\u09b0\u09c7\u09a3\u09c0\u09ac\u09bf\u09ad\u09be\u0997**: \u09aa\u09a0\u09a8\u09af\u09cb\u0997\u09cd\u09af\u09a4\u09be \u09b8\u09cd\u0995\u09cb\u09b0\u09bf\u0982, \u09ac\u09af\u09bc\u09b8-\u09b8\u09cd\u09a4\u09b0 \u09b8\u09a8\u09be\u0995\u09cd\u09a4\u0995\u09b0\u09a3, \u09a8\u09bf\u09b0\u09be\u09aa\u09a4\u09cd\u09a4\u09be \u09ab\u09bf\u09b2\u09cd\u099f\u09be\u09b0\u09bf\u0982\n- **\u0987\u0989\u099f\u09bf\u09b2\u09bf\u099f\u09bf**: \u09a1\u09c7\u099f\u09be\u09b8\u09c7\u099f \u099f\u09c1\u09b2, \u09ae\u09c2\u09b2\u09cd\u09af\u09be\u09af\u09bc\u09a8 \u09ae\u09c7\u099f\u09cd\u09b0\u09bf\u0995\u09cd\u09b8, \u09aa\u09cd\u09b0\u09bf\u09aa\u09cd\u09b0\u09b8\u09c7\u09b8\u09bf\u0982 \u09aa\u09be\u0987\u09aa\u09b2\u09be\u0987\u09a8\n\n### \u09a6\u09cd\u09b0\u09c1\u09a4 \u09b6\u09c1\u09b0\u09c1\n\n#### \u0987\u09a8\u09b8\u09cd\u099f\u09b2\u09c7\u09b6\u09a8\n\n```bash\npip install bilingual\n```\n\n\u09a1\u09c7\u09ad\u09c7\u09b2\u09aa\u09ae\u09c7\u09a8\u09cd\u099f\u09c7\u09b0 \u099c\u09a8\u09cd\u09af:\n\n```bash\ngit clone https://github.com/YOUR_ORG/bilingual.git\ncd bilingual\npip install -e \".[dev]\"\n```\n\n#### \u09ae\u09cc\u09b2\u09bf\u0995 \u09ac\u09cd\u09af\u09ac\u09b9\u09be\u09b0\n\n```python\nfrom bilingual import bilingual_api as bb\n\n# \u099f\u09cb\u0995\u09c7\u09a8\u09be\u0987\u099c\u09be\u09b0 \u09b2\u09cb\u09a1 \u0995\u09b0\u09c1\u09a8\ntokenizer = bb.load_tokenizer(\"bilingual-tokenizer\")\n\n# \u099f\u09c7\u0995\u09cd\u09b8\u099f \u09a8\u09b0\u09ae\u09be\u09b2\u09be\u0987\u099c \u0995\u09b0\u09c1\u09a8\ntext_bn = bb.normalize_text(\"\u0986\u09ae\u09bf \u09b8\u09cd\u0995\u09c1\u09b2\u09c7 \u09af\u09be\u099a\u09cd\u099b\u09bf\u0964\", lang=\"bn\")\ntext_en = bb.normalize_text(\"I am going to school.\", lang=\"en\")\n\n# \u099f\u09c7\u0995\u09cd\u09b8\u099f \u099c\u09c7\u09a8\u09be\u09b0\u09c7\u099f \u0995\u09b0\u09c1\u09a8\nprompt = \"A short story about a brave rabbit / \u09b8\u09be\u09b9\u09b8\u09c0 \u0996\u09b0\u0997\u09cb\u09b6\u09c7\u09b0 \u098f\u0995\u099f\u09bf \u099b\u09cb\u099f \u0997\u09b2\u09cd\u09aa\"\nstory = bb.generate(prompt, model_name=\"bilingual-small-lm\", max_tokens=150)\n\n# \u0985\u09a8\u09c1\u09ac\u09be\u09a6 \u0995\u09b0\u09c1\u09a8\ntranslation = bb.translate(\"\u0986\u09ae\u09bf \u09ac\u0987 \u09aa\u09a1\u09bc\u09a4\u09c7 \u09ad\u09be\u09b2\u09cb\u09ac\u09be\u09b8\u09bf\u0964\", src=\"bn\", tgt=\"en\")\nprint(translation) # \"I love to read books.\"\n\n# \u09aa\u09a0\u09a8\u09af\u09cb\u0997\u09cd\u09af\u09a4\u09be \u099a\u09c7\u0995 \u0995\u09b0\u09c1\u09a8\nlevel = bb.readability_check(text_bn, lang=\"bn\")\nprint(f\"Reading level: {level}\")\n```\n\n#### CLI \u09ac\u09cd\u09af\u09ac\u09b9\u09be\u09b0\n\n```bash\n# \u099f\u09c7\u0995\u09cd\u09b8\u099f \u099f\u09cb\u0995\u09c7\u09a8\u09be\u0987\u099c \u0995\u09b0\u09c1\u09a8\nbilingual tokenize --lang bn --text \"\u0986\u09ae\u09bf \u09ad\u09be\u09a4 \u0996\u09be\u0987\u0964\"\n\n# \u099f\u09c7\u0995\u09cd\u09b8\u099f \u099c\u09c7\u09a8\u09be\u09b0\u09c7\u099f \u0995\u09b0\u09c1\u09a8\nbilingual generate --model bilingual-small-lm --prompt \"Once upon a time...\" --max-tokens 100\n\n# \u0985\u09a8\u09c1\u09ac\u09be\u09a6 \u0995\u09b0\u09c1\u09a8\nbilingual translate --src bn --tgt en --text \"\u0986\u09ae\u09bf \u09a4\u09cb\u09ae\u09be\u0995\u09c7 \u09ad\u09be\u09b2\u09cb\u09ac\u09be\u09b8\u09bf\u0964\"\n\n# \u09ae\u09a1\u09c7\u09b2 \u09ae\u09c2\u09b2\u09cd\u09af\u09be\u09af\u09bc\u09a8 \u0995\u09b0\u09c1\u09a8\nbilingual evaluate --dataset data/test.jsonl --model bilingual-small-lm\n```\n\n### \u09a1\u0995\u09c1\u09ae\u09c7\u09a8\u09cd\u099f\u09c7\u09b6\u09a8\n\n- \ud83d\udcd6 [\u09b8\u09ae\u09cd\u09aa\u09c2\u09b0\u09cd\u09a3 \u09a1\u0995\u09c1\u09ae\u09c7\u09a8\u09cd\u099f\u09c7\u09b6\u09a8](docs/bn/README.md)\n- \ud83d\ude80 [\u09a6\u09cd\u09b0\u09c1\u09a4 \u09b6\u09c1\u09b0\u09c1 \u0997\u09be\u0987\u09a1](docs/bn/quickstart.md)\n- \ud83d\udd27 [API \u09b0\u09c7\u09ab\u09be\u09b0\u09c7\u09a8\u09cd\u09b8](docs/bn/api.md)\n- \ud83e\udd1d [\u0985\u09ac\u09a6\u09be\u09a8 \u0997\u09be\u0987\u09a1](CONTRIBUTING.md)\n- \ud83d\uddfa\ufe0f [\u09b0\u09cb\u09a1\u09ae\u09cd\u09af\u09be\u09aa](ROADMAP.md)\n\n### \u0985\u09ac\u09a6\u09be\u09a8 \u09b0\u09be\u0996\u09be\n\n\u0986\u09ae\u09b0\u09be \u0985\u09ac\u09a6\u09be\u09a8 \u09b8\u09cd\u09ac\u09be\u0997\u09a4 \u099c\u09be\u09a8\u09be\u0987! \u09ac\u09bf\u09b8\u09cd\u09a4\u09be\u09b0\u09bf\u09a4 \u099c\u09be\u09a8\u09be\u09b0 \u099c\u09a8\u09cd\u09af \u0985\u09a8\u09c1\u0997\u09cd\u09b0\u09b9 \u0995\u09b0\u09c7 \u0986\u09ae\u09be\u09a6\u09c7\u09b0 [\u0985\u09ac\u09a6\u09be\u09a8 \u0997\u09be\u0987\u09a1](CONTRIBUTING.md) \u09a6\u09c7\u0996\u09c1\u09a8\u0964\n\n\u09af\u09c7\u09b8\u09ac \u0995\u09cd\u09b7\u09c7\u09a4\u09cd\u09b0\u09c7 \u0986\u09ae\u09be\u09a6\u09c7\u09b0 \u09b8\u09be\u09b9\u09be\u09af\u09cd\u09af \u09aa\u09cd\u09b0\u09af\u09bc\u09cb\u099c\u09a8:\n- \ud83d\udcca \u09a1\u09c7\u099f\u09be\u09b8\u09c7\u099f \u09b8\u0982\u0997\u09cd\u09b0\u09b9 \u098f\u09ac\u0982 \u0995\u09bf\u0989\u09b0\u09c7\u09b6\u09a8\n- \ud83e\udd16 \u09ae\u09a1\u09c7\u09b2 \u099f\u09cd\u09b0\u09c7\u09a8\u09bf\u0982 \u098f\u09ac\u0982 \u09ab\u09be\u0987\u09a8-\u099f\u09bf\u0989\u09a8\u09bf\u0982\n- \ud83d\udcdd \u09a1\u0995\u09c1\u09ae\u09c7\u09a8\u09cd\u099f\u09c7\u09b6\u09a8 \u098f\u09ac\u0982 \u0985\u09a8\u09c1\u09ac\u09be\u09a6\n- \ud83e\uddea \u099f\u09c7\u09b8\u09cd\u099f\u09bf\u0982 \u098f\u09ac\u0982 \u0995\u09cb\u09af\u09bc\u09be\u09b2\u09bf\u099f\u09bf \u0985\u09cd\u09af\u09be\u09b8\u09bf\u0989\u09b0\u09c7\u09a8\u09cd\u09b8\n- \ud83d\udc1b \u09ac\u09be\u0997 \u09ab\u09bf\u0995\u09cd\u09b8 \u098f\u09ac\u0982 \u0989\u09a8\u09cd\u09a8\u09a4\u09bf\n\n### \u09b2\u09be\u0987\u09b8\u09c7\u09a8\u09cd\u09b8\n\n\u098f\u0987 \u09aa\u09cd\u09b0\u0995\u09b2\u09cd\u09aa\u099f\u09bf Apache License 2.0 \u098f\u09b0 \u0985\u09a7\u09c0\u09a8\u09c7 \u09b2\u09be\u0987\u09b8\u09c7\u09a8\u09cd\u09b8\u09aa\u09cd\u09b0\u09be\u09aa\u09cd\u09a4 - \u09ac\u09bf\u09b8\u09cd\u09a4\u09be\u09b0\u09bf\u09a4 \u099c\u09be\u09a8\u09be\u09b0 \u099c\u09a8\u09cd\u09af [LICENSE](LICENSE) \u09ab\u09be\u0987\u09b2 \u09a6\u09c7\u0996\u09c1\u09a8\u0964\n\n### \u09b8\u09cd\u09ac\u09c0\u0995\u09c3\u09a4\u09bf\n\n\u098f\u0987 \u09aa\u09cd\u09b0\u0995\u09b2\u09cd\u09aa\u099f\u09bf \u0993\u09aa\u09c7\u09a8-\u09b8\u09cb\u09b0\u09cd\u09b8 \u0995\u09ae\u09bf\u0989\u09a8\u09bf\u099f\u09bf\u09b0 \u09b8\u09ae\u09b0\u09cd\u09a5\u09a8\u09c7 \u09a4\u09c8\u09b0\u09bf \u098f\u09ac\u0982 \u09b8\u09ac\u09be\u09b0 \u099c\u09a8\u09cd\u09af \u09ac\u09be\u0982\u09b2\u09be \u09ad\u09be\u09b7\u09be \u09aa\u09cd\u09b0\u09af\u09c1\u0995\u09cd\u09a4\u09bf \u098f\u0997\u09bf\u09af\u09bc\u09c7 \u09a8\u09bf\u09af\u09bc\u09c7 \u09af\u09be\u0993\u09af\u09bc\u09be\u09b0 \u09b2\u0995\u09cd\u09b7\u09cd\u09af\u09c7 \u0995\u09be\u099c \u0995\u09b0\u09c7\u0964\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "High-quality Bangla and English NLP toolkit for production use",
"version": "1.0.0",
"project_urls": {
"Bug Reports": "https://github.com/kothagpt/bilingual/issues",
"Changelog": "https://github.com/kothagpt/bilingual/releases",
"Documentation": "https://bilingual.readthedocs.io",
"Homepage": "https://github.com/kothagpt/bilingual",
"Issues": "https://github.com/kothagpt/bilingual/issues",
"Repository": "https://github.com/kothagpt/bilingual",
"Source Code": "https://github.com/kothagpt/bilingual"
},
"split_keywords": [
"nlp",
" bangla",
" bengali",
" bilingual",
" multilingual",
" tokenization",
" translation",
" language-model",
" ai",
" machine-learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "23ae21cc772e2d66252ca829bce0e2f1ed7c3f0ed8e88fed700b55dd55bcb162",
"md5": "82af4582263c9813bc3ed9fb248a47e3",
"sha256": "8136ed21120f5b0b781984efa157c3bac860f538ac47978d3ec5b363551aa209"
},
"downloads": -1,
"filename": "bilingual-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "82af4582263c9813bc3ed9fb248a47e3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 121373,
"upload_time": "2025-11-15T17:12:12",
"upload_time_iso_8601": "2025-11-15T17:12:12.436707Z",
"url": "https://files.pythonhosted.org/packages/23/ae/21cc772e2d66252ca829bce0e2f1ed7c3f0ed8e88fed700b55dd55bcb162/bilingual-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-15 17:12:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kothagpt",
"github_project": "bilingual",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.20.0"
]
]
},
{
"name": "sentencepiece",
"specs": [
[
">=",
"0.1.96"
]
]
},
{
"name": "regex",
"specs": [
[
">=",
"2021.0.0"
]
]
},
{
"name": "tqdm",
"specs": [
[
">=",
"4.62.0"
]
]
},
{
"name": "requests",
"specs": [
[
">=",
"2.26.0"
]
]
},
{
"name": "fastapi",
"specs": [
[
">=",
"0.100.0"
]
]
},
{
"name": "uvicorn",
"specs": [
[
">=",
"0.23.0"
]
]
},
{
"name": "pydantic",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "pydantic-settings",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "torch",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "transformers",
"specs": [
[
">=",
"4.44.0"
]
]
},
{
"name": "accelerate",
"specs": [
[
">=",
"0.20.0"
]
]
},
{
"name": "peft",
"specs": [
[
">=",
"0.5.0"
]
]
},
{
"name": "bitsandbytes",
"specs": [
[
">=",
"0.41.0"
]
]
},
{
"name": "datasets",
"specs": [
[
">=",
"3.0.0"
]
]
},
{
"name": "tensorboard",
"specs": [
[
">=",
"2.13.0"
]
]
},
{
"name": "onnx",
"specs": [
[
">=",
"1.14.0"
]
]
},
{
"name": "onnxruntime",
"specs": [
[
">=",
"1.15.0"
]
]
},
{
"name": "optimum",
"specs": [
[
">=",
"1.12.0"
]
]
},
{
"name": "huggingface_hub",
"specs": [
[
">=",
"0.25.0"
]
]
},
{
"name": "gradio",
"specs": [
[
">=",
"4.0.0"
]
]
},
{
"name": "beautifulsoup4",
"specs": [
[
">=",
"4.11.0"
]
]
},
{
"name": "lxml",
"specs": [
[
">=",
"4.9.0"
]
]
},
{
"name": "fake-useragent",
"specs": [
[
">=",
"1.2.0"
]
]
},
{
"name": "wikiextractor",
"specs": [
[
">=",
"3.0.6"
]
]
},
{
"name": "indic-nlp-library",
"specs": [
[
">=",
"0.92"
]
]
},
{
"name": "nltk",
"specs": [
[
">=",
"3.8.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.7.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.3.0"
]
]
},
{
"name": "typer",
"specs": [
[
">=",
"0.9.0"
]
]
},
{
"name": "rich",
"specs": [
[
">=",
"13.0.0"
]
]
},
{
"name": "prometheus-client",
"specs": [
[
">=",
"0.17.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"7.0.0"
]
]
},
{
"name": "pytest-cov",
"specs": [
[
">=",
"3.0.0"
]
]
},
{
"name": "black",
"specs": [
[
">=",
"22.0.0"
]
]
},
{
"name": "isort",
"specs": [
[
">=",
"5.10.0"
]
]
},
{
"name": "flake8",
"specs": [
[
">=",
"4.0.0"
]
]
},
{
"name": "mypy",
"specs": [
[
">=",
"0.950"
]
]
}
],
"lcname": "bilingual"
}