
# NLPashto – NLP Toolkit for Pashto



[](https://pypi.org/project/nlpashto/)
[](https://scholar.google.com.pk/scholar?cites=14346943839777454210)
NLPashto is a Python suite for Pashto Natural Language Processing. It provides tools for fundamental text processing tasks, such as text cleaning, tokenization, and chunking (word segmentation). Additionally, it includes state-of-the-art models for POS tagging and sentiment analysis (specifically offensive language detection).
## Prerequisites
To use NLPashto, you will need:
- Python 3.8+
## Installing NLPashto
Install NLPashto via PyPi:
```bash
pip install nlpashto
```
## Basic Usage
### Text Cleaning
This module contains basic text cleaning utilities:
```python
from nlpashto import Cleaner
cleaner = Cleaner()
noisy_txt = "په ژوند کی علم 📚🖖 , 🖊 او پيسي 💵. 💸💲 دواړه حاصل کړه پوهان به دی علم ته درناوی ولري اوناپوهان به دي پیسو ته... https://t.co/xIiEXFg"
cleaned_text = cleaner.clean(noisy_txt)
print(cleaned_text)
# Output: په ژوند کی علم , او پيسي دواړه حاصل کړه پوهان به دی علم ته درناوی ولري او ناپوهان به دي پیسو ته
```
Parameters of the `clean` method:
- `text` (str or list): Input noisy text to clean.
- `split_into_sentences` (bool): Split text into sentences.
- `remove_emojis` (bool): Remove emojis.
- `normalize_nums` (bool): Normalize Arabic numerals (1, 2, 3, ...) to Pashto numerals (۱، ۲، ۳، ...).
- `remove_puncs` (bool): Remove punctuations.
- `remove_special_chars` (bool): Remove special characters.
- `special_chars` (list): List of special characters to keep.
### Tokenization (Space Correction)
This module corrects space omission and insertion errors. It removes extra spaces and inserts necessary ones:
```python
from nlpashto import Tokenizer
tokenizer = Tokenizer()
noisy_txt = 'جلال اباد ښار کې هره ورځ لس ګونه کسانپهډلهییزهتوګهدنشهيي توکو کارولو ته ا د ا م ه و رک وي'
tokenized_text = tokenizer.tokenize(noisy_txt)
print(tokenized_text)
# Output: [['جلال', 'اباد', 'ښار', 'کې', 'هره', 'ورځ', 'لسګونه', 'کسان', 'په', 'ډله', 'ییزه', 'توګه', 'د', 'نشه', 'يي', 'توکو', 'کارولو', 'ته', 'ادامه', 'ورکوي']]
```
### Chunking (Word Segmentation)
To retrieve full compound words instead of space-delimited tokens, use the Segmenter:
```python
from nlpashto import Segmenter
segmenter = Segmenter()
segmented_text = segmenter.segment(tokenized_text)
print(segmented_text)
# Output: [['جلال اباد', 'ښار', 'کې', 'هره', 'ورځ', 'لسګونه', 'کسان', 'په', 'ډله ییزه', 'توګه', 'د', 'نشه يي', 'توکو', 'کارولو', 'ته', 'ادامه', 'ورکوي']]
```
Specify batch size for multiple sentences:
```python
segmenter = Segmenter(batch_size=32) # Default is 16
```
### Part-of-speech (POS) Tagging
For a detailed explanation about the POS tagger, refer to the [POS tagging paper](https://www.researchsquare.com/article/rs-2712906/v1):
```python
from nlpashto import POSTagger
pos_tagger = POSTagger()
pos_tagged = pos_tagger.tag(segmented_text)
print(pos_tagged)
# Output: [[('جلال اباد', 'NNP'), ('ښار', 'NNM'), ('کې', 'PT'), ('هره', 'JJ'), ('ورځ', 'NNF'), ...]]
```
### Sentiment Analysis (Offensive Language Detection)
Detect offensive language using a fine-tuned PsBERT model:
```python
from nlpashto import POLD
sentiment_analysis = POLD()
# Offensive example
offensive_text = 'مړه یو کس وی صرف ځان شرموی او یو ستا غوندے جاهل وی چې قوم او ملت شرموی'
sentiment = sentiment_analysis.predict(offensive_text)
print(sentiment)
# Output: 1
# Normal example
normal_text = 'تاسو رښتیا وایئ خور 🙏'
sentiment = sentiment_analysis.predict(normal_text)
print(sentiment)
# Output: 0
```
## Other Resources
### Pretrained Models
- **BERT (WordPiece Level):** [ijazulhaq/bert-base-pashto](https://huggingface.co/ijazulhaq/bert-base-pashto)
- **BERT (Character Level):** [ijazulhaq/bert-base-pashto-c](https://huggingface.co/ijazulhaq/bert-base-pashto-c)
- **Static Word Embeddings:** Available on [Kaggle](https://www.kaggle.com/datasets/drijaz/pashto-we): Word2Vec, fastText, GloVe
### Datasets and Examples
- Sample datasets: [Kaggle](https://www.kaggle.com/drijaz/)
- Jupyter Notebooks: [Kaggle](https://www.kaggle.com/drijaz/)
## Citations
**[NLPashto: NLP Toolkit for Low-resource Pashto Language](https://dx.doi.org/10.14569/IJACSA.2023.01406142)**
_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, "NLPashto: NLP Toolkit for Low-resource Pashto Language," International Journal of Advanced Computer Science and Applications, vol. 14, no. 6, pp. 1345-1352, 2023._
- **BibTeX**
```bibtex
@article{haq2023nlpashto,
title={NLPashto: NLP Toolkit for Low-resource Pashto Language},
author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang},
journal={International Journal of Advanced Computer Science and Applications},
issn={2156-5570},
volume={14},
number={6},
pages={1345-1352},
year={2023},
doi={https://dx.doi.org/10.14569/IJACSA.2023.01406142}
}
```
**[Correction of Whitespace and Word Segmentation in Noisy Pashto Text using CRF](https://doi.org/10.1016/j.specom.2023.102970)**
_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, "Correction of whitespace and word segmentation in noisy Pashto text using CRF," Speech Communication, vol. 153, p. 102970, 2023._
- **BibTeX**
```bibtex
@article{HAQ2023102970,
title={Correction of whitespace and word segmentation in noisy Pashto text using CRF},
journal={Speech Communication},
issn={1872-7182},
volume={153},
pages={102970},
year={2023},
doi={https://doi.org/10.1016/j.specom.2023.102970},
author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang}
}
```
**[POS Tagging of Low-resource Pashto Language: Annotated Corpus and Bert-based Model](https://doi.org/10.21203/rs.3.rs-2712906/v1)**
_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, "POS Tagging of Low-resource Pashto Language: Annotated Corpus and BERT-based Model," Preprint, 2023._
- **BibTeX**
```bibtex
@article{haq2023pashto,
title={POS Tagging of Low-resource Pashto Language: Annotated Corpus and Bert-based Model},
author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang},
journal={Preprint},
year={2023},
doi={https://doi.org/10.21203/rs.3.rs-2712906/v1}
}
```
**[Pashto Offensive Language Detection: A Benchmark Dataset and Monolingual Pashto BERT](https://doi.org/10.7717/peerj-cs.1617)**
_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, "Pashto Offensive Language Detection: A Benchmark Dataset and Monolingual Pashto BERT," PeerJ Computer Science, vol. 9, p. e1617, 2023._
- **BibTeX**
```bibtex
@article{haq2023pold,
title={Pashto Offensive Language Detection: A Benchmark Dataset and Monolingual Pashto BERT},
author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang},
journal={PeerJ Computer Science},
issn={2376-5992},
volume={9},
pages={e1617},
year={2023},
doi={10.7717/peerj-cs.1617}
}
```
**[Social Media’s Dark Secrets: A Propagation, Lexical and Psycholinguistic Ooriented Deep Learning Approach for Fake News Proliferation](https://doi.org/10.1016/j.eswa.2024.124650)**
_K. Ahmed, M. A. Khan, I. Haq, A. Al Mazroa, M. Syam, N. Innab, et al., "Social media’s dark secrets: A propagation, lexical and psycholinguistic oriented deep learning approach for fake news proliferation," Expert Systems with Applications, vol. 255, p. 124650, 2024._
- **BibTeX**
```bibtex
@article{AHMED2024124650,
title={Social media’s dark secrets: A propagation, lexical and psycholinguistic oriented deep learning approach for fake news proliferation},
author={Kanwal Ahmed and Muhammad Asghar Khan and Ijazul Haq and Alanoud Al Mazroa and Syam M.S. and Nisreen Innab and Masoud Alajmi and Hend Khalid Alkahtani},
journal={Expert Systems with Applications},
volume={255},
pages={124650},
year={2024},
issn={0957-4174},
doi={https://doi.org/10.1016/j.eswa.2024.124650}
}
```
## Contact
- Website: [https://ijaz.me/](https://ijaz.me/)
- LinkedIn: [https://www.linkedin.com/in/drijazulhaq/](https://www.linkedin.com/in/drijazulhaq/)
Raw data
{
"_id": null,
"home_page": null,
"name": "nlpashto",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "NLP, Pashto, NLPashto, Pashto NLP, NLP Toolkit, Word Segmentation, Tokenization, POS Tagging, Sentiment Analysis, Text Classification, Token Classification, Sequence Tagging, Text Cleaning",
"author": null,
"author_email": "Ijazul Haq <ijazse@hotmail.com>",
"download_url": "https://files.pythonhosted.org/packages/a9/e5/20a1c30bf170b94c72babe9cdc8414c2773cf311ed4cb39a125b911d734c/nlpashto-0.0.25.tar.gz",
"platform": null,
"description": "\r\n\r\n# NLPashto \u2013 NLP Toolkit for Pashto\r\n\r\n\r\n \r\n\r\n[](https://pypi.org/project/nlpashto/)\r\n[](https://scholar.google.com.pk/scholar?cites=14346943839777454210)\r\n\r\nNLPashto is a Python suite for Pashto Natural Language Processing. It provides tools for fundamental text processing tasks, such as text cleaning, tokenization, and chunking (word segmentation). Additionally, it includes state-of-the-art models for POS tagging and sentiment analysis (specifically offensive language detection).\r\n\r\n## Prerequisites\r\n\r\nTo use NLPashto, you will need:\r\n\r\n- Python 3.8+\r\n\r\n## Installing NLPashto\r\n\r\nInstall NLPashto via PyPi:\r\n\r\n```bash\r\npip install nlpashto\r\n```\r\n\r\n## Basic Usage\r\n\r\n### Text Cleaning\r\n\r\nThis module contains basic text cleaning utilities:\r\n\r\n```python\r\nfrom nlpashto import Cleaner\r\n\r\ncleaner = Cleaner()\r\nnoisy_txt = \"\u067e\u0647 \u0698\u0648\u0646\u062f \u06a9\u06cc \u0639\u0644\u0645 \ud83d\udcda\ud83d\udd96 , \ud83d\udd8a \u0627\u0648 \u067e\u064a\u0633\u064a \ud83d\udcb5. \ud83d\udcb8\ud83d\udcb2 \u062f\u0648\u0627\u0693\u0647 \u062d\u0627\u0635\u0644 \u06a9\u0693\u0647 \u067e\u0648\u0647\u0627\u0646 \u0628\u0647 \u062f\u06cc \u0639\u0644\u0645 \u062a\u0647 \u062f\u0631\u0646\u0627\u0648\u06cc \u0648\u0644\u0631\u064a \u0627\u0648\u0646\u0627\u067e\u0648\u0647\u0627\u0646 \u0628\u0647 \u062f\u064a \u067e\u06cc\u0633\u0648 \u062a\u0647... https://t.co/xIiEXFg\"\r\n\r\ncleaned_text = cleaner.clean(noisy_txt)\r\nprint(cleaned_text)\r\n# Output: \u067e\u0647 \u0698\u0648\u0646\u062f \u06a9\u06cc \u0639\u0644\u0645 , \u0627\u0648 \u067e\u064a\u0633\u064a \u062f\u0648\u0627\u0693\u0647 \u062d\u0627\u0635\u0644 \u06a9\u0693\u0647 \u067e\u0648\u0647\u0627\u0646 \u0628\u0647 \u062f\u06cc \u0639\u0644\u0645 \u062a\u0647 \u062f\u0631\u0646\u0627\u0648\u06cc \u0648\u0644\u0631\u064a \u0627\u0648 \u0646\u0627\u067e\u0648\u0647\u0627\u0646 \u0628\u0647 \u062f\u064a \u067e\u06cc\u0633\u0648 \u062a\u0647\r\n```\r\n\r\nParameters of the `clean` method:\r\n\r\n- `text` (str or list): Input noisy text to clean.\r\n- `split_into_sentences` (bool): Split text into sentences.\r\n- `remove_emojis` (bool): Remove emojis.\r\n- `normalize_nums` (bool): Normalize Arabic numerals (1, 2, 3, ...) to Pashto numerals (\u06f1\u060c \u06f2\u060c \u06f3\u060c ...).\r\n- `remove_puncs` (bool): Remove punctuations.\r\n- `remove_special_chars` (bool): Remove special characters.\r\n- `special_chars` (list): List of special characters to keep.\r\n\r\n### Tokenization (Space Correction)\r\n\r\nThis module corrects space omission and insertion errors. It removes extra spaces and inserts necessary ones:\r\n\r\n```python\r\nfrom nlpashto import Tokenizer\r\n\r\ntokenizer = Tokenizer()\r\nnoisy_txt = '\u062c\u0644\u0627\u0644 \u0627\u0628\u0627\u062f \u069a\u0627\u0631 \u06a9\u06d0 \u0647\u0631\u0647 \u0648\u0631\u0681 \u0644\u0633 \u06ab\u0648\u0646\u0647 \u06a9\u0633\u0627\u0646\u067e\u0647\u0689\u0644\u0647\u06cc\u06cc\u0632\u0647\u062a\u0648\u06ab\u0647\u062f\u0646\u0634\u0647\u064a\u064a \u062a\u0648\u06a9\u0648 \u06a9\u0627\u0631\u0648\u0644\u0648 \u062a\u0647 \u0627 \u062f \u0627 \u0645 \u0647 \u0648 \u0631\u06a9 \u0648\u064a'\r\n\r\ntokenized_text = tokenizer.tokenize(noisy_txt)\r\nprint(tokenized_text)\r\n# Output: [['\u062c\u0644\u0627\u0644', '\u0627\u0628\u0627\u062f', '\u069a\u0627\u0631', '\u06a9\u06d0', '\u0647\u0631\u0647', '\u0648\u0631\u0681', '\u0644\u0633\u06ab\u0648\u0646\u0647', '\u06a9\u0633\u0627\u0646', '\u067e\u0647', '\u0689\u0644\u0647', '\u06cc\u06cc\u0632\u0647', '\u062a\u0648\u06ab\u0647', '\u062f', '\u0646\u0634\u0647', '\u064a\u064a', '\u062a\u0648\u06a9\u0648', '\u06a9\u0627\u0631\u0648\u0644\u0648', '\u062a\u0647', '\u0627\u062f\u0627\u0645\u0647', '\u0648\u0631\u06a9\u0648\u064a']]\r\n```\r\n\r\n### Chunking (Word Segmentation)\r\n\r\nTo retrieve full compound words instead of space-delimited tokens, use the Segmenter:\r\n\r\n```python\r\nfrom nlpashto import Segmenter\r\n\r\nsegmenter = Segmenter()\r\nsegmented_text = segmenter.segment(tokenized_text)\r\nprint(segmented_text)\r\n# Output: [['\u062c\u0644\u0627\u0644 \u0627\u0628\u0627\u062f', '\u069a\u0627\u0631', '\u06a9\u06d0', '\u0647\u0631\u0647', '\u0648\u0631\u0681', '\u0644\u0633\u06ab\u0648\u0646\u0647', '\u06a9\u0633\u0627\u0646', '\u067e\u0647', '\u0689\u0644\u0647 \u06cc\u06cc\u0632\u0647', '\u062a\u0648\u06ab\u0647', '\u062f', '\u0646\u0634\u0647 \u064a\u064a', '\u062a\u0648\u06a9\u0648', '\u06a9\u0627\u0631\u0648\u0644\u0648', '\u062a\u0647', '\u0627\u062f\u0627\u0645\u0647', '\u0648\u0631\u06a9\u0648\u064a']]\r\n```\r\n\r\nSpecify batch size for multiple sentences:\r\n\r\n```python\r\nsegmenter = Segmenter(batch_size=32) # Default is 16\r\n```\r\n\r\n### Part-of-speech (POS) Tagging\r\n\r\nFor a detailed explanation about the POS tagger, refer to the [POS tagging paper](https://www.researchsquare.com/article/rs-2712906/v1):\r\n\r\n```python\r\nfrom nlpashto import POSTagger\r\n\r\npos_tagger = POSTagger()\r\npos_tagged = pos_tagger.tag(segmented_text)\r\nprint(pos_tagged)\r\n# Output: [[('\u062c\u0644\u0627\u0644 \u0627\u0628\u0627\u062f', 'NNP'), ('\u069a\u0627\u0631', 'NNM'), ('\u06a9\u06d0', 'PT'), ('\u0647\u0631\u0647', 'JJ'), ('\u0648\u0631\u0681', 'NNF'), ...]]\r\n```\r\n\r\n### Sentiment Analysis (Offensive Language Detection)\r\n\r\nDetect offensive language using a fine-tuned PsBERT model:\r\n\r\n```python\r\nfrom nlpashto import POLD\r\n\r\nsentiment_analysis = POLD()\r\n\r\n# Offensive example\r\noffensive_text = '\u0645\u0693\u0647 \u06cc\u0648 \u06a9\u0633 \u0648\u06cc \u0635\u0631\u0641 \u0681\u0627\u0646 \u0634\u0631\u0645\u0648\u06cc \u0627\u0648 \u06cc\u0648 \u0633\u062a\u0627 \u063a\u0648\u0646\u062f\u06d2 \u062c\u0627\u0647\u0644 \u0648\u06cc \u0686\u06d0 \u0642\u0648\u0645 \u0627\u0648 \u0645\u0644\u062a \u0634\u0631\u0645\u0648\u06cc'\r\nsentiment = sentiment_analysis.predict(offensive_text)\r\nprint(sentiment)\r\n# Output: 1\r\n\r\n# Normal example\r\nnormal_text = '\u062a\u0627\u0633\u0648 \u0631\u069a\u062a\u06cc\u0627 \u0648\u0627\u06cc\u0626 \u062e\u0648\u0631 \ud83d\ude4f'\r\nsentiment = sentiment_analysis.predict(normal_text)\r\nprint(sentiment)\r\n# Output: 0\r\n```\r\n\r\n## Other Resources\r\n\r\n### Pretrained Models\r\n\r\n- **BERT (WordPiece Level):** [ijazulhaq/bert-base-pashto](https://huggingface.co/ijazulhaq/bert-base-pashto)\r\n- **BERT (Character Level):** [ijazulhaq/bert-base-pashto-c](https://huggingface.co/ijazulhaq/bert-base-pashto-c)\r\n- **Static Word Embeddings:** Available on [Kaggle](https://www.kaggle.com/datasets/drijaz/pashto-we): Word2Vec, fastText, GloVe\r\n\r\n### Datasets and Examples\r\n\r\n- Sample datasets: [Kaggle](https://www.kaggle.com/drijaz/)\r\n- Jupyter Notebooks: [Kaggle](https://www.kaggle.com/drijaz/)\r\n\r\n## Citations\r\n\r\n**[NLPashto: NLP Toolkit for Low-resource Pashto Language](https://dx.doi.org/10.14569/IJACSA.2023.01406142)**\r\n\r\n_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, \"NLPashto: NLP Toolkit for Low-resource Pashto Language,\" International Journal of Advanced Computer Science and Applications, vol. 14, no. 6, pp. 1345-1352, 2023._\r\n\r\n- **BibTeX**\r\n\r\n ```bibtex\r\n @article{haq2023nlpashto,\r\n title={NLPashto: NLP Toolkit for Low-resource Pashto Language},\r\n author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang},\r\n journal={International Journal of Advanced Computer Science and Applications},\r\n issn={2156-5570},\r\n volume={14},\r\n number={6},\r\n pages={1345-1352},\r\n year={2023},\r\n doi={https://dx.doi.org/10.14569/IJACSA.2023.01406142}\r\n }\r\n ```\r\n\r\n**[Correction of Whitespace and Word Segmentation in Noisy Pashto Text using CRF](https://doi.org/10.1016/j.specom.2023.102970)**\r\n\r\n_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, \"Correction of whitespace and word segmentation in noisy Pashto text using CRF,\" Speech Communication, vol. 153, p. 102970, 2023._\r\n\r\n- **BibTeX**\r\n\r\n ```bibtex\r\n @article{HAQ2023102970,\r\n title={Correction of whitespace and word segmentation in noisy Pashto text using CRF},\r\n journal={Speech Communication},\r\n issn={1872-7182},\r\n volume={153},\r\n pages={102970},\r\n year={2023},\r\n doi={https://doi.org/10.1016/j.specom.2023.102970},\r\n author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang}\r\n }\r\n ```\r\n\r\n**[POS Tagging of Low-resource Pashto Language: Annotated Corpus and Bert-based Model](https://doi.org/10.21203/rs.3.rs-2712906/v1)**\r\n\r\n_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, \"POS Tagging of Low-resource Pashto Language: Annotated Corpus and BERT-based Model,\" Preprint, 2023._\r\n\r\n- **BibTeX**\r\n\r\n ```bibtex\r\n @article{haq2023pashto,\r\n title={POS Tagging of Low-resource Pashto Language: Annotated Corpus and Bert-based Model},\r\n author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang},\r\n journal={Preprint},\r\n year={2023},\r\n doi={https://doi.org/10.21203/rs.3.rs-2712906/v1}\r\n }\r\n ```\r\n\r\n**[Pashto Offensive Language Detection: A Benchmark Dataset and Monolingual Pashto BERT](https://doi.org/10.7717/peerj-cs.1617)**\r\n\r\n_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, \"Pashto Offensive Language Detection: A Benchmark Dataset and Monolingual Pashto BERT,\" PeerJ Computer Science, vol. 9, p. e1617, 2023._\r\n\r\n- **BibTeX**\r\n\r\n ```bibtex\r\n @article{haq2023pold,\r\n title={Pashto Offensive Language Detection: A Benchmark Dataset and Monolingual Pashto BERT},\r\n author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang},\r\n journal={PeerJ Computer Science},\r\n issn={2376-5992},\r\n volume={9},\r\n pages={e1617},\r\n year={2023},\r\n doi={10.7717/peerj-cs.1617}\r\n }\r\n ```\r\n\r\n**[Social Media\u2019s Dark Secrets: A Propagation, Lexical and Psycholinguistic Ooriented Deep Learning Approach for Fake News Proliferation](https://doi.org/10.1016/j.eswa.2024.124650)**\r\n\r\n_K. Ahmed, M. A. Khan, I. Haq, A. Al Mazroa, M. Syam, N. Innab, et al., \"Social media\u2019s dark secrets: A propagation, lexical and psycholinguistic oriented deep learning approach for fake news proliferation,\" Expert Systems with Applications, vol. 255, p. 124650, 2024._\r\n\r\n- **BibTeX**\r\n\r\n ```bibtex\r\n @article{AHMED2024124650,\r\n title={Social media\u2019s dark secrets: A propagation, lexical and psycholinguistic oriented deep learning approach for fake news proliferation},\r\n author={Kanwal Ahmed and Muhammad Asghar Khan and Ijazul Haq and Alanoud Al Mazroa and Syam M.S. and Nisreen Innab and Masoud Alajmi and Hend Khalid Alkahtani},\r\n journal={Expert Systems with Applications},\r\n volume={255},\r\n pages={124650},\r\n year={2024},\r\n issn={0957-4174},\r\n doi={https://doi.org/10.1016/j.eswa.2024.124650}\r\n }\r\n ```\r\n \r\n## Contact\r\n- Website: [https://ijaz.me/](https://ijaz.me/)\r\n- LinkedIn: [https://www.linkedin.com/in/drijazulhaq/](https://www.linkedin.com/in/drijazulhaq/)\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Pashto Natural Language Processing Toolkit",
"version": "0.0.25",
"project_urls": {
"Homepage": "https://github.com/dr-ijaz/nlpashto"
},
"split_keywords": [
"nlp",
" pashto",
" nlpashto",
" pashto nlp",
" nlp toolkit",
" word segmentation",
" tokenization",
" pos tagging",
" sentiment analysis",
" text classification",
" token classification",
" sequence tagging",
" text cleaning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cbb05781747c04ad5488b61cd494e4d32de0d118c42975a5fccc174adf22bd7d",
"md5": "c62babcd048418b6efabe79e141aae4a",
"sha256": "713ba9ced7d66a121ace0cc4de71ccdd08d781e77a833a65eaf167019bdcb021"
},
"downloads": -1,
"filename": "nlpashto-0.0.25-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c62babcd048418b6efabe79e141aae4a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 11958,
"upload_time": "2025-02-01T11:17:42",
"upload_time_iso_8601": "2025-02-01T11:17:42.529372Z",
"url": "https://files.pythonhosted.org/packages/cb/b0/5781747c04ad5488b61cd494e4d32de0d118c42975a5fccc174adf22bd7d/nlpashto-0.0.25-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a9e520a1c30bf170b94c72babe9cdc8414c2773cf311ed4cb39a125b911d734c",
"md5": "6c452cc038eef50f9fcfa5d9f3f25ee4",
"sha256": "16b6f33650145c6e1b435126a674618690f1868358ef8a5819a54729bba3ca38"
},
"downloads": -1,
"filename": "nlpashto-0.0.25.tar.gz",
"has_sig": false,
"md5_digest": "6c452cc038eef50f9fcfa5d9f3f25ee4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 13005,
"upload_time": "2025-02-01T11:17:44",
"upload_time_iso_8601": "2025-02-01T11:17:44.438272Z",
"url": "https://files.pythonhosted.org/packages/a9/e5/20a1c30bf170b94c72babe9cdc8414c2773cf311ed4cb39a125b911d734c/nlpashto-0.0.25.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-01 11:17:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dr-ijaz",
"github_project": "nlpashto",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "nlpashto"
}