nlpashto

Name	nlpashto JSON
Version	0.0.25 JSON
	download
home_page	None
Summary	Pashto Natural Language Processing Toolkit
upload_time	2025-02-01 11:17:44
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	None
keywords	nlp pashto nlpashto pashto nlp nlp toolkit word segmentation tokenization pos tagging sentiment analysis text classification token classification sequence tagging text cleaning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![Pashto Word Cloud](https://github.com/dr-ijaz/nlpashto/blob/main/wc.gif)

# NLPashto – NLP Toolkit for Pashto

![GitHub](https://img.shields.io/github/license/dr-ijaz/nlpashto)
![GitHub contributors](https://img.shields.io/github/contributors/dr-ijaz/nlpashto) 
![Code size](https://img.shields.io/github/languages/code-size/dr-ijaz/nlpashto)
[![PyPI](https://img.shields.io/pypi/v/nlpashto)](https://pypi.org/project/nlpashto/)
[![Citation](https://img.shields.io/badge/citation-10-green)](https://scholar.google.com.pk/scholar?cites=14346943839777454210)

NLPashto is a Python suite for Pashto Natural Language Processing. It provides tools for fundamental text processing tasks, such as text cleaning, tokenization, and chunking (word segmentation). Additionally, it includes state-of-the-art models for POS tagging and sentiment analysis (specifically offensive language detection).

## Prerequisites

To use NLPashto, you will need:

- Python 3.8+

## Installing NLPashto

Install NLPashto via PyPi:

```bash
pip install nlpashto
```

## Basic Usage

### Text Cleaning

This module contains basic text cleaning utilities:

```python
from nlpashto import Cleaner

cleaner = Cleaner()
noisy_txt = "په ژوند کی علم 📚🖖 , 🖊  او پيسي 💵.  💸💲 دواړه حاصل کړه پوهان به دی علم ته درناوی ولري اوناپوهان به دي پیسو ته... https://t.co/xIiEXFg"

cleaned_text = cleaner.clean(noisy_txt)
print(cleaned_text)
# Output: په ژوند کی علم , او پيسي دواړه حاصل کړه پوهان به دی علم ته درناوی ولري او ناپوهان به دي پیسو ته
```

Parameters of the `clean` method:

- `text` (str or list): Input noisy text to clean.
- `split_into_sentences` (bool): Split text into sentences.
- `remove_emojis` (bool): Remove emojis.
- `normalize_nums` (bool): Normalize Arabic numerals (1, 2, 3, ...) to Pashto numerals (۱، ۲، ۳، ...).
- `remove_puncs` (bool): Remove punctuations.
- `remove_special_chars` (bool): Remove special characters.
- `special_chars` (list): List of special characters to keep.

### Tokenization (Space Correction)

This module corrects space omission and insertion errors. It removes extra spaces and inserts necessary ones:

```python
from nlpashto import Tokenizer

tokenizer = Tokenizer()
noisy_txt = 'جلال اباد ښار کې هره ورځ لس ګونه کسانپهډلهییزهتوګهدنشهيي توکو کارولو ته ا د ا م ه و رک وي'

tokenized_text = tokenizer.tokenize(noisy_txt)
print(tokenized_text)
# Output: [['جلال', 'اباد', 'ښار', 'کې', 'هره', 'ورځ', 'لسګونه', 'کسان', 'په', 'ډله', 'ییزه', 'توګه', 'د', 'نشه', 'يي', 'توکو', 'کارولو', 'ته', 'ادامه', 'ورکوي']]
```

### Chunking (Word Segmentation)

To retrieve full compound words instead of space-delimited tokens, use the Segmenter:

```python
from nlpashto import Segmenter

segmenter = Segmenter()
segmented_text = segmenter.segment(tokenized_text)
print(segmented_text)
# Output: [['جلال اباد', 'ښار', 'کې', 'هره', 'ورځ', 'لسګونه', 'کسان', 'په', 'ډله ییزه', 'توګه', 'د', 'نشه يي', 'توکو', 'کارولو', 'ته', 'ادامه', 'ورکوي']]
```

Specify batch size for multiple sentences:

```python
segmenter = Segmenter(batch_size=32)  # Default is 16
```

### Part-of-speech (POS) Tagging

For a detailed explanation about the POS tagger, refer to the [POS tagging paper](https://www.researchsquare.com/article/rs-2712906/v1):

```python
from nlpashto import POSTagger

pos_tagger = POSTagger()
pos_tagged = pos_tagger.tag(segmented_text)
print(pos_tagged)
# Output: [[('جلال اباد', 'NNP'), ('ښار', 'NNM'), ('کې', 'PT'), ('هره', 'JJ'), ('ورځ', 'NNF'), ...]]
```

### Sentiment Analysis (Offensive Language Detection)

Detect offensive language using a fine-tuned PsBERT model:

```python
from nlpashto import POLD

sentiment_analysis = POLD()

# Offensive example
offensive_text = 'مړه یو کس وی صرف ځان شرموی او یو ستا غوندے جاهل وی چې قوم او ملت شرموی'
sentiment = sentiment_analysis.predict(offensive_text)
print(sentiment)
# Output: 1

# Normal example
normal_text = 'تاسو رښتیا وایئ خور 🙏'
sentiment = sentiment_analysis.predict(normal_text)
print(sentiment)
# Output: 0
```

## Other Resources

### Pretrained Models

- **BERT (WordPiece Level):** [ijazulhaq/bert-base-pashto](https://huggingface.co/ijazulhaq/bert-base-pashto)
- **BERT (Character Level):** [ijazulhaq/bert-base-pashto-c](https://huggingface.co/ijazulhaq/bert-base-pashto-c)
- **Static Word Embeddings:** Available on [Kaggle](https://www.kaggle.com/datasets/drijaz/pashto-we): Word2Vec, fastText, GloVe

### Datasets and Examples

- Sample datasets: [Kaggle](https://www.kaggle.com/drijaz/)
- Jupyter Notebooks: [Kaggle](https://www.kaggle.com/drijaz/)

## Citations

**[NLPashto: NLP Toolkit for Low-resource Pashto Language](https://dx.doi.org/10.14569/IJACSA.2023.01406142)**

_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, "NLPashto: NLP Toolkit for Low-resource Pashto Language," International Journal of Advanced Computer Science and Applications, vol. 14, no. 6, pp. 1345-1352, 2023._

- **BibTeX**

  ```bibtex
  @article{haq2023nlpashto,
    title={NLPashto: NLP Toolkit for Low-resource Pashto Language},
    author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang},
    journal={International Journal of Advanced Computer Science and Applications},
    issn={2156-5570},
    volume={14},
    number={6},
    pages={1345-1352},
    year={2023},
    doi={https://dx.doi.org/10.14569/IJACSA.2023.01406142}
  }
  ```

**[Correction of Whitespace and Word Segmentation in Noisy Pashto Text using CRF](https://doi.org/10.1016/j.specom.2023.102970)**

_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, "Correction of whitespace and word segmentation in noisy Pashto text using CRF," Speech Communication, vol. 153, p. 102970, 2023._

- **BibTeX**

  ```bibtex
  @article{HAQ2023102970,
    title={Correction of whitespace and word segmentation in noisy Pashto text using CRF},
    journal={Speech Communication},
    issn={1872-7182},
    volume={153},
    pages={102970},
    year={2023},
    doi={https://doi.org/10.1016/j.specom.2023.102970},
    author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang}
  }
  ```

**[POS Tagging of Low-resource Pashto Language: Annotated Corpus and Bert-based Model](https://doi.org/10.21203/rs.3.rs-2712906/v1)**

_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, "POS Tagging of Low-resource Pashto Language: Annotated Corpus and BERT-based Model," Preprint, 2023._

- **BibTeX**

  ```bibtex
  @article{haq2023pashto,
    title={POS Tagging of Low-resource Pashto Language: Annotated Corpus and Bert-based Model},
    author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang},
    journal={Preprint},
    year={2023},
    doi={https://doi.org/10.21203/rs.3.rs-2712906/v1}
  }
  ```

**[Pashto Offensive Language Detection: A Benchmark Dataset and Monolingual Pashto BERT](https://doi.org/10.7717/peerj-cs.1617)**

_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, "Pashto Offensive Language Detection: A Benchmark Dataset and Monolingual Pashto BERT," PeerJ Computer Science, vol. 9, p. e1617, 2023._

- **BibTeX**

  ```bibtex
  @article{haq2023pold,
    title={Pashto Offensive Language Detection: A Benchmark Dataset and Monolingual Pashto BERT},
    author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang},
    journal={PeerJ Computer Science},
    issn={2376-5992},
    volume={9},
    pages={e1617},
    year={2023},
    doi={10.7717/peerj-cs.1617}
  }
  ```

**[Social Media’s Dark Secrets: A Propagation, Lexical and Psycholinguistic Ooriented Deep Learning Approach for Fake News Proliferation](https://doi.org/10.1016/j.eswa.2024.124650)**

_K. Ahmed, M. A. Khan, I. Haq, A. Al Mazroa, M. Syam, N. Innab, et al., "Social media’s dark secrets: A propagation, lexical and psycholinguistic oriented deep learning approach for fake news proliferation," Expert Systems with Applications, vol. 255, p. 124650, 2024._

- **BibTeX**

  ```bibtex
  @article{AHMED2024124650,
    title={Social media’s dark secrets: A propagation, lexical and psycholinguistic oriented deep learning approach for fake news proliferation},
    author={Kanwal Ahmed and Muhammad Asghar Khan and Ijazul Haq and Alanoud Al Mazroa and Syam M.S. and Nisreen Innab and Masoud Alajmi and Hend Khalid Alkahtani},
    journal={Expert Systems with Applications},
    volume={255},
    pages={124650},
    year={2024},
    issn={0957-4174},
    doi={https://doi.org/10.1016/j.eswa.2024.124650}
  }
  ```
  
## Contact
- Website: [https://ijaz.me/](https://ijaz.me/)
- LinkedIn: [https://www.linkedin.com/in/drijazulhaq/](https://www.linkedin.com/in/drijazulhaq/)

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "nlpashto",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "NLP, Pashto, NLPashto, Pashto NLP, NLP Toolkit, Word Segmentation, Tokenization, POS Tagging, Sentiment Analysis, Text Classification, Token Classification, Sequence Tagging, Text Cleaning",
    "author": null,
    "author_email": "Ijazul Haq <ijazse@hotmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/a9/e5/20a1c30bf170b94c72babe9cdc8414c2773cf311ed4cb39a125b911d734c/nlpashto-0.0.25.tar.gz",
    "platform": null,
    "description": "![Pashto Word Cloud](https://github.com/dr-ijaz/nlpashto/blob/main/wc.gif)\r\n\r\n# NLPashto \u2013 NLP Toolkit for Pashto\r\n\r\n![GitHub](https://img.shields.io/github/license/dr-ijaz/nlpashto)\r\n![GitHub contributors](https://img.shields.io/github/contributors/dr-ijaz/nlpashto) \r\n![Code size](https://img.shields.io/github/languages/code-size/dr-ijaz/nlpashto)\r\n[![PyPI](https://img.shields.io/pypi/v/nlpashto)](https://pypi.org/project/nlpashto/)\r\n[![Citation](https://img.shields.io/badge/citation-10-green)](https://scholar.google.com.pk/scholar?cites=14346943839777454210)\r\n\r\nNLPashto is a Python suite for Pashto Natural Language Processing. It provides tools for fundamental text processing tasks, such as text cleaning, tokenization, and chunking (word segmentation). Additionally, it includes state-of-the-art models for POS tagging and sentiment analysis (specifically offensive language detection).\r\n\r\n## Prerequisites\r\n\r\nTo use NLPashto, you will need:\r\n\r\n- Python 3.8+\r\n\r\n## Installing NLPashto\r\n\r\nInstall NLPashto via PyPi:\r\n\r\n```bash\r\npip install nlpashto\r\n```\r\n\r\n## Basic Usage\r\n\r\n### Text Cleaning\r\n\r\nThis module contains basic text cleaning utilities:\r\n\r\n```python\r\nfrom nlpashto import Cleaner\r\n\r\ncleaner = Cleaner()\r\nnoisy_txt = \"\u067e\u0647 \u0698\u0648\u0646\u062f \u06a9\u06cc \u0639\u0644\u0645 \ud83d\udcda\ud83d\udd96 , \ud83d\udd8a  \u0627\u0648 \u067e\u064a\u0633\u064a \ud83d\udcb5.  \ud83d\udcb8\ud83d\udcb2 \u062f\u0648\u0627\u0693\u0647 \u062d\u0627\u0635\u0644 \u06a9\u0693\u0647 \u067e\u0648\u0647\u0627\u0646 \u0628\u0647 \u062f\u06cc \u0639\u0644\u0645 \u062a\u0647 \u062f\u0631\u0646\u0627\u0648\u06cc \u0648\u0644\u0631\u064a \u0627\u0648\u0646\u0627\u067e\u0648\u0647\u0627\u0646 \u0628\u0647 \u062f\u064a \u067e\u06cc\u0633\u0648 \u062a\u0647... https://t.co/xIiEXFg\"\r\n\r\ncleaned_text = cleaner.clean(noisy_txt)\r\nprint(cleaned_text)\r\n# Output: \u067e\u0647 \u0698\u0648\u0646\u062f \u06a9\u06cc \u0639\u0644\u0645 , \u0627\u0648 \u067e\u064a\u0633\u064a \u062f\u0648\u0627\u0693\u0647 \u062d\u0627\u0635\u0644 \u06a9\u0693\u0647 \u067e\u0648\u0647\u0627\u0646 \u0628\u0647 \u062f\u06cc \u0639\u0644\u0645 \u062a\u0647 \u062f\u0631\u0646\u0627\u0648\u06cc \u0648\u0644\u0631\u064a \u0627\u0648 \u0646\u0627\u067e\u0648\u0647\u0627\u0646 \u0628\u0647 \u062f\u064a \u067e\u06cc\u0633\u0648 \u062a\u0647\r\n```\r\n\r\nParameters of the `clean` method:\r\n\r\n- `text` (str or list): Input noisy text to clean.\r\n- `split_into_sentences` (bool): Split text into sentences.\r\n- `remove_emojis` (bool): Remove emojis.\r\n- `normalize_nums` (bool): Normalize Arabic numerals (1, 2, 3, ...) to Pashto numerals (\u06f1\u060c \u06f2\u060c \u06f3\u060c ...).\r\n- `remove_puncs` (bool): Remove punctuations.\r\n- `remove_special_chars` (bool): Remove special characters.\r\n- `special_chars` (list): List of special characters to keep.\r\n\r\n### Tokenization (Space Correction)\r\n\r\nThis module corrects space omission and insertion errors. It removes extra spaces and inserts necessary ones:\r\n\r\n```python\r\nfrom nlpashto import Tokenizer\r\n\r\ntokenizer = Tokenizer()\r\nnoisy_txt = '\u062c\u0644\u0627\u0644 \u0627\u0628\u0627\u062f \u069a\u0627\u0631 \u06a9\u06d0 \u0647\u0631\u0647 \u0648\u0631\u0681 \u0644\u0633 \u06ab\u0648\u0646\u0647 \u06a9\u0633\u0627\u0646\u067e\u0647\u0689\u0644\u0647\u06cc\u06cc\u0632\u0647\u062a\u0648\u06ab\u0647\u062f\u0646\u0634\u0647\u064a\u064a \u062a\u0648\u06a9\u0648 \u06a9\u0627\u0631\u0648\u0644\u0648 \u062a\u0647 \u0627 \u062f \u0627 \u0645 \u0647 \u0648 \u0631\u06a9 \u0648\u064a'\r\n\r\ntokenized_text = tokenizer.tokenize(noisy_txt)\r\nprint(tokenized_text)\r\n# Output: [['\u062c\u0644\u0627\u0644', '\u0627\u0628\u0627\u062f', '\u069a\u0627\u0631', '\u06a9\u06d0', '\u0647\u0631\u0647', '\u0648\u0631\u0681', '\u0644\u0633\u06ab\u0648\u0646\u0647', '\u06a9\u0633\u0627\u0646', '\u067e\u0647', '\u0689\u0644\u0647', '\u06cc\u06cc\u0632\u0647', '\u062a\u0648\u06ab\u0647', '\u062f', '\u0646\u0634\u0647', '\u064a\u064a', '\u062a\u0648\u06a9\u0648', '\u06a9\u0627\u0631\u0648\u0644\u0648', '\u062a\u0647', '\u0627\u062f\u0627\u0645\u0647', '\u0648\u0631\u06a9\u0648\u064a']]\r\n```\r\n\r\n### Chunking (Word Segmentation)\r\n\r\nTo retrieve full compound words instead of space-delimited tokens, use the Segmenter:\r\n\r\n```python\r\nfrom nlpashto import Segmenter\r\n\r\nsegmenter = Segmenter()\r\nsegmented_text = segmenter.segment(tokenized_text)\r\nprint(segmented_text)\r\n# Output: [['\u062c\u0644\u0627\u0644 \u0627\u0628\u0627\u062f', '\u069a\u0627\u0631', '\u06a9\u06d0', '\u0647\u0631\u0647', '\u0648\u0631\u0681', '\u0644\u0633\u06ab\u0648\u0646\u0647', '\u06a9\u0633\u0627\u0646', '\u067e\u0647', '\u0689\u0644\u0647 \u06cc\u06cc\u0632\u0647', '\u062a\u0648\u06ab\u0647', '\u062f', '\u0646\u0634\u0647 \u064a\u064a', '\u062a\u0648\u06a9\u0648', '\u06a9\u0627\u0631\u0648\u0644\u0648', '\u062a\u0647', '\u0627\u062f\u0627\u0645\u0647', '\u0648\u0631\u06a9\u0648\u064a']]\r\n```\r\n\r\nSpecify batch size for multiple sentences:\r\n\r\n```python\r\nsegmenter = Segmenter(batch_size=32)  # Default is 16\r\n```\r\n\r\n### Part-of-speech (POS) Tagging\r\n\r\nFor a detailed explanation about the POS tagger, refer to the [POS tagging paper](https://www.researchsquare.com/article/rs-2712906/v1):\r\n\r\n```python\r\nfrom nlpashto import POSTagger\r\n\r\npos_tagger = POSTagger()\r\npos_tagged = pos_tagger.tag(segmented_text)\r\nprint(pos_tagged)\r\n# Output: [[('\u062c\u0644\u0627\u0644 \u0627\u0628\u0627\u062f', 'NNP'), ('\u069a\u0627\u0631', 'NNM'), ('\u06a9\u06d0', 'PT'), ('\u0647\u0631\u0647', 'JJ'), ('\u0648\u0631\u0681', 'NNF'), ...]]\r\n```\r\n\r\n### Sentiment Analysis (Offensive Language Detection)\r\n\r\nDetect offensive language using a fine-tuned PsBERT model:\r\n\r\n```python\r\nfrom nlpashto import POLD\r\n\r\nsentiment_analysis = POLD()\r\n\r\n# Offensive example\r\noffensive_text = '\u0645\u0693\u0647 \u06cc\u0648 \u06a9\u0633 \u0648\u06cc \u0635\u0631\u0641 \u0681\u0627\u0646 \u0634\u0631\u0645\u0648\u06cc \u0627\u0648 \u06cc\u0648 \u0633\u062a\u0627 \u063a\u0648\u0646\u062f\u06d2 \u062c\u0627\u0647\u0644 \u0648\u06cc \u0686\u06d0 \u0642\u0648\u0645 \u0627\u0648 \u0645\u0644\u062a \u0634\u0631\u0645\u0648\u06cc'\r\nsentiment = sentiment_analysis.predict(offensive_text)\r\nprint(sentiment)\r\n# Output: 1\r\n\r\n# Normal example\r\nnormal_text = '\u062a\u0627\u0633\u0648 \u0631\u069a\u062a\u06cc\u0627 \u0648\u0627\u06cc\u0626 \u062e\u0648\u0631 \ud83d\ude4f'\r\nsentiment = sentiment_analysis.predict(normal_text)\r\nprint(sentiment)\r\n# Output: 0\r\n```\r\n\r\n## Other Resources\r\n\r\n### Pretrained Models\r\n\r\n- **BERT (WordPiece Level):** [ijazulhaq/bert-base-pashto](https://huggingface.co/ijazulhaq/bert-base-pashto)\r\n- **BERT (Character Level):** [ijazulhaq/bert-base-pashto-c](https://huggingface.co/ijazulhaq/bert-base-pashto-c)\r\n- **Static Word Embeddings:** Available on [Kaggle](https://www.kaggle.com/datasets/drijaz/pashto-we): Word2Vec, fastText, GloVe\r\n\r\n### Datasets and Examples\r\n\r\n- Sample datasets: [Kaggle](https://www.kaggle.com/drijaz/)\r\n- Jupyter Notebooks: [Kaggle](https://www.kaggle.com/drijaz/)\r\n\r\n## Citations\r\n\r\n**[NLPashto: NLP Toolkit for Low-resource Pashto Language](https://dx.doi.org/10.14569/IJACSA.2023.01406142)**\r\n\r\n_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, \"NLPashto: NLP Toolkit for Low-resource Pashto Language,\" International Journal of Advanced Computer Science and Applications, vol. 14, no. 6, pp. 1345-1352, 2023._\r\n\r\n- **BibTeX**\r\n\r\n  ```bibtex\r\n  @article{haq2023nlpashto,\r\n    title={NLPashto: NLP Toolkit for Low-resource Pashto Language},\r\n    author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang},\r\n    journal={International Journal of Advanced Computer Science and Applications},\r\n    issn={2156-5570},\r\n    volume={14},\r\n    number={6},\r\n    pages={1345-1352},\r\n    year={2023},\r\n    doi={https://dx.doi.org/10.14569/IJACSA.2023.01406142}\r\n  }\r\n  ```\r\n\r\n**[Correction of Whitespace and Word Segmentation in Noisy Pashto Text using CRF](https://doi.org/10.1016/j.specom.2023.102970)**\r\n\r\n_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, \"Correction of whitespace and word segmentation in noisy Pashto text using CRF,\" Speech Communication, vol. 153, p. 102970, 2023._\r\n\r\n- **BibTeX**\r\n\r\n  ```bibtex\r\n  @article{HAQ2023102970,\r\n    title={Correction of whitespace and word segmentation in noisy Pashto text using CRF},\r\n    journal={Speech Communication},\r\n    issn={1872-7182},\r\n    volume={153},\r\n    pages={102970},\r\n    year={2023},\r\n    doi={https://doi.org/10.1016/j.specom.2023.102970},\r\n    author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang}\r\n  }\r\n  ```\r\n\r\n**[POS Tagging of Low-resource Pashto Language: Annotated Corpus and Bert-based Model](https://doi.org/10.21203/rs.3.rs-2712906/v1)**\r\n\r\n_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, \"POS Tagging of Low-resource Pashto Language: Annotated Corpus and BERT-based Model,\" Preprint, 2023._\r\n\r\n- **BibTeX**\r\n\r\n  ```bibtex\r\n  @article{haq2023pashto,\r\n    title={POS Tagging of Low-resource Pashto Language: Annotated Corpus and Bert-based Model},\r\n    author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang},\r\n    journal={Preprint},\r\n    year={2023},\r\n    doi={https://doi.org/10.21203/rs.3.rs-2712906/v1}\r\n  }\r\n  ```\r\n\r\n**[Pashto Offensive Language Detection: A Benchmark Dataset and Monolingual Pashto BERT](https://doi.org/10.7717/peerj-cs.1617)**\r\n\r\n_H. Ijazul, Q. Weidong, G. Jie, and T. Peng, \"Pashto Offensive Language Detection: A Benchmark Dataset and Monolingual Pashto BERT,\" PeerJ Computer Science, vol. 9, p. e1617, 2023._\r\n\r\n- **BibTeX**\r\n\r\n  ```bibtex\r\n  @article{haq2023pold,\r\n    title={Pashto Offensive Language Detection: A Benchmark Dataset and Monolingual Pashto BERT},\r\n    author={Ijazul Haq and Weidong Qiu and Jie Guo and Peng Tang},\r\n    journal={PeerJ Computer Science},\r\n    issn={2376-5992},\r\n    volume={9},\r\n    pages={e1617},\r\n    year={2023},\r\n    doi={10.7717/peerj-cs.1617}\r\n  }\r\n  ```\r\n\r\n**[Social Media\u2019s Dark Secrets: A Propagation, Lexical and Psycholinguistic Ooriented Deep Learning Approach for Fake News Proliferation](https://doi.org/10.1016/j.eswa.2024.124650)**\r\n\r\n_K. Ahmed, M. A. Khan, I. Haq, A. Al Mazroa, M. Syam, N. Innab, et al., \"Social media\u2019s dark secrets: A propagation, lexical and psycholinguistic oriented deep learning approach for fake news proliferation,\" Expert Systems with Applications, vol. 255, p. 124650, 2024._\r\n\r\n- **BibTeX**\r\n\r\n  ```bibtex\r\n  @article{AHMED2024124650,\r\n    title={Social media\u2019s dark secrets: A propagation, lexical and psycholinguistic oriented deep learning approach for fake news proliferation},\r\n    author={Kanwal Ahmed and Muhammad Asghar Khan and Ijazul Haq and Alanoud Al Mazroa and Syam M.S. and Nisreen Innab and Masoud Alajmi and Hend Khalid Alkahtani},\r\n    journal={Expert Systems with Applications},\r\n    volume={255},\r\n    pages={124650},\r\n    year={2024},\r\n    issn={0957-4174},\r\n    doi={https://doi.org/10.1016/j.eswa.2024.124650}\r\n  }\r\n  ```\r\n  \r\n## Contact\r\n- Website: [https://ijaz.me/](https://ijaz.me/)\r\n- LinkedIn: [https://www.linkedin.com/in/drijazulhaq/](https://www.linkedin.com/in/drijazulhaq/)\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Pashto Natural Language Processing Toolkit",
    "version": "0.0.25",
    "project_urls": {
        "Homepage": "https://github.com/dr-ijaz/nlpashto"
    },
    "split_keywords": [
        "nlp",
        " pashto",
        " nlpashto",
        " pashto nlp",
        " nlp toolkit",
        " word segmentation",
        " tokenization",
        " pos tagging",
        " sentiment analysis",
        " text classification",
        " token classification",
        " sequence tagging",
        " text cleaning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cbb05781747c04ad5488b61cd494e4d32de0d118c42975a5fccc174adf22bd7d",
                "md5": "c62babcd048418b6efabe79e141aae4a",
                "sha256": "713ba9ced7d66a121ace0cc4de71ccdd08d781e77a833a65eaf167019bdcb021"
            },
            "downloads": -1,
            "filename": "nlpashto-0.0.25-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c62babcd048418b6efabe79e141aae4a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 11958,
            "upload_time": "2025-02-01T11:17:42",
            "upload_time_iso_8601": "2025-02-01T11:17:42.529372Z",
            "url": "https://files.pythonhosted.org/packages/cb/b0/5781747c04ad5488b61cd494e4d32de0d118c42975a5fccc174adf22bd7d/nlpashto-0.0.25-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a9e520a1c30bf170b94c72babe9cdc8414c2773cf311ed4cb39a125b911d734c",
                "md5": "6c452cc038eef50f9fcfa5d9f3f25ee4",
                "sha256": "16b6f33650145c6e1b435126a674618690f1868358ef8a5819a54729bba3ca38"
            },
            "downloads": -1,
            "filename": "nlpashto-0.0.25.tar.gz",
            "has_sig": false,
            "md5_digest": "6c452cc038eef50f9fcfa5d9f3f25ee4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 13005,
            "upload_time": "2025-02-01T11:17:44",
            "upload_time_iso_8601": "2025-02-01T11:17:44.438272Z",
            "url": "https://files.pythonhosted.org/packages/a9/e5/20a1c30bf170b94c72babe9cdc8414c2773cf311ed4cb39a125b911d734c/nlpashto-0.0.25.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-01 11:17:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dr-ijaz",
    "github_project": "nlpashto",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "nlpashto"
}

None