LughaatNLP


NameLughaatNLP JSON
Version 1.0.6 PyPI version JSON
download
home_pagehttps://github.com/MuhammadNoman76/LughaatNLP
SummaryA Python package for natural language processing tasks for the Urdu language, including normalization, part-of-speech (POS) tagging, named entity recognition (NER), stemming, lemmatization, tokenization, and stopword removal , text-to-speech , speech-to-text, summarization.
upload_time2024-05-19 06:31:13
maintainerNone
docs_urlNone
authorMuhammad Noman
requires_pythonNone
licenseMIT
keywords urdu nlp natural language processing text processing tokenization stemming lemmatization stopwords morphological-analysis nlp urdu lughaatnlp urdunlp urdunlp urduhack stanza natural-language-processing text-processing language-processing preprocessing stemming lemmatization tokenization stopwords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LughaatNLP

Introducing LughaatNLP, the all-in-one Urdu language toolkit designed for NLP tasks in Pakistan. It offers essential features like tokenization, lemmatization, stop word removal, POS tagging, NER, normalization, summarization, and even text-to-speech and speech-to-text capabilities, all crafted exclusively for Urdu. Unlock the power of Urdu NLP effortlessly with LughaatNLP!

<p align="center">
  <img src="https://i.imgur.com/6lKyQlo.png" alt="Alt Text" width="500" height="500">
</p>

## Documentation
Explore the full potential of LughaatNLP through its detailed [documentation](https://lughaatnlp.blogspot.com/2024/04/mastering-urdu-text-processing.html), which includes practical usage examples, installation guides, and API references.

## Google Colab Notebook
Get hands-on with LughaatNLP using the interactive [Google Colab Notebook](https://colab.research.google.com/drive/1lLaUBiFq61-B7GyQ0wdNg9FyLCraAcDU?usp=sharing) provided in the documentation. This notebook lets you experiment with the toolkit's functionalities directly in your browser.

## PyPI Package
Install LughaatNLP effortlessly via its [PyPI page](https://pypi.org/project/LughaatNLP) to integrate Urdu NLP capabilities into your Python projects seamlessly.

## YouTube Tutorial Series
Accelerate your understanding of LughaatNLP's features with the dedicated [YouTube tutorial playlist](https://www.youtube.com/playlist?list=PL4tcmUwDtJEIHZhAZ3XP9U6ZJzaS4RFbd), offering step-by-step guidance on utilizing the toolkit for various Urdu NLP tasks.

## Blogs and Articles
Gain insights into advanced techniques and best practices for mastering Urdu text processing through informative articles like:

 **Geeksforgeeks Link** :
- [LughaatNLP: A Powerful Urdu Language Preprocessing Library](https://www.geeksforgeeks.org/lughaatnlp-a-powerful-urdu-language-preprocessing-library/)


 **Medium Link** :
- [Introducing LughaatNLP: A Powerful Urdu Language Preprocessing Library](https://medium.com/@muhammadnomanshafiq76/introducing-lughaatnlp-a-powerful-urdu-language-preprocessing-library-488af74d3dde)

 **LughaatNLP Blog** :
- [Mastering Urdu Text Processing](https://lughaatnlp.blogspot.com/2024/04/mastering-urdu-text-processing.html)



## Features

- **Tokenization**: Breaks down Urdu text into individual tokens, considering the intricacies of Urdu script and language structure.
- **Lemmatization**: Converts inflected words into their base or dictionary form, aiding in text analysis and comprehension.
- **Stop Word Removal**: Eliminates common Urdu stop words to focus on meaningful content during text processing.
- **Normalization**: Standardizes text by removing diacritics, normalizing character variations, and handling common orthographic variations in Urdu.
- **Stemming**: Reduces words to their root form, improving text analysis and comprehension in Urdu.
- **Spell Checker**: Identifies and corrects misspelled words in Urdu text, enhancing text quality and readability.
- **Part of Speech Extraction**: Tags words with their grammatical categories, enabling advanced syntactic analysis.
- **Named Entity Recognition (NER)**: Identify and extract names of entities like persons, organizations, or locations.
- **Text Summarization**: Generates concise summaries of Urdu text, facilitating quick understanding of lengthy content.
- **Text-to-Speech Conversion**: Converts Urdu text into spoken audio, enabling accessibility and language learning applications.
- **Speech-to-Text Conversion**: Transcribes spoken Urdu into written text, facilitating voice-based interactions and data entry.

## Installation

You can install the `LughaatUrdu` library from PyPI using pip:

```bash
pip install lughaatNLP
```

Alternatively, you can manually install it by downloading and unzipping the provided `LughaatNLP.rar` file and installing the wheel file using pip:

```bash
pip install path_to_wheel_file/LughaatNLP-1.0.2-py3-none-any.whl
```

## Required Packages

The LughaatNLP library requires the following packages:

- `python-Levenshtein`
- `tensorflow`
- `numpy`
- `scikit-learn`
- `scipy`
- `gtts`
- `SpeechRecognition`
- `Pydub`


You can install these packages using pip:

```bash
pip install python-Levenshtein tensorflow numpy scikit-learn scipy gtts SpeechRecognition pydub
```

## Usage

After installing the library, you can import the necessary functions or classes in your Python script:

```python
#importing Pakages
from LughaatNLP import LughaatNLP 
from LughaatNLP import POS_urdu from LughaatNLP import NER_Urdu
         from LughaatNLP import TextSummarization
         from LughaatNLP import  UrduSpeech

# Instance Calling 
urdu_text_processing = LughaatNLP() 
ner_urdu = NER_Urdu()
pos_tagger = POS_urdu()
speech_urdu = TextSummarization()
speech_urdu = UrduSpeech()
```

## Functions

# Normalization

### 1. `normalize_characters(text)`

This function normalizes the Urdu characters in the given text by mapping incorrect Urdu characters to their correct forms. Sometimes, single Unicode characters representing Urdu may be written in multiple forms, and this function normalizes them accordingly.

**Example:**

```python
text = "آپ کیسے ہیں؟"
normalized_text = urdu_text_processing.normalize_characters(text)
print(normalized_text)  # Output: اپ کیسے ہیں؟
```

### 2. `normalize_combine_characters(text)`

This function simplifies Urdu characters by combining certain combinations into their correct single forms. In Urdu writing, some characters are made up of multiple parts like ligatures or diacritics. This function finds these combinations in the text and changes them to their single character forms. It ensures consistency and accuracy in how Urdu text is represented.

**Example:**

```python
text = "اُردو"
normalized_text = urdu_text_processing.normalize_combine_characters(text)
print(normalized_text)  # Output: اُردو
```

### 3. `normalize(text)`

This function performs all-in-one normalization on the Urdu text, including character normalization, diacritic removal, punctuation handling, digit conversion, and special character preservation.

**Example:**

```python
text = "آپ کیسے ہیں؟ میں ۲۳ سال کا ہوں۔"
normalized_text = urdu_text_processing.normalize(text)
print("Normalize all at once together of Urdu: ", normalized_text)  # Output: اپ کیسے ہیں ؟ میں 23 سال کا ہوں ۔
```

### 4. `remove_diacritics(text)`

This function removes diacritics (zabar, zer, pesh) from the Urdu text.

**Example:**

```python
text = "کِتَاب"
diacritics_removed = urdu_text_processing.remove_diacritics(text)
print("Remove all Diacritic (Zabar - Zer - Pesh): ", diacritics_removed)  # Output: کتاب
```

### 5. `punctuations_space(text)`

This function remove spaces after punctuations (excluding numbers) and removes spaces before punctuations in the Urdu text.

**Example:**

```python
text = "کیا آپ کھانا کھانا چاہتے ہیں ؟ میں کھانا کھاؤں گا  ۔"
punctuated_text = urdu_text_processing.punctuations_space(text)
print(punctuated_text)  # Output: کیا آپ کھانا کھانا چاہتے ہیں؟ میں کھانا کھاؤں گا۔
```

### 6. `replace_digits(text)`

This function replaces English digits with Urdu digits.

**Example:**

```python
text = "میں 23 سال کا ہوں۔"
english_digits = urdu_text_processing.replace_digits(text)
print("Replace All maths numbers with Urdu number eg(2 1 3 1 -> ۲ ۱ ۳ ۱): ", english_digits)  # Output: میں ۲۳ سال کا ہوں۔
```

### 7. `remove_numbers_urdu(text)`

This function removes Urdu numbers from the Urdu text.

**Example:**

```python
text = "میں  22 ۲۳ سال کا ہوں۔"
no_urdu_numbers = urdu_text_processing.remove_numbers_urdu(text)
print("Remove Urdu numbers from text: ", no_urdu_numbers)  # Output: میں 22 سال کا ہوں۔
```

### 8. `remove_numbers_english(text)`

This function removes English numbers from the Urdu text.

**Example:**

```python
text = "میں ۲۳ 23 سال کا ہوں۔"
no_english_numbers = urdu_text_processing.remove_numbers_english(text)
print("Remove English numbers from text: ", no_english_numbers)  # Output: میں ۲۳ سال کا ہوں۔
```

### 9. `remove_whitespace(text)`

This function removes extra whitespaces from the Urdu text.

**Example:**

```python
text = "میں   گھر   جا   رہا   ہوں۔"
cleaned_text = urdu_text_processing.remove_whitespace(text)
print("Remove All extra space between words", cleaned_text)  # Output: میں گھر جا رہا ہوں۔
```

### 10. `preserve_special_characters(text)`

This function adds spaces around special characters in the Urdu text to facilitate tokenization.

**Example:**

```python
text = "میں@پاکستان_سے_ہوں۔"
preserved_text = urdu_text_processing.preserve_special_characters(text)
print("make a space between every special character and word so tokenize easily", preserved_text)  # Output: میں @ پاکستان _ سے _ ہوں ۔
```

### 11. `remove_numbers(text)`

This function removes both Urdu and English numbers from the Urdu text.

**Example:**

```python
text = "میں ۲۳ سال کا ہوں اور میری عمر 23 ہے۔"
number_removed = urdu_text_processing.remove_numbers(text)
print("Remove All numbers whether they are Urdu or English: ", number_removed)  # Output: میں سال کا ہوں اور میری عمر ہے۔
```

### 12. `remove_english(text)`

This function removes English characters from the Urdu text.

**Example:**

```python
text = "I am learning Urdu."
urdu_only = urdu_text_processing.remove_english(text)
print("Remove All English characters from text: ", urdu_only)  # Output:  ام لرننگ اردو
```

### 13. `pure_urdu(text)`

This function removes all non-Urdu characters and numbers from the text, leaving only Urdu characters and special characters used in Urdu.

**Example:**

```python
text = "I ?  # & am learning Urdu. میں اردو سیکھ رہا ہوں۔ 123"
pure_urdu_text = urdu_text_processing.pure_urdu(text)
print(pure_urdu_text)  # Output: میں اردو سیکھ رہا ہوں۔
```

### 14. `just_urdu(text)`

This function removes all non-Urdu characters, numbers, and special characters, just leaving only pure Urdu text even not special character used in urdu.

**Example:**

```python
text = "I am learning Urdu. میں اردو سیکھ رہا ہوں۔ 123"
just_urdu_text = urdu_text_processing.just_urdu(text)
print(just_urdu_text)  # Output: میں اردو سیکھ رہا ہوں 
```

### 15. `remove_urls(text)`

This function removes URLs from the Urdu text.

**Example:**

```python
text = "میں https://www.example.com پر گیا۔"
no_urls = urdu_text_processing.remove_urls(text)
print("Remove All URLs", no_urls)  # Output: میں  پر گیا۔
```

### 16. `remove_special_characters(text)`

This function removes all special characters from the Urdu text.

**Example:**

```python
text = "میں@پاکستان_سے_ہوں۔"
no_special_chars = urdu_text_processing.remove_special_characters(text)
print("Remove All Special characters", no_special_chars)  # Output: میں پاکستان سے ہوں
```

### 17. `remove_special_characters_exceptUrdu(text)`

This function removes all special characters from the Urdu text, except for those commonly used in the Urdu language (e.g., ؟, ۔, ،).

**Example:**

```python
text = "میں@پاکستان??_سے_ہوں؟"
urdu_special_chars = urdu_text_processing.remove_special_characters_exceptUrdu(text)
print("Remove All Special characters except those which are used in Urdu language eg( ؟ ۔ ، ): ", urdu_special_chars)  # Output: میں پاکستان سے ہوں؟
```
# Stop Words Removing

### 1. `remove_stopwords(text)`

This function removes stopwords from the Urdu text.

**Example:**

```python
text = "میں اس کتاب کو پڑھنا چاہتا ہوں۔"
filtered_text = urdu_text_processing.remove_stopwords(text)
print("Remove Stop words:", filtered_text)  # Output: کتاب پڑھنا چاہتا ہوں۔
```

# Tokenization

Tokenization involves breaking down Urdu text into individual tokens, considering the intricacies of Urdu script and language structure.

### 1. `urdu_tokenize(text)`

This function tokenizes the Urdu text into individual tokens (words, numbers, and punctuations).

**Example:**

```python
text = "میں پاکستان سے ہوں۔"
tokens = urdu_text_processing.urdu_tokenize(text)
print("Tokenization for Urdu language:", tokens)  # Output: ['میں', 'پاکستان', 'سے', 'ہوں۔']
```
# Lemmatization and Stemming

Lemmatization involves converting inflected words into their base or dictionary form, while stemming reduces words to their root form.

### 1. `lemmatize_sentence(sentence)`

This function performs lemmatization on the Urdu sentence, replacing words with their base or dictionary form.

**Example:**

```python
sentence = "میں کتابیں پڑھتا ہوں۔"
lemmatized_sentence = urdu_text_processing.lemmatize_sentence(sentence)
print("lemmatize the words ", lemmatized_sentence)  # Output: میں کتاب پڑھنا ہوں۔
```

### 2. `urdu_stemmer(sentence)`

This function performs stemming on the Urdu sentence, reducing words to their root or stem form.

**Example:**

```python
sentence = "میں کتابیں پڑھتا ہوں۔"
stemmed_sentence = urdu_text_processing.urdu_stemmer(sentence)
print("Urdu Stemming ", stemmed_sentence)  # Output: میں کتاب پڑھ ہوں۔
```
# Spell Checker

Spell checking involves identifying and correcting misspelled words in Urdu text.

### 1. `corrected_sentence_spelling(input_word, threshold)`

This function takes an input sentence and a similarity threshold as arguments and returns the corrected sentence with potentially misspelled words replaced by the most similar words from the vocabulary.

**Example:**

```python
spell_checker = LughaatNLP()
sentence = 'سسب سےا بڑاا ملکا ہے'
corrected_sentence = spell_checker.corrected_sentence_spelling(sentence, 60)
print("This correct spelling of sentence itself", corrected_sentence)
```

### 2. `most_similar_word(input_word, threshold)`

This function takes an input word and a similarity threshold as arguments and returns the most similar word from the vocabulary based on the Levenshtein distance.

**Example:**

```python
spell_checker = LughaatNLP()
word = 'پاکستاان'
most_similar = spell_checker.most_similar_word(word, 70)
print("This will return the most similar single word in string", most_similar)
```

### 3. `get_similar_words_percentage(input_word, threshold)`

This function takes an input word and a similarity threshold as arguments and returns a list of tuples containing similar words and their corresponding similarity percentages.

**Example:**

```python
spell_checker = LughaatNLP()
word = 'پاکستاان'
similar_words_with_percentage = spell_checker.get_similar_words_percentage(word, 70)
print("This will return the most similar words in list with percentage", similar_words_with_percentage)
```

### 4. `get_similar_words(input_word, threshold)`

This function takes an input word and a similarity threshold as arguments and returns a list of similar words from the vocabulary based on the Levenshtein distance.

**Example:**

```python
spell_checker = LughaatNLP()
word = 'پاکستاان'
similar_words = spell_checker.get_similar_words(word, 70)
print("This will return the most similar words in list only you can access word using index", similar_words)
```

These functions leverage the Levenshtein distance algorithm to calculate the similarity between the input word or sentence and the words in the vocabulary. The `threshold` parameter is used to filter out words with a similarity percentage below the specified threshold.

Note: These examples assume that you have an instance of the `UrduTextNormalizer` class named `spell_checker` and have imported the `Levenshtein` module for calculating the edit distance.

# Part of Speech

The `pos_tags_urdu` function is used for part-of-speech tagging in Urdu text. It takes an Urdu sentence as input and returns a list of dictionaries where each word is paired with its assigned part-of-speech tag, such as nouns (`NN`), verbs (`VB`), adjectives (`ADJ`), etc.

### 1. `pos_tagger.pos_tags_urdu (sentence)`

The example output demonstrates how words like "میرے" (`G` for postposition) and "تعلیم" (`NN` for noun) are tagged based on their grammatical roles within the sentence.

**Example:**

```python
from LughaatNLP import POS_urdu

pos_tagger = POS_urdu()

sentence = "میرے والدین نے میری تعلیم اور تربیت میں بہت محنت کی تاکہ میں اپنی زندگی میں کامیاب ہو سکوں۔"

pos_tagger = POS_urdu()
predicted_pos_tags = pos_tagger.pos_tags_urdu (sentence)

print(predicted_pos_tags)


```
## output 
```python
# output => [{'Word': 'میرے', 'POS_Tag': 'G'}, {'Word': 'والدین', 'POS_Tag': 'NN'},{'Word': 'نے', 'POS_Tag': 'P'},{'Word': 'میری', 'POS_Tag': 'G'},{'Word': 'تعلیم', 'POS_Tag': 'NN'},{'Word': 'اور', 'POS_Tag': 'CC'},{'Word': 'تربیت', 'POS_Tag': 'NN'},{'Word': 'میں', 'POS_Tag': 'P'},{'Word': 'بہت', 'POS_Tag': 'ADV'},{'Word': 'محنت', 'POS_Tag': 'NN'},{'Word': 'کی', 'POS_Tag': 'VB'},{'Word': 'تاکہ', 'POS_Tag': 'SC'},{'Word': 'میں', 'POS_Tag': 'P'},{'Word': 'اپنی', 'POS_Tag': 'GR'},{'Word': 'زندگی', 'POS_Tag': 'NN'},{'Word': 'میں', 'POS_Tag': 'P'},{'Word': 'کامیاب', 'POS_Tag': 'ADJ'},{'Word': 'ہو', 'POS_Tag': 'VB'},{'Word': 'سکوں', 'POS_Tag': 'NN'},{'Word': '۔', 'POS_Tag': 'SM'}]
```
# Name Entity Relationships

The `ner_tags_urdu` function performs named entity recognition on Urdu text, assigning named entity tags (such as `U-LOCATION` for locations) to identified entities in the input sentence. It outputs a dictionary where words are mapped to their corresponding named entity tags, facilitating tasks like information extraction and text analysis specific to Urdu language. 

### 1. `ner_urdu.ner_tags_urdu (sentence)`

The example output illustrates how entities like "پاکستان" are recognized as locations (U-LOCATION) within the provided sentence.

**Example:**

```python
from LughaatNLP import NER_Urdu

ner_urdu = NER_Urdu()

sentence = "اس کتاب میں پاکستان کی تاریخ بیان کی گئی ہے۔"

word_tag_dict= ner_urdu.ner_tags_urdu (sentence)

print(word_tag_dict)


```
## output 
```python
# output => {'اس': 'O', 'کتاب': 'O', 'میں': 'O', 'پاکستان': 'U-LOCATION', 'کی': 'O', 'تاریخ': 'O', 'بیان': 'O', 'گئی': 'O', 'ہے': 'O', '۔': 'O'}
```

# Text Summarization

The `TextSummarization` class offers automatic summarization of Urdu text by leveraging natural language processing techniques. It uses graph-based representation and scoring methods like BM25 and TF-IDF to identify and select the most important sentences for summarization. This class is designed to generate concise and informative summaries from unstructured Urdu text, making it valuable for applications requiring efficient text summarization capabilities.

### 1. `summarize(text, ratio)`

This Python program demonstrates the use of the `TextSummarization` class from the `LughaatNLP` module to summarize a given Urdu text based on a specified ratio. The input text is a news article discussing the integration of digital technology into governance and electoral processes in China. The program initializes a `TextSummarization` instance, sets a summary ratio of 1% (adjustable to other percentages), and then generates a summary using the `summarize` method. 

**Example:**

```python
from LughaatNLP import TextSummarization


text = '''
بھی حال ہی میں تیسرے انٹرنیشنل ڈیموکریسی فورم کا انعقاد بیجنگ میں کیا گیا ، جس میں 70 سے زائد ممالک اور خطوں کے تقریباً 300 عہدیداروں اور ماہرین نے اس بات پر تبادلہ خیال کیا کہ جمہوریت کو جدید ڈیجیٹل دور کی ضروریات اور گلوبل ساؤتھ کی مضبوطی کے مطابق کیسے ہونا چاہئے۔
وسیع تناظر میں آج، ڈیجیٹل ٹیکنالوجی چین کی جمہوری گورننس کے ساتھ گہرائی سے جڑی ہوئی ہے۔ اس وقت ووٹنگ کے دوران ڈیجیٹل ٹیکنالوجی کا استعمال، حکومتی پالیسیوں کو سمجھنے، مشاورت میں حصہ لینے اور حکومتی سرگرمیوں کی نگرانی جیسے عمل کو آسان بنا رہا ہے.چین کے سرکاری انتخابی طریقہ کار میں ڈیجیٹل ٹیکنالوجی کا انضمام ووٹر رجسٹریشن، امیدواروں کی معلومات کی فراہمی کے ساتھ ساتھ ووٹنگ اور گنتی کے عمل جیسے کاموں کو آسان بنارہا ہے۔
یہ انضمام حکومتی اداروں، رائے دہندگان اور امیدواروں کے درمیان تعامل کو بھی بڑھاتا ہے، حتمی طور پر اس بات کو یقینی بناتا ہے کہ جمہوری انتخابات عوام کی ضروریات کے مطابق زیادہ جوابدہ ہوں۔
مزید برآں، ڈیجیٹل ٹیکنالوجی حکومت کو عوامی تشویش کے مسائل کو حل کرنے کے لئے کاروباری اداروں، ماہرین اور سول مندوبین کے درمیان آن لائن تبادلہ خیال کا اہتمام کرنے کے قابل بنا رہی ہے، جس سے مشاورت زیادہ موثر ہوتی جا رہی ہے.ڈیجیٹل ٹیک فیصلہ سازی اور انتظام کے لئے چینلز کو بھی وسعت دیتی ہے۔
لوگ حکومت کی جانب سے شروع کیے گئے مختلف آن لائن پلیٹ فارمز جیسے ڈیجیٹل رائے عامہ کے چینلز اور آن لائن سروے کے ذریعے اپنی رائے کا اظہار کرسکتے ہیں اور مشاورت میں مشغول ہوسکتے ہیں۔ نتیجتاً ، سوشل گورننس کے لئے فیڈ بیک میکانزم نے درجہ بندی کی سطحوں کو کم کردیا ہے ، جس سے ردعمل اور کارکردگی میں تیزی سے اضافہ ہوا ہے۔
یہاں اس حقیقت کو بھی تسلیم کرنا پڑے گا کہ، ڈیجیٹل ٹیکنالوجی کا اثر محض کارکردگی کے فوائد سے کہیں زیادہ ہے. یہ سوشل میڈیا پلیٹ فارمز اور آن لائن رپورٹنگ سسٹم کی بدولت سرکاری کارروائیوں میں شفافیت اور احتساب کو فروغ دے رہی ہے۔ عوامی جانچ پڑتال کی نگرانی میں انتظامی عمل کا معائنہ بھی شامل ہے ، جو اس بات کو یقینی بناتا ہے کہ سرکاری سرگرمیاں اخلاقی معیارات پر مکمل عمل پیرا ہوں اور عوام کو جوابدہ ہوں۔
یوں چین میں ، ڈیجیٹل ٹیکنالوجی صرف ایک آلہ نہیں ہے۔ یہ ترقی کے لئے ایک محرک ہے جو شہریوں کو جمہوری عمل کو تشکیل دینے کے لئے بااختیار بناتا ہے. جدید پلیٹ فارمز اور شفاف حکمرانی کے طریقوں نے لوگوں کے لئے اپنے جمہوری حقوق کو براہ راست استعمال کرنے کے لئے زیادہ آسان اور موثر بنا دیا ہے۔
ڈیجیٹل ٹیکنالوجی سے ہٹ کر اگر تیسرے انٹرنیشنل ڈیموکریسی فورم کی مزید بات کی جائے تو فورم کے شرکاء نے جہاں تکنیکی ترقی کی وکالت کی وہاں یہ بھی واضح کیا کہ کوئی ایک طرز جمہوریت یا جمہوری ماڈل واحد انتخاب نہیں ہونا چاہئے۔مغربی ریاستوں کو ایسی سیاسی شکلوں کو قبول کرنا چاہیے جو ان کی اپنی سیاسی شکلوں سے مختلف ہیں۔ساتھ ساتھ یہ ضروری ہے کہ دنیا کو ایک ایسی کمیونٹی کے نقطہ نظر سے دیکھا جائے جس کی انسانیت کے لئے مشترکہ تقدیر ہو۔
اس ضمن میں چین کی جانب سے تجویز کردہ بین الاقوامی اقدامات بشمول بیلٹ اینڈ روڈ انیشی ایٹو، گلوبل ڈیولپمنٹ انیشی ایٹو، گلوبل سیکیورٹی انیشی ایٹو اور گلوبل سولائزیشن انیشی ایٹو مسابقت اور بالادستی کے نقطہ نظر سے یکسر جداگانہ نقطہ نظر فراہم کرتے ہیں اور عالمی چیلنجز کا چینی دانش کے تحت موئثر حل پیش کرتے ہیں۔

'''

summarizer = TextSummarization()
ratio = 0.01  # Summary ratio (e.g., keep 1% of the sentences or adjust according to your preference: 10%, 20%, 40%, etc. (1% to 100%))
summary = summarizer.summarize(text, ratio)
print(summary)
```
## output 
```python
#output =>
گلوبل سیکیورٹی انیشی ایٹو اور گلوبل سولائزیشن انیشی ایٹو مسابقت اور بالادستی کے نقطہ نظر سے یکسر جداگانہ نقطہ نظر فراہم کرتے ہیں اور عالمی چیلنجز کا چینی دانش کے تحت موئثر حل پیش کرتے ہیں ۔
```

### 2. `summarize_unstructured_text(text, ratio)`

The `summarize_unstructured_text` function for the `Text without the sentence Structure` in the `TextSummarizer` class is designed to summarize messy or unorganized Urdu text by automatically breaking it into understandable sentences. This function helps users quickly get a condensed summary of important information from raw text that may not follow typical sentence structures. It's useful for efficiently analyzing and extracting key details from various forms of unstructured Urdu text.

**Example:**

```python
from LughaatNLP import TextSummarization


# Example usage

# Text without the sentence Structure

text = "اس ضمن میں چین کی جانب سے تجویز کردہ بین الاقوامی اقدامات بشمول بیلٹ اینڈ روڈ انیشی ایٹو گلوبل ڈیولپمنٹ انیشی ایٹو گلوبل سیکیورٹی انیشی ایٹو اور گلوبل سولائزیشن انیشی ایٹو مسابقت اور بالادستی کے نقطہ نظر سے یکسر جداگانہ نقطہ نظر فراہم کرتے ہیں اور عالمی چیلنجز کا چینی دانش کے تحت موئثر حل پیش کرتے ہیں اس وقت ووٹنگ کے دوران ڈیجیٹل ٹیکنالوجی کا استعمال حکومتی پالیسیوں کو سمجھنے مشاورت میں حصہ لینے اور حکومتی سرگرمیوں کی نگرانی جیسے عمل کو آسان بنا رہا ہےچین کے سرکاری انتخابی طریقہ کار میں ڈیجیٹل ٹیکنالوجی کا انضمام ووٹر رجسٹریشن امیدواروں کی معلومات کی فراہمی کے ساتھ ساتھ ووٹنگ اور گنتی کے عمل جیسے کاموں کو آسان بنارہا ہے مزید برآں ڈیجیٹل ٹیکنالوجی حکومت کو عوامی تشویش کے مسائل کو حل کرنے کے لئے کاروباری اداروں ماہرین اور سول مندوبین کے درمیان آن لائن تبادلہ خیال کا اہتمام کرنے کے قابل بنا رہی ہے جس سے مشاورت زیادہ موثر ہوتی جا رہی ہےڈیجیٹل ٹیک فیصلہ سازی اور انتظام کے لئے چینلز کو بھی وسعت دیتی ہے بھی حال ہی میں تیسرے انٹرنیشنل ڈیموکریسی فورم کا انعقاد بیجنگ میں کیا گیا جس میں 70 سے زائد ممالک اور خطوں کے تقریباً 300 عہدیداروں اور ماہرین نے اس بات پر تبادلہ خیال کیا کہ جمہوریت کو جدید ڈیجیٹل دور کی ضروریات اور گلوبل ساؤتھ کی مضبوطی کے مطابق کیسے ہونا چاہئے یہ ترقی کے لئے ایک محرک ہے جو شہریوں کو جمہوری عمل کو تشکیل دینے کے لئے بااختیار بناتا ہے جدید پلیٹ فارمز اور شفاف حکمرانی کے طریقوں نے لوگوں کے لئے اپنے جمہوری حقوق کو براہ راست استعمال کرنے کے لئے زیادہ آسان اور موثر بنا دیا ہے ڈیجیٹل ٹیکنالوجی سے ہٹ کر اگر تیسرے انٹرنیشنل ڈیموکریسی فورم کی مزید بات کی جائے تو فورم کے شرکاء نے جہاں تکنیکی ترقی کی وکالت کی وہاں یہ بھی واضح کیا کہ کوئی ایک طرز جمہوریت یا جمہوری ماڈل واحد انتخاب نہیں ہونا چاہئے یہاں اس حقیقت کو بھی تسلیم کرنا پڑے گا کہ ڈیجیٹل ٹیکنالوجی کا اثر محض کارکردگی کے فوائد سے کہیں زیادہ ہے یہ سوشل میڈیا پلیٹ فارمز اور آن لائن رپورٹنگ سسٹم کی بدولت سرکاری کارروائیوں میں شفافیت اور احتساب کو فروغ دے رہی ہے لوگ حکومت کی جانب سے شروع کیے گئے مختلف آن لائن پلیٹ فارمز جیسے ڈیجیٹل رائے عامہ کے چینلز اور آن لائن سروے کے ذریعے اپنی رائے کا اظہار کرسکتے ہیں اور مشاورت میں مشغول ہوسکتے ہیں عوامی جانچ پڑتال کی نگرانی میں انتظامی عمل کا معائنہ بھی شامل ہے جو اس بات کو یقینی بناتا ہے کہ سرکاری سرگرمیاں اخلاقی معیارات پر مکمل عمل پیرا ہوں اور عوام کو جوابدہ ہوں یہ انضمام حکومتی اداروں رائے دہندگان اور امیدواروں کے درمیان تعامل کو بھی بڑھاتا ہے حتمی طور پر اس بات کو یقینی بناتا ہے کہ جمہوری انتخابات عوام کی ضروریات کے مطابق زیادہ جوابدہ ہوں نتیجتاً سوشل گورننس کے لئے فیڈ بیک میکانزم نے درجہ بندی کی سطحوں کو کم کردیا ہے جس سے ردعمل اور کارکردگی میں تیزی سے اضافہ ہوا ہے مغربی ریاستوں کو ایسی سیاسی شکلوں کو قبول کرنا چاہیے جو ان کی اپنی سیاسی شکلوں سے مختلف ہیں ساتھ ساتھ یہ ضروری ہے کہ دنیا کو ایک ایسی کمیونٹی کے نقطہ نظر سے دیکھا جائے جس کی انسانیت کے لئے مشترکہ تقدیر ہو وسیع تناظر میں آج ڈیجیٹل ٹیکنالوجی چین کی جمہوری گورننس کے ساتھ گہرائی سے جڑی ہوئی ہے"


summarizer = TextSummarization()
ratio = 0.01  # Summary ratio (e.g., keep 1% of the sentences or adjust according to your preference: 10%, 20%, 40%, etc. (1% to 100%))
summary = summarizer. summarize_unstructured_text(text, ratio)
print(summary)
```
## output 
```python

# output => جانب سے تجویز کردہ بین الاقوامی اقدامات بشمول بیلٹ اینڈ روڈ انیشی ایٹو گلوبل ڈیولپمنٹ انیشی ایٹو گلوبل سیکیورٹی انیشی ایٹو اور گلوبل سولائزیشن انیشی ایٹو مسابقت اور بالادستی کے نقطہ نظر سے یکسر جداگانہ نقطہ نظر فراہم کرتے ہیں۔
```
### Text-to-Speech & Speech-to-Text

The `UrduSpeechConverter` class provides functionality to convert Urdu text to speech and vice versa using the Google Text-to-Speech (gTTS) API and SpeechRecognition library. This library supports converting Urdu text to audio files (MP3, WAV, or OGG formats) and recognizing Urdu speech from audio files (MP3 or WAV formats). Before using this library, ensure that `ffmpeg` is installed on your computer, as it is required for audio file manipulation.

## Installation Requirements

Before using the `UrduSpeechConverter` library, ensure you have `ffmpeg` installed on your computer and that it is accessible via the command line. `ffmpeg` is essential for converting audio files to different formats, which is necessary for speech recognition tasks. If `ffmpeg` is not installed, please install it and set the environment variable path accordingly.

### 1. `text_to_speech(text, output_file_name, format)`

This function converts Urdu text to speech and saves the generated speech as an audio file in the specified format.

**Parameters:**
- `text`: The Urdu text to convert to speech.
- `output_file_folder`: The of the folder where you want to save audio.
- `output_file_name`: The name of the output audio file (without file extension).
- `format`: The format for the output audio file (`'wav'`, `'mp3'`, or `'ogg'`).

**Example Usage  1:**
```python
from LughaatNLP import  UrduSpeech

converter = UrduSpeech()

text = 'جانب سے تجویز کردہ بین الاقوامی اقدامات بشمول بیلٹ اینڈ روڈ انیشی ایٹو، گلوبل ڈیولپمنٹ انیشی ایٹو، گلوبل سیکیورٹی انیشی ایٹو اور گلوبل سولائزیشن انیشی ایٹو مسابقت اور بالادستی کے نقطہ نظر سے یکسر جداگانہ نقطہ نظر فراہم کرتے ہیں'

converter.text_to_speech(text, output_file_name='output', format='wav')
converter.text_to_speech(text, output_file_name='output', format='mp3')
converter.text_to_speech(text, output_file_name='output', format='ogg')
```

## Output 
```python
# output =>
# (save on the same directory) or 
Speech saved to output.wav
Speech saved to output.mp3
Speech saved to output.ogg
```
You can also specify an output folder path:

**Example Usage  2:**
```python
output_folder = r'C:\Users\YourName\Desktop\Testing\your-path'

converter.text_to_speech(text, output_folder, output_file_name='output', format='wav')
converter.text_to_speech(text, output_folder, output_file_name='output', format='mp3')
converter.text_to_speech(text, output_folder, output_file_name='output', format='ogg')
```

## Output 
```python
# output =>
Speech saved to CC:\Users\YourName\Desktop\Testing\your-path\output.wav
Speech saved to CC:\Users\YourName\Desktop\Testing\your-path\output.mp3
Speech saved to CC:\Users\YourName\Desktop\Testing\your-path\output.ogg
```

### 2. `speech_to_text(audio_file, format='mp3')`

This function converts Urdu speech to text and print the generated text from audio file in the specified format.

**Parameters:**
- `audio_file`: Give audio file path.
- `format`: The format for the output audio file (`'wav'`, `'mp3'`, or `'ogg'`).

**Example Usage  1:**
```python
from LughaatNLP import  UrduSpeech

converter = UrduSpeech()

audio_file = "output.mp3"
text = converter.speech_to_text(audio_file, format='mp3')
print(text)
```

## Output 
```python
# output =>
جانب سے تجویز کردہ بین الاقوامی اقدامات بشمول بیلٹن روڈ انیشی ایٹو گلوبل ڈیولپمنٹ انیشی ایٹو گلوبل سکیورٹی انیشی ایٹو اور گلوبل سولائیزیشن انیشی ایٹو مساوت اور بالادستی کے نقطہ نظر سے یکسر جداگانہ نقطہ نظر فراہم کرتے ہیں
```

##	Future Work

The future roadmap for LughaatNLP includes the
following features: - Urdu language translator - Urdu chatbot models - Text-to-speech and speech-to-text capabilities for Urdu - Urdu text summarization
To implement these features, resources such as servers and GPU for training are required. Therefore, Muhammad Noman is collecting funds to support the development and maintenance of this library.

## Contributing

This library was created by Muhammad Noman, a student at Iqra University. You can reach him via email at muhammadnomanshafiq76@gmail.com or connect with him on [LinkedIn](https://www.linkedin.com/in/muhammad-noman76/).

If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request on the [GitHub repository](https://github.com/MuhammadNoman76/LughaatNLP).

## License

This project is licensed under the [MIT License](https://drive.google.com/file/d/1YcxoYrL3atF7U7vYKK_KFiKvDF-YiywT/view?usp=drive_link).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/MuhammadNoman76/LughaatNLP",
    "name": "LughaatNLP",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "urdu nlp natural language processing text processing tokenization stemming lemmatization stopwords morphological-analysis nlp urdu LughaatNLP urdunlp UrduNLP urduhack stanza natural-language-processing text-processing language-processing preprocessing stemming lemmatization tokenization stopwords",
    "author": "Muhammad Noman",
    "author_email": "muhammadnomanshafiq76@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/41/1d/a26dcc1e065b2e721ef008d7c1e02d98536b21e335ac95a7e7b97c9952ab/LughaatNLP-1.0.6.tar.gz",
    "platform": null,
    "description": "# LughaatNLP\r\n\r\nIntroducing LughaatNLP, the all-in-one Urdu language toolkit designed for NLP tasks in Pakistan. It offers essential features like tokenization, lemmatization, stop word removal, POS tagging, NER, normalization, summarization, and even text-to-speech and speech-to-text capabilities, all crafted exclusively for Urdu. Unlock the power of Urdu NLP effortlessly with LughaatNLP!\r\n\r\n<p align=\"center\">\r\n  <img src=\"https://i.imgur.com/6lKyQlo.png\" alt=\"Alt Text\" width=\"500\" height=\"500\">\r\n</p>\r\n\r\n## Documentation\r\nExplore the full potential of LughaatNLP through its detailed [documentation](https://lughaatnlp.blogspot.com/2024/04/mastering-urdu-text-processing.html), which includes practical usage examples, installation guides, and API references.\r\n\r\n## Google Colab Notebook\r\nGet hands-on with LughaatNLP using the interactive [Google Colab Notebook](https://colab.research.google.com/drive/1lLaUBiFq61-B7GyQ0wdNg9FyLCraAcDU?usp=sharing) provided in the documentation. This notebook lets you experiment with the toolkit's functionalities directly in your browser.\r\n\r\n## PyPI Package\r\nInstall LughaatNLP effortlessly via its [PyPI page](https://pypi.org/project/LughaatNLP) to integrate Urdu NLP capabilities into your Python projects seamlessly.\r\n\r\n## YouTube Tutorial Series\r\nAccelerate your understanding of LughaatNLP's features with the dedicated [YouTube tutorial playlist](https://www.youtube.com/playlist?list=PL4tcmUwDtJEIHZhAZ3XP9U6ZJzaS4RFbd), offering step-by-step guidance on utilizing the toolkit for various Urdu NLP tasks.\r\n\r\n## Blogs and Articles\r\nGain insights into advanced techniques and best practices for mastering Urdu text processing through informative articles like:\r\n\r\n **Geeksforgeeks Link** :\r\n- [LughaatNLP: A Powerful Urdu Language Preprocessing Library](https://www.geeksforgeeks.org/lughaatnlp-a-powerful-urdu-language-preprocessing-library/)\r\n\r\n\r\n **Medium Link** :\r\n- [Introducing LughaatNLP: A Powerful Urdu Language Preprocessing Library](https://medium.com/@muhammadnomanshafiq76/introducing-lughaatnlp-a-powerful-urdu-language-preprocessing-library-488af74d3dde)\r\n\r\n **LughaatNLP Blog** :\r\n- [Mastering Urdu Text Processing](https://lughaatnlp.blogspot.com/2024/04/mastering-urdu-text-processing.html)\r\n\r\n\r\n\r\n## Features\r\n\r\n- **Tokenization**: Breaks down Urdu text into individual tokens, considering the intricacies of Urdu script and language structure.\r\n- **Lemmatization**: Converts inflected words into their base or dictionary form, aiding in text analysis and comprehension.\r\n- **Stop Word Removal**: Eliminates common Urdu stop words to focus on meaningful content during text processing.\r\n- **Normalization**: Standardizes text by removing diacritics, normalizing character variations, and handling common orthographic variations in Urdu.\r\n- **Stemming**: Reduces words to their root form, improving text analysis and comprehension in Urdu.\r\n- **Spell Checker**: Identifies and corrects misspelled words in Urdu text, enhancing text quality and readability.\r\n- **Part of Speech Extraction**: Tags words with their grammatical categories, enabling advanced syntactic analysis.\r\n- **Named Entity Recognition (NER)**: Identify and extract names of entities like persons, organizations, or locations.\r\n- **Text Summarization**: Generates concise summaries of Urdu text, facilitating quick understanding of lengthy content.\r\n- **Text-to-Speech Conversion**: Converts Urdu text into spoken audio, enabling accessibility and language learning applications.\r\n- **Speech-to-Text Conversion**: Transcribes spoken Urdu into written text, facilitating voice-based interactions and data entry.\r\n\r\n## Installation\r\n\r\nYou can install the `LughaatUrdu` library from PyPI using pip:\r\n\r\n```bash\r\npip install lughaatNLP\r\n```\r\n\r\nAlternatively, you can manually install it by downloading and unzipping the provided `LughaatNLP.rar` file and installing the wheel file using pip:\r\n\r\n```bash\r\npip install path_to_wheel_file/LughaatNLP-1.0.2-py3-none-any.whl\r\n```\r\n\r\n## Required Packages\r\n\r\nThe LughaatNLP library requires the following packages:\r\n\r\n- `python-Levenshtein`\r\n- `tensorflow`\r\n- `numpy`\r\n- `scikit-learn`\r\n- `scipy`\r\n- `gtts`\r\n- `SpeechRecognition`\r\n- `Pydub`\r\n\r\n\r\nYou can install these packages using pip:\r\n\r\n```bash\r\npip install python-Levenshtein tensorflow numpy scikit-learn scipy gtts SpeechRecognition pydub\r\n```\r\n\r\n## Usage\r\n\r\nAfter installing the library, you can import the necessary functions or classes in your Python script:\r\n\r\n```python\r\n#importing Pakages\r\nfrom LughaatNLP import LughaatNLP \r\nfrom LughaatNLP import POS_urdu from LughaatNLP import NER_Urdu\r\n         from LughaatNLP import TextSummarization\r\n         from LughaatNLP import  UrduSpeech\r\n\r\n# Instance Calling \r\nurdu_text_processing = LughaatNLP() \r\nner_urdu = NER_Urdu()\r\npos_tagger = POS_urdu()\r\nspeech_urdu = TextSummarization()\r\nspeech_urdu = UrduSpeech()\r\n```\r\n\r\n## Functions\r\n\r\n# Normalization\r\n\r\n### 1. `normalize_characters(text)`\r\n\r\nThis function normalizes the Urdu characters in the given text by mapping incorrect Urdu characters to their correct forms. Sometimes, single Unicode characters representing Urdu may be written in multiple forms, and this function normalizes them accordingly.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u0622\u067e \u06a9\u06cc\u0633\u06d2 \u06c1\u06cc\u06ba\u061f\"\r\nnormalized_text = urdu_text_processing.normalize_characters(text)\r\nprint(normalized_text)  # Output: \u0627\u067e \u06a9\u06cc\u0633\u06d2 \u06c1\u06cc\u06ba\u061f\r\n```\r\n\r\n### 2. `normalize_combine_characters(text)`\r\n\r\nThis function simplifies Urdu characters by combining certain combinations into their correct single forms. In Urdu writing, some characters are made up of multiple parts like ligatures or diacritics. This function finds these combinations in the text and changes them to their single character forms. It ensures consistency and accuracy in how Urdu text is represented.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u0627\u064f\u0631\u062f\u0648\"\r\nnormalized_text = urdu_text_processing.normalize_combine_characters(text)\r\nprint(normalized_text)  # Output: \u0627\u064f\u0631\u062f\u0648\r\n```\r\n\r\n### 3. `normalize(text)`\r\n\r\nThis function performs all-in-one normalization on the Urdu text, including character normalization, diacritic removal, punctuation handling, digit conversion, and special character preservation.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u0622\u067e \u06a9\u06cc\u0633\u06d2 \u06c1\u06cc\u06ba\u061f \u0645\u06cc\u06ba \u06f2\u06f3 \u0633\u0627\u0644 \u06a9\u0627 \u06c1\u0648\u06ba\u06d4\"\r\nnormalized_text = urdu_text_processing.normalize(text)\r\nprint(\"Normalize all at once together of Urdu: \", normalized_text)  # Output: \u0627\u067e \u06a9\u06cc\u0633\u06d2 \u06c1\u06cc\u06ba \u061f \u0645\u06cc\u06ba 23 \u0633\u0627\u0644 \u06a9\u0627 \u06c1\u0648\u06ba \u06d4\r\n```\r\n\r\n### 4. `remove_diacritics(text)`\r\n\r\nThis function removes diacritics (zabar, zer, pesh) from the Urdu text.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u06a9\u0650\u062a\u064e\u0627\u0628\"\r\ndiacritics_removed = urdu_text_processing.remove_diacritics(text)\r\nprint(\"Remove all Diacritic (Zabar - Zer - Pesh): \", diacritics_removed)  # Output: \u06a9\u062a\u0627\u0628\r\n```\r\n\r\n### 5. `punctuations_space(text)`\r\n\r\nThis function remove spaces after punctuations (excluding numbers) and removes spaces before punctuations in the Urdu text.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u06a9\u06cc\u0627 \u0622\u067e \u06a9\u06be\u0627\u0646\u0627 \u06a9\u06be\u0627\u0646\u0627 \u0686\u0627\u06c1\u062a\u06d2 \u06c1\u06cc\u06ba \u061f \u0645\u06cc\u06ba \u06a9\u06be\u0627\u0646\u0627 \u06a9\u06be\u0627\u0624\u06ba \u06af\u0627  \u06d4\"\r\npunctuated_text = urdu_text_processing.punctuations_space(text)\r\nprint(punctuated_text)  # Output: \u06a9\u06cc\u0627 \u0622\u067e \u06a9\u06be\u0627\u0646\u0627 \u06a9\u06be\u0627\u0646\u0627 \u0686\u0627\u06c1\u062a\u06d2 \u06c1\u06cc\u06ba\u061f \u0645\u06cc\u06ba \u06a9\u06be\u0627\u0646\u0627 \u06a9\u06be\u0627\u0624\u06ba \u06af\u0627\u06d4\r\n```\r\n\r\n### 6. `replace_digits(text)`\r\n\r\nThis function replaces English digits with Urdu digits.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u0645\u06cc\u06ba 23 \u0633\u0627\u0644 \u06a9\u0627 \u06c1\u0648\u06ba\u06d4\"\r\nenglish_digits = urdu_text_processing.replace_digits(text)\r\nprint(\"Replace All maths numbers with Urdu number eg(2 1 3 1 -> \u06f2 \u06f1 \u06f3 \u06f1): \", english_digits)  # Output: \u0645\u06cc\u06ba \u06f2\u06f3 \u0633\u0627\u0644 \u06a9\u0627 \u06c1\u0648\u06ba\u06d4\r\n```\r\n\r\n### 7. `remove_numbers_urdu(text)`\r\n\r\nThis function removes Urdu numbers from the Urdu text.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u0645\u06cc\u06ba  22 \u06f2\u06f3 \u0633\u0627\u0644 \u06a9\u0627 \u06c1\u0648\u06ba\u06d4\"\r\nno_urdu_numbers = urdu_text_processing.remove_numbers_urdu(text)\r\nprint(\"Remove Urdu numbers from text: \", no_urdu_numbers)  # Output: \u0645\u06cc\u06ba 22 \u0633\u0627\u0644 \u06a9\u0627 \u06c1\u0648\u06ba\u06d4\r\n```\r\n\r\n### 8. `remove_numbers_english(text)`\r\n\r\nThis function removes English numbers from the Urdu text.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u0645\u06cc\u06ba \u06f2\u06f3 23 \u0633\u0627\u0644 \u06a9\u0627 \u06c1\u0648\u06ba\u06d4\"\r\nno_english_numbers = urdu_text_processing.remove_numbers_english(text)\r\nprint(\"Remove English numbers from text: \", no_english_numbers)  # Output: \u0645\u06cc\u06ba \u06f2\u06f3 \u0633\u0627\u0644 \u06a9\u0627 \u06c1\u0648\u06ba\u06d4\r\n```\r\n\r\n### 9. `remove_whitespace(text)`\r\n\r\nThis function removes extra whitespaces from the Urdu text.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u0645\u06cc\u06ba   \u06af\u06be\u0631   \u062c\u0627   \u0631\u06c1\u0627   \u06c1\u0648\u06ba\u06d4\"\r\ncleaned_text = urdu_text_processing.remove_whitespace(text)\r\nprint(\"Remove All extra space between words\", cleaned_text)  # Output: \u0645\u06cc\u06ba \u06af\u06be\u0631 \u062c\u0627 \u0631\u06c1\u0627 \u06c1\u0648\u06ba\u06d4\r\n```\r\n\r\n### 10. `preserve_special_characters(text)`\r\n\r\nThis function adds spaces around special characters in the Urdu text to facilitate tokenization.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u0645\u06cc\u06ba@\u067e\u0627\u06a9\u0633\u062a\u0627\u0646_\u0633\u06d2_\u06c1\u0648\u06ba\u06d4\"\r\npreserved_text = urdu_text_processing.preserve_special_characters(text)\r\nprint(\"make a space between every special character and word so tokenize easily\", preserved_text)  # Output: \u0645\u06cc\u06ba @ \u067e\u0627\u06a9\u0633\u062a\u0627\u0646 _ \u0633\u06d2 _ \u06c1\u0648\u06ba \u06d4\r\n```\r\n\r\n### 11. `remove_numbers(text)`\r\n\r\nThis function removes both Urdu and English numbers from the Urdu text.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u0645\u06cc\u06ba \u06f2\u06f3 \u0633\u0627\u0644 \u06a9\u0627 \u06c1\u0648\u06ba \u0627\u0648\u0631 \u0645\u06cc\u0631\u06cc \u0639\u0645\u0631 23 \u06c1\u06d2\u06d4\"\r\nnumber_removed = urdu_text_processing.remove_numbers(text)\r\nprint(\"Remove All numbers whether they are Urdu or English: \", number_removed)  # Output: \u0645\u06cc\u06ba \u0633\u0627\u0644 \u06a9\u0627 \u06c1\u0648\u06ba \u0627\u0648\u0631 \u0645\u06cc\u0631\u06cc \u0639\u0645\u0631 \u06c1\u06d2\u06d4\r\n```\r\n\r\n### 12. `remove_english(text)`\r\n\r\nThis function removes English characters from the Urdu text.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"I am learning Urdu.\"\r\nurdu_only = urdu_text_processing.remove_english(text)\r\nprint(\"Remove All English characters from text: \", urdu_only)  # Output:  \u0627\u0645 \u0644\u0631\u0646\u0646\u06af \u0627\u0631\u062f\u0648\r\n```\r\n\r\n### 13. `pure_urdu(text)`\r\n\r\nThis function removes all non-Urdu characters and numbers from the text, leaving only Urdu characters and special characters used in Urdu.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"I ?  # & am learning Urdu. \u0645\u06cc\u06ba \u0627\u0631\u062f\u0648 \u0633\u06cc\u06a9\u06be \u0631\u06c1\u0627 \u06c1\u0648\u06ba\u06d4 123\"\r\npure_urdu_text = urdu_text_processing.pure_urdu(text)\r\nprint(pure_urdu_text)  # Output: \u0645\u06cc\u06ba \u0627\u0631\u062f\u0648 \u0633\u06cc\u06a9\u06be \u0631\u06c1\u0627 \u06c1\u0648\u06ba\u06d4\r\n```\r\n\r\n### 14. `just_urdu(text)`\r\n\r\nThis function removes all non-Urdu characters, numbers, and special characters, just leaving only pure Urdu text even not special character used in urdu.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"I am learning Urdu. \u0645\u06cc\u06ba \u0627\u0631\u062f\u0648 \u0633\u06cc\u06a9\u06be \u0631\u06c1\u0627 \u06c1\u0648\u06ba\u06d4 123\"\r\njust_urdu_text = urdu_text_processing.just_urdu(text)\r\nprint(just_urdu_text)  # Output: \u0645\u06cc\u06ba \u0627\u0631\u062f\u0648 \u0633\u06cc\u06a9\u06be \u0631\u06c1\u0627 \u06c1\u0648\u06ba \r\n```\r\n\r\n### 15. `remove_urls(text)`\r\n\r\nThis function removes URLs from the Urdu text.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u0645\u06cc\u06ba https://www.example.com \u067e\u0631 \u06af\u06cc\u0627\u06d4\"\r\nno_urls = urdu_text_processing.remove_urls(text)\r\nprint(\"Remove All URLs\", no_urls)  # Output: \u0645\u06cc\u06ba  \u067e\u0631 \u06af\u06cc\u0627\u06d4\r\n```\r\n\r\n### 16. `remove_special_characters(text)`\r\n\r\nThis function removes all special characters from the Urdu text.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u0645\u06cc\u06ba@\u067e\u0627\u06a9\u0633\u062a\u0627\u0646_\u0633\u06d2_\u06c1\u0648\u06ba\u06d4\"\r\nno_special_chars = urdu_text_processing.remove_special_characters(text)\r\nprint(\"Remove All Special characters\", no_special_chars)  # Output: \u0645\u06cc\u06ba \u067e\u0627\u06a9\u0633\u062a\u0627\u0646 \u0633\u06d2 \u06c1\u0648\u06ba\r\n```\r\n\r\n### 17. `remove_special_characters_exceptUrdu(text)`\r\n\r\nThis function removes all special characters from the Urdu text, except for those commonly used in the Urdu language (e.g., \u061f, \u06d4, \u060c).\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u0645\u06cc\u06ba@\u067e\u0627\u06a9\u0633\u062a\u0627\u0646??_\u0633\u06d2_\u06c1\u0648\u06ba\u061f\"\r\nurdu_special_chars = urdu_text_processing.remove_special_characters_exceptUrdu(text)\r\nprint(\"Remove All Special characters except those which are used in Urdu language eg( \u061f \u06d4 \u060c ): \", urdu_special_chars)  # Output: \u0645\u06cc\u06ba \u067e\u0627\u06a9\u0633\u062a\u0627\u0646 \u0633\u06d2 \u06c1\u0648\u06ba\u061f\r\n```\r\n# Stop Words Removing\r\n\r\n### 1. `remove_stopwords(text)`\r\n\r\nThis function removes stopwords from the Urdu text.\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u0645\u06cc\u06ba \u0627\u0633 \u06a9\u062a\u0627\u0628 \u06a9\u0648 \u067e\u0691\u06be\u0646\u0627 \u0686\u0627\u06c1\u062a\u0627 \u06c1\u0648\u06ba\u06d4\"\r\nfiltered_text = urdu_text_processing.remove_stopwords(text)\r\nprint(\"Remove Stop words:\", filtered_text)  # Output: \u06a9\u062a\u0627\u0628 \u067e\u0691\u06be\u0646\u0627 \u0686\u0627\u06c1\u062a\u0627 \u06c1\u0648\u06ba\u06d4\r\n```\r\n\r\n# Tokenization\r\n\r\nTokenization involves breaking down Urdu text into individual tokens, considering the intricacies of Urdu script and language structure.\r\n\r\n### 1. `urdu_tokenize(text)`\r\n\r\nThis function tokenizes the Urdu text into individual tokens (words, numbers, and punctuations).\r\n\r\n**Example:**\r\n\r\n```python\r\ntext = \"\u0645\u06cc\u06ba \u067e\u0627\u06a9\u0633\u062a\u0627\u0646 \u0633\u06d2 \u06c1\u0648\u06ba\u06d4\"\r\ntokens = urdu_text_processing.urdu_tokenize(text)\r\nprint(\"Tokenization for Urdu language:\", tokens)  # Output: ['\u0645\u06cc\u06ba', '\u067e\u0627\u06a9\u0633\u062a\u0627\u0646', '\u0633\u06d2', '\u06c1\u0648\u06ba\u06d4']\r\n```\r\n# Lemmatization and Stemming\r\n\r\nLemmatization involves converting inflected words into their base or dictionary form, while stemming reduces words to their root form.\r\n\r\n### 1. `lemmatize_sentence(sentence)`\r\n\r\nThis function performs lemmatization on the Urdu sentence, replacing words with their base or dictionary form.\r\n\r\n**Example:**\r\n\r\n```python\r\nsentence = \"\u0645\u06cc\u06ba \u06a9\u062a\u0627\u0628\u06cc\u06ba \u067e\u0691\u06be\u062a\u0627 \u06c1\u0648\u06ba\u06d4\"\r\nlemmatized_sentence = urdu_text_processing.lemmatize_sentence(sentence)\r\nprint(\"lemmatize the words \", lemmatized_sentence)  # Output: \u0645\u06cc\u06ba \u06a9\u062a\u0627\u0628 \u067e\u0691\u06be\u0646\u0627 \u06c1\u0648\u06ba\u06d4\r\n```\r\n\r\n### 2. `urdu_stemmer(sentence)`\r\n\r\nThis function performs stemming on the Urdu sentence, reducing words to their root or stem form.\r\n\r\n**Example:**\r\n\r\n```python\r\nsentence = \"\u0645\u06cc\u06ba \u06a9\u062a\u0627\u0628\u06cc\u06ba \u067e\u0691\u06be\u062a\u0627 \u06c1\u0648\u06ba\u06d4\"\r\nstemmed_sentence = urdu_text_processing.urdu_stemmer(sentence)\r\nprint(\"Urdu Stemming \", stemmed_sentence)  # Output: \u0645\u06cc\u06ba \u06a9\u062a\u0627\u0628 \u067e\u0691\u06be \u06c1\u0648\u06ba\u06d4\r\n```\r\n# Spell Checker\r\n\r\nSpell checking involves identifying and correcting misspelled words in Urdu text.\r\n\r\n### 1. `corrected_sentence_spelling(input_word, threshold)`\r\n\r\nThis function takes an input sentence and a similarity threshold as arguments and returns the corrected sentence with potentially misspelled words replaced by the most similar words from the vocabulary.\r\n\r\n**Example:**\r\n\r\n```python\r\nspell_checker = LughaatNLP()\r\nsentence = '\u0633\u0633\u0628 \u0633\u06d2\u0627 \u0628\u0691\u0627\u0627 \u0645\u0644\u06a9\u0627 \u06c1\u06d2'\r\ncorrected_sentence = spell_checker.corrected_sentence_spelling(sentence, 60)\r\nprint(\"This correct spelling of sentence itself\", corrected_sentence)\r\n```\r\n\r\n### 2. `most_similar_word(input_word, threshold)`\r\n\r\nThis function takes an input word and a similarity threshold as arguments and returns the most similar word from the vocabulary based on the Levenshtein distance.\r\n\r\n**Example:**\r\n\r\n```python\r\nspell_checker = LughaatNLP()\r\nword = '\u067e\u0627\u06a9\u0633\u062a\u0627\u0627\u0646'\r\nmost_similar = spell_checker.most_similar_word(word, 70)\r\nprint(\"This will return the most similar single word in string\", most_similar)\r\n```\r\n\r\n### 3. `get_similar_words_percentage(input_word, threshold)`\r\n\r\nThis function takes an input word and a similarity threshold as arguments and returns a list of tuples containing similar words and their corresponding similarity percentages.\r\n\r\n**Example:**\r\n\r\n```python\r\nspell_checker = LughaatNLP()\r\nword = '\u067e\u0627\u06a9\u0633\u062a\u0627\u0627\u0646'\r\nsimilar_words_with_percentage = spell_checker.get_similar_words_percentage(word, 70)\r\nprint(\"This will return the most similar words in list with percentage\", similar_words_with_percentage)\r\n```\r\n\r\n### 4. `get_similar_words(input_word, threshold)`\r\n\r\nThis function takes an input word and a similarity threshold as arguments and returns a list of similar words from the vocabulary based on the Levenshtein distance.\r\n\r\n**Example:**\r\n\r\n```python\r\nspell_checker = LughaatNLP()\r\nword = '\u067e\u0627\u06a9\u0633\u062a\u0627\u0627\u0646'\r\nsimilar_words = spell_checker.get_similar_words(word, 70)\r\nprint(\"This will return the most similar words in list only you can access word using index\", similar_words)\r\n```\r\n\r\nThese functions leverage the Levenshtein distance algorithm to calculate the similarity between the input word or sentence and the words in the vocabulary. The `threshold` parameter is used to filter out words with a similarity percentage below the specified threshold.\r\n\r\nNote: These examples assume that you have an instance of the `UrduTextNormalizer` class named `spell_checker` and have imported the `Levenshtein` module for calculating the edit distance.\r\n\r\n# Part of Speech\r\n\r\nThe `pos_tags_urdu` function is used for part-of-speech tagging in Urdu text. It takes an Urdu sentence as input and returns a list of dictionaries where each word is paired with its assigned part-of-speech tag, such as nouns (`NN`), verbs (`VB`), adjectives (`ADJ`), etc.\r\n\r\n### 1. `pos_tagger.pos_tags_urdu (sentence)`\r\n\r\nThe example output demonstrates how words like \"\u0645\u06cc\u0631\u06d2\" (`G` for postposition) and \"\u062a\u0639\u0644\u06cc\u0645\" (`NN` for noun) are tagged based on their grammatical roles within the sentence.\r\n\r\n**Example:**\r\n\r\n```python\r\nfrom LughaatNLP import POS_urdu\r\n\r\npos_tagger = POS_urdu()\r\n\r\nsentence = \"\u0645\u06cc\u0631\u06d2 \u0648\u0627\u0644\u062f\u06cc\u0646 \u0646\u06d2 \u0645\u06cc\u0631\u06cc \u062a\u0639\u0644\u06cc\u0645 \u0627\u0648\u0631 \u062a\u0631\u0628\u06cc\u062a \u0645\u06cc\u06ba \u0628\u06c1\u062a \u0645\u062d\u0646\u062a \u06a9\u06cc \u062a\u0627\u06a9\u06c1 \u0645\u06cc\u06ba \u0627\u067e\u0646\u06cc \u0632\u0646\u062f\u06af\u06cc \u0645\u06cc\u06ba \u06a9\u0627\u0645\u06cc\u0627\u0628 \u06c1\u0648 \u0633\u06a9\u0648\u06ba\u06d4\"\r\n\r\npos_tagger = POS_urdu()\r\npredicted_pos_tags = pos_tagger.pos_tags_urdu (sentence)\r\n\r\nprint(predicted_pos_tags)\r\n\r\n\r\n```\r\n## output \r\n```python\r\n# output => [{'Word': '\u0645\u06cc\u0631\u06d2', 'POS_Tag': 'G'}, {'Word': '\u0648\u0627\u0644\u062f\u06cc\u0646', 'POS_Tag': 'NN'},{'Word': '\u0646\u06d2', 'POS_Tag': 'P'},{'Word': '\u0645\u06cc\u0631\u06cc', 'POS_Tag': 'G'},{'Word': '\u062a\u0639\u0644\u06cc\u0645', 'POS_Tag': 'NN'},{'Word': '\u0627\u0648\u0631', 'POS_Tag': 'CC'},{'Word': '\u062a\u0631\u0628\u06cc\u062a', 'POS_Tag': 'NN'},{'Word': '\u0645\u06cc\u06ba', 'POS_Tag': 'P'},{'Word': '\u0628\u06c1\u062a', 'POS_Tag': 'ADV'},{'Word': '\u0645\u062d\u0646\u062a', 'POS_Tag': 'NN'},{'Word': '\u06a9\u06cc', 'POS_Tag': 'VB'},{'Word': '\u062a\u0627\u06a9\u06c1', 'POS_Tag': 'SC'},{'Word': '\u0645\u06cc\u06ba', 'POS_Tag': 'P'},{'Word': '\u0627\u067e\u0646\u06cc', 'POS_Tag': 'GR'},{'Word': '\u0632\u0646\u062f\u06af\u06cc', 'POS_Tag': 'NN'},{'Word': '\u0645\u06cc\u06ba', 'POS_Tag': 'P'},{'Word': '\u06a9\u0627\u0645\u06cc\u0627\u0628', 'POS_Tag': 'ADJ'},{'Word': '\u06c1\u0648', 'POS_Tag': 'VB'},{'Word': '\u0633\u06a9\u0648\u06ba', 'POS_Tag': 'NN'},{'Word': '\u06d4', 'POS_Tag': 'SM'}]\r\n```\r\n# Name Entity Relationships\r\n\r\nThe `ner_tags_urdu` function performs named entity recognition on Urdu text, assigning named entity tags (such as `U-LOCATION` for locations) to identified entities in the input sentence. It outputs a dictionary where words are mapped to their corresponding named entity tags, facilitating tasks like information extraction and text analysis specific to Urdu language. \r\n\r\n### 1. `ner_urdu.ner_tags_urdu (sentence)`\r\n\r\nThe example output illustrates how entities like \"\u067e\u0627\u06a9\u0633\u062a\u0627\u0646\" are recognized as locations (U-LOCATION) within the provided sentence.\r\n\r\n**Example:**\r\n\r\n```python\r\nfrom LughaatNLP import NER_Urdu\r\n\r\nner_urdu = NER_Urdu()\r\n\r\nsentence = \"\u0627\u0633 \u06a9\u062a\u0627\u0628 \u0645\u06cc\u06ba \u067e\u0627\u06a9\u0633\u062a\u0627\u0646 \u06a9\u06cc \u062a\u0627\u0631\u06cc\u062e \u0628\u06cc\u0627\u0646 \u06a9\u06cc \u06af\u0626\u06cc \u06c1\u06d2\u06d4\"\r\n\r\nword_tag_dict= ner_urdu.ner_tags_urdu (sentence)\r\n\r\nprint(word_tag_dict)\r\n\r\n\r\n```\r\n## output \r\n```python\r\n# output => {'\u0627\u0633': 'O', '\u06a9\u062a\u0627\u0628': 'O', '\u0645\u06cc\u06ba': 'O', '\u067e\u0627\u06a9\u0633\u062a\u0627\u0646': 'U-LOCATION', '\u06a9\u06cc': 'O', '\u062a\u0627\u0631\u06cc\u062e': 'O', '\u0628\u06cc\u0627\u0646': 'O', '\u06af\u0626\u06cc': 'O', '\u06c1\u06d2': 'O', '\u06d4': 'O'}\r\n```\r\n\r\n# Text Summarization\r\n\r\nThe `TextSummarization` class offers automatic summarization of Urdu text by leveraging natural language processing techniques. It uses graph-based representation and scoring methods like BM25 and TF-IDF to identify and select the most important sentences for summarization. This class is designed to generate concise and informative summaries from unstructured Urdu text, making it valuable for applications requiring efficient text summarization capabilities.\r\n\r\n### 1. `summarize(text, ratio)`\r\n\r\nThis Python program demonstrates the use of the `TextSummarization` class from the `LughaatNLP` module to summarize a given Urdu text based on a specified ratio. The input text is a news article discussing the integration of digital technology into governance and electoral processes in China. The program initializes a `TextSummarization` instance, sets a summary ratio of 1% (adjustable to other percentages), and then generates a summary using the `summarize` method. \r\n\r\n**Example:**\r\n\r\n```python\r\nfrom LughaatNLP import TextSummarization\r\n\r\n\r\ntext = '''\r\n\u0628\u06be\u06cc \u062d\u0627\u0644 \u06c1\u06cc \u0645\u06cc\u06ba \u062a\u06cc\u0633\u0631\u06d2 \u0627\u0646\u0679\u0631\u0646\u06cc\u0634\u0646\u0644 \u0688\u06cc\u0645\u0648\u06a9\u0631\u06cc\u0633\u06cc \u0641\u0648\u0631\u0645 \u06a9\u0627 \u0627\u0646\u0639\u0642\u0627\u062f \u0628\u06cc\u062c\u0646\u06af \u0645\u06cc\u06ba \u06a9\u06cc\u0627 \u06af\u06cc\u0627 \u060c \u062c\u0633 \u0645\u06cc\u06ba 70 \u0633\u06d2 \u0632\u0627\u0626\u062f \u0645\u0645\u0627\u0644\u06a9 \u0627\u0648\u0631 \u062e\u0637\u0648\u06ba \u06a9\u06d2 \u062a\u0642\u0631\u06cc\u0628\u0627\u064b 300 \u0639\u06c1\u062f\u06cc\u062f\u0627\u0631\u0648\u06ba \u0627\u0648\u0631 \u0645\u0627\u06c1\u0631\u06cc\u0646 \u0646\u06d2 \u0627\u0633 \u0628\u0627\u062a \u067e\u0631 \u062a\u0628\u0627\u062f\u0644\u06c1 \u062e\u06cc\u0627\u0644 \u06a9\u06cc\u0627 \u06a9\u06c1 \u062c\u0645\u06c1\u0648\u0631\u06cc\u062a \u06a9\u0648 \u062c\u062f\u06cc\u062f \u0688\u06cc\u062c\u06cc\u0679\u0644 \u062f\u0648\u0631 \u06a9\u06cc \u0636\u0631\u0648\u0631\u06cc\u0627\u062a \u0627\u0648\u0631 \u06af\u0644\u0648\u0628\u0644 \u0633\u0627\u0624\u062a\u06be \u06a9\u06cc \u0645\u0636\u0628\u0648\u0637\u06cc \u06a9\u06d2 \u0645\u0637\u0627\u0628\u0642 \u06a9\u06cc\u0633\u06d2 \u06c1\u0648\u0646\u0627 \u0686\u0627\u06c1\u0626\u06d2\u06d4\r\n\u0648\u0633\u06cc\u0639 \u062a\u0646\u0627\u0638\u0631 \u0645\u06cc\u06ba \u0622\u062c\u060c \u0688\u06cc\u062c\u06cc\u0679\u0644 \u0679\u06cc\u06a9\u0646\u0627\u0644\u0648\u062c\u06cc \u0686\u06cc\u0646 \u06a9\u06cc \u062c\u0645\u06c1\u0648\u0631\u06cc \u06af\u0648\u0631\u0646\u0646\u0633 \u06a9\u06d2 \u0633\u0627\u062a\u06be \u06af\u06c1\u0631\u0627\u0626\u06cc \u0633\u06d2 \u062c\u0691\u06cc \u06c1\u0648\u0626\u06cc \u06c1\u06d2\u06d4 \u0627\u0633 \u0648\u0642\u062a \u0648\u0648\u0679\u0646\u06af \u06a9\u06d2 \u062f\u0648\u0631\u0627\u0646 \u0688\u06cc\u062c\u06cc\u0679\u0644 \u0679\u06cc\u06a9\u0646\u0627\u0644\u0648\u062c\u06cc \u06a9\u0627 \u0627\u0633\u062a\u0639\u0645\u0627\u0644\u060c \u062d\u06a9\u0648\u0645\u062a\u06cc \u067e\u0627\u0644\u06cc\u0633\u06cc\u0648\u06ba \u06a9\u0648 \u0633\u0645\u062c\u06be\u0646\u06d2\u060c \u0645\u0634\u0627\u0648\u0631\u062a \u0645\u06cc\u06ba \u062d\u0635\u06c1 \u0644\u06cc\u0646\u06d2 \u0627\u0648\u0631 \u062d\u06a9\u0648\u0645\u062a\u06cc \u0633\u0631\u06af\u0631\u0645\u06cc\u0648\u06ba \u06a9\u06cc \u0646\u06af\u0631\u0627\u0646\u06cc \u062c\u06cc\u0633\u06d2 \u0639\u0645\u0644 \u06a9\u0648 \u0622\u0633\u0627\u0646 \u0628\u0646\u0627 \u0631\u06c1\u0627 \u06c1\u06d2.\u0686\u06cc\u0646 \u06a9\u06d2 \u0633\u0631\u06a9\u0627\u0631\u06cc \u0627\u0646\u062a\u062e\u0627\u0628\u06cc \u0637\u0631\u06cc\u0642\u06c1 \u06a9\u0627\u0631 \u0645\u06cc\u06ba \u0688\u06cc\u062c\u06cc\u0679\u0644 \u0679\u06cc\u06a9\u0646\u0627\u0644\u0648\u062c\u06cc \u06a9\u0627 \u0627\u0646\u0636\u0645\u0627\u0645 \u0648\u0648\u0679\u0631 \u0631\u062c\u0633\u0679\u0631\u06cc\u0634\u0646\u060c \u0627\u0645\u06cc\u062f\u0648\u0627\u0631\u0648\u06ba \u06a9\u06cc \u0645\u0639\u0644\u0648\u0645\u0627\u062a \u06a9\u06cc \u0641\u0631\u0627\u06c1\u0645\u06cc \u06a9\u06d2 \u0633\u0627\u062a\u06be \u0633\u0627\u062a\u06be \u0648\u0648\u0679\u0646\u06af \u0627\u0648\u0631 \u06af\u0646\u062a\u06cc \u06a9\u06d2 \u0639\u0645\u0644 \u062c\u06cc\u0633\u06d2 \u06a9\u0627\u0645\u0648\u06ba \u06a9\u0648 \u0622\u0633\u0627\u0646 \u0628\u0646\u0627\u0631\u06c1\u0627 \u06c1\u06d2\u06d4\r\n\u06cc\u06c1 \u0627\u0646\u0636\u0645\u0627\u0645 \u062d\u06a9\u0648\u0645\u062a\u06cc \u0627\u062f\u0627\u0631\u0648\u06ba\u060c \u0631\u0627\u0626\u06d2 \u062f\u06c1\u0646\u062f\u06af\u0627\u0646 \u0627\u0648\u0631 \u0627\u0645\u06cc\u062f\u0648\u0627\u0631\u0648\u06ba \u06a9\u06d2 \u062f\u0631\u0645\u06cc\u0627\u0646 \u062a\u0639\u0627\u0645\u0644 \u06a9\u0648 \u0628\u06be\u06cc \u0628\u0691\u06be\u0627\u062a\u0627 \u06c1\u06d2\u060c \u062d\u062a\u0645\u06cc \u0637\u0648\u0631 \u067e\u0631 \u0627\u0633 \u0628\u0627\u062a \u06a9\u0648 \u06cc\u0642\u06cc\u0646\u06cc \u0628\u0646\u0627\u062a\u0627 \u06c1\u06d2 \u06a9\u06c1 \u062c\u0645\u06c1\u0648\u0631\u06cc \u0627\u0646\u062a\u062e\u0627\u0628\u0627\u062a \u0639\u0648\u0627\u0645 \u06a9\u06cc \u0636\u0631\u0648\u0631\u06cc\u0627\u062a \u06a9\u06d2 \u0645\u0637\u0627\u0628\u0642 \u0632\u06cc\u0627\u062f\u06c1 \u062c\u0648\u0627\u0628\u062f\u06c1 \u06c1\u0648\u06ba\u06d4\r\n\u0645\u0632\u06cc\u062f \u0628\u0631\u0622\u06ba\u060c \u0688\u06cc\u062c\u06cc\u0679\u0644 \u0679\u06cc\u06a9\u0646\u0627\u0644\u0648\u062c\u06cc \u062d\u06a9\u0648\u0645\u062a \u06a9\u0648 \u0639\u0648\u0627\u0645\u06cc \u062a\u0634\u0648\u06cc\u0634 \u06a9\u06d2 \u0645\u0633\u0627\u0626\u0644 \u06a9\u0648 \u062d\u0644 \u06a9\u0631\u0646\u06d2 \u06a9\u06d2 \u0644\u0626\u06d2 \u06a9\u0627\u0631\u0648\u0628\u0627\u0631\u06cc \u0627\u062f\u0627\u0631\u0648\u06ba\u060c \u0645\u0627\u06c1\u0631\u06cc\u0646 \u0627\u0648\u0631 \u0633\u0648\u0644 \u0645\u0646\u062f\u0648\u0628\u06cc\u0646 \u06a9\u06d2 \u062f\u0631\u0645\u06cc\u0627\u0646 \u0622\u0646 \u0644\u0627\u0626\u0646 \u062a\u0628\u0627\u062f\u0644\u06c1 \u062e\u06cc\u0627\u0644 \u06a9\u0627 \u0627\u06c1\u062a\u0645\u0627\u0645 \u06a9\u0631\u0646\u06d2 \u06a9\u06d2 \u0642\u0627\u0628\u0644 \u0628\u0646\u0627 \u0631\u06c1\u06cc \u06c1\u06d2\u060c \u062c\u0633 \u0633\u06d2 \u0645\u0634\u0627\u0648\u0631\u062a \u0632\u06cc\u0627\u062f\u06c1 \u0645\u0648\u062b\u0631 \u06c1\u0648\u062a\u06cc \u062c\u0627 \u0631\u06c1\u06cc \u06c1\u06d2.\u0688\u06cc\u062c\u06cc\u0679\u0644 \u0679\u06cc\u06a9 \u0641\u06cc\u0635\u0644\u06c1 \u0633\u0627\u0632\u06cc \u0627\u0648\u0631 \u0627\u0646\u062a\u0638\u0627\u0645 \u06a9\u06d2 \u0644\u0626\u06d2 \u0686\u06cc\u0646\u0644\u0632 \u06a9\u0648 \u0628\u06be\u06cc \u0648\u0633\u0639\u062a \u062f\u06cc\u062a\u06cc \u06c1\u06d2\u06d4\r\n\u0644\u0648\u06af \u062d\u06a9\u0648\u0645\u062a \u06a9\u06cc \u062c\u0627\u0646\u0628 \u0633\u06d2 \u0634\u0631\u0648\u0639 \u06a9\u06cc\u06d2 \u06af\u0626\u06d2 \u0645\u062e\u062a\u0644\u0641 \u0622\u0646 \u0644\u0627\u0626\u0646 \u067e\u0644\u06cc\u0679 \u0641\u0627\u0631\u0645\u0632 \u062c\u06cc\u0633\u06d2 \u0688\u06cc\u062c\u06cc\u0679\u0644 \u0631\u0627\u0626\u06d2 \u0639\u0627\u0645\u06c1 \u06a9\u06d2 \u0686\u06cc\u0646\u0644\u0632 \u0627\u0648\u0631 \u0622\u0646 \u0644\u0627\u0626\u0646 \u0633\u0631\u0648\u06d2 \u06a9\u06d2 \u0630\u0631\u06cc\u0639\u06d2 \u0627\u067e\u0646\u06cc \u0631\u0627\u0626\u06d2 \u06a9\u0627 \u0627\u0638\u06c1\u0627\u0631 \u06a9\u0631\u0633\u06a9\u062a\u06d2 \u06c1\u06cc\u06ba \u0627\u0648\u0631 \u0645\u0634\u0627\u0648\u0631\u062a \u0645\u06cc\u06ba \u0645\u0634\u063a\u0648\u0644 \u06c1\u0648\u0633\u06a9\u062a\u06d2 \u06c1\u06cc\u06ba\u06d4 \u0646\u062a\u06cc\u062c\u062a\u0627\u064b \u060c \u0633\u0648\u0634\u0644 \u06af\u0648\u0631\u0646\u0646\u0633 \u06a9\u06d2 \u0644\u0626\u06d2 \u0641\u06cc\u0688 \u0628\u06cc\u06a9 \u0645\u06cc\u06a9\u0627\u0646\u0632\u0645 \u0646\u06d2 \u062f\u0631\u062c\u06c1 \u0628\u0646\u062f\u06cc \u06a9\u06cc \u0633\u0637\u062d\u0648\u06ba \u06a9\u0648 \u06a9\u0645 \u06a9\u0631\u062f\u06cc\u0627 \u06c1\u06d2 \u060c \u062c\u0633 \u0633\u06d2 \u0631\u062f\u0639\u0645\u0644 \u0627\u0648\u0631 \u06a9\u0627\u0631\u06a9\u0631\u062f\u06af\u06cc \u0645\u06cc\u06ba \u062a\u06cc\u0632\u06cc \u0633\u06d2 \u0627\u0636\u0627\u0641\u06c1 \u06c1\u0648\u0627 \u06c1\u06d2\u06d4\r\n\u06cc\u06c1\u0627\u06ba \u0627\u0633 \u062d\u0642\u06cc\u0642\u062a \u06a9\u0648 \u0628\u06be\u06cc \u062a\u0633\u0644\u06cc\u0645 \u06a9\u0631\u0646\u0627 \u067e\u0691\u06d2 \u06af\u0627 \u06a9\u06c1\u060c \u0688\u06cc\u062c\u06cc\u0679\u0644 \u0679\u06cc\u06a9\u0646\u0627\u0644\u0648\u062c\u06cc \u06a9\u0627 \u0627\u062b\u0631 \u0645\u062d\u0636 \u06a9\u0627\u0631\u06a9\u0631\u062f\u06af\u06cc \u06a9\u06d2 \u0641\u0648\u0627\u0626\u062f \u0633\u06d2 \u06a9\u06c1\u06cc\u06ba \u0632\u06cc\u0627\u062f\u06c1 \u06c1\u06d2. \u06cc\u06c1 \u0633\u0648\u0634\u0644 \u0645\u06cc\u0688\u06cc\u0627 \u067e\u0644\u06cc\u0679 \u0641\u0627\u0631\u0645\u0632 \u0627\u0648\u0631 \u0622\u0646 \u0644\u0627\u0626\u0646 \u0631\u067e\u0648\u0631\u0679\u0646\u06af \u0633\u0633\u0679\u0645 \u06a9\u06cc \u0628\u062f\u0648\u0644\u062a \u0633\u0631\u06a9\u0627\u0631\u06cc \u06a9\u0627\u0631\u0631\u0648\u0627\u0626\u06cc\u0648\u06ba \u0645\u06cc\u06ba \u0634\u0641\u0627\u0641\u06cc\u062a \u0627\u0648\u0631 \u0627\u062d\u062a\u0633\u0627\u0628 \u06a9\u0648 \u0641\u0631\u0648\u063a \u062f\u06d2 \u0631\u06c1\u06cc \u06c1\u06d2\u06d4 \u0639\u0648\u0627\u0645\u06cc \u062c\u0627\u0646\u0686 \u067e\u0691\u062a\u0627\u0644 \u06a9\u06cc \u0646\u06af\u0631\u0627\u0646\u06cc \u0645\u06cc\u06ba \u0627\u0646\u062a\u0638\u0627\u0645\u06cc \u0639\u0645\u0644 \u06a9\u0627 \u0645\u0639\u0627\u0626\u0646\u06c1 \u0628\u06be\u06cc \u0634\u0627\u0645\u0644 \u06c1\u06d2 \u060c \u062c\u0648 \u0627\u0633 \u0628\u0627\u062a \u06a9\u0648 \u06cc\u0642\u06cc\u0646\u06cc \u0628\u0646\u0627\u062a\u0627 \u06c1\u06d2 \u06a9\u06c1 \u0633\u0631\u06a9\u0627\u0631\u06cc \u0633\u0631\u06af\u0631\u0645\u06cc\u0627\u06ba \u0627\u062e\u0644\u0627\u0642\u06cc \u0645\u0639\u06cc\u0627\u0631\u0627\u062a \u067e\u0631 \u0645\u06a9\u0645\u0644 \u0639\u0645\u0644 \u067e\u06cc\u0631\u0627 \u06c1\u0648\u06ba \u0627\u0648\u0631 \u0639\u0648\u0627\u0645 \u06a9\u0648 \u062c\u0648\u0627\u0628\u062f\u06c1 \u06c1\u0648\u06ba\u06d4\r\n\u06cc\u0648\u06ba \u0686\u06cc\u0646 \u0645\u06cc\u06ba \u060c \u0688\u06cc\u062c\u06cc\u0679\u0644 \u0679\u06cc\u06a9\u0646\u0627\u0644\u0648\u062c\u06cc \u0635\u0631\u0641 \u0627\u06cc\u06a9 \u0622\u0644\u06c1 \u0646\u06c1\u06cc\u06ba \u06c1\u06d2\u06d4 \u06cc\u06c1 \u062a\u0631\u0642\u06cc \u06a9\u06d2 \u0644\u0626\u06d2 \u0627\u06cc\u06a9 \u0645\u062d\u0631\u06a9 \u06c1\u06d2 \u062c\u0648 \u0634\u06c1\u0631\u06cc\u0648\u06ba \u06a9\u0648 \u062c\u0645\u06c1\u0648\u0631\u06cc \u0639\u0645\u0644 \u06a9\u0648 \u062a\u0634\u06a9\u06cc\u0644 \u062f\u06cc\u0646\u06d2 \u06a9\u06d2 \u0644\u0626\u06d2 \u0628\u0627\u0627\u062e\u062a\u06cc\u0627\u0631 \u0628\u0646\u0627\u062a\u0627 \u06c1\u06d2. \u062c\u062f\u06cc\u062f \u067e\u0644\u06cc\u0679 \u0641\u0627\u0631\u0645\u0632 \u0627\u0648\u0631 \u0634\u0641\u0627\u0641 \u062d\u06a9\u0645\u0631\u0627\u0646\u06cc \u06a9\u06d2 \u0637\u0631\u06cc\u0642\u0648\u06ba \u0646\u06d2 \u0644\u0648\u06af\u0648\u06ba \u06a9\u06d2 \u0644\u0626\u06d2 \u0627\u067e\u0646\u06d2 \u062c\u0645\u06c1\u0648\u0631\u06cc \u062d\u0642\u0648\u0642 \u06a9\u0648 \u0628\u0631\u0627\u06c1 \u0631\u0627\u0633\u062a \u0627\u0633\u062a\u0639\u0645\u0627\u0644 \u06a9\u0631\u0646\u06d2 \u06a9\u06d2 \u0644\u0626\u06d2 \u0632\u06cc\u0627\u062f\u06c1 \u0622\u0633\u0627\u0646 \u0627\u0648\u0631 \u0645\u0648\u062b\u0631 \u0628\u0646\u0627 \u062f\u06cc\u0627 \u06c1\u06d2\u06d4\r\n\u0688\u06cc\u062c\u06cc\u0679\u0644 \u0679\u06cc\u06a9\u0646\u0627\u0644\u0648\u062c\u06cc \u0633\u06d2 \u06c1\u0679 \u06a9\u0631 \u0627\u06af\u0631 \u062a\u06cc\u0633\u0631\u06d2 \u0627\u0646\u0679\u0631\u0646\u06cc\u0634\u0646\u0644 \u0688\u06cc\u0645\u0648\u06a9\u0631\u06cc\u0633\u06cc \u0641\u0648\u0631\u0645 \u06a9\u06cc \u0645\u0632\u06cc\u062f \u0628\u0627\u062a \u06a9\u06cc \u062c\u0627\u0626\u06d2 \u062a\u0648 \u0641\u0648\u0631\u0645 \u06a9\u06d2 \u0634\u0631\u06a9\u0627\u0621 \u0646\u06d2 \u062c\u06c1\u0627\u06ba \u062a\u06a9\u0646\u06cc\u06a9\u06cc \u062a\u0631\u0642\u06cc \u06a9\u06cc \u0648\u06a9\u0627\u0644\u062a \u06a9\u06cc \u0648\u06c1\u0627\u06ba \u06cc\u06c1 \u0628\u06be\u06cc \u0648\u0627\u0636\u062d \u06a9\u06cc\u0627 \u06a9\u06c1 \u06a9\u0648\u0626\u06cc \u0627\u06cc\u06a9 \u0637\u0631\u0632 \u062c\u0645\u06c1\u0648\u0631\u06cc\u062a \u06cc\u0627 \u062c\u0645\u06c1\u0648\u0631\u06cc \u0645\u0627\u0688\u0644 \u0648\u0627\u062d\u062f \u0627\u0646\u062a\u062e\u0627\u0628 \u0646\u06c1\u06cc\u06ba \u06c1\u0648\u0646\u0627 \u0686\u0627\u06c1\u0626\u06d2\u06d4\u0645\u063a\u0631\u0628\u06cc \u0631\u06cc\u0627\u0633\u062a\u0648\u06ba \u06a9\u0648 \u0627\u06cc\u0633\u06cc \u0633\u06cc\u0627\u0633\u06cc \u0634\u06a9\u0644\u0648\u06ba \u06a9\u0648 \u0642\u0628\u0648\u0644 \u06a9\u0631\u0646\u0627 \u0686\u0627\u06c1\u06cc\u06d2 \u062c\u0648 \u0627\u0646 \u06a9\u06cc \u0627\u067e\u0646\u06cc \u0633\u06cc\u0627\u0633\u06cc \u0634\u06a9\u0644\u0648\u06ba \u0633\u06d2 \u0645\u062e\u062a\u0644\u0641 \u06c1\u06cc\u06ba\u06d4\u0633\u0627\u062a\u06be \u0633\u0627\u062a\u06be \u06cc\u06c1 \u0636\u0631\u0648\u0631\u06cc \u06c1\u06d2 \u06a9\u06c1 \u062f\u0646\u06cc\u0627 \u06a9\u0648 \u0627\u06cc\u06a9 \u0627\u06cc\u0633\u06cc \u06a9\u0645\u06cc\u0648\u0646\u0679\u06cc \u06a9\u06d2 \u0646\u0642\u0637\u06c1 \u0646\u0638\u0631 \u0633\u06d2 \u062f\u06cc\u06a9\u06be\u0627 \u062c\u0627\u0626\u06d2 \u062c\u0633 \u06a9\u06cc \u0627\u0646\u0633\u0627\u0646\u06cc\u062a \u06a9\u06d2 \u0644\u0626\u06d2 \u0645\u0634\u062a\u0631\u06a9\u06c1 \u062a\u0642\u062f\u06cc\u0631 \u06c1\u0648\u06d4\r\n\u0627\u0633 \u0636\u0645\u0646 \u0645\u06cc\u06ba \u0686\u06cc\u0646 \u06a9\u06cc \u062c\u0627\u0646\u0628 \u0633\u06d2 \u062a\u062c\u0648\u06cc\u0632 \u06a9\u0631\u062f\u06c1 \u0628\u06cc\u0646 \u0627\u0644\u0627\u0642\u0648\u0627\u0645\u06cc \u0627\u0642\u062f\u0627\u0645\u0627\u062a \u0628\u0634\u0645\u0648\u0644 \u0628\u06cc\u0644\u0679 \u0627\u06cc\u0646\u0688 \u0631\u0648\u0688 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648\u060c \u06af\u0644\u0648\u0628\u0644 \u0688\u06cc\u0648\u0644\u067e\u0645\u0646\u0679 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648\u060c \u06af\u0644\u0648\u0628\u0644 \u0633\u06cc\u06a9\u06cc\u0648\u0631\u0679\u06cc \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u0627\u0648\u0631 \u06af\u0644\u0648\u0628\u0644 \u0633\u0648\u0644\u0627\u0626\u0632\u06cc\u0634\u0646 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u0645\u0633\u0627\u0628\u0642\u062a \u0627\u0648\u0631 \u0628\u0627\u0644\u0627\u062f\u0633\u062a\u06cc \u06a9\u06d2 \u0646\u0642\u0637\u06c1 \u0646\u0638\u0631 \u0633\u06d2 \u06cc\u06a9\u0633\u0631 \u062c\u062f\u0627\u06af\u0627\u0646\u06c1 \u0646\u0642\u0637\u06c1 \u0646\u0638\u0631 \u0641\u0631\u0627\u06c1\u0645 \u06a9\u0631\u062a\u06d2 \u06c1\u06cc\u06ba \u0627\u0648\u0631 \u0639\u0627\u0644\u0645\u06cc \u0686\u06cc\u0644\u0646\u062c\u0632 \u06a9\u0627 \u0686\u06cc\u0646\u06cc \u062f\u0627\u0646\u0634 \u06a9\u06d2 \u062a\u062d\u062a \u0645\u0648\u0626\u062b\u0631 \u062d\u0644 \u067e\u06cc\u0634 \u06a9\u0631\u062a\u06d2 \u06c1\u06cc\u06ba\u06d4\r\n\r\n'''\r\n\r\nsummarizer = TextSummarization()\r\nratio = 0.01  # Summary ratio (e.g., keep 1% of the sentences or adjust according to your preference: 10%, 20%, 40%, etc. (1% to 100%))\r\nsummary = summarizer.summarize(text, ratio)\r\nprint(summary)\r\n```\r\n## output \r\n```python\r\n#output =>\r\n\u06af\u0644\u0648\u0628\u0644 \u0633\u06cc\u06a9\u06cc\u0648\u0631\u0679\u06cc \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u0627\u0648\u0631 \u06af\u0644\u0648\u0628\u0644 \u0633\u0648\u0644\u0627\u0626\u0632\u06cc\u0634\u0646 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u0645\u0633\u0627\u0628\u0642\u062a \u0627\u0648\u0631 \u0628\u0627\u0644\u0627\u062f\u0633\u062a\u06cc \u06a9\u06d2 \u0646\u0642\u0637\u06c1 \u0646\u0638\u0631 \u0633\u06d2 \u06cc\u06a9\u0633\u0631 \u062c\u062f\u0627\u06af\u0627\u0646\u06c1 \u0646\u0642\u0637\u06c1 \u0646\u0638\u0631 \u0641\u0631\u0627\u06c1\u0645 \u06a9\u0631\u062a\u06d2 \u06c1\u06cc\u06ba \u0627\u0648\u0631 \u0639\u0627\u0644\u0645\u06cc \u0686\u06cc\u0644\u0646\u062c\u0632 \u06a9\u0627 \u0686\u06cc\u0646\u06cc \u062f\u0627\u0646\u0634 \u06a9\u06d2 \u062a\u062d\u062a \u0645\u0648\u0626\u062b\u0631 \u062d\u0644 \u067e\u06cc\u0634 \u06a9\u0631\u062a\u06d2 \u06c1\u06cc\u06ba \u06d4\r\n```\r\n\r\n### 2. `summarize_unstructured_text(text, ratio)`\r\n\r\nThe `summarize_unstructured_text` function for the `Text without the sentence Structure` in the `TextSummarizer` class is designed to summarize messy or unorganized Urdu text by automatically breaking it into understandable sentences. This function helps users quickly get a condensed summary of important information from raw text that may not follow typical sentence structures. It's useful for efficiently analyzing and extracting key details from various forms of unstructured Urdu text.\r\n\r\n**Example:**\r\n\r\n```python\r\nfrom LughaatNLP import TextSummarization\r\n\r\n\r\n# Example usage\r\n\r\n# Text without the sentence Structure\r\n\r\ntext = \"\u0627\u0633 \u0636\u0645\u0646 \u0645\u06cc\u06ba \u0686\u06cc\u0646 \u06a9\u06cc \u062c\u0627\u0646\u0628 \u0633\u06d2 \u062a\u062c\u0648\u06cc\u0632 \u06a9\u0631\u062f\u06c1 \u0628\u06cc\u0646 \u0627\u0644\u0627\u0642\u0648\u0627\u0645\u06cc \u0627\u0642\u062f\u0627\u0645\u0627\u062a \u0628\u0634\u0645\u0648\u0644 \u0628\u06cc\u0644\u0679 \u0627\u06cc\u0646\u0688 \u0631\u0648\u0688 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u06af\u0644\u0648\u0628\u0644 \u0688\u06cc\u0648\u0644\u067e\u0645\u0646\u0679 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u06af\u0644\u0648\u0628\u0644 \u0633\u06cc\u06a9\u06cc\u0648\u0631\u0679\u06cc \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u0627\u0648\u0631 \u06af\u0644\u0648\u0628\u0644 \u0633\u0648\u0644\u0627\u0626\u0632\u06cc\u0634\u0646 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u0645\u0633\u0627\u0628\u0642\u062a \u0627\u0648\u0631 \u0628\u0627\u0644\u0627\u062f\u0633\u062a\u06cc \u06a9\u06d2 \u0646\u0642\u0637\u06c1 \u0646\u0638\u0631 \u0633\u06d2 \u06cc\u06a9\u0633\u0631 \u062c\u062f\u0627\u06af\u0627\u0646\u06c1 \u0646\u0642\u0637\u06c1 \u0646\u0638\u0631 \u0641\u0631\u0627\u06c1\u0645 \u06a9\u0631\u062a\u06d2 \u06c1\u06cc\u06ba \u0627\u0648\u0631 \u0639\u0627\u0644\u0645\u06cc \u0686\u06cc\u0644\u0646\u062c\u0632 \u06a9\u0627 \u0686\u06cc\u0646\u06cc \u062f\u0627\u0646\u0634 \u06a9\u06d2 \u062a\u062d\u062a \u0645\u0648\u0626\u062b\u0631 \u062d\u0644 \u067e\u06cc\u0634 \u06a9\u0631\u062a\u06d2 \u06c1\u06cc\u06ba \u0627\u0633 \u0648\u0642\u062a \u0648\u0648\u0679\u0646\u06af \u06a9\u06d2 \u062f\u0648\u0631\u0627\u0646 \u0688\u06cc\u062c\u06cc\u0679\u0644 \u0679\u06cc\u06a9\u0646\u0627\u0644\u0648\u062c\u06cc \u06a9\u0627 \u0627\u0633\u062a\u0639\u0645\u0627\u0644 \u062d\u06a9\u0648\u0645\u062a\u06cc \u067e\u0627\u0644\u06cc\u0633\u06cc\u0648\u06ba \u06a9\u0648 \u0633\u0645\u062c\u06be\u0646\u06d2 \u0645\u0634\u0627\u0648\u0631\u062a \u0645\u06cc\u06ba \u062d\u0635\u06c1 \u0644\u06cc\u0646\u06d2 \u0627\u0648\u0631 \u062d\u06a9\u0648\u0645\u062a\u06cc \u0633\u0631\u06af\u0631\u0645\u06cc\u0648\u06ba \u06a9\u06cc \u0646\u06af\u0631\u0627\u0646\u06cc \u062c\u06cc\u0633\u06d2 \u0639\u0645\u0644 \u06a9\u0648 \u0622\u0633\u0627\u0646 \u0628\u0646\u0627 \u0631\u06c1\u0627 \u06c1\u06d2\u0686\u06cc\u0646 \u06a9\u06d2 \u0633\u0631\u06a9\u0627\u0631\u06cc \u0627\u0646\u062a\u062e\u0627\u0628\u06cc \u0637\u0631\u06cc\u0642\u06c1 \u06a9\u0627\u0631 \u0645\u06cc\u06ba \u0688\u06cc\u062c\u06cc\u0679\u0644 \u0679\u06cc\u06a9\u0646\u0627\u0644\u0648\u062c\u06cc \u06a9\u0627 \u0627\u0646\u0636\u0645\u0627\u0645 \u0648\u0648\u0679\u0631 \u0631\u062c\u0633\u0679\u0631\u06cc\u0634\u0646 \u0627\u0645\u06cc\u062f\u0648\u0627\u0631\u0648\u06ba \u06a9\u06cc \u0645\u0639\u0644\u0648\u0645\u0627\u062a \u06a9\u06cc \u0641\u0631\u0627\u06c1\u0645\u06cc \u06a9\u06d2 \u0633\u0627\u062a\u06be \u0633\u0627\u062a\u06be \u0648\u0648\u0679\u0646\u06af \u0627\u0648\u0631 \u06af\u0646\u062a\u06cc \u06a9\u06d2 \u0639\u0645\u0644 \u062c\u06cc\u0633\u06d2 \u06a9\u0627\u0645\u0648\u06ba \u06a9\u0648 \u0622\u0633\u0627\u0646 \u0628\u0646\u0627\u0631\u06c1\u0627 \u06c1\u06d2 \u0645\u0632\u06cc\u062f \u0628\u0631\u0622\u06ba \u0688\u06cc\u062c\u06cc\u0679\u0644 \u0679\u06cc\u06a9\u0646\u0627\u0644\u0648\u062c\u06cc \u062d\u06a9\u0648\u0645\u062a \u06a9\u0648 \u0639\u0648\u0627\u0645\u06cc \u062a\u0634\u0648\u06cc\u0634 \u06a9\u06d2 \u0645\u0633\u0627\u0626\u0644 \u06a9\u0648 \u062d\u0644 \u06a9\u0631\u0646\u06d2 \u06a9\u06d2 \u0644\u0626\u06d2 \u06a9\u0627\u0631\u0648\u0628\u0627\u0631\u06cc \u0627\u062f\u0627\u0631\u0648\u06ba \u0645\u0627\u06c1\u0631\u06cc\u0646 \u0627\u0648\u0631 \u0633\u0648\u0644 \u0645\u0646\u062f\u0648\u0628\u06cc\u0646 \u06a9\u06d2 \u062f\u0631\u0645\u06cc\u0627\u0646 \u0622\u0646 \u0644\u0627\u0626\u0646 \u062a\u0628\u0627\u062f\u0644\u06c1 \u062e\u06cc\u0627\u0644 \u06a9\u0627 \u0627\u06c1\u062a\u0645\u0627\u0645 \u06a9\u0631\u0646\u06d2 \u06a9\u06d2 \u0642\u0627\u0628\u0644 \u0628\u0646\u0627 \u0631\u06c1\u06cc \u06c1\u06d2 \u062c\u0633 \u0633\u06d2 \u0645\u0634\u0627\u0648\u0631\u062a \u0632\u06cc\u0627\u062f\u06c1 \u0645\u0648\u062b\u0631 \u06c1\u0648\u062a\u06cc \u062c\u0627 \u0631\u06c1\u06cc \u06c1\u06d2\u0688\u06cc\u062c\u06cc\u0679\u0644 \u0679\u06cc\u06a9 \u0641\u06cc\u0635\u0644\u06c1 \u0633\u0627\u0632\u06cc \u0627\u0648\u0631 \u0627\u0646\u062a\u0638\u0627\u0645 \u06a9\u06d2 \u0644\u0626\u06d2 \u0686\u06cc\u0646\u0644\u0632 \u06a9\u0648 \u0628\u06be\u06cc \u0648\u0633\u0639\u062a \u062f\u06cc\u062a\u06cc \u06c1\u06d2 \u0628\u06be\u06cc \u062d\u0627\u0644 \u06c1\u06cc \u0645\u06cc\u06ba \u062a\u06cc\u0633\u0631\u06d2 \u0627\u0646\u0679\u0631\u0646\u06cc\u0634\u0646\u0644 \u0688\u06cc\u0645\u0648\u06a9\u0631\u06cc\u0633\u06cc \u0641\u0648\u0631\u0645 \u06a9\u0627 \u0627\u0646\u0639\u0642\u0627\u062f \u0628\u06cc\u062c\u0646\u06af \u0645\u06cc\u06ba \u06a9\u06cc\u0627 \u06af\u06cc\u0627 \u062c\u0633 \u0645\u06cc\u06ba 70 \u0633\u06d2 \u0632\u0627\u0626\u062f \u0645\u0645\u0627\u0644\u06a9 \u0627\u0648\u0631 \u062e\u0637\u0648\u06ba \u06a9\u06d2 \u062a\u0642\u0631\u06cc\u0628\u0627\u064b 300 \u0639\u06c1\u062f\u06cc\u062f\u0627\u0631\u0648\u06ba \u0627\u0648\u0631 \u0645\u0627\u06c1\u0631\u06cc\u0646 \u0646\u06d2 \u0627\u0633 \u0628\u0627\u062a \u067e\u0631 \u062a\u0628\u0627\u062f\u0644\u06c1 \u062e\u06cc\u0627\u0644 \u06a9\u06cc\u0627 \u06a9\u06c1 \u062c\u0645\u06c1\u0648\u0631\u06cc\u062a \u06a9\u0648 \u062c\u062f\u06cc\u062f \u0688\u06cc\u062c\u06cc\u0679\u0644 \u062f\u0648\u0631 \u06a9\u06cc \u0636\u0631\u0648\u0631\u06cc\u0627\u062a \u0627\u0648\u0631 \u06af\u0644\u0648\u0628\u0644 \u0633\u0627\u0624\u062a\u06be \u06a9\u06cc \u0645\u0636\u0628\u0648\u0637\u06cc \u06a9\u06d2 \u0645\u0637\u0627\u0628\u0642 \u06a9\u06cc\u0633\u06d2 \u06c1\u0648\u0646\u0627 \u0686\u0627\u06c1\u0626\u06d2 \u06cc\u06c1 \u062a\u0631\u0642\u06cc \u06a9\u06d2 \u0644\u0626\u06d2 \u0627\u06cc\u06a9 \u0645\u062d\u0631\u06a9 \u06c1\u06d2 \u062c\u0648 \u0634\u06c1\u0631\u06cc\u0648\u06ba \u06a9\u0648 \u062c\u0645\u06c1\u0648\u0631\u06cc \u0639\u0645\u0644 \u06a9\u0648 \u062a\u0634\u06a9\u06cc\u0644 \u062f\u06cc\u0646\u06d2 \u06a9\u06d2 \u0644\u0626\u06d2 \u0628\u0627\u0627\u062e\u062a\u06cc\u0627\u0631 \u0628\u0646\u0627\u062a\u0627 \u06c1\u06d2 \u062c\u062f\u06cc\u062f \u067e\u0644\u06cc\u0679 \u0641\u0627\u0631\u0645\u0632 \u0627\u0648\u0631 \u0634\u0641\u0627\u0641 \u062d\u06a9\u0645\u0631\u0627\u0646\u06cc \u06a9\u06d2 \u0637\u0631\u06cc\u0642\u0648\u06ba \u0646\u06d2 \u0644\u0648\u06af\u0648\u06ba \u06a9\u06d2 \u0644\u0626\u06d2 \u0627\u067e\u0646\u06d2 \u062c\u0645\u06c1\u0648\u0631\u06cc \u062d\u0642\u0648\u0642 \u06a9\u0648 \u0628\u0631\u0627\u06c1 \u0631\u0627\u0633\u062a \u0627\u0633\u062a\u0639\u0645\u0627\u0644 \u06a9\u0631\u0646\u06d2 \u06a9\u06d2 \u0644\u0626\u06d2 \u0632\u06cc\u0627\u062f\u06c1 \u0622\u0633\u0627\u0646 \u0627\u0648\u0631 \u0645\u0648\u062b\u0631 \u0628\u0646\u0627 \u062f\u06cc\u0627 \u06c1\u06d2 \u0688\u06cc\u062c\u06cc\u0679\u0644 \u0679\u06cc\u06a9\u0646\u0627\u0644\u0648\u062c\u06cc \u0633\u06d2 \u06c1\u0679 \u06a9\u0631 \u0627\u06af\u0631 \u062a\u06cc\u0633\u0631\u06d2 \u0627\u0646\u0679\u0631\u0646\u06cc\u0634\u0646\u0644 \u0688\u06cc\u0645\u0648\u06a9\u0631\u06cc\u0633\u06cc \u0641\u0648\u0631\u0645 \u06a9\u06cc \u0645\u0632\u06cc\u062f \u0628\u0627\u062a \u06a9\u06cc \u062c\u0627\u0626\u06d2 \u062a\u0648 \u0641\u0648\u0631\u0645 \u06a9\u06d2 \u0634\u0631\u06a9\u0627\u0621 \u0646\u06d2 \u062c\u06c1\u0627\u06ba \u062a\u06a9\u0646\u06cc\u06a9\u06cc \u062a\u0631\u0642\u06cc \u06a9\u06cc \u0648\u06a9\u0627\u0644\u062a \u06a9\u06cc \u0648\u06c1\u0627\u06ba \u06cc\u06c1 \u0628\u06be\u06cc \u0648\u0627\u0636\u062d \u06a9\u06cc\u0627 \u06a9\u06c1 \u06a9\u0648\u0626\u06cc \u0627\u06cc\u06a9 \u0637\u0631\u0632 \u062c\u0645\u06c1\u0648\u0631\u06cc\u062a \u06cc\u0627 \u062c\u0645\u06c1\u0648\u0631\u06cc \u0645\u0627\u0688\u0644 \u0648\u0627\u062d\u062f \u0627\u0646\u062a\u062e\u0627\u0628 \u0646\u06c1\u06cc\u06ba \u06c1\u0648\u0646\u0627 \u0686\u0627\u06c1\u0626\u06d2 \u06cc\u06c1\u0627\u06ba \u0627\u0633 \u062d\u0642\u06cc\u0642\u062a \u06a9\u0648 \u0628\u06be\u06cc \u062a\u0633\u0644\u06cc\u0645 \u06a9\u0631\u0646\u0627 \u067e\u0691\u06d2 \u06af\u0627 \u06a9\u06c1 \u0688\u06cc\u062c\u06cc\u0679\u0644 \u0679\u06cc\u06a9\u0646\u0627\u0644\u0648\u062c\u06cc \u06a9\u0627 \u0627\u062b\u0631 \u0645\u062d\u0636 \u06a9\u0627\u0631\u06a9\u0631\u062f\u06af\u06cc \u06a9\u06d2 \u0641\u0648\u0627\u0626\u062f \u0633\u06d2 \u06a9\u06c1\u06cc\u06ba \u0632\u06cc\u0627\u062f\u06c1 \u06c1\u06d2 \u06cc\u06c1 \u0633\u0648\u0634\u0644 \u0645\u06cc\u0688\u06cc\u0627 \u067e\u0644\u06cc\u0679 \u0641\u0627\u0631\u0645\u0632 \u0627\u0648\u0631 \u0622\u0646 \u0644\u0627\u0626\u0646 \u0631\u067e\u0648\u0631\u0679\u0646\u06af \u0633\u0633\u0679\u0645 \u06a9\u06cc \u0628\u062f\u0648\u0644\u062a \u0633\u0631\u06a9\u0627\u0631\u06cc \u06a9\u0627\u0631\u0631\u0648\u0627\u0626\u06cc\u0648\u06ba \u0645\u06cc\u06ba \u0634\u0641\u0627\u0641\u06cc\u062a \u0627\u0648\u0631 \u0627\u062d\u062a\u0633\u0627\u0628 \u06a9\u0648 \u0641\u0631\u0648\u063a \u062f\u06d2 \u0631\u06c1\u06cc \u06c1\u06d2 \u0644\u0648\u06af \u062d\u06a9\u0648\u0645\u062a \u06a9\u06cc \u062c\u0627\u0646\u0628 \u0633\u06d2 \u0634\u0631\u0648\u0639 \u06a9\u06cc\u06d2 \u06af\u0626\u06d2 \u0645\u062e\u062a\u0644\u0641 \u0622\u0646 \u0644\u0627\u0626\u0646 \u067e\u0644\u06cc\u0679 \u0641\u0627\u0631\u0645\u0632 \u062c\u06cc\u0633\u06d2 \u0688\u06cc\u062c\u06cc\u0679\u0644 \u0631\u0627\u0626\u06d2 \u0639\u0627\u0645\u06c1 \u06a9\u06d2 \u0686\u06cc\u0646\u0644\u0632 \u0627\u0648\u0631 \u0622\u0646 \u0644\u0627\u0626\u0646 \u0633\u0631\u0648\u06d2 \u06a9\u06d2 \u0630\u0631\u06cc\u0639\u06d2 \u0627\u067e\u0646\u06cc \u0631\u0627\u0626\u06d2 \u06a9\u0627 \u0627\u0638\u06c1\u0627\u0631 \u06a9\u0631\u0633\u06a9\u062a\u06d2 \u06c1\u06cc\u06ba \u0627\u0648\u0631 \u0645\u0634\u0627\u0648\u0631\u062a \u0645\u06cc\u06ba \u0645\u0634\u063a\u0648\u0644 \u06c1\u0648\u0633\u06a9\u062a\u06d2 \u06c1\u06cc\u06ba \u0639\u0648\u0627\u0645\u06cc \u062c\u0627\u0646\u0686 \u067e\u0691\u062a\u0627\u0644 \u06a9\u06cc \u0646\u06af\u0631\u0627\u0646\u06cc \u0645\u06cc\u06ba \u0627\u0646\u062a\u0638\u0627\u0645\u06cc \u0639\u0645\u0644 \u06a9\u0627 \u0645\u0639\u0627\u0626\u0646\u06c1 \u0628\u06be\u06cc \u0634\u0627\u0645\u0644 \u06c1\u06d2 \u062c\u0648 \u0627\u0633 \u0628\u0627\u062a \u06a9\u0648 \u06cc\u0642\u06cc\u0646\u06cc \u0628\u0646\u0627\u062a\u0627 \u06c1\u06d2 \u06a9\u06c1 \u0633\u0631\u06a9\u0627\u0631\u06cc \u0633\u0631\u06af\u0631\u0645\u06cc\u0627\u06ba \u0627\u062e\u0644\u0627\u0642\u06cc \u0645\u0639\u06cc\u0627\u0631\u0627\u062a \u067e\u0631 \u0645\u06a9\u0645\u0644 \u0639\u0645\u0644 \u067e\u06cc\u0631\u0627 \u06c1\u0648\u06ba \u0627\u0648\u0631 \u0639\u0648\u0627\u0645 \u06a9\u0648 \u062c\u0648\u0627\u0628\u062f\u06c1 \u06c1\u0648\u06ba \u06cc\u06c1 \u0627\u0646\u0636\u0645\u0627\u0645 \u062d\u06a9\u0648\u0645\u062a\u06cc \u0627\u062f\u0627\u0631\u0648\u06ba \u0631\u0627\u0626\u06d2 \u062f\u06c1\u0646\u062f\u06af\u0627\u0646 \u0627\u0648\u0631 \u0627\u0645\u06cc\u062f\u0648\u0627\u0631\u0648\u06ba \u06a9\u06d2 \u062f\u0631\u0645\u06cc\u0627\u0646 \u062a\u0639\u0627\u0645\u0644 \u06a9\u0648 \u0628\u06be\u06cc \u0628\u0691\u06be\u0627\u062a\u0627 \u06c1\u06d2 \u062d\u062a\u0645\u06cc \u0637\u0648\u0631 \u067e\u0631 \u0627\u0633 \u0628\u0627\u062a \u06a9\u0648 \u06cc\u0642\u06cc\u0646\u06cc \u0628\u0646\u0627\u062a\u0627 \u06c1\u06d2 \u06a9\u06c1 \u062c\u0645\u06c1\u0648\u0631\u06cc \u0627\u0646\u062a\u062e\u0627\u0628\u0627\u062a \u0639\u0648\u0627\u0645 \u06a9\u06cc \u0636\u0631\u0648\u0631\u06cc\u0627\u062a \u06a9\u06d2 \u0645\u0637\u0627\u0628\u0642 \u0632\u06cc\u0627\u062f\u06c1 \u062c\u0648\u0627\u0628\u062f\u06c1 \u06c1\u0648\u06ba \u0646\u062a\u06cc\u062c\u062a\u0627\u064b \u0633\u0648\u0634\u0644 \u06af\u0648\u0631\u0646\u0646\u0633 \u06a9\u06d2 \u0644\u0626\u06d2 \u0641\u06cc\u0688 \u0628\u06cc\u06a9 \u0645\u06cc\u06a9\u0627\u0646\u0632\u0645 \u0646\u06d2 \u062f\u0631\u062c\u06c1 \u0628\u0646\u062f\u06cc \u06a9\u06cc \u0633\u0637\u062d\u0648\u06ba \u06a9\u0648 \u06a9\u0645 \u06a9\u0631\u062f\u06cc\u0627 \u06c1\u06d2 \u062c\u0633 \u0633\u06d2 \u0631\u062f\u0639\u0645\u0644 \u0627\u0648\u0631 \u06a9\u0627\u0631\u06a9\u0631\u062f\u06af\u06cc \u0645\u06cc\u06ba \u062a\u06cc\u0632\u06cc \u0633\u06d2 \u0627\u0636\u0627\u0641\u06c1 \u06c1\u0648\u0627 \u06c1\u06d2 \u0645\u063a\u0631\u0628\u06cc \u0631\u06cc\u0627\u0633\u062a\u0648\u06ba \u06a9\u0648 \u0627\u06cc\u0633\u06cc \u0633\u06cc\u0627\u0633\u06cc \u0634\u06a9\u0644\u0648\u06ba \u06a9\u0648 \u0642\u0628\u0648\u0644 \u06a9\u0631\u0646\u0627 \u0686\u0627\u06c1\u06cc\u06d2 \u062c\u0648 \u0627\u0646 \u06a9\u06cc \u0627\u067e\u0646\u06cc \u0633\u06cc\u0627\u0633\u06cc \u0634\u06a9\u0644\u0648\u06ba \u0633\u06d2 \u0645\u062e\u062a\u0644\u0641 \u06c1\u06cc\u06ba \u0633\u0627\u062a\u06be \u0633\u0627\u062a\u06be \u06cc\u06c1 \u0636\u0631\u0648\u0631\u06cc \u06c1\u06d2 \u06a9\u06c1 \u062f\u0646\u06cc\u0627 \u06a9\u0648 \u0627\u06cc\u06a9 \u0627\u06cc\u0633\u06cc \u06a9\u0645\u06cc\u0648\u0646\u0679\u06cc \u06a9\u06d2 \u0646\u0642\u0637\u06c1 \u0646\u0638\u0631 \u0633\u06d2 \u062f\u06cc\u06a9\u06be\u0627 \u062c\u0627\u0626\u06d2 \u062c\u0633 \u06a9\u06cc \u0627\u0646\u0633\u0627\u0646\u06cc\u062a \u06a9\u06d2 \u0644\u0626\u06d2 \u0645\u0634\u062a\u0631\u06a9\u06c1 \u062a\u0642\u062f\u06cc\u0631 \u06c1\u0648 \u0648\u0633\u06cc\u0639 \u062a\u0646\u0627\u0638\u0631 \u0645\u06cc\u06ba \u0622\u062c \u0688\u06cc\u062c\u06cc\u0679\u0644 \u0679\u06cc\u06a9\u0646\u0627\u0644\u0648\u062c\u06cc \u0686\u06cc\u0646 \u06a9\u06cc \u062c\u0645\u06c1\u0648\u0631\u06cc \u06af\u0648\u0631\u0646\u0646\u0633 \u06a9\u06d2 \u0633\u0627\u062a\u06be \u06af\u06c1\u0631\u0627\u0626\u06cc \u0633\u06d2 \u062c\u0691\u06cc \u06c1\u0648\u0626\u06cc \u06c1\u06d2\"\r\n\r\n\r\nsummarizer = TextSummarization()\r\nratio = 0.01  # Summary ratio (e.g., keep 1% of the sentences or adjust according to your preference: 10%, 20%, 40%, etc. (1% to 100%))\r\nsummary = summarizer. summarize_unstructured_text(text, ratio)\r\nprint(summary)\r\n```\r\n## output \r\n```python\r\n\r\n# output => \u062c\u0627\u0646\u0628 \u0633\u06d2 \u062a\u062c\u0648\u06cc\u0632 \u06a9\u0631\u062f\u06c1 \u0628\u06cc\u0646 \u0627\u0644\u0627\u0642\u0648\u0627\u0645\u06cc \u0627\u0642\u062f\u0627\u0645\u0627\u062a \u0628\u0634\u0645\u0648\u0644 \u0628\u06cc\u0644\u0679 \u0627\u06cc\u0646\u0688 \u0631\u0648\u0688 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u06af\u0644\u0648\u0628\u0644 \u0688\u06cc\u0648\u0644\u067e\u0645\u0646\u0679 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u06af\u0644\u0648\u0628\u0644 \u0633\u06cc\u06a9\u06cc\u0648\u0631\u0679\u06cc \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u0627\u0648\u0631 \u06af\u0644\u0648\u0628\u0644 \u0633\u0648\u0644\u0627\u0626\u0632\u06cc\u0634\u0646 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u0645\u0633\u0627\u0628\u0642\u062a \u0627\u0648\u0631 \u0628\u0627\u0644\u0627\u062f\u0633\u062a\u06cc \u06a9\u06d2 \u0646\u0642\u0637\u06c1 \u0646\u0638\u0631 \u0633\u06d2 \u06cc\u06a9\u0633\u0631 \u062c\u062f\u0627\u06af\u0627\u0646\u06c1 \u0646\u0642\u0637\u06c1 \u0646\u0638\u0631 \u0641\u0631\u0627\u06c1\u0645 \u06a9\u0631\u062a\u06d2 \u06c1\u06cc\u06ba\u06d4\r\n```\r\n### Text-to-Speech & Speech-to-Text\r\n\r\nThe `UrduSpeechConverter` class provides functionality to convert Urdu text to speech and vice versa using the Google Text-to-Speech (gTTS) API and SpeechRecognition library. This library supports converting Urdu text to audio files (MP3, WAV, or OGG formats) and recognizing Urdu speech from audio files (MP3 or WAV formats). Before using this library, ensure that `ffmpeg` is installed on your computer, as it is required for audio file manipulation.\r\n\r\n## Installation Requirements\r\n\r\nBefore using the `UrduSpeechConverter` library, ensure you have `ffmpeg` installed on your computer and that it is accessible via the command line. `ffmpeg` is essential for converting audio files to different formats, which is necessary for speech recognition tasks. If `ffmpeg` is not installed, please install it and set the environment variable path accordingly.\r\n\r\n### 1. `text_to_speech(text, output_file_name, format)`\r\n\r\nThis function converts Urdu text to speech and saves the generated speech as an audio file in the specified format.\r\n\r\n**Parameters:**\r\n- `text`: The Urdu text to convert to speech.\r\n- `output_file_folder`: The of the folder where you want to save audio.\r\n- `output_file_name`: The name of the output audio file (without file extension).\r\n- `format`: The format for the output audio file (`'wav'`, `'mp3'`, or `'ogg'`).\r\n\r\n**Example Usage  1:**\r\n```python\r\nfrom LughaatNLP import  UrduSpeech\r\n\r\nconverter = UrduSpeech()\r\n\r\ntext = '\u062c\u0627\u0646\u0628 \u0633\u06d2 \u062a\u062c\u0648\u06cc\u0632 \u06a9\u0631\u062f\u06c1 \u0628\u06cc\u0646 \u0627\u0644\u0627\u0642\u0648\u0627\u0645\u06cc \u0627\u0642\u062f\u0627\u0645\u0627\u062a \u0628\u0634\u0645\u0648\u0644 \u0628\u06cc\u0644\u0679 \u0627\u06cc\u0646\u0688 \u0631\u0648\u0688 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648\u060c \u06af\u0644\u0648\u0628\u0644 \u0688\u06cc\u0648\u0644\u067e\u0645\u0646\u0679 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648\u060c \u06af\u0644\u0648\u0628\u0644 \u0633\u06cc\u06a9\u06cc\u0648\u0631\u0679\u06cc \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u0627\u0648\u0631 \u06af\u0644\u0648\u0628\u0644 \u0633\u0648\u0644\u0627\u0626\u0632\u06cc\u0634\u0646 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u0645\u0633\u0627\u0628\u0642\u062a \u0627\u0648\u0631 \u0628\u0627\u0644\u0627\u062f\u0633\u062a\u06cc \u06a9\u06d2 \u0646\u0642\u0637\u06c1 \u0646\u0638\u0631 \u0633\u06d2 \u06cc\u06a9\u0633\u0631 \u062c\u062f\u0627\u06af\u0627\u0646\u06c1 \u0646\u0642\u0637\u06c1 \u0646\u0638\u0631 \u0641\u0631\u0627\u06c1\u0645 \u06a9\u0631\u062a\u06d2 \u06c1\u06cc\u06ba'\r\n\r\nconverter.text_to_speech(text, output_file_name='output', format='wav')\r\nconverter.text_to_speech(text, output_file_name='output', format='mp3')\r\nconverter.text_to_speech(text, output_file_name='output', format='ogg')\r\n```\r\n\r\n## Output \r\n```python\r\n# output =>\r\n# (save on the same directory) or \r\nSpeech saved to output.wav\r\nSpeech saved to output.mp3\r\nSpeech saved to output.ogg\r\n```\r\nYou can also specify an output folder path:\r\n\r\n**Example Usage  2:**\r\n```python\r\noutput_folder = r'C:\\Users\\YourName\\Desktop\\Testing\\your-path'\r\n\r\nconverter.text_to_speech(text, output_folder, output_file_name='output', format='wav')\r\nconverter.text_to_speech(text, output_folder, output_file_name='output', format='mp3')\r\nconverter.text_to_speech(text, output_folder, output_file_name='output', format='ogg')\r\n```\r\n\r\n## Output \r\n```python\r\n# output =>\r\nSpeech saved to CC:\\Users\\YourName\\Desktop\\Testing\\your-path\\output.wav\r\nSpeech saved to CC:\\Users\\YourName\\Desktop\\Testing\\your-path\\output.mp3\r\nSpeech saved to CC:\\Users\\YourName\\Desktop\\Testing\\your-path\\output.ogg\r\n```\r\n\r\n### 2. `speech_to_text(audio_file, format='mp3')`\r\n\r\nThis function converts Urdu speech to text and print the generated text from audio file in the specified format.\r\n\r\n**Parameters:**\r\n- `audio_file`: Give audio file path.\r\n- `format`: The format for the output audio file (`'wav'`, `'mp3'`, or `'ogg'`).\r\n\r\n**Example Usage  1:**\r\n```python\r\nfrom LughaatNLP import  UrduSpeech\r\n\r\nconverter = UrduSpeech()\r\n\r\naudio_file = \"output.mp3\"\r\ntext = converter.speech_to_text(audio_file, format='mp3')\r\nprint(text)\r\n```\r\n\r\n## Output \r\n```python\r\n# output =>\r\n\u062c\u0627\u0646\u0628 \u0633\u06d2 \u062a\u062c\u0648\u06cc\u0632 \u06a9\u0631\u062f\u06c1 \u0628\u06cc\u0646 \u0627\u0644\u0627\u0642\u0648\u0627\u0645\u06cc \u0627\u0642\u062f\u0627\u0645\u0627\u062a \u0628\u0634\u0645\u0648\u0644 \u0628\u06cc\u0644\u0679\u0646 \u0631\u0648\u0688 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u06af\u0644\u0648\u0628\u0644 \u0688\u06cc\u0648\u0644\u067e\u0645\u0646\u0679 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u06af\u0644\u0648\u0628\u0644 \u0633\u06a9\u06cc\u0648\u0631\u0679\u06cc \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u0627\u0648\u0631 \u06af\u0644\u0648\u0628\u0644 \u0633\u0648\u0644\u0627\u0626\u06cc\u0632\u06cc\u0634\u0646 \u0627\u0646\u06cc\u0634\u06cc \u0627\u06cc\u0679\u0648 \u0645\u0633\u0627\u0648\u062a \u0627\u0648\u0631 \u0628\u0627\u0644\u0627\u062f\u0633\u062a\u06cc \u06a9\u06d2 \u0646\u0642\u0637\u06c1 \u0646\u0638\u0631 \u0633\u06d2 \u06cc\u06a9\u0633\u0631 \u062c\u062f\u0627\u06af\u0627\u0646\u06c1 \u0646\u0642\u0637\u06c1 \u0646\u0638\u0631 \u0641\u0631\u0627\u06c1\u0645 \u06a9\u0631\u062a\u06d2 \u06c1\u06cc\u06ba\r\n```\r\n\r\n##\tFuture Work\r\n\r\nThe future roadmap for LughaatNLP includes the\r\nfollowing features: - Urdu language translator - Urdu chatbot models - Text-to-speech and speech-to-text capabilities for Urdu - Urdu text summarization\r\nTo implement these features, resources such as servers and GPU for training are required. Therefore, Muhammad Noman is collecting funds to support the development and maintenance of this library.\r\n\r\n## Contributing\r\n\r\nThis library was created by Muhammad Noman, a student at Iqra University. You can reach him via email at muhammadnomanshafiq76@gmail.com or connect with him on [LinkedIn](https://www.linkedin.com/in/muhammad-noman76/).\r\n\r\nIf you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request on the [GitHub repository](https://github.com/MuhammadNoman76/LughaatNLP).\r\n\r\n## License\r\n\r\nThis project is licensed under the [MIT License](https://drive.google.com/file/d/1YcxoYrL3atF7U7vYKK_KFiKvDF-YiywT/view?usp=drive_link).\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python package for natural language processing tasks for the Urdu language, including normalization, part-of-speech (POS) tagging, named entity recognition (NER), stemming, lemmatization, tokenization, and stopword removal , text-to-speech , speech-to-text, summarization.",
    "version": "1.0.6",
    "project_urls": {
        "Geeksforgeeks": "https://www.geeksforgeeks.org/lughaatnlp-a-powerful-urdu-language-preprocessing-library/",
        "Google Colab": "https://colab.research.google.com/drive/1lLaUBiFq61-B7GyQ0wdNg9FyLCraAcDU?usp=sharing",
        "Homepage": "https://github.com/MuhammadNoman76/LughaatNLP",
        "Issue Tracker": "https://github.com/MuhammadNoman76/LughaatNLP/issues",
        "LinkedIn": "https://www.linkedin.com/in/muhammad-noman76",
        "LughaatNLP Blog": "https://lughaatnlp.blogspot.com/2024/04/mastering-urdu-text-processing.html",
        "Medium": "https://medium.com/@muhammadnomanshafiq76/introducing-lughaatnlp-a-powerful-urdu-language-preprocessing-library-488af74d3dde",
        "Source": "https://github.com/MuhammadNoman76/LughaatNLP",
        "YouTube Channel": "https://www.youtube.com/playlist?list=PL4tcmUwDtJEIHZhAZ3XP9U6ZJzaS4RFbd"
    },
    "split_keywords": [
        "urdu",
        "nlp",
        "natural",
        "language",
        "processing",
        "text",
        "processing",
        "tokenization",
        "stemming",
        "lemmatization",
        "stopwords",
        "morphological-analysis",
        "nlp",
        "urdu",
        "lughaatnlp",
        "urdunlp",
        "urdunlp",
        "urduhack",
        "stanza",
        "natural-language-processing",
        "text-processing",
        "language-processing",
        "preprocessing",
        "stemming",
        "lemmatization",
        "tokenization",
        "stopwords"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d0bf97f513569f3b8021ff7e2c8dce9a4f46ec298a22c0bc76df8a70328fb574",
                "md5": "534c4430ca7ec2cabfcd76679e0fa8fa",
                "sha256": "481fe395c8f917b9a399265d6410863016feb871634c116e1bf3d00e93800b51"
            },
            "downloads": -1,
            "filename": "LughaatNLP-1.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "534c4430ca7ec2cabfcd76679e0fa8fa",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 69804452,
            "upload_time": "2024-05-19T06:09:52",
            "upload_time_iso_8601": "2024-05-19T06:09:52.125903Z",
            "url": "https://files.pythonhosted.org/packages/d0/bf/97f513569f3b8021ff7e2c8dce9a4f46ec298a22c0bc76df8a70328fb574/LughaatNLP-1.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "411da26dcc1e065b2e721ef008d7c1e02d98536b21e335ac95a7e7b97c9952ab",
                "md5": "5007e1854b09d31a9dd1cd8c048afd39",
                "sha256": "6ba5c89f0e6a374777d61fa85bd359db08aac634b04752461fe76b63c39be386"
            },
            "downloads": -1,
            "filename": "LughaatNLP-1.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "5007e1854b09d31a9dd1cd8c048afd39",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 69794671,
            "upload_time": "2024-05-19T06:31:13",
            "upload_time_iso_8601": "2024-05-19T06:31:13.563241Z",
            "url": "https://files.pythonhosted.org/packages/41/1d/a26dcc1e065b2e721ef008d7c1e02d98536b21e335ac95a7e7b97c9952ab/LughaatNLP-1.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-19 06:31:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MuhammadNoman76",
    "github_project": "LughaatNLP",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "lughaatnlp"
}
        
Elapsed time: 0.26097s