bnunicodenormalizer

Name	bnunicodenormalizer JSON
Version	0.1.6 JSON
	download
home_page	https://github.com/mnansary/bnUnicodeNormalizer
Summary	Bangla Unicode Normalization Toolkit
upload_time	2023-04-15 15:18:52
maintainer
docs_url	None
author	Bengali.AI
requires_python
license	MIT
keywords	bangla unicode text normalization indic
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # bnUnicodeNormalizer
Bangla Unicode Normalization for word normalization
# install
```python
pip install bnunicodenormalizer
```
# useage
**initialization and cleaning**
```python
# import
from bnunicodenormalizer import Normalizer 
from pprint import pprint
# initialize
bnorm=Normalizer()
# normalize
word = 'াটোবাকো'
result=bnorm(word)
print(f"Non-norm:{word}; Norm:{result['normalized']}")
print("--------------------------------------------------")
pprint(result)
```
> output 

```
Non-norm:াটোবাকো; Norm:টোবাকো
--------------------------------------------------
{'given': 'াটোবাকো',
 'normalized': 'টোবাকো',
 'ops': [{'after': 'টোবাকো',
          'before': 'াটোবাকো',
          'operation': 'InvalidUnicode'}]}
```

**call to the normalizer returns a dictionary in the following format**

* ```given``` = provided text
* ```normalized``` = normalized text (gives None if during the operation length of the text becomes 0)
* ```ops``` = list of operations (dictionary) that were executed in given text to create normalized text
*  each dictionary in ops has:
    * ```operation```: the name of the operation / problem in given text
    * ```before``` : what the text looked like before the specific operation
    * ```after```  : what the text looks like after the specific operation  

**allow to use english text**

```python
# initialize without english (default)
norm=Normalizer()
print("without english:",norm("ASD123")["normalized"])
# --> returns None
norm=Normalizer(allow_english=True)
print("with english:",norm("ASD123")["normalized"])

```
> output

```
without english: None
with english: ASD123
```

 

# Initialization: Bangla Normalizer

```python
'''
    initialize a normalizer
            args:
                allow_english                   :   allow english letters numbers and punctuations [default:False]
                keep_legacy_symbols             :   legacy symbols will be considered as valid unicodes[default:False]
                                                    '৺':Isshar 
                                                    '৻':Ganda
                                                    'ঀ':Anji (not '৭')
                                                    'ঌ':li
                                                    'ৡ':dirgho li
                                                    'ঽ':Avagraha
                                                    'ৠ':Vocalic Rr (not 'ঋ')
                                                    '৲':rupi
                                                    '৴':currency numerator 1
                                                    '৵':currency numerator 2
                                                    '৶':currency numerator 3
                                                    '৷':currency numerator 4
                                                    '৸':currency numerator one less than the denominator
                                                    '৹':Currency Denominator Sixteen
                legacy_maps                     :   a dictionay for changing legacy symbols into a more used  unicode 
                                                    a default legacy map is included in the language class as well,
                                                    legacy_maps={'ঀ':'৭',
                                                                'ঌ':'৯',
                                                                'ৡ':'৯',
                                                                '৵':'৯',
                                                                '৻':'ৎ',
                                                                'ৠ':'ঋ',
                                                                'ঽ':'ই'}
                                            
                                                    pass-   
                                                    * legacy_maps=None; for keeping the legacy symbols as they are
                                                    * legacy_maps="default"; for using the default legacy map
                                                    * legacy_maps=custom dictionary(type-dict) ; which will map your desired legacy symbol to any of symbol you want
                                                        * the keys in the custiom dicts must belong to any of the legacy symbols
                                                        * the values in the custiom dicts must belong to either vowels,consonants,numbers or diacritics  
                                                        vowels         =   ['অ', 'আ', 'ই', 'ঈ', 'উ', 'ঊ', 'ঋ', 'এ', 'ঐ', 'ও', 'ঔ']
                                                        consonants     =   ['ক', 'খ', 'গ', 'ঘ', 'ঙ', 'চ', 'ছ','জ', 'ঝ', 'ঞ', 
                                                                            'ট', 'ঠ', 'ড', 'ঢ', 'ণ', 'ত', 'থ', 'দ', 'ধ', 'ন', 
                                                                            'প', 'ফ', 'ব', 'ভ', 'ম', 'য', 'র', 'ল', 'শ', 'ষ', 
                                                                            'স', 'হ','ড়', 'ঢ়', 'য়','ৎ']    
                                                        numbers        =    ['০', '১', '২', '৩', '৪', '৫', '৬', '৭', '৮', '৯']
                                                        vowel_diacritics       =   ['া', 'ি', 'ী', 'ু', 'ূ', 'ৃ', 'ে', 'ৈ', 'ো', 'ৌ']
                                                        consonant_diacritics   =   ['ঁ', 'ং', 'ঃ']
    
                                                        > for example you may want to map 'ঽ':Avagraha as 'হ' based on visual similiarity 
                                                            (default:'ই')

                ** legacy contions: keep_legacy_symbols and legacy_maps operates as follows 
                    case-1) keep_legacy_symbols=True and legacy_maps=None
                        : all legacy symbols will be considered valid unicodes. None of them will be changed
                    case-2) keep_legacy_symbols=True and legacy_maps=valid dictionary example:{'ঀ':'ক'}
                        : all legacy symbols will be considered valid unicodes. Only 'ঀ' will be changed to 'ক' , others will be untouched
                    case-3) keep_legacy_symbols=False and legacy_maps=None
                        : all legacy symbols will be removed
                    case-4) keep_legacy_symbols=False and legacy_maps=valid dictionary example:{'ঽ':'ই','ৠ':'ঋ'}
                        : 'ঽ' will be changed to 'ই' and 'ৠ' will be changed to 'ঋ'. All other legacy symbols will be removed
'''

```

```python
my_legacy_maps={'ঌ':'ই',
                'ৡ':'ই',
                '৵':'ই',
                'ৠ':'ই',
                'ঽ':'ই'}
text="৺,৻,ঀ,ঌ,ৡ,ঽ,ৠ,৲,৴,৵,৶,৷,৸,৹"
# case 1
norm=Normalizer(keep_legacy_symbols=True,legacy_maps=None)
print("case-1 normalized text:  ",norm(text)["normalized"])
# case 2
norm=Normalizer(keep_legacy_symbols=True,legacy_maps=my_legacy_maps)
print("case-2 normalized text:  ",norm(text)["normalized"])
# case 2-defalut
norm=Normalizer(keep_legacy_symbols=True)
print("case-2 default normalized text:  ",norm(text)["normalized"])

# case 3
norm=Normalizer(keep_legacy_symbols=False,legacy_maps=None)
print("case-3 normalized text:  ",norm(text)["normalized"])
# case 4
norm=Normalizer(keep_legacy_symbols=False,legacy_maps=my_legacy_maps)
print("case-4 normalized text:  ",norm(text)["normalized"])
# case 4-defalut
norm=Normalizer(keep_legacy_symbols=False)
print("case-4 default normalized text:  ",norm(text)["normalized"])
```

> output

```
case-1 normalized text:   ৺,৻,ঀ,ঌ,ৡ,ঽ,ৠ,৲,৴,৵,৶,৷,৸,৹
case-2 normalized text:   ৺,৻,ঀ,ই,ই,ই,ই,৲,৴,ই,৶,৷,৸,৹
case-2 default normalized text:   ৺,৻,ঀ,ঌ,ৡ,ঽ,ৠ,৲,৴,৵,৶,৷,৸,৹
case-3 normalized text:   ,,,,,,,,,,,,,
case-4 normalized text:   ,,,ই,ই,ই,ই,,,ই,,,,
case-4 default normalized text:   ,,,,,,,,,,,,, 
```

# Operations
* base operations available for all indic languages:

```python
self.word_level_ops={"LegacySymbols"   :self.mapLegacySymbols,
                    "BrokenDiacritics" :self.fixBrokenDiacritics}

self.decomp_level_ops={"BrokenNukta"             :self.fixBrokenNukta,
                    "InvalidUnicode"             :self.cleanInvalidUnicodes,
                    "InvalidConnector"           :self.cleanInvalidConnector,
                    "FixDiacritics"              :self.cleanDiacritics,
                    "VowelDiacriticAfterVowel"   :self.cleanVowelDiacriticComingAfterVowel}
```
* extensions for bangla

```python
self.decomp_level_ops["ToAndHosontoNormalize"]             =       self.normalizeToandHosonto

# invalid folas 
self.decomp_level_ops["NormalizeConjunctsDiacritics"]      =       self.cleanInvalidConjunctDiacritics

# complex root cleanup 
self.decomp_level_ops["ComplexRootNormalization"]          =       self.convertComplexRoots

```

# Normalization Problem Examples
**In all examples (a) is the non-normalized form and (b) is the normalized form**

* Broken diacritics:
``` 
# Example-1: 
(a)'আরো'==(b)'আরো' ->  False 
    (a) breaks as:['আ', 'র', 'ে', 'া']
    (b) breaks as:['আ', 'র', 'ো']
# Example-2:
(a)পৌঁছে==(b)পৌঁছে ->  False
    (a) breaks as:['প', 'ে', 'ৗ', 'ঁ', 'ছ', 'ে']
    (b) breaks as:['প', 'ৌ', 'ঁ', 'ছ', 'ে']
# Example-3:
(a)সংস্কৄতি==(b)সংস্কৃতি ->  False
    (a) breaks as:['স', 'ং', 'স', '্', 'ক', 'ৄ', 'ত', 'ি']
    (b) breaks as:['স', 'ং', 'স', '্', 'ক', 'ৃ', 'ত', 'ি']
```
* Nukta Normalization:

```        
Example-1:
(a)কেন্দ্রীয়==(b)কেন্দ্রীয় ->  False
    (a) breaks as:['ক', 'ে', 'ন', '্', 'দ', '্', 'র', 'ী', 'য', '়']
    (b) breaks as:['ক', 'ে', 'ন', '্', 'দ', '্', 'র', 'ী', 'য়']
Example-2:
(a)রযে়ছে==(b)রয়েছে ->  False
    (a) breaks as:['র', 'য', 'ে', '়', 'ছ', 'ে']
    (b) breaks as:['র', 'য়', 'ে', 'ছ', 'ে']
Example-3: 
(a)জ়ন্য==(b)জন্য ->  False
    (a) breaks as:['জ', '়', 'ন', '্', 'য']
    (b) breaks as:['জ', 'ন', '্', 'য']
``` 
* Invalid hosonto
```
# Example-1:
(a)দুই্টি==(b)দুইটি-->False
    (a) breaks as ['দ', 'ু', 'ই', '্', 'ট', 'ি']
    (b) breaks as ['দ', 'ু', 'ই', 'ট', 'ি']
# Example-2:
(a)এ্তে==(b)এতে-->False
    (a) breaks as ['এ', '্', 'ত', 'ে']
    (b) breaks as ['এ', 'ত', 'ে']
# Example-3:
(a)নেট্ওয়ার্ক==(b)নেটওয়ার্ক-->False
    (a) breaks as ['ন', 'ে', 'ট', '্', 'ও', 'য়', 'া', 'র', '্', 'ক']
    (b) breaks as ['ন', 'ে', 'ট', 'ও', 'য়', 'া', 'র', '্', 'ক']
# Example-4:
(a)এস্আই==(b)এসআই-->False
    (a) breaks as ['এ', 'স', '্', 'আ', 'ই']
    (b) breaks as ['এ', 'স', 'আ', 'ই']
# Example-5: 
(a)'চু্ক্তি'==(b)'চুক্তি' ->  False 
    (a) breaks as:['চ', 'ু', '্', 'ক', '্', 'ত', 'ি']
    (b) breaks as:['চ', 'ু','ক', '্', 'ত', 'ি']
# Example-6:
(a)'যু্ক্ত'==(b)'যুক্ত' ->   False
    (a) breaks as:['য', 'ু', '্', 'ক', '্', 'ত']
    (b) breaks as:['য', 'ু', 'ক', '্', 'ত']
# Example-7:
(a)'কিছু্ই'==(b)'কিছুই' ->   False
    (a) breaks as:['ক', 'ি', 'ছ', 'ু', '্', 'ই']
    (b) breaks as:['ক', 'ি', 'ছ', 'ু','ই']
```

* To+hosonto: 

``` 
# Example-1:
(a)বুত্পত্তি==(b)বুৎপত্তি-->False
    (a) breaks as ['ব', 'ু', 'ত', '্', 'প', 'ত', '্', 'ত', 'ি']
    (b) breaks as ['ব', 'ু', 'ৎ', 'প', 'ত', '্', 'ত', 'ি']
# Example-2:
(a)উত্স==(b)উৎস-->False
    (a) breaks as ['উ', 'ত', '্', 'স']
    (b) breaks as ['উ', 'ৎ', 'স']
```

* Unwanted doubles(consecutive doubles):

```
# Example-1: 
(a)'যুুদ্ধ'==(b)'যুদ্ধ' ->  False 
    (a) breaks as:['য', 'ু', 'ু', 'দ', '্', 'ধ']
    (b) breaks as:['য', 'ু', 'দ', '্', 'ধ']
# Example-2:
(a)'দুুই'==(b)'দুই' ->   False
    (a) breaks as:['দ', 'ু', 'ু', 'ই']
    (b) breaks as:['দ', 'ু', 'ই']
# Example-3:
(a)'প্রকৃৃতির'==(b)'প্রকৃতির' ->   False
    (a) breaks as:['প', '্', 'র', 'ক', 'ৃ', 'ৃ', 'ত', 'ি', 'র']
    (b) breaks as:['প', '্', 'র', 'ক', 'ৃ', 'ত', 'ি', 'র']
# Example-4:
(a)আমাকোা==(b)'আমাকো'->   False
    (a) breaks as:['আ', 'ম', 'া', 'ক', 'ে', 'া', 'া']
    (b) breaks as:['আ', 'ম', 'া', 'ক', 'ো']
```

* Vowwels and modifier followed by vowel diacritics:

```
# Example-1:
(a)উুলু==(b)উলু-->False
    (a) breaks as ['উ', 'ু', 'ল', 'ু']
    (b) breaks as ['উ', 'ল', 'ু']
# Example-2:
(a)আর্কিওোলজি==(b)আর্কিওলজি-->False
    (a) breaks as ['আ', 'র', '্', 'ক', 'ি', 'ও', 'ো', 'ল', 'জ', 'ি']
    (b) breaks as ['আ', 'র', '্', 'ক', 'ি', 'ও', 'ল', 'জ', 'ি']
# Example-3:
(a)একএে==(b)একত্রে-->False
    (a) breaks as ['এ', 'ক', 'এ', 'ে']
    (b) breaks as ['এ', 'ক', 'ত', '্', 'র', 'ে']
```  

* Repeated folas:

```
# Example-1:
(a)গ্র্রামকে==(b)গ্রামকে-->False
    (a) breaks as ['গ', '্', 'র', '্', 'র', 'া', 'ম', 'ক', 'ে']
    (b) breaks as ['গ', '্', 'র', 'া', 'ম', 'ক', 'ে']
```

## IMPORTANT NOTE
**The normalization is purely based on how bangla text is used in ```Bangladesh```(bn:bd). It does not necesserily cover every variation of textual content available at other regions**

# unit testing
* clone the repository
* change working directory to ```tests```
* run: ```python3 -m unittest test_normalizer.py```

# Issue Reporting
* for reporting an issue please provide the specific information
    *  invalid text
    *  expected valid text
    *  why is the output expected 
    *  clone the repository
    *  add a test case in **tests/test_normalizer.py** after **line no:91**

    ```python
        # Dummy Non-Bangla,Numbers and Space cases/ Invalid start end cases
        # english
        self.assertEqual(norm('ASD1234')["normalized"],None)
        self.assertEqual(ennorm('ASD1234')["normalized"],'ASD1234')
        # random
        self.assertEqual(norm('িত')["normalized"],'ত')
        self.assertEqual(norm('সং্যুক্তি')["normalized"],"সংযুক্তি")
        # Ending
        self.assertEqual(norm("অজানা্")["normalized"],"অজানা")

        #--------------------------------------------- insert your assertions here----------------------------------------
        '''
            ###  case: give a comment about your case
            ## (a) invalid text==(b) valid text <---- an example of your case
            self.assertEqual(norm(invalid text)["normalized"],expected output)
                        or
            self.assertEqual(ennorm(invalid text)["normalized"],expected output) <----- for including english text
            
        '''
        # your case goes here-
            
    ```
    * perform the unit testing
    * make sure the unit test fails under true conditions    

# Indic Base Normalizer
* to use indic language normalizer for 'devanagari', 'gujarati', 'odiya', 'tamil', 'panjabi', 'malayalam','sylhetinagri'

```python
from bnunicodenormalizer import IndicNormalizer
norm=IndicNormalizer('devanagari')
```
* initialization

```python
'''
    initialize a normalizer
    args:
        language                        :   language identifier from 'devanagari', 'gujarati', 'odiya', 'tamil', 'panjabi', 'malayalam','sylhetinagri'
        allow_english                   :   allow english letters numbers and punctuations [default:False]
                
'''        
        
```


# ABOUT US
* Authors: [Bengali.AI](https://bengali.ai/) in association with OCR Team , [APSIS Solutions Limited](https://apsissolutions.com/) 
* **Cite Bengali.AI multipurpose grapheme dataset paper**
```bibtext
@inproceedings{alam2021large,
  title={A large multi-target dataset of common bengali handwritten graphemes},
  author={Alam, Samiul and Reasat, Tahsin and Sushmit, Asif Shahriyar and Siddique, Sadi Mohammad and Rahman, Fuad and Hasan, Mahady and Humayun, Ahmed Imtiaz},
  booktitle={International Conference on Document Analysis and Recognition},
  pages={383--398},
  year={2021},
  organization={Springer}
}
```

Change Log
===========

0.0.5 (9/03/2022)
-------------------
- added details for execution map
- checkop typo correction

0.0.6 (9/03/2022)
-------------------
- broken diacritics op addition

0.0.7 (11/03/2022)
-------------------
- assemese replacement
- word op and unicode op mapping
- modifier list modification
- doc string for call and initialization
- verbosity removal
- typo correction for operation
- unit test updates
- 'এ' replacement correction
- NonGylphUnicodes
- Legacy symbols option
- legacy mapper added 
- added bn:bd declaration

0.0.8 (14/03/2022)
-------------------
- MultipleConsonantDiacritics handling change
- to+hosonto correction
- invalid hosonto correction 

0.0.9 (15/04/2022)
-------------------
- base normalizer
- language class
- bangla extension
- complex root normalization 

0.0.10 (15/04/2022)
-------------------
- added conjucts
- exception for english words

0.0.11 (15/04/2022)
-------------------
- fixed no space char issue for bangla

0.0.12 (26/04/2022)
-------------------
- fixed consonants orders 

0.0.13 (26/04/2022)
-------------------
- fixed non char followed by diacritics 

0.0.14 (01/05/2022)
-------------------
- word based normalization
- encoding fix

0.0.15 (02/05/2022)
-------------------
- import correction

0.0.16 (02/05/2022)
-------------------
- local variable issue

0.0.17 (17/05/2022)
-------------------
- nukta mod break

0.0.18 (08/06/2022)
-------------------
- no space chars fix


0.0.19 (15/06/2022)
-------------------
- no space chars further fix
- base_bangla_compose to avoid false op flags
- added foreign conjuncts


0.0.20 (01/08/2022)
-------------------
- এ্যা replacement correction

0.0.21 (01/08/2022)
-------------------
- "য","ব" + hosonto combination correction
- added 'ব্ল্য' in conjuncts

0.0.22 (22/08/2022)
-------------------
- \u200d combination limiting

0.0.23 (23/08/2022)
-------------------
- \u200d condition change

0.0.24 (26/08/2022)
-------------------
- \u200d error handling

0.0.25 (10/09/22)
-------------------
- removed unnecessary operations: fixRefOrder,fixOrdersForCC
- added conjuncts: 'র্ন্ত','ঠ্য','ভ্ল'

0.1.0 (20/10/22)
-------------------
- added indic parser
- fixed language class

0.1.1 (21/10/22)
-------------------
- added nukta and diacritic maps for indics 
- cleaned conjucts for now 
- fixed issues with no-space and connector

0.1.2 (10/12/22)
-------------------
- allow halant ending for indic language except bangla

0.1.3 (10/12/22)
-------------------
- broken char break cases for halant 

0.1.4 (01/01/23)
-------------------
- added sylhetinagri 

0.1.5 (01/01/23)
-------------------
- cleaned panjabi double quotes in diac map 

0.1.6 (15/04/23)
-------------------
- added bangla punctuations

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mnansary/bnUnicodeNormalizer",
    "name": "bnunicodenormalizer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "bangla,unicode,text normalization,indic",
    "author": "Bengali.AI",
    "author_email": "research.bengaliai@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b0/eb/8ec11d496623bfceeccb3ad8d40b4a4653d4e1559ece2f4488f5bf58518d/bnunicodenormalizer-0.1.6.tar.gz",
    "platform": null,
    "description": "# bnUnicodeNormalizer\nBangla Unicode Normalization for word normalization\n# install\n```python\npip install bnunicodenormalizer\n```\n# useage\n**initialization and cleaning**\n```python\n# import\nfrom bnunicodenormalizer import Normalizer \nfrom pprint import pprint\n# initialize\nbnorm=Normalizer()\n# normalize\nword = '\u09be\u099f\u09cb\u09ac\u09be\u0995\u09cb'\nresult=bnorm(word)\nprint(f\"Non-norm:{word}; Norm:{result['normalized']}\")\nprint(\"--------------------------------------------------\")\npprint(result)\n```\n> output \n\n```\nNon-norm:\u09be\u099f\u09cb\u09ac\u09be\u0995\u09cb; Norm:\u099f\u09cb\u09ac\u09be\u0995\u09cb\n--------------------------------------------------\n{'given': '\u09be\u099f\u09cb\u09ac\u09be\u0995\u09cb',\n 'normalized': '\u099f\u09cb\u09ac\u09be\u0995\u09cb',\n 'ops': [{'after': '\u099f\u09cb\u09ac\u09be\u0995\u09cb',\n          'before': '\u09be\u099f\u09cb\u09ac\u09be\u0995\u09cb',\n          'operation': 'InvalidUnicode'}]}\n```\n\n**call to the normalizer returns a dictionary in the following format**\n\n* ```given``` = provided text\n* ```normalized``` = normalized text (gives None if during the operation length of the text becomes 0)\n* ```ops``` = list of operations (dictionary) that were executed in given text to create normalized text\n*  each dictionary in ops has:\n    * ```operation```: the name of the operation / problem in given text\n    * ```before``` : what the text looked like before the specific operation\n    * ```after```  : what the text looks like after the specific operation  \n\n**allow to use english text**\n\n```python\n# initialize without english (default)\nnorm=Normalizer()\nprint(\"without english:\",norm(\"ASD123\")[\"normalized\"])\n# --> returns None\nnorm=Normalizer(allow_english=True)\nprint(\"with english:\",norm(\"ASD123\")[\"normalized\"])\n\n```\n> output\n\n```\nwithout english: None\nwith english: ASD123\n```\n\n \n\n# Initialization: Bangla Normalizer\n\n```python\n'''\n    initialize a normalizer\n            args:\n                allow_english                   :   allow english letters numbers and punctuations [default:False]\n                keep_legacy_symbols             :   legacy symbols will be considered as valid unicodes[default:False]\n                                                    '\u09fa':Isshar \n                                                    '\u09fb':Ganda\n                                                    '\u0980':Anji (not '\u09ed')\n                                                    '\u098c':li\n                                                    '\u09e1':dirgho li\n                                                    '\u09bd':Avagraha\n                                                    '\u09e0':Vocalic Rr (not '\u098b')\n                                                    '\u09f2':rupi\n                                                    '\u09f4':currency numerator 1\n                                                    '\u09f5':currency numerator 2\n                                                    '\u09f6':currency numerator 3\n                                                    '\u09f7':currency numerator 4\n                                                    '\u09f8':currency numerator one less than the denominator\n                                                    '\u09f9':Currency Denominator Sixteen\n                legacy_maps                     :   a dictionay for changing legacy symbols into a more used  unicode \n                                                    a default legacy map is included in the language class as well,\n                                                    legacy_maps={'\u0980':'\u09ed',\n                                                                '\u098c':'\u09ef',\n                                                                '\u09e1':'\u09ef',\n                                                                '\u09f5':'\u09ef',\n                                                                '\u09fb':'\u09ce',\n                                                                '\u09e0':'\u098b',\n                                                                '\u09bd':'\u0987'}\n                                            \n                                                    pass-   \n                                                    * legacy_maps=None; for keeping the legacy symbols as they are\n                                                    * legacy_maps=\"default\"; for using the default legacy map\n                                                    * legacy_maps=custom dictionary(type-dict) ; which will map your desired legacy symbol to any of symbol you want\n                                                        * the keys in the custiom dicts must belong to any of the legacy symbols\n                                                        * the values in the custiom dicts must belong to either vowels,consonants,numbers or diacritics  \n                                                        vowels         =   ['\u0985', '\u0986', '\u0987', '\u0988', '\u0989', '\u098a', '\u098b', '\u098f', '\u0990', '\u0993', '\u0994']\n                                                        consonants     =   ['\u0995', '\u0996', '\u0997', '\u0998', '\u0999', '\u099a', '\u099b','\u099c', '\u099d', '\u099e', \n                                                                            '\u099f', '\u09a0', '\u09a1', '\u09a2', '\u09a3', '\u09a4', '\u09a5', '\u09a6', '\u09a7', '\u09a8', \n                                                                            '\u09aa', '\u09ab', '\u09ac', '\u09ad', '\u09ae', '\u09af', '\u09b0', '\u09b2', '\u09b6', '\u09b7', \n                                                                            '\u09b8', '\u09b9','\u09dc', '\u09dd', '\u09df','\u09ce']    \n                                                        numbers        =    ['\u09e6', '\u09e7', '\u09e8', '\u09e9', '\u09ea', '\u09eb', '\u09ec', '\u09ed', '\u09ee', '\u09ef']\n                                                        vowel_diacritics       =   ['\u09be', '\u09bf', '\u09c0', '\u09c1', '\u09c2', '\u09c3', '\u09c7', '\u09c8', '\u09cb', '\u09cc']\n                                                        consonant_diacritics   =   ['\u0981', '\u0982', '\u0983']\n    \n                                                        > for example you may want to map '\u09bd':Avagraha as '\u09b9' based on visual similiarity \n                                                            (default:'\u0987')\n\n                ** legacy contions: keep_legacy_symbols and legacy_maps operates as follows \n                    case-1) keep_legacy_symbols=True and legacy_maps=None\n                        : all legacy symbols will be considered valid unicodes. None of them will be changed\n                    case-2) keep_legacy_symbols=True and legacy_maps=valid dictionary example:{'\u0980':'\u0995'}\n                        : all legacy symbols will be considered valid unicodes. Only '\u0980' will be changed to '\u0995' , others will be untouched\n                    case-3) keep_legacy_symbols=False and legacy_maps=None\n                        : all legacy symbols will be removed\n                    case-4) keep_legacy_symbols=False and legacy_maps=valid dictionary example:{'\u09bd':'\u0987','\u09e0':'\u098b'}\n                        : '\u09bd' will be changed to '\u0987' and '\u09e0' will be changed to '\u098b'. All other legacy symbols will be removed\n'''\n\n```\n\n```python\nmy_legacy_maps={'\u098c':'\u0987',\n                '\u09e1':'\u0987',\n                '\u09f5':'\u0987',\n                '\u09e0':'\u0987',\n                '\u09bd':'\u0987'}\ntext=\"\u09fa,\u09fb,\u0980,\u098c,\u09e1,\u09bd,\u09e0,\u09f2,\u09f4,\u09f5,\u09f6,\u09f7,\u09f8,\u09f9\"\n# case 1\nnorm=Normalizer(keep_legacy_symbols=True,legacy_maps=None)\nprint(\"case-1 normalized text:  \",norm(text)[\"normalized\"])\n# case 2\nnorm=Normalizer(keep_legacy_symbols=True,legacy_maps=my_legacy_maps)\nprint(\"case-2 normalized text:  \",norm(text)[\"normalized\"])\n# case 2-defalut\nnorm=Normalizer(keep_legacy_symbols=True)\nprint(\"case-2 default normalized text:  \",norm(text)[\"normalized\"])\n\n# case 3\nnorm=Normalizer(keep_legacy_symbols=False,legacy_maps=None)\nprint(\"case-3 normalized text:  \",norm(text)[\"normalized\"])\n# case 4\nnorm=Normalizer(keep_legacy_symbols=False,legacy_maps=my_legacy_maps)\nprint(\"case-4 normalized text:  \",norm(text)[\"normalized\"])\n# case 4-defalut\nnorm=Normalizer(keep_legacy_symbols=False)\nprint(\"case-4 default normalized text:  \",norm(text)[\"normalized\"])\n```\n\n> output\n\n```\ncase-1 normalized text:   \u09fa,\u09fb,\u0980,\u098c,\u09e1,\u09bd,\u09e0,\u09f2,\u09f4,\u09f5,\u09f6,\u09f7,\u09f8,\u09f9\ncase-2 normalized text:   \u09fa,\u09fb,\u0980,\u0987,\u0987,\u0987,\u0987,\u09f2,\u09f4,\u0987,\u09f6,\u09f7,\u09f8,\u09f9\ncase-2 default normalized text:   \u09fa,\u09fb,\u0980,\u098c,\u09e1,\u09bd,\u09e0,\u09f2,\u09f4,\u09f5,\u09f6,\u09f7,\u09f8,\u09f9\ncase-3 normalized text:   ,,,,,,,,,,,,,\ncase-4 normalized text:   ,,,\u0987,\u0987,\u0987,\u0987,,,\u0987,,,,\ncase-4 default normalized text:   ,,,,,,,,,,,,, \n```\n\n# Operations\n* base operations available for all indic languages:\n\n```python\nself.word_level_ops={\"LegacySymbols\"   :self.mapLegacySymbols,\n                    \"BrokenDiacritics\" :self.fixBrokenDiacritics}\n\nself.decomp_level_ops={\"BrokenNukta\"             :self.fixBrokenNukta,\n                    \"InvalidUnicode\"             :self.cleanInvalidUnicodes,\n                    \"InvalidConnector\"           :self.cleanInvalidConnector,\n                    \"FixDiacritics\"              :self.cleanDiacritics,\n                    \"VowelDiacriticAfterVowel\"   :self.cleanVowelDiacriticComingAfterVowel}\n```\n* extensions for bangla\n\n```python\nself.decomp_level_ops[\"ToAndHosontoNormalize\"]             =       self.normalizeToandHosonto\n\n# invalid folas \nself.decomp_level_ops[\"NormalizeConjunctsDiacritics\"]      =       self.cleanInvalidConjunctDiacritics\n\n# complex root cleanup \nself.decomp_level_ops[\"ComplexRootNormalization\"]          =       self.convertComplexRoots\n\n```\n\n# Normalization Problem Examples\n**In all examples (a) is the non-normalized form and (b) is the normalized form**\n\n* Broken diacritics:\n``` \n# Example-1: \n(a)'\u0986\u09b0\u09c7\u09be'==(b)'\u0986\u09b0\u09cb' ->  False \n    (a) breaks as:['\u0986', '\u09b0', '\u09c7', '\u09be']\n    (b) breaks as:['\u0986', '\u09b0', '\u09cb']\n# Example-2:\n(a)\u09aa\u09c7\u09d7\u0981\u099b\u09c7==(b)\u09aa\u09cc\u0981\u099b\u09c7 ->  False\n    (a) breaks as:['\u09aa', '\u09c7', '\u09d7', '\u0981', '\u099b', '\u09c7']\n    (b) breaks as:['\u09aa', '\u09cc', '\u0981', '\u099b', '\u09c7']\n# Example-3:\n(a)\u09b8\u0982\u09b8\u09cd\u0995\u09c4\u09a4\u09bf==(b)\u09b8\u0982\u09b8\u09cd\u0995\u09c3\u09a4\u09bf ->  False\n    (a) breaks as:['\u09b8', '\u0982', '\u09b8', '\u09cd', '\u0995', '\u09c4', '\u09a4', '\u09bf']\n    (b) breaks as:['\u09b8', '\u0982', '\u09b8', '\u09cd', '\u0995', '\u09c3', '\u09a4', '\u09bf']\n```\n* Nukta Normalization:\n\n```        \nExample-1:\n(a)\u0995\u09c7\u09a8\u09cd\u09a6\u09cd\u09b0\u09c0\u09af\u09bc==(b)\u0995\u09c7\u09a8\u09cd\u09a6\u09cd\u09b0\u09c0\u09df ->  False\n    (a) breaks as:['\u0995', '\u09c7', '\u09a8', '\u09cd', '\u09a6', '\u09cd', '\u09b0', '\u09c0', '\u09af', '\u09bc']\n    (b) breaks as:['\u0995', '\u09c7', '\u09a8', '\u09cd', '\u09a6', '\u09cd', '\u09b0', '\u09c0', '\u09df']\nExample-2:\n(a)\u09b0\u09af\u09c7\u09bc\u099b\u09c7==(b)\u09b0\u09df\u09c7\u099b\u09c7 ->  False\n    (a) breaks as:['\u09b0', '\u09af', '\u09c7', '\u09bc', '\u099b', '\u09c7']\n    (b) breaks as:['\u09b0', '\u09df', '\u09c7', '\u099b', '\u09c7']\nExample-3: \n(a)\u099c\u09bc\u09a8\u09cd\u09af==(b)\u099c\u09a8\u09cd\u09af ->  False\n    (a) breaks as:['\u099c', '\u09bc', '\u09a8', '\u09cd', '\u09af']\n    (b) breaks as:['\u099c', '\u09a8', '\u09cd', '\u09af']\n``` \n* Invalid hosonto\n```\n# Example-1:\n(a)\u09a6\u09c1\u0987\u09cd\u099f\u09bf==(b)\u09a6\u09c1\u0987\u099f\u09bf-->False\n    (a) breaks as ['\u09a6', '\u09c1', '\u0987', '\u09cd', '\u099f', '\u09bf']\n    (b) breaks as ['\u09a6', '\u09c1', '\u0987', '\u099f', '\u09bf']\n# Example-2:\n(a)\u098f\u09cd\u09a4\u09c7==(b)\u098f\u09a4\u09c7-->False\n    (a) breaks as ['\u098f', '\u09cd', '\u09a4', '\u09c7']\n    (b) breaks as ['\u098f', '\u09a4', '\u09c7']\n# Example-3:\n(a)\u09a8\u09c7\u099f\u09cd\u0993\u09df\u09be\u09b0\u09cd\u0995==(b)\u09a8\u09c7\u099f\u0993\u09df\u09be\u09b0\u09cd\u0995-->False\n    (a) breaks as ['\u09a8', '\u09c7', '\u099f', '\u09cd', '\u0993', '\u09df', '\u09be', '\u09b0', '\u09cd', '\u0995']\n    (b) breaks as ['\u09a8', '\u09c7', '\u099f', '\u0993', '\u09df', '\u09be', '\u09b0', '\u09cd', '\u0995']\n# Example-4:\n(a)\u098f\u09b8\u09cd\u0986\u0987==(b)\u098f\u09b8\u0986\u0987-->False\n    (a) breaks as ['\u098f', '\u09b8', '\u09cd', '\u0986', '\u0987']\n    (b) breaks as ['\u098f', '\u09b8', '\u0986', '\u0987']\n# Example-5: \n(a)'\u099a\u09c1\u09cd\u0995\u09cd\u09a4\u09bf'==(b)'\u099a\u09c1\u0995\u09cd\u09a4\u09bf' ->  False \n    (a) breaks as:['\u099a', '\u09c1', '\u09cd', '\u0995', '\u09cd', '\u09a4', '\u09bf']\n    (b) breaks as:['\u099a', '\u09c1','\u0995', '\u09cd', '\u09a4', '\u09bf']\n# Example-6:\n(a)'\u09af\u09c1\u09cd\u0995\u09cd\u09a4'==(b)'\u09af\u09c1\u0995\u09cd\u09a4' ->   False\n    (a) breaks as:['\u09af', '\u09c1', '\u09cd', '\u0995', '\u09cd', '\u09a4']\n    (b) breaks as:['\u09af', '\u09c1', '\u0995', '\u09cd', '\u09a4']\n# Example-7:\n(a)'\u0995\u09bf\u099b\u09c1\u09cd\u0987'==(b)'\u0995\u09bf\u099b\u09c1\u0987' ->   False\n    (a) breaks as:['\u0995', '\u09bf', '\u099b', '\u09c1', '\u09cd', '\u0987']\n    (b) breaks as:['\u0995', '\u09bf', '\u099b', '\u09c1','\u0987']\n```\n\n* To+hosonto: \n\n``` \n# Example-1:\n(a)\u09ac\u09c1\u09a4\u09cd\u09aa\u09a4\u09cd\u09a4\u09bf==(b)\u09ac\u09c1\u09ce\u09aa\u09a4\u09cd\u09a4\u09bf-->False\n    (a) breaks as ['\u09ac', '\u09c1', '\u09a4', '\u09cd', '\u09aa', '\u09a4', '\u09cd', '\u09a4', '\u09bf']\n    (b) breaks as ['\u09ac', '\u09c1', '\u09ce', '\u09aa', '\u09a4', '\u09cd', '\u09a4', '\u09bf']\n# Example-2:\n(a)\u0989\u09a4\u09cd\u09b8==(b)\u0989\u09ce\u09b8-->False\n    (a) breaks as ['\u0989', '\u09a4', '\u09cd', '\u09b8']\n    (b) breaks as ['\u0989', '\u09ce', '\u09b8']\n```\n\n* Unwanted doubles(consecutive doubles):\n\n```\n# Example-1: \n(a)'\u09af\u09c1\u09c1\u09a6\u09cd\u09a7'==(b)'\u09af\u09c1\u09a6\u09cd\u09a7' ->  False \n    (a) breaks as:['\u09af', '\u09c1', '\u09c1', '\u09a6', '\u09cd', '\u09a7']\n    (b) breaks as:['\u09af', '\u09c1', '\u09a6', '\u09cd', '\u09a7']\n# Example-2:\n(a)'\u09a6\u09c1\u09c1\u0987'==(b)'\u09a6\u09c1\u0987' ->   False\n    (a) breaks as:['\u09a6', '\u09c1', '\u09c1', '\u0987']\n    (b) breaks as:['\u09a6', '\u09c1', '\u0987']\n# Example-3:\n(a)'\u09aa\u09cd\u09b0\u0995\u09c3\u09c3\u09a4\u09bf\u09b0'==(b)'\u09aa\u09cd\u09b0\u0995\u09c3\u09a4\u09bf\u09b0' ->   False\n    (a) breaks as:['\u09aa', '\u09cd', '\u09b0', '\u0995', '\u09c3', '\u09c3', '\u09a4', '\u09bf', '\u09b0']\n    (b) breaks as:['\u09aa', '\u09cd', '\u09b0', '\u0995', '\u09c3', '\u09a4', '\u09bf', '\u09b0']\n# Example-4:\n(a)\u0986\u09ae\u09be\u0995\u09c7\u09be\u09be==(b)'\u0986\u09ae\u09be\u0995\u09cb'->   False\n    (a) breaks as:['\u0986', '\u09ae', '\u09be', '\u0995', '\u09c7', '\u09be', '\u09be']\n    (b) breaks as:['\u0986', '\u09ae', '\u09be', '\u0995', '\u09cb']\n```\n\n* Vowwels and modifier followed by vowel diacritics:\n\n```\n# Example-1:\n(a)\u0989\u09c1\u09b2\u09c1==(b)\u0989\u09b2\u09c1-->False\n    (a) breaks as ['\u0989', '\u09c1', '\u09b2', '\u09c1']\n    (b) breaks as ['\u0989', '\u09b2', '\u09c1']\n# Example-2:\n(a)\u0986\u09b0\u09cd\u0995\u09bf\u0993\u09cb\u09b2\u099c\u09bf==(b)\u0986\u09b0\u09cd\u0995\u09bf\u0993\u09b2\u099c\u09bf-->False\n    (a) breaks as ['\u0986', '\u09b0', '\u09cd', '\u0995', '\u09bf', '\u0993', '\u09cb', '\u09b2', '\u099c', '\u09bf']\n    (b) breaks as ['\u0986', '\u09b0', '\u09cd', '\u0995', '\u09bf', '\u0993', '\u09b2', '\u099c', '\u09bf']\n# Example-3:\n(a)\u098f\u0995\u098f\u09c7==(b)\u098f\u0995\u09a4\u09cd\u09b0\u09c7-->False\n    (a) breaks as ['\u098f', '\u0995', '\u098f', '\u09c7']\n    (b) breaks as ['\u098f', '\u0995', '\u09a4', '\u09cd', '\u09b0', '\u09c7']\n```  \n\n* Repeated folas:\n\n```\n# Example-1:\n(a)\u0997\u09cd\u09b0\u09cd\u09b0\u09be\u09ae\u0995\u09c7==(b)\u0997\u09cd\u09b0\u09be\u09ae\u0995\u09c7-->False\n    (a) breaks as ['\u0997', '\u09cd', '\u09b0', '\u09cd', '\u09b0', '\u09be', '\u09ae', '\u0995', '\u09c7']\n    (b) breaks as ['\u0997', '\u09cd', '\u09b0', '\u09be', '\u09ae', '\u0995', '\u09c7']\n```\n\n## IMPORTANT NOTE\n**The normalization is purely based on how bangla text is used in ```Bangladesh```(bn:bd). It does not necesserily cover every variation of textual content available at other regions**\n\n# unit testing\n* clone the repository\n* change working directory to ```tests```\n* run: ```python3 -m unittest test_normalizer.py```\n\n# Issue Reporting\n* for reporting an issue please provide the specific information\n    *  invalid text\n    *  expected valid text\n    *  why is the output expected \n    *  clone the repository\n    *  add a test case in **tests/test_normalizer.py** after **line no:91**\n\n    ```python\n        # Dummy Non-Bangla,Numbers and Space cases/ Invalid start end cases\n        # english\n        self.assertEqual(norm('ASD1234')[\"normalized\"],None)\n        self.assertEqual(ennorm('ASD1234')[\"normalized\"],'ASD1234')\n        # random\n        self.assertEqual(norm('\u09bf\u09a4')[\"normalized\"],'\u09a4')\n        self.assertEqual(norm('\u09b8\u0982\u09cd\u09af\u09c1\u0995\u09cd\u09a4\u09bf')[\"normalized\"],\"\u09b8\u0982\u09af\u09c1\u0995\u09cd\u09a4\u09bf\")\n        # Ending\n        self.assertEqual(norm(\"\u0985\u099c\u09be\u09a8\u09be\u09cd\")[\"normalized\"],\"\u0985\u099c\u09be\u09a8\u09be\")\n\n        #--------------------------------------------- insert your assertions here----------------------------------------\n        '''\n            ###  case: give a comment about your case\n            ## (a) invalid text==(b) valid text <---- an example of your case\n            self.assertEqual(norm(invalid text)[\"normalized\"],expected output)\n                        or\n            self.assertEqual(ennorm(invalid text)[\"normalized\"],expected output) <----- for including english text\n            \n        '''\n        # your case goes here-\n            \n    ```\n    * perform the unit testing\n    * make sure the unit test fails under true conditions    \n\n# Indic Base Normalizer\n* to use indic language normalizer for 'devanagari', 'gujarati', 'odiya', 'tamil', 'panjabi', 'malayalam','sylhetinagri'\n\n```python\nfrom bnunicodenormalizer import IndicNormalizer\nnorm=IndicNormalizer('devanagari')\n```\n* initialization\n\n```python\n'''\n    initialize a normalizer\n    args:\n        language                        :   language identifier from 'devanagari', 'gujarati', 'odiya', 'tamil', 'panjabi', 'malayalam','sylhetinagri'\n        allow_english                   :   allow english letters numbers and punctuations [default:False]\n                \n'''        \n        \n```\n\n\n# ABOUT US\n* Authors: [Bengali.AI](https://bengali.ai/) in association with OCR Team , [APSIS Solutions Limited](https://apsissolutions.com/) \n* **Cite Bengali.AI multipurpose grapheme dataset paper**\n```bibtext\n@inproceedings{alam2021large,\n  title={A large multi-target dataset of common bengali handwritten graphemes},\n  author={Alam, Samiul and Reasat, Tahsin and Sushmit, Asif Shahriyar and Siddique, Sadi Mohammad and Rahman, Fuad and Hasan, Mahady and Humayun, Ahmed Imtiaz},\n  booktitle={International Conference on Document Analysis and Recognition},\n  pages={383--398},\n  year={2021},\n  organization={Springer}\n}\n```\n\nChange Log\n===========\n\n0.0.5 (9/03/2022)\n-------------------\n- added details for execution map\n- checkop typo correction\n\n0.0.6 (9/03/2022)\n-------------------\n- broken diacritics op addition\n\n0.0.7 (11/03/2022)\n-------------------\n- assemese replacement\n- word op and unicode op mapping\n- modifier list modification\n- doc string for call and initialization\n- verbosity removal\n- typo correction for operation\n- unit test updates\n- '\u098f' replacement correction\n- NonGylphUnicodes\n- Legacy symbols option\n- legacy mapper added \n- added bn:bd declaration\n\n0.0.8 (14/03/2022)\n-------------------\n- MultipleConsonantDiacritics handling change\n- to+hosonto correction\n- invalid hosonto correction \n\n0.0.9 (15/04/2022)\n-------------------\n- base normalizer\n- language class\n- bangla extension\n- complex root normalization \n\n0.0.10 (15/04/2022)\n-------------------\n- added conjucts\n- exception for english words\n\n0.0.11 (15/04/2022)\n-------------------\n- fixed no space char issue for bangla\n\n0.0.12 (26/04/2022)\n-------------------\n- fixed consonants orders \n\n0.0.13 (26/04/2022)\n-------------------\n- fixed non char followed by diacritics \n\n0.0.14 (01/05/2022)\n-------------------\n- word based normalization\n- encoding fix\n\n0.0.15 (02/05/2022)\n-------------------\n- import correction\n\n0.0.16 (02/05/2022)\n-------------------\n- local variable issue\n\n0.0.17 (17/05/2022)\n-------------------\n- nukta mod break\n\n0.0.18 (08/06/2022)\n-------------------\n- no space chars fix\n\n\n0.0.19 (15/06/2022)\n-------------------\n- no space chars further fix\n- base_bangla_compose to avoid false op flags\n- added foreign conjuncts\n\n\n0.0.20 (01/08/2022)\n-------------------\n- \u098f\u09cd\u09af\u09be replacement correction\n\n0.0.21 (01/08/2022)\n-------------------\n- \"\u09af\",\"\u09ac\" + hosonto combination correction\n- added '\u09ac\u09cd\u09b2\u09cd\u09af' in conjuncts\n\n0.0.22 (22/08/2022)\n-------------------\n- \\u200d combination limiting\n\n0.0.23 (23/08/2022)\n-------------------\n- \\u200d condition change\n\n0.0.24 (26/08/2022)\n-------------------\n- \\u200d error handling\n\n0.0.25 (10/09/22)\n-------------------\n- removed unnecessary operations: fixRefOrder,fixOrdersForCC\n- added conjuncts: '\u09b0\u09cd\u09a8\u09cd\u09a4','\u09a0\u09cd\u09af','\u09ad\u09cd\u09b2'\n\n0.1.0 (20/10/22)\n-------------------\n- added indic parser\n- fixed language class\n\n0.1.1 (21/10/22)\n-------------------\n- added nukta and diacritic maps for indics \n- cleaned conjucts for now \n- fixed issues with no-space and connector\n\n0.1.2 (10/12/22)\n-------------------\n- allow halant ending for indic language except bangla\n\n0.1.3 (10/12/22)\n-------------------\n- broken char break cases for halant \n\n0.1.4 (01/01/23)\n-------------------\n- added sylhetinagri \n\n0.1.5 (01/01/23)\n-------------------\n- cleaned panjabi double quotes in diac map \n\n0.1.6 (15/04/23)\n-------------------\n- added bangla punctuations \n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Bangla Unicode Normalization Toolkit",
    "version": "0.1.6",
    "split_keywords": [
        "bangla",
        "unicode",
        "text normalization",
        "indic"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b0eb8ec11d496623bfceeccb3ad8d40b4a4653d4e1559ece2f4488f5bf58518d",
                "md5": "0df6fdab3572da33d87be52dacd7f613",
                "sha256": "a950bafb44a702cdb90c5cca3c71543a860f96b46f906585a6d8c85689bcc093"
            },
            "downloads": -1,
            "filename": "bnunicodenormalizer-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "0df6fdab3572da33d87be52dacd7f613",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 39127,
            "upload_time": "2023-04-15T15:18:52",
            "upload_time_iso_8601": "2023-04-15T15:18:52.986145Z",
            "url": "https://files.pythonhosted.org/packages/b0/eb/8ec11d496623bfceeccb3ad8d40b4a4653d4e1559ece2f4488f5bf58518d/bnunicodenormalizer-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-15 15:18:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "mnansary",
    "github_project": "bnUnicodeNormalizer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "bnunicodenormalizer"
}

Bengali.AI