nepalitokenizers

Name	nepalitokenizers JSON
Version	0.0.1 JSON
	download
home_page
Summary	Pre-trained Tokenizers for the Nepali language with an interface to HuggingFace's tokenizers library for customizability.
upload_time	2023-06-23 22:51:43
maintainer
docs_url	None
author
requires_python	>=3.5
license	Apache-2.0
keywords	nepali tokenizer nlp wordpiece sentencepiece huggingface transformers
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Nepali Tokenizers

[![LICENSE](https://img.shields.io/badge/license-Apache--2.0-blue)](./LICENSE) [![Build and Release](https://github.com/basnetsoyuj/nepali-tokenizers/actions/workflows/build.yml/badge.svg)](https://github.com/basnetsoyuj/nepali-tokenizers/actions/workflows/build.yml)

This package provides access to pre-trained __WordPiece__ and __SentencePiece__ (Unigram) tokenizers for Nepali language, trained using HuggingFace's `tokenizers` library. It is a simple and short Python package tailored specifically for Nepali language with a default set of configurations for the normalizer, pre-tokenizer, post-processor, and decoder. 

It delegates further customization by providing an interface to HuggingFace's `Tokenizer` pipeline, allowing users to adapt the tokenizers according to their requirements.


## Installation

You can install `nepalitokenizers` using pip:

```sh
pip install nepalitokenizers
```


## Usage

After installing the package, you can use the tokenizers in your Python code:

### WordPiece Tokenizer

```python
from nepalitokenizers import WordPiece

text = "हाम्रा सबै क्रियाकलापहरु भोलिवादी छन् । मेरो पानीजहाज वाम माछाले भरिपूर्ण छ । इन्जिनियरहरुले गएको हप्ता राजधानीमा त्यस्तै बहस गरे ।"

tokenizer_wp = WordPiece()

tokens = tokenizer_wp.encode(text)
print(tokens.ids)
print(tokens.tokens)

print(tokenizer_wp.decode(tokens.ids))
```

**Output**

```
[1, 11366, 8625, 14157, 8423, 13344, 9143, 8425, 1496, 9505, 22406, 11693, 12679, 8340, 27445, 1430, 1496, 13890, 9008, 9605, 13591, 14547, 9957, 12507, 8700, 1496, 2]
['[CLS]', 'हाम्रा', 'सबै', 'क्रियाकलाप', '##हरु', 'भोलि', '##वादी', 'छन्', '।', 'मेरो', 'पानीजहाज', 'वाम', 'माछा', '##ले', 'भरिपूर्ण', 'छ', '।', 'इन्जिनियर', '##हरुले', 'गएको', 'हप्ता', 'राजधानीमा', 'त्यस्तै', 'बहस', 'गरे', '।', '[SEP]']
हाम्रा सबै क्रियाकलापहरु भोलिवादी छन् । मेरो पानीजहाज वाम माछाले भरिपूर्ण छ । इन्जिनियरहरुले गएको हप्ता राजधानीमा त्यस्तै बहस गरे ।
```

### SentencePiece (Unigram) Tokenizer

```python
from nepalitokenizers import SentencePiece

text = "कोभिड महामारीको पिडाबाट मुक्त नहुँदै मानव समाजलाई यतिबेला युद्धको विध्वंसकारी क्षतिको चिन्ताले चिन्तित बनाएको छ ।"

tokenizer_sp = SentencePiece()

tokens = tokenizer_sp.encode(text)
print(tokens.ids)
print(tokens.tokens)

print(tokenizer_wp.decode(tokens.ids))
```

**Output**

```
[7, 9, 3241, 483, 12081, 9, 11079, 23, 2567, 11254, 1002, 789, 20, 3334, 2161, 9, 23517, 2711, 1115, 9, 1718, 12, 5941, 781, 19, 8, 1, 0]
['▁', 'को', 'भि', 'ड', '▁महामारी', 'को', '▁पिडा', 'बाट', '▁मुक्त', '▁नहुँदै', '▁मानव', '▁समाज', 'लाई', '▁यतिबेला', '▁युद्ध', 'को', '▁विध्वंस', 'कारी', '▁क्षति', 'को', '▁चिन्ता', 'ले', '▁चिन्तित', '▁बनाएको', '▁छ', '▁।', '<sep>', '<cls>']
कोभिड महामारीको पिडाबाट मुक्त नहुँदै मानव समाजलाई यतिबेला युद्धको विध्वंसकारी क्षतिको चिन्ताले चिन्तित बनाएको छ ।
```


## Configuration & Customization

Each tokenizer class has a default and standard set of configurations for the normalizer, pre-tokenizer, post-processor, and decoder. For more information, look at the training files available in the [`train/`](train/) directory.

The package delegates further customization by providing an interface to directly access to HuggingFace's tokenizer pipeline. Therefore, you can treat `nepalitokenizers`'s tokenizer instances as HuggingFace's `Tokenizer` objects. For example:

```python
from nepalitokenizers import WordPiece

# importing from the HuggingFace tokenizers package
from tokenizers.processors import TemplateProcessing

text = "हाम्रो मातृभूमि नेपाल हो"

tokenizer_sp = WordPiece()

# using default post processor
tokens = tokenizer_sp.encode(text)
print(tokens.tokens)

# change the post processor to not add any special tokens
# treat tokenizer_sp as HuggingFace's Tokenizer object
tokenizer_sp.post_processor = TemplateProcessing()

tokens = tokenizer_sp.encode(text)
print(tokens.tokens)
```

**Output**
```
['[CLS]', 'हाम्रो', 'मातृ', '##भूमि', 'नेपाल', 'हो', '[SEP]']
['हाम्रो', 'मातृ', '##भूमि', 'नेपाल', 'हो']
```

To learn more about further customizations that can be performed, visit [HuggingFace's Tokenizer Documentation](https://huggingface.co/docs/tokenizers/api/tokenizer).


> **Note**: The delegation to HuggingFace's Tokenizer pipeline was done with the following generic wrapper class because `tokenizers.Tokenizer` is not an acceptable base type for inheritance.
> It is a useful trick I use for solving similar issues:
> ```python
> class Delegate:
>    """
>    A generic wrapper class that delegates attributes and method calls
>    to the specified self.delegate instance.
>    """
>
>    @property
>    def _items(self):
>        return dir(self.delegate)
>
>    def __getattr__(self, name):
>        if name in self._items:
>            return getattr(self.delegate, name)
>        raise AttributeError(
>            f"'{self.__class__.__name__}' object has no attribute '{name}'")
>
>    def __setattr__(self, name, value):
>        if name == "delegate" or name not in self._items:
>            super().__setattr__(name, value)
>        else:
>            setattr(self.delegate, name, value)
>
>    def __dir__(self):
>        return dir(type(self)) + list(self.__dict__.keys()) + self._items
> ```


## Training

The python files used to train the tokenizers are available in the [`train/`](train/) directory. You can also use these files to train your own tokenizers on a custom text corpus. 

These tokenizers were trained on two datasets:

#### 1. The Nepali Subset of the [OSCAR](https://oscar-project.github.io/documentation/versions/oscar-2301/) dataset

You can download it using the following code:

```python
import datasets
from tqdm.auto import tqdm
import os

dataset = datasets.load_dataset(
    'oscar', 'unshuffled_deduplicated_ne',
    split='train'
)

os.mkdir('data')

batch = []
counter = 0

for sample in tqdm(dataset):
    sample = sample['text'].replace('\n', ' ')
    batch.append(sample)

    if len(batch) == 10_000:
        with open(f'data/ne_{counter}.txt', 'w', encoding='utf-8') as f:
            f.write('\n'.join(batch))
            batch = []
            counter += 1
```

#### 2. A Large Scale Nepali Text Corpus by Rabindra Lamsal (2020)
To download the dataset, follow the instructions provided in this link: [A Large Scale Nepali Text Corpus](https://dx.doi.org/10.21227/jxrd-d245).


## License

This package is licensed under the Apache 2.0 License, which is consistent with the license used by HuggingFace's `tokenizers` library. Please see the [`LICENSE`](LICENSE) file for more details.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "nepalitokenizers",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": "",
    "keywords": "nepali,tokenizer,NLP,wordpiece,sentencepiece,huggingface,transformers",
    "author": "",
    "author_email": "Soyuj Jung Basnet <bsoyuj@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/41/bb/e05f85fa8bea83f714b054c5b050c59a812bba69c6ec07c922abaf889a40/nepalitokenizers-0.0.1.tar.gz",
    "platform": null,
    "description": "# Nepali Tokenizers\n\n[![LICENSE](https://img.shields.io/badge/license-Apache--2.0-blue)](./LICENSE) [![Build and Release](https://github.com/basnetsoyuj/nepali-tokenizers/actions/workflows/build.yml/badge.svg)](https://github.com/basnetsoyuj/nepali-tokenizers/actions/workflows/build.yml)\n\nThis package provides access to pre-trained __WordPiece__ and __SentencePiece__ (Unigram) tokenizers for Nepali language, trained using HuggingFace's `tokenizers` library. It is a simple and short Python package tailored specifically for Nepali language with a default set of configurations for the normalizer, pre-tokenizer, post-processor, and decoder. \n\nIt delegates further customization by providing an interface to HuggingFace's `Tokenizer` pipeline, allowing users to adapt the tokenizers according to their requirements.\n\n\n## Installation\n\nYou can install `nepalitokenizers` using pip:\n\n```sh\npip install nepalitokenizers\n```\n\n\n## Usage\n\nAfter installing the package, you can use the tokenizers in your Python code:\n\n### WordPiece Tokenizer\n\n```python\nfrom nepalitokenizers import WordPiece\n\ntext = \"\u0939\u093e\u092e\u094d\u0930\u093e \u0938\u092c\u0948 \u0915\u094d\u0930\u093f\u092f\u093e\u0915\u0932\u093e\u092a\u0939\u0930\u0941 \u092d\u094b\u0932\u093f\u0935\u093e\u0926\u0940 \u091b\u0928\u094d \u0964 \u092e\u0947\u0930\u094b \u092a\u093e\u0928\u0940\u091c\u0939\u093e\u091c \u0935\u093e\u092e \u092e\u093e\u091b\u093e\u0932\u0947 \u092d\u0930\u093f\u092a\u0942\u0930\u094d\u0923 \u091b \u0964 \u0907\u0928\u094d\u091c\u093f\u0928\u093f\u092f\u0930\u0939\u0930\u0941\u0932\u0947 \u0917\u090f\u0915\u094b \u0939\u092a\u094d\u0924\u093e \u0930\u093e\u091c\u0927\u093e\u0928\u0940\u092e\u093e \u0924\u094d\u092f\u0938\u094d\u0924\u0948 \u092c\u0939\u0938 \u0917\u0930\u0947 \u0964\"\n\ntokenizer_wp = WordPiece()\n\ntokens = tokenizer_wp.encode(text)\nprint(tokens.ids)\nprint(tokens.tokens)\n\nprint(tokenizer_wp.decode(tokens.ids))\n```\n\n**Output**\n\n```\n[1, 11366, 8625, 14157, 8423, 13344, 9143, 8425, 1496, 9505, 22406, 11693, 12679, 8340, 27445, 1430, 1496, 13890, 9008, 9605, 13591, 14547, 9957, 12507, 8700, 1496, 2]\n['[CLS]', '\u0939\u093e\u092e\u094d\u0930\u093e', '\u0938\u092c\u0948', '\u0915\u094d\u0930\u093f\u092f\u093e\u0915\u0932\u093e\u092a', '##\u0939\u0930\u0941', '\u092d\u094b\u0932\u093f', '##\u0935\u093e\u0926\u0940', '\u091b\u0928\u094d', '\u0964', '\u092e\u0947\u0930\u094b', '\u092a\u093e\u0928\u0940\u091c\u0939\u093e\u091c', '\u0935\u093e\u092e', '\u092e\u093e\u091b\u093e', '##\u0932\u0947', '\u092d\u0930\u093f\u092a\u0942\u0930\u094d\u0923', '\u091b', '\u0964', '\u0907\u0928\u094d\u091c\u093f\u0928\u093f\u092f\u0930', '##\u0939\u0930\u0941\u0932\u0947', '\u0917\u090f\u0915\u094b', '\u0939\u092a\u094d\u0924\u093e', '\u0930\u093e\u091c\u0927\u093e\u0928\u0940\u092e\u093e', '\u0924\u094d\u092f\u0938\u094d\u0924\u0948', '\u092c\u0939\u0938', '\u0917\u0930\u0947', '\u0964', '[SEP]']\n\u0939\u093e\u092e\u094d\u0930\u093e \u0938\u092c\u0948 \u0915\u094d\u0930\u093f\u092f\u093e\u0915\u0932\u093e\u092a\u0939\u0930\u0941 \u092d\u094b\u0932\u093f\u0935\u093e\u0926\u0940 \u091b\u0928\u094d \u0964 \u092e\u0947\u0930\u094b \u092a\u093e\u0928\u0940\u091c\u0939\u093e\u091c \u0935\u093e\u092e \u092e\u093e\u091b\u093e\u0932\u0947 \u092d\u0930\u093f\u092a\u0942\u0930\u094d\u0923 \u091b \u0964 \u0907\u0928\u094d\u091c\u093f\u0928\u093f\u092f\u0930\u0939\u0930\u0941\u0932\u0947 \u0917\u090f\u0915\u094b \u0939\u092a\u094d\u0924\u093e \u0930\u093e\u091c\u0927\u093e\u0928\u0940\u092e\u093e \u0924\u094d\u092f\u0938\u094d\u0924\u0948 \u092c\u0939\u0938 \u0917\u0930\u0947 \u0964\n```\n\n### SentencePiece (Unigram) Tokenizer\n\n```python\nfrom nepalitokenizers import SentencePiece\n\ntext = \"\u0915\u094b\u092d\u093f\u0921 \u092e\u0939\u093e\u092e\u093e\u0930\u0940\u0915\u094b \u092a\u093f\u0921\u093e\u092c\u093e\u091f \u092e\u0941\u0915\u094d\u0924 \u0928\u0939\u0941\u0901\u0926\u0948 \u092e\u093e\u0928\u0935 \u0938\u092e\u093e\u091c\u0932\u093e\u0908 \u092f\u0924\u093f\u092c\u0947\u0932\u093e \u092f\u0941\u0926\u094d\u0927\u0915\u094b \u0935\u093f\u0927\u094d\u0935\u0902\u0938\u0915\u093e\u0930\u0940 \u0915\u094d\u0937\u0924\u093f\u0915\u094b \u091a\u093f\u0928\u094d\u0924\u093e\u0932\u0947 \u091a\u093f\u0928\u094d\u0924\u093f\u0924 \u092c\u0928\u093e\u090f\u0915\u094b \u091b \u0964\"\n\ntokenizer_sp = SentencePiece()\n\ntokens = tokenizer_sp.encode(text)\nprint(tokens.ids)\nprint(tokens.tokens)\n\nprint(tokenizer_wp.decode(tokens.ids))\n```\n\n**Output**\n\n```\n[7, 9, 3241, 483, 12081, 9, 11079, 23, 2567, 11254, 1002, 789, 20, 3334, 2161, 9, 23517, 2711, 1115, 9, 1718, 12, 5941, 781, 19, 8, 1, 0]\n['\u2581', '\u0915\u094b', '\u092d\u093f', '\u0921', '\u2581\u092e\u0939\u093e\u092e\u093e\u0930\u0940', '\u0915\u094b', '\u2581\u092a\u093f\u0921\u093e', '\u092c\u093e\u091f', '\u2581\u092e\u0941\u0915\u094d\u0924', '\u2581\u0928\u0939\u0941\u0901\u0926\u0948', '\u2581\u092e\u093e\u0928\u0935', '\u2581\u0938\u092e\u093e\u091c', '\u0932\u093e\u0908', '\u2581\u092f\u0924\u093f\u092c\u0947\u0932\u093e', '\u2581\u092f\u0941\u0926\u094d\u0927', '\u0915\u094b', '\u2581\u0935\u093f\u0927\u094d\u0935\u0902\u0938', '\u0915\u093e\u0930\u0940', '\u2581\u0915\u094d\u0937\u0924\u093f', '\u0915\u094b', '\u2581\u091a\u093f\u0928\u094d\u0924\u093e', '\u0932\u0947', '\u2581\u091a\u093f\u0928\u094d\u0924\u093f\u0924', '\u2581\u092c\u0928\u093e\u090f\u0915\u094b', '\u2581\u091b', '\u2581\u0964', '<sep>', '<cls>']\n\u0915\u094b\u092d\u093f\u0921 \u092e\u0939\u093e\u092e\u093e\u0930\u0940\u0915\u094b \u092a\u093f\u0921\u093e\u092c\u093e\u091f \u092e\u0941\u0915\u094d\u0924 \u0928\u0939\u0941\u0901\u0926\u0948 \u092e\u093e\u0928\u0935 \u0938\u092e\u093e\u091c\u0932\u093e\u0908 \u092f\u0924\u093f\u092c\u0947\u0932\u093e \u092f\u0941\u0926\u094d\u0927\u0915\u094b \u0935\u093f\u0927\u094d\u0935\u0902\u0938\u0915\u093e\u0930\u0940 \u0915\u094d\u0937\u0924\u093f\u0915\u094b \u091a\u093f\u0928\u094d\u0924\u093e\u0932\u0947 \u091a\u093f\u0928\u094d\u0924\u093f\u0924 \u092c\u0928\u093e\u090f\u0915\u094b \u091b \u0964\n```\n\n\n## Configuration & Customization\n\nEach tokenizer class has a default and standard set of configurations for the normalizer, pre-tokenizer, post-processor, and decoder. For more information, look at the training files available in the [`train/`](train/) directory.\n\nThe package delegates further customization by providing an interface to directly access to HuggingFace's tokenizer pipeline. Therefore, you can treat `nepalitokenizers`'s tokenizer instances as HuggingFace's `Tokenizer` objects. For example:\n\n```python\nfrom nepalitokenizers import WordPiece\n\n# importing from the HuggingFace tokenizers package\nfrom tokenizers.processors import TemplateProcessing\n\ntext = \"\u0939\u093e\u092e\u094d\u0930\u094b \u092e\u093e\u0924\u0943\u092d\u0942\u092e\u093f \u0928\u0947\u092a\u093e\u0932 \u0939\u094b\"\n\ntokenizer_sp = WordPiece()\n\n# using default post processor\ntokens = tokenizer_sp.encode(text)\nprint(tokens.tokens)\n\n# change the post processor to not add any special tokens\n# treat tokenizer_sp as HuggingFace's Tokenizer object\ntokenizer_sp.post_processor = TemplateProcessing()\n\ntokens = tokenizer_sp.encode(text)\nprint(tokens.tokens)\n```\n\n**Output**\n```\n['[CLS]', '\u0939\u093e\u092e\u094d\u0930\u094b', '\u092e\u093e\u0924\u0943', '##\u092d\u0942\u092e\u093f', '\u0928\u0947\u092a\u093e\u0932', '\u0939\u094b', '[SEP]']\n['\u0939\u093e\u092e\u094d\u0930\u094b', '\u092e\u093e\u0924\u0943', '##\u092d\u0942\u092e\u093f', '\u0928\u0947\u092a\u093e\u0932', '\u0939\u094b']\n```\n\nTo learn more about further customizations that can be performed, visit [HuggingFace's Tokenizer Documentation](https://huggingface.co/docs/tokenizers/api/tokenizer).\n\n\n> **Note**: The delegation to HuggingFace's Tokenizer pipeline was done with the following generic wrapper class because `tokenizers.Tokenizer` is not an acceptable base type for inheritance.\n> It is a useful trick I use for solving similar issues:\n> ```python\n> class Delegate:\n>    \"\"\"\n>    A generic wrapper class that delegates attributes and method calls\n>    to the specified self.delegate instance.\n>    \"\"\"\n>\n>    @property\n>    def _items(self):\n>        return dir(self.delegate)\n>\n>    def __getattr__(self, name):\n>        if name in self._items:\n>            return getattr(self.delegate, name)\n>        raise AttributeError(\n>            f\"'{self.__class__.__name__}' object has no attribute '{name}'\")\n>\n>    def __setattr__(self, name, value):\n>        if name == \"delegate\" or name not in self._items:\n>            super().__setattr__(name, value)\n>        else:\n>            setattr(self.delegate, name, value)\n>\n>    def __dir__(self):\n>        return dir(type(self)) + list(self.__dict__.keys()) + self._items\n> ```\n\n\n## Training\n\nThe python files used to train the tokenizers are available in the [`train/`](train/) directory. You can also use these files to train your own tokenizers on a custom text corpus. \n\nThese tokenizers were trained on two datasets:\n\n#### 1. The Nepali Subset of the [OSCAR](https://oscar-project.github.io/documentation/versions/oscar-2301/) dataset\n\nYou can download it using the following code:\n\n```python\nimport datasets\nfrom tqdm.auto import tqdm\nimport os\n\ndataset = datasets.load_dataset(\n    'oscar', 'unshuffled_deduplicated_ne',\n    split='train'\n)\n\nos.mkdir('data')\n\nbatch = []\ncounter = 0\n\nfor sample in tqdm(dataset):\n    sample = sample['text'].replace('\\n', ' ')\n    batch.append(sample)\n\n    if len(batch) == 10_000:\n        with open(f'data/ne_{counter}.txt', 'w', encoding='utf-8') as f:\n            f.write('\\n'.join(batch))\n            batch = []\n            counter += 1\n```\n\n#### 2. A Large Scale Nepali Text Corpus by Rabindra Lamsal (2020)\nTo download the dataset, follow the instructions provided in this link: [A Large Scale Nepali Text Corpus](https://dx.doi.org/10.21227/jxrd-d245).\n\n\n## License\n\nThis package is licensed under the Apache 2.0 License, which is consistent with the license used by HuggingFace's `tokenizers` library. Please see the [`LICENSE`](LICENSE) file for more details.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Pre-trained Tokenizers for the Nepali language with an interface to HuggingFace's tokenizers library for customizability.",
    "version": "0.0.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/basnetsoyuj/nepali-tokenizers/issues",
        "Homepage": "https://github.com/basnetsoyuj/nepali-tokenizers"
    },
    "split_keywords": [
        "nepali",
        "tokenizer",
        "nlp",
        "wordpiece",
        "sentencepiece",
        "huggingface",
        "transformers"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fa2247bbd1d408a94dc9470c809235f13d22a47750a25996583f058772fa15f9",
                "md5": "eac68d91ff6a1b740e8a5954b74c7096",
                "sha256": "a6c1fadb0f2d86b603d3861a7fecee810d064b4c66d6acbcf9de301ece083085"
            },
            "downloads": -1,
            "filename": "nepalitokenizers-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "eac68d91ff6a1b740e8a5954b74c7096",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.5",
            "size": 9438,
            "upload_time": "2023-06-23T22:51:41",
            "upload_time_iso_8601": "2023-06-23T22:51:41.461261Z",
            "url": "https://files.pythonhosted.org/packages/fa/22/47bbd1d408a94dc9470c809235f13d22a47750a25996583f058772fa15f9/nepalitokenizers-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "41bbe05f85fa8bea83f714b054c5b050c59a812bba69c6ec07c922abaf889a40",
                "md5": "42abd0829b452b7a5f7348f16f47033c",
                "sha256": "c5713b6d88d2f855c89445ca60de93eba0b0fb9404045a4ff290499d69253016"
            },
            "downloads": -1,
            "filename": "nepalitokenizers-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "42abd0829b452b7a5f7348f16f47033c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5",
            "size": 9398,
            "upload_time": "2023-06-23T22:51:43",
            "upload_time_iso_8601": "2023-06-23T22:51:43.355542Z",
            "url": "https://files.pythonhosted.org/packages/41/bb/e05f85fa8bea83f714b054c5b050c59a812bba69c6ec07c922abaf889a40/nepalitokenizers-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-23 22:51:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "basnetsoyuj",
    "github_project": "nepali-tokenizers",
    "github_not_found": true,
    "lcname": "nepalitokenizers"
}