ts-tokenizer


Namets-tokenizer JSON
Version 0.1.12 PyPI version JSON
download
home_pagehttps://github.com/tanerim/ts_tokenizer
SummaryTS Tokenizer is a hybrid (lexicon-based and rule-based) tokenizer designed for Turkish text.
upload_time2024-10-18 06:48:00
maintainerNone
docs_urlNone
authorTaner Sezer
requires_python>=3.9
licenseNone
keywords
VCS
bugtrack_url
requirements click tqdm setuptools
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # TS Tokenizer

**TS Tokenizer** is a hybrid tokenizer designed for Turkish text.
It uses a hybrid (lexicon-based and rule-based) approach to split text into tokens.
The tokenizer leverages regular expressions to handle non-standard text elements like dates, percentages, URLs, and punctuation marks.


### Key Features:
- **Hybrid Approach**: Uses a hybrid (lexicon-based and rule-based approach) for tokenization.
- **Handling of Special Tokens**: Recognizes special tokens like mentions, hashtags, emails, URLs, numbers, smiley, emoticons, etc..
- **Highly Configurable**: Provides multiple output formats to suit different NLP processing needs,
- including plain tokens, tagged tokens, and token-tag pairs in list or line formats.

Whether you are working on natural language processing (NLP), information retrieval, or text mining for Turkish, **TS Tokenizer** offers
a versatile and reliable solution for tokenization.


# Installation

You can install the ts-tokenizer package using pip. Ensure you have Python 3.9 or higher installed on your system.

    pip install ts-tokenizer

## Command line tool
Basic usage returns tokenized output of given text file.

    $ ts-tokenizer input.txt

## CLI Arguments

-o parameter takes four arguments.

The two arguments 'tokenized' and 'tagged' returns word/per/line output.
Tokenized is the default value and it is not obligatory to declare.

input_text = "Queen , 31.10.1975 tarihinde çıkardıðı A Night at the Opera albümüyle dünya müziðini deðiåÿtirdi ."

    $ ts-tokenizer input text

    Queen
    ,
    31.10.1975
    tarihinde
    çıkardığı
    A
    Night
    at
    the
    Opera
    albümüyle
    dünya
    müziğini
    değiştirdi
    .

Note that tags are not part-of-speech tags but they define the given string.
    
    $ ts-tokenizer -o tagged input.txt

    Queen	English_Word
    ,	Punc
    31.10.1975	Date
    tarihinde	Valid_Word
    çıkardığı	Valid_Word
    A	OOV
    Night	English_Word
    at	Valid_Word
    the	English_Word
    Opera	Valid_Word
    albümüyle	Valid_Word
    dünya	Valid_Word
    müziğini	Valid_Word
    değiştirdi	Valid_Word
    .	Punc


The other two arguments are "lines" and "tagged_lines".
The "lines" parameter reads input file line-by-line and returns a list for each line.

    $ ts-tokenizer -o lines input.txt

    ['Queen', ',', '31.10.1975', 'tarihinde', 'çıkardığı', 'A', 'Night', 'at', 'the', 'Opera', 'albümüyle', 'dünya', 'müziğini', 'değiştirdi', '.']

The "tagged_lines" parameter reads input file line-by-line and returns a list of tuples for each line.


    $ ts-tokenizer -o tagged_lines input.txt

    [('Queen', 'English_Word'), (',', 'Punc'), ('31.10.1975', 'Date'), ('tarihinde', 'Valid_Word'), ('çıkardığı', 'Valid_Word'), ('A', 'OOV'), ('Night', 'English_Word'), ('at', 'Valid_Word'), ('the', 'English_Word'), ('Opera', 'Valid_Word'), ('albümüyle', 'Valid_Word'),('dünya', 'Valid_Word'), ('müziğini', 'Valid_Word'), ('değiştirdi', 'Valid_Word'), ('.', 'Punc')]


The tokenizer is designed to take advantge of multiple cores. Default value is [Total Number of Cores - 1].
-j parameter sets the number of parallel workers.

    $ ts-tokenizer -j 2 -o tagged input_file

## Using CLI Arguments with pipelines

ts-tokenizer could also be used in a pipeline on bash.

Following sample returns calculated  frequencies for the given file:

    $ ts-tokenizer input.txt | sort | uniq -c | sort -n

For case-insensitive output tr is employed in the sample below:

    $ ts-tokenizer input.txt | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -n

Sample below returns number of tags in given text

    $ts-tokenizer -o tagged input.txt | cut -f3 | sort | uniq -c
      1 Hyphen_In
      1 Inner_Punc
      2 FMP
      8 ISP
      8 Num_Char_Seq
     12 Number
     24 Apostrophe
     25 OOV
     69 FSP
    515 Valid_Word

To find a specific tag following command could be used.

    $ ts-tokenizer -o tagged input.txt | cut -f2,3 | grep "Num_Char_Seq"
    40'ar	Num_Char_Seq
    2.	Num_Char_Seq
    24.	Num_Char_Seq
    Num_Char_Seq
    16'sı	Num_Char_Seq
    8.	Num_Char_Seq
    20'şer	Num_Char_Seq
    40'ar	Num_Char_Seq

By employing sort and uniq commands frequency of the words with target tag could be found:

    $ ts-tokenizer -o tagged Test_Text.txt | cut -f2,3 | grep "Num_Char_Seq" | sort | uniq -c | sort -n
      1 16'sı	Num_Char_Seq
      1 20'şer	Num_Char_Seq
      1 2.	Num_Char_Seq
      1 8.	Num_Char_Seq
      2 24.	Num_Char_Seq
      2 40'ar	Num_Char_Seq



--help returns help
    
    $ ts-tokenizer --help

    usage: main.py [-h] [-o {tokenized,lines,tagged,tagged_lines}] [-w] [-v] [-j JOBS] filename

    positional arguments:
      filename              Name of the file to process
    
    options:
      -h, --help            show this help message and exit
      -o {tokenized,lines,tagged,tagged_lines}, --output {tokenized,lines,tagged,tagged_lines}
                            Specify the output format
      -v, --verbose         Enable verbose mode
      -j JOBS, --jobs JOBS  Number of parallel workers


## Classes

## CharFix

This class has 4 methods. They are useful to fix corrupted texts.

### CharFix Class

```python
from ts_tokenizer.char_fix import CharFix
```

### Fix Characters

```python
line = "Parça ve bütün iliåÿkisi her zaman iåÿlevsel deðildir."
print(CharFix.fix(line))

Parça ve bütün ilişkisi her zaman işlevsel değildir.
```
### Lowercase

```python
line = "İstanbul ve Iğdır ''arası'' 1528 km'dir."
print(CharFix.tr_lowercase(line))

istanbul ve ığdır ''arası'' 1528 km'dir.
```
### Fix Quotes

```python
line = "İstanbul ve Iğdır ''arası'' 1528 km'dir."
print(CharFix.fix_quote(line))

İstanbul ve Iğdır "arası" 1528 km'dir.
```


## TokenCheck

This class is used to pass input tokens to the tokenizer for further analysis.
However, it could be used for various tasks.<br>
The tags are "Valid_Word", "Exception_Word", "Eng_Word", "Date", "Hour", "In_Parenthesis", "In_Quotes", "Smiley", "Inner_Char", "Abbr", "Number", "Non_Prefix_URL", "Prefix_URL", "Emoticon", "Mention", "HashTag", "Percentage_Numbers", "Percentage_Number_Chars", "Num_Char_Seq", "Multiple_Smiley", "Punc", "Underscored", "Hyphenated", "Hyphen_In", "Multiple_Emoticon", "Copyright", "Email", "Registered", "Three_or_More"

### token_tagger

```python
from ts_tokenizer.token_check import TokenCheck
```

### Default Usage
```python
word = "Parça"
print(TokenCheck.token_tagger(word))

$ Valid_Word

print(TokenCheck.token_tagger(word, output="all", output_format="tuple"))

$ ('Parça', 'Parça', 'Valid_Word')

print(TokenCheck.token_tagger(word, output="all", output_format="list"))

$ ['Parça', 'Parça', 'Valid_Word']

word = "#tstokenizer"
print(TokenCheck.token_tagger(word, output='all', output_format='tuple'))  # Returns a tuple
('#tstokenizer', '#tstokenizer', 'HashTag')

word = "#tanerim"
print(TokenCheck.token_tagger(word, output='all', output_format='list'))   # Returns a list
['@tanerim', '@tanerim', 'Mention']

word = ":):):)"
print(TokenCheck.token_tagger(word, output='all', output_format='string'))   # Returns a tab-separated string
:):):)  :):):)  Multiple_Smiley
```

```python
line = "Queen , 31.10.1975 tarihinde çıkardıðı A Night at the Opera albümüyle dünya müziðini deðiåÿtirdi ."

for word in line.split(" "):
    TokenTag = TokenCheck.token_tagger(word, output='all', output_format='list')
    print(TokenTag)
['Queen', 'Queen', 'Eng_Word']
[',', ',', 'Punc']
['31.10.1975', '31.10.1975', 'Date']
['tarihinde', 'tarihinde', 'Valid_Word']
['çıkardıðı', 'çıkardığı', 'Valid_Word']
['A', 'A', 'OOV']
['Night', 'Night', 'Eng_Word']
['at', 'at', 'Valid_Word']
['the', 'the', 'Eng_Word']
['Opera', 'Opera', 'Valid_Word']
['albümüyle', 'albümüyle', 'Valid_Word']
['dünya', 'dünya', 'Valid_Word']
['müziðini', 'müziğini', 'Valid_Word']
['deðiåÿtirdi', 'değiştirdi', 'Valid_Word']
['.', '.', 'Punc']

```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/tanerim/ts_tokenizer",
    "name": "ts-tokenizer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Taner Sezer",
    "author_email": "tanersezerr@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/78/e0/3c22f4ae03be68289c042deecd5fd014b0ac322ea97e29c6fad57b8c51e3/ts_tokenizer-0.1.12.tar.gz",
    "platform": null,
    "description": "# TS Tokenizer\n\n**TS Tokenizer** is a hybrid tokenizer designed for Turkish text.\nIt uses a hybrid (lexicon-based and rule-based) approach to split text into tokens.\nThe tokenizer leverages regular expressions to handle non-standard text elements like dates, percentages, URLs, and punctuation marks.\n\n\n### Key Features:\n- **Hybrid Approach**: Uses a hybrid (lexicon-based and rule-based approach) for tokenization.\n- **Handling of Special Tokens**: Recognizes special tokens like mentions, hashtags, emails, URLs, numbers, smiley, emoticons, etc..\n- **Highly Configurable**: Provides multiple output formats to suit different NLP processing needs,\n- including plain tokens, tagged tokens, and token-tag pairs in list or line formats.\n\nWhether you are working on natural language processing (NLP), information retrieval, or text mining for Turkish, **TS Tokenizer** offers\na versatile and reliable solution for tokenization.\n\n\n# Installation\n\nYou can install the ts-tokenizer package using pip. Ensure you have Python 3.9 or higher installed on your system.\n\n    pip install ts-tokenizer\n\n## Command line tool\nBasic usage returns tokenized output of given text file.\n\n    $ ts-tokenizer input.txt\n\n## CLI Arguments\n\n-o parameter takes four arguments.\n\nThe two arguments 'tokenized' and 'tagged' returns word/per/line output.\nTokenized is the default value and it is not obligatory to declare.\n\ninput_text = \"Queen , 31.10.1975 tarihinde \u00e7\u0131kard\u0131\u00f0\u0131 A Night at the Opera alb\u00c3\u00bcm\u00c3\u00bcyle d\u00c3\u00bcnya m\u00c3\u00bczi\u00f0ini de\u00f0i\u00e5\u00fftirdi .\"\n\n    $ ts-tokenizer input text\n\n    Queen\n    ,\n    31.10.1975\n    tarihinde\n    \u00e7\u0131kard\u0131\u011f\u0131\n    A\n    Night\n    at\n    the\n    Opera\n    alb\u00fcm\u00fcyle\n    d\u00fcnya\n    m\u00fczi\u011fini\n    de\u011fi\u015ftirdi\n    .\n\nNote that tags are not part-of-speech tags but they define the given string.\n    \n    $ ts-tokenizer -o tagged input.txt\n\n    Queen\tEnglish_Word\n    ,\tPunc\n    31.10.1975\tDate\n    tarihinde\tValid_Word\n    \u00e7\u0131kard\u0131\u011f\u0131\tValid_Word\n    A\tOOV\n    Night\tEnglish_Word\n    at\tValid_Word\n    the\tEnglish_Word\n    Opera\tValid_Word\n    alb\u00fcm\u00fcyle\tValid_Word\n    d\u00fcnya\tValid_Word\n    m\u00fczi\u011fini\tValid_Word\n    de\u011fi\u015ftirdi\tValid_Word\n    .\tPunc\n\n\nThe other two arguments are \"lines\" and \"tagged_lines\".\nThe \"lines\" parameter reads input file line-by-line and returns a list for each line.\n\n    $ ts-tokenizer -o lines input.txt\n\n    ['Queen', ',', '31.10.1975', 'tarihinde', '\u00e7\u0131kard\u0131\u011f\u0131', 'A', 'Night', 'at', 'the', 'Opera', 'alb\u00fcm\u00fcyle', 'd\u00fcnya', 'm\u00fczi\u011fini', 'de\u011fi\u015ftirdi', '.']\n\nThe \"tagged_lines\" parameter reads input file line-by-line and returns a list of tuples for each line.\n\n\n    $ ts-tokenizer -o tagged_lines input.txt\n\n    [('Queen', 'English_Word'), (',', 'Punc'), ('31.10.1975', 'Date'), ('tarihinde', 'Valid_Word'), ('\u00e7\u0131kard\u0131\u011f\u0131', 'Valid_Word'), ('A', 'OOV'), ('Night', 'English_Word'), ('at', 'Valid_Word'), ('the', 'English_Word'), ('Opera', 'Valid_Word'), ('alb\u00fcm\u00fcyle', 'Valid_Word'),('d\u00fcnya', 'Valid_Word'), ('m\u00fczi\u011fini', 'Valid_Word'), ('de\u011fi\u015ftirdi', 'Valid_Word'), ('.', 'Punc')]\n\n\nThe tokenizer is designed to take advantge of multiple cores. Default value is [Total Number of Cores - 1].\n-j parameter sets the number of parallel workers.\n\n    $ ts-tokenizer -j 2 -o tagged input_file\n\n## Using CLI Arguments with pipelines\n\nts-tokenizer could also be used in a pipeline on bash.\n\nFollowing sample returns calculated  frequencies for the given file:\n\n    $ ts-tokenizer input.txt | sort | uniq -c | sort -n\n\nFor case-insensitive output tr is employed in the sample below:\n\n    $ ts-tokenizer input.txt | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -n\n\nSample below returns number of tags in given text\n\n    $ts-tokenizer -o tagged input.txt | cut -f3 | sort | uniq -c\n      1 Hyphen_In\n      1 Inner_Punc\n      2 FMP\n      8 ISP\n      8 Num_Char_Seq\n     12 Number\n     24 Apostrophe\n     25 OOV\n     69 FSP\n    515 Valid_Word\n\nTo find a specific tag following command could be used.\n\n    $ ts-tokenizer -o tagged input.txt | cut -f2,3 | grep \"Num_Char_Seq\"\n    40'ar\tNum_Char_Seq\n    2.\tNum_Char_Seq\n    24.\tNum_Char_Seq\n    Num_Char_Seq\n    16's\u0131\tNum_Char_Seq\n    8.\tNum_Char_Seq\n    20'\u015fer\tNum_Char_Seq\n    40'ar\tNum_Char_Seq\n\nBy employing sort and uniq commands frequency of the words with target tag could be found:\n\n    $ ts-tokenizer -o tagged Test_Text.txt | cut -f2,3 | grep \"Num_Char_Seq\" | sort | uniq -c | sort -n\n      1 16's\u0131\tNum_Char_Seq\n      1 20'\u015fer\tNum_Char_Seq\n      1 2.\tNum_Char_Seq\n      1 8.\tNum_Char_Seq\n      2 24.\tNum_Char_Seq\n      2 40'ar\tNum_Char_Seq\n\n\n\n--help returns help\n    \n    $ ts-tokenizer --help\n\n    usage: main.py [-h] [-o {tokenized,lines,tagged,tagged_lines}] [-w] [-v] [-j JOBS] filename\n\n    positional arguments:\n      filename              Name of the file to process\n    \n    options:\n      -h, --help            show this help message and exit\n      -o {tokenized,lines,tagged,tagged_lines}, --output {tokenized,lines,tagged,tagged_lines}\n                            Specify the output format\n      -v, --verbose         Enable verbose mode\n      -j JOBS, --jobs JOBS  Number of parallel workers\n\n\n## Classes\n\n## CharFix\n\nThis class has 4 methods. They are useful to fix corrupted texts.\n\n### CharFix Class\n\n```python\nfrom ts_tokenizer.char_fix import CharFix\n```\n\n### Fix Characters\n\n```python\nline = \"Par\u00c3\u00a7a ve b\u00c3\u00bct\u00c3\u00bcn ili\u00e5\u00ffkisi her zaman i\u00e5\u00fflevsel de\u00f0ildir.\"\nprint(CharFix.fix(line))\n\nPar\u00e7a ve b\u00fct\u00fcn ili\u015fkisi her zaman i\u015flevsel de\u011fildir.\n```\n### Lowercase\n\n```python\nline = \"\u0130stanbul ve I\u011fd\u0131r ''aras\u0131'' 1528 km'dir.\"\nprint(CharFix.tr_lowercase(line))\n\nistanbul ve \u0131\u011fd\u0131r ''aras\u0131'' 1528 km'dir.\n```\n### Fix Quotes\n\n```python\nline = \"\u0130stanbul ve I\u011fd\u0131r ''aras\u0131'' 1528 km'dir.\"\nprint(CharFix.fix_quote(line))\n\n\u0130stanbul ve I\u011fd\u0131r \"aras\u0131\" 1528 km'dir.\n```\n\n\n## TokenCheck\n\nThis class is used to pass input tokens to the tokenizer for further analysis.\nHowever, it could be used for various tasks.<br>\nThe tags are \"Valid_Word\", \"Exception_Word\", \"Eng_Word\", \"Date\", \"Hour\", \"In_Parenthesis\", \"In_Quotes\", \"Smiley\", \"Inner_Char\", \"Abbr\", \"Number\", \"Non_Prefix_URL\", \"Prefix_URL\", \"Emoticon\", \"Mention\", \"HashTag\", \"Percentage_Numbers\", \"Percentage_Number_Chars\", \"Num_Char_Seq\", \"Multiple_Smiley\", \"Punc\", \"Underscored\", \"Hyphenated\", \"Hyphen_In\", \"Multiple_Emoticon\", \"Copyright\", \"Email\", \"Registered\", \"Three_or_More\"\n\n### token_tagger\n\n```python\nfrom ts_tokenizer.token_check import TokenCheck\n```\n\n### Default Usage\n```python\nword = \"Par\u00c3\u00a7a\"\nprint(TokenCheck.token_tagger(word))\n\n$ Valid_Word\n\nprint(TokenCheck.token_tagger(word, output=\"all\", output_format=\"tuple\"))\n\n$ ('Par\u00c3\u00a7a', 'Par\u00e7a', 'Valid_Word')\n\nprint(TokenCheck.token_tagger(word, output=\"all\", output_format=\"list\"))\n\n$ ['Par\u00c3\u00a7a', 'Par\u00e7a', 'Valid_Word']\n\nword = \"#tstokenizer\"\nprint(TokenCheck.token_tagger(word, output='all', output_format='tuple'))  # Returns a tuple\n('#tstokenizer', '#tstokenizer', 'HashTag')\n\nword = \"#tanerim\"\nprint(TokenCheck.token_tagger(word, output='all', output_format='list'))   # Returns a list\n['@tanerim', '@tanerim', 'Mention']\n\nword = \":):):)\"\nprint(TokenCheck.token_tagger(word, output='all', output_format='string'))   # Returns a tab-separated string\n:):):)  :):):)  Multiple_Smiley\n```\n\n```python\nline = \"Queen , 31.10.1975 tarihinde \u00e7\u0131kard\u0131\u00f0\u0131 A Night at the Opera alb\u00c3\u00bcm\u00c3\u00bcyle d\u00c3\u00bcnya m\u00c3\u00bczi\u00f0ini de\u00f0i\u00e5\u00fftirdi .\"\n\nfor word in line.split(\" \"):\n    TokenTag = TokenCheck.token_tagger(word, output='all', output_format='list')\n    print(TokenTag)\n['Queen', 'Queen', 'Eng_Word']\n[',', ',', 'Punc']\n['31.10.1975', '31.10.1975', 'Date']\n['tarihinde', 'tarihinde', 'Valid_Word']\n['\u00e7\u0131kard\u0131\u00f0\u0131', '\u00e7\u0131kard\u0131\u011f\u0131', 'Valid_Word']\n['A', 'A', 'OOV']\n['Night', 'Night', 'Eng_Word']\n['at', 'at', 'Valid_Word']\n['the', 'the', 'Eng_Word']\n['Opera', 'Opera', 'Valid_Word']\n['alb\u00c3\u00bcm\u00c3\u00bcyle', 'alb\u00fcm\u00fcyle', 'Valid_Word']\n['d\u00c3\u00bcnya', 'd\u00fcnya', 'Valid_Word']\n['m\u00c3\u00bczi\u00f0ini', 'm\u00fczi\u011fini', 'Valid_Word']\n['de\u00f0i\u00e5\u00fftirdi', 'de\u011fi\u015ftirdi', 'Valid_Word']\n['.', '.', 'Punc']\n\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "TS Tokenizer is a hybrid (lexicon-based and rule-based) tokenizer designed for Turkish text.",
    "version": "0.1.12",
    "project_urls": {
        "Homepage": "https://github.com/tanerim/ts_tokenizer"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "addf774732b2b278e5cc227d07be85b789b9252390630d216da8f2f38c3dbc29",
                "md5": "a90423e72b2e0a831367c8fe6811d066",
                "sha256": "f523f3c0267a858fee6a95d15a055101d806c9fcda1faaf36a932b3f8839127b"
            },
            "downloads": -1,
            "filename": "ts_tokenizer-0.1.12-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a90423e72b2e0a831367c8fe6811d066",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 9996250,
            "upload_time": "2024-10-18T06:47:56",
            "upload_time_iso_8601": "2024-10-18T06:47:56.083153Z",
            "url": "https://files.pythonhosted.org/packages/ad/df/774732b2b278e5cc227d07be85b789b9252390630d216da8f2f38c3dbc29/ts_tokenizer-0.1.12-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "78e03c22f4ae03be68289c042deecd5fd014b0ac322ea97e29c6fad57b8c51e3",
                "md5": "fc0d6d6acdd4238be3d8f190b873c2ab",
                "sha256": "1d69b40fb156ada34670d661164c9687dfcbe9850d7e32bbd897d4aa8503523d"
            },
            "downloads": -1,
            "filename": "ts_tokenizer-0.1.12.tar.gz",
            "has_sig": false,
            "md5_digest": "fc0d6d6acdd4238be3d8f190b873c2ab",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 10052010,
            "upload_time": "2024-10-18T06:48:00",
            "upload_time_iso_8601": "2024-10-18T06:48:00.054621Z",
            "url": "https://files.pythonhosted.org/packages/78/e0/3c22f4ae03be68289c042deecd5fd014b0ac322ea97e29c6fad57b8c51e3/ts_tokenizer-0.1.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-18 06:48:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tanerim",
    "github_project": "ts_tokenizer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "click",
            "specs": [
                [
                    "~=",
                    "8.1.7"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "~=",
                    "4.66.4"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    "~=",
                    "68.2.0"
                ]
            ]
        }
    ],
    "lcname": "ts-tokenizer"
}
        
Elapsed time: 0.65322s