# UniTok V3: 类SQL数据预处理工具包
Updated on 2023.11.04
## 1. 简介
UniTok 是史上第一个类SQL的数据预处理工具包,提供了一整套的数据封装和编辑工具。
UniTok 主要包括两大组件:负责统一数据处理的`UniTok` 和 负责数据读取和二次编辑的`UniDep`:
- `UniTok` 通过分词器(Tokenizers)和数据列(Columns)等组件将生数据(Raw Data)进行分词与ID化操作,并最终以numpy数组格式存储为一张数据表。
- `UniDep` 读取由`UniTok`生成的数据表以及元数据(如词表信息),可以直接与Pytorch的Dataset结合使用,也可以完成二次编辑、和其他数据表合并、导出等操作。
- 在3.1.9版本后,我们推出`Fut` 组件,它是`UniTok`的替代品,可以更快速地完成数据预处理。
## 2. 安装
使用pip安装:
```bash
pip install unitok>=3.4.8
```
## 3. 主要功能
### 3.1 UniTok
UniTok提供了一整套的数据预处理工具,包括不同类型的分词器、数据列的管理等。具体来说,UniTok 提供了多种类型的分词器,可以满足不同类型数据的分词需求。每个分词器都继承自 `BaseTok` 类。
此外,UniTok 提供了 `Column` 类来管理数据列。每个 `Column` 对象包含一个分词器(Tokenizer)和一个序列操作器(SeqOperator)。
我们以新闻推荐系统场景为例,数据集可能包含以下部分:
- 新闻内容数据`(news.tsv)`:每一行是一条新闻,包含新闻ID、新闻标题、摘要、类别、子类别等多个特征,用`\t`分隔。
- 用户历史数据`(user.tsv)`:每一行是一位用户,包含用户ID和用户历史点击新闻的ID列表,新闻ID用` `分隔。
- 交互数据:包含训练`(train.tsv)`、验证`(dev.tsv)`和测试数据`(test.tsv)`。每一行是一条交互记录,包含用户ID、新闻ID、是否点击,用`\t`分隔。
我们首先分析以上每个属性的数据类型:
| 文件 | 属性 | 类型 | 样例 | 备注 |
|-----------|----------|-----|----------------------------------------------------------------------|-------------------------|
| news.tsv | nid | str | N1234 | 新闻ID,唯一标识 |
| news.tsv | title | str | After 10 years, the iPhone is still the best smartphone in the world | 新闻标题,通常用BertTokenizer分词 |
| news.tsv | abstract | str | The iPhone 11 Pro is the best smartphone you can buy right now. | 新闻摘要,通常用BertTokenizer分词 |
| news.tsv | category | str | Technology | 新闻类别,不可分割 |
| news.tsv | subcat | str | Mobile | 新闻子类别,不可分割 |
| user.tsv | uid | str | U1234 | 用户ID,唯一标识 |
| user.tsv | history | str | N1234 N1235 N1236 | 用户历史,被` `分割 |
| train.tsv | uid | str | U1234 | 用户ID,与`user.tsv`一致 |
| train.tsv | nid | str | N1234 | 新闻ID,与`news.tsv`一致 |
| train.tsv | label | int | 1 | 是否点击,0表示未点击,1表示点击 |
我们可以对以上属性进行分类:
| 属性 | 类型 | 预设分词器 | 备注 |
|------------------|-----|-----------|-------------------------------------|
| nid, uid, index | str | IdTok | 唯一标识 |
| title, abstract | str | BertTok | 指定参数`vocab_dir="bert-base-uncased"` |
| category, subcat | str | EntityTok | 不可分割 |
| history | str | SplitTok | 指定参数`sep=' '` |
| label | int | NumberTok | 指定参数`vocab_size=2`,只有0和1两种情况 |
通过以下代码,我们可以针对每个文件构建一个UniTok对象:
```python
from UniTok import UniTok, Column, Vocab
from UniTok.tok import IdTok, BertTok, EntTok, SplitTok, NumberTok
# Create a news id vocab, commonly used in news data, history data, and interaction data.
nid_vocab = Vocab('nid')
# Create a bert tokenizer, commonly used in tokenizing title and abstract.
eng_tok = BertTok(vocab_dir='bert-base-uncased', name='eng')
# Create a news UniTok object.
news_ut = UniTok()
# Add columns to the news UniTok object.
news_ut.add_col(Column(
# Specify the vocab. The column name will be set to 'nid' automatically if not specified.
tok=IdTok(vocab=nid_vocab),
)).add_col(Column(
# The column name will be set to 'title', rather than the name of eng_tok 'eng'.
name='title',
tok=eng_tok,
max_length=20, # Specify the max length. The exceeding part will be truncated.
)).add_col(Column(
name='abstract',
tok=eng_tok, # Abstract and title use the same tokenizer.
max_length=30,
)).add_col(Column(
name='category',
tok=EntTok, # Vocab will be created automatically, and the vocab name will be set to 'category'.
)).add_col(Column(
name='subcat',
tok=EntTok, # Vocab will be created automatically, and the vocab name will be set to 'subcat'.
))
# Read the data file.
news_ut.read('news.tsv', sep='\t')
# Tokenize the data.
news_ut.tokenize()
# Store the tokenized data.
news_ut.store('data/news')
# Create a user id vocab, commonly used in user data and interaction data.
uid_vocab = Vocab('uid') # 在用户数据和交互数据中都会用到
# Create a user UniTok object.
user_ut = UniTok()
# Add columns to the user UniTok object.
user_ut.add_col(Column(
tok=IdTok(vocab=uid_vocab),
)).add_col(Column(
name='history',
tok=SplitTok(sep=' '), # The news id in the history data is separated by space.
))
# Read the data file.
user_ut.read('user.tsv', sep='\t')
# Tokenize the data.
user_ut.tokenize()
# Store the tokenized data.
user_ut.store('data/user')
def inter_tokenize(mode):
# Create an interaction UniTok object.
inter_ut = UniTok()
# Add columns to the interaction UniTok object.
inter_ut.add_index_col(
# The index column in the interaction data is automatically generated, and the tokenizer does not need to be specified.
).add_col(Column(
# Align with the uid column in user_ut.
tok=EntTok(vocab=uid_vocab),
)).add_col(Column(
# Align with the nid column in news_ut.
tok=EntTok(vocab=nid_vocab),
)).add_col(Column(
name='label',
# The label column in the interaction data only has two values, 0 and 1.
tok=NumberTok(vocab_size=2), # NumberTok is supported by UniTok >= 3.0.11.
))
# Read the data file.
inter_ut.read(f'{mode}.tsv', sep='\t')
# Tokenize the data.
inter_ut.tokenize()
# Store the tokenized data.
inter_ut.store(mode)
inter_tokenize('data/train')
inter_tokenize('data/dev')
inter_tokenize('data/test')
```
### 3.2 UniDep
UniDep 是一个数据依赖处理类,可以用于加载和访问 UniTok 预处理后的数据。UniDep 包括词汇表(Vocabs),元数据(Meta)等。
`Vocabs` 类是用来集中管理所有的词汇表的。每个 `Vocab` 对象包含了对象到索引的映射,索引到对象的映射,以及一些其它的属性和方法。
`Meta` 类用来管理元数据,包括加载、保存和升级元数据。
以下是一个简单的使用示例:
```python
from UniTok import UniDep
# Load the data.
dep = UniDep('data/news')
# Get sample size.
print(len(dep))
# Get the first sample.
print(dep[0])
```
Raw data
{
"_id": null,
"home_page": "https://github.com/Jyonn/UnifiedTokenizer",
"name": "UniTok",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "token, tokenizer",
"author": "Jyonn Liu",
"author_email": "liu@qijiong.work",
"download_url": "https://files.pythonhosted.org/packages/63/53/aab09922cc3122017b9bb3cd25be0bff45cd8b37af2e3d4b6d11de5deb71/UniTok-3.5.3.tar.gz",
"platform": "any",
"description": "# UniTok V3: \u7c7bSQL\u6570\u636e\u9884\u5904\u7406\u5de5\u5177\u5305\n\nUpdated on 2023.11.04\n\n## 1. \u7b80\u4ecb\n\nUniTok \u662f\u53f2\u4e0a\u7b2c\u4e00\u4e2a\u7c7bSQL\u7684\u6570\u636e\u9884\u5904\u7406\u5de5\u5177\u5305\uff0c\u63d0\u4f9b\u4e86\u4e00\u6574\u5957\u7684\u6570\u636e\u5c01\u88c5\u548c\u7f16\u8f91\u5de5\u5177\u3002\n\nUniTok \u4e3b\u8981\u5305\u62ec\u4e24\u5927\u7ec4\u4ef6\uff1a\u8d1f\u8d23\u7edf\u4e00\u6570\u636e\u5904\u7406\u7684`UniTok` \u548c \u8d1f\u8d23\u6570\u636e\u8bfb\u53d6\u548c\u4e8c\u6b21\u7f16\u8f91\u7684`UniDep`\uff1a\n- `UniTok` \u901a\u8fc7\u5206\u8bcd\u5668\uff08Tokenizers\uff09\u548c\u6570\u636e\u5217\uff08Columns\uff09\u7b49\u7ec4\u4ef6\u5c06\u751f\u6570\u636e\uff08Raw Data\uff09\u8fdb\u884c\u5206\u8bcd\u4e0eID\u5316\u64cd\u4f5c\uff0c\u5e76\u6700\u7ec8\u4ee5numpy\u6570\u7ec4\u683c\u5f0f\u5b58\u50a8\u4e3a\u4e00\u5f20\u6570\u636e\u8868\u3002\n- `UniDep` \u8bfb\u53d6\u7531`UniTok`\u751f\u6210\u7684\u6570\u636e\u8868\u4ee5\u53ca\u5143\u6570\u636e\uff08\u5982\u8bcd\u8868\u4fe1\u606f\uff09\uff0c\u53ef\u4ee5\u76f4\u63a5\u4e0ePytorch\u7684Dataset\u7ed3\u5408\u4f7f\u7528\uff0c\u4e5f\u53ef\u4ee5\u5b8c\u6210\u4e8c\u6b21\u7f16\u8f91\u3001\u548c\u5176\u4ed6\u6570\u636e\u8868\u5408\u5e76\u3001\u5bfc\u51fa\u7b49\u64cd\u4f5c\u3002\n- \u57283.1.9\u7248\u672c\u540e\uff0c\u6211\u4eec\u63a8\u51fa`Fut` \u7ec4\u4ef6\uff0c\u5b83\u662f`UniTok`\u7684\u66ff\u4ee3\u54c1\uff0c\u53ef\u4ee5\u66f4\u5feb\u901f\u5730\u5b8c\u6210\u6570\u636e\u9884\u5904\u7406\u3002\n\n## 2. \u5b89\u88c5\n\n\u4f7f\u7528pip\u5b89\u88c5\uff1a\n\n```bash\npip install unitok>=3.4.8\n```\n\n## 3. \u4e3b\u8981\u529f\u80fd\n\n### 3.1 UniTok\n\nUniTok\u63d0\u4f9b\u4e86\u4e00\u6574\u5957\u7684\u6570\u636e\u9884\u5904\u7406\u5de5\u5177\uff0c\u5305\u62ec\u4e0d\u540c\u7c7b\u578b\u7684\u5206\u8bcd\u5668\u3001\u6570\u636e\u5217\u7684\u7ba1\u7406\u7b49\u3002\u5177\u4f53\u6765\u8bf4\uff0cUniTok \u63d0\u4f9b\u4e86\u591a\u79cd\u7c7b\u578b\u7684\u5206\u8bcd\u5668\uff0c\u53ef\u4ee5\u6ee1\u8db3\u4e0d\u540c\u7c7b\u578b\u6570\u636e\u7684\u5206\u8bcd\u9700\u6c42\u3002\u6bcf\u4e2a\u5206\u8bcd\u5668\u90fd\u7ee7\u627f\u81ea `BaseTok` \u7c7b\u3002\n\n\u6b64\u5916\uff0cUniTok \u63d0\u4f9b\u4e86 `Column` \u7c7b\u6765\u7ba1\u7406\u6570\u636e\u5217\u3002\u6bcf\u4e2a `Column` \u5bf9\u8c61\u5305\u542b\u4e00\u4e2a\u5206\u8bcd\u5668\uff08Tokenizer\uff09\u548c\u4e00\u4e2a\u5e8f\u5217\u64cd\u4f5c\u5668\uff08SeqOperator\uff09\u3002\n\n\u6211\u4eec\u4ee5\u65b0\u95fb\u63a8\u8350\u7cfb\u7edf\u573a\u666f\u4e3a\u4f8b\uff0c\u6570\u636e\u96c6\u53ef\u80fd\u5305\u542b\u4ee5\u4e0b\u90e8\u5206\uff1a\n\n- \u65b0\u95fb\u5185\u5bb9\u6570\u636e`(news.tsv)`\uff1a\u6bcf\u4e00\u884c\u662f\u4e00\u6761\u65b0\u95fb\uff0c\u5305\u542b\u65b0\u95fbID\u3001\u65b0\u95fb\u6807\u9898\u3001\u6458\u8981\u3001\u7c7b\u522b\u3001\u5b50\u7c7b\u522b\u7b49\u591a\u4e2a\u7279\u5f81\uff0c\u7528`\\t`\u5206\u9694\u3002\n- \u7528\u6237\u5386\u53f2\u6570\u636e`(user.tsv)`\uff1a\u6bcf\u4e00\u884c\u662f\u4e00\u4f4d\u7528\u6237\uff0c\u5305\u542b\u7528\u6237ID\u548c\u7528\u6237\u5386\u53f2\u70b9\u51fb\u65b0\u95fb\u7684ID\u5217\u8868\uff0c\u65b0\u95fbID\u7528` `\u5206\u9694\u3002\n- \u4ea4\u4e92\u6570\u636e\uff1a\u5305\u542b\u8bad\u7ec3`(train.tsv)`\u3001\u9a8c\u8bc1`(dev.tsv)`\u548c\u6d4b\u8bd5\u6570\u636e`(test.tsv)`\u3002\u6bcf\u4e00\u884c\u662f\u4e00\u6761\u4ea4\u4e92\u8bb0\u5f55\uff0c\u5305\u542b\u7528\u6237ID\u3001\u65b0\u95fbID\u3001\u662f\u5426\u70b9\u51fb\uff0c\u7528`\\t`\u5206\u9694\u3002\n\n\u6211\u4eec\u9996\u5148\u5206\u6790\u4ee5\u4e0a\u6bcf\u4e2a\u5c5e\u6027\u7684\u6570\u636e\u7c7b\u578b\uff1a\n\n| \u6587\u4ef6 | \u5c5e\u6027 | \u7c7b\u578b | \u6837\u4f8b | \u5907\u6ce8 |\n|-----------|----------|-----|----------------------------------------------------------------------|-------------------------|\n| news.tsv | nid | str | N1234 | \u65b0\u95fbID\uff0c\u552f\u4e00\u6807\u8bc6 |\n| news.tsv | title | str | After 10 years, the iPhone is still the best smartphone in the world | \u65b0\u95fb\u6807\u9898\uff0c\u901a\u5e38\u7528BertTokenizer\u5206\u8bcd |\n| news.tsv | abstract | str | The iPhone 11 Pro is the best smartphone you can buy right now. | \u65b0\u95fb\u6458\u8981\uff0c\u901a\u5e38\u7528BertTokenizer\u5206\u8bcd |\n| news.tsv | category | str | Technology | \u65b0\u95fb\u7c7b\u522b\uff0c\u4e0d\u53ef\u5206\u5272 |\n| news.tsv | subcat | str | Mobile | \u65b0\u95fb\u5b50\u7c7b\u522b\uff0c\u4e0d\u53ef\u5206\u5272 |\n| user.tsv | uid | str | U1234 | \u7528\u6237ID\uff0c\u552f\u4e00\u6807\u8bc6 |\n| user.tsv | history | str | N1234 N1235 N1236 | \u7528\u6237\u5386\u53f2\uff0c\u88ab` `\u5206\u5272 |\n| train.tsv | uid | str | U1234 | \u7528\u6237ID\uff0c\u4e0e`user.tsv`\u4e00\u81f4 |\n| train.tsv | nid | str | N1234 | \u65b0\u95fbID\uff0c\u4e0e`news.tsv`\u4e00\u81f4 |\n| train.tsv | label | int | 1 | \u662f\u5426\u70b9\u51fb\uff0c0\u8868\u793a\u672a\u70b9\u51fb\uff0c1\u8868\u793a\u70b9\u51fb |\n\n\u6211\u4eec\u53ef\u4ee5\u5bf9\u4ee5\u4e0a\u5c5e\u6027\u8fdb\u884c\u5206\u7c7b\uff1a\n\n| \u5c5e\u6027 | \u7c7b\u578b | \u9884\u8bbe\u5206\u8bcd\u5668 | \u5907\u6ce8 |\n|------------------|-----|-----------|-------------------------------------|\n| nid, uid, index | str | IdTok | \u552f\u4e00\u6807\u8bc6 |\n| title, abstract | str | BertTok | \u6307\u5b9a\u53c2\u6570`vocab_dir=\"bert-base-uncased\"` |\n| category, subcat | str | EntityTok | \u4e0d\u53ef\u5206\u5272 |\n| history | str | SplitTok | \u6307\u5b9a\u53c2\u6570`sep=' '` |\n| label | int | NumberTok | \u6307\u5b9a\u53c2\u6570`vocab_size=2`\uff0c\u53ea\u67090\u548c1\u4e24\u79cd\u60c5\u51b5 |\n\n\u901a\u8fc7\u4ee5\u4e0b\u4ee3\u7801\uff0c\u6211\u4eec\u53ef\u4ee5\u9488\u5bf9\u6bcf\u4e2a\u6587\u4ef6\u6784\u5efa\u4e00\u4e2aUniTok\u5bf9\u8c61\uff1a\n\n```python\nfrom UniTok import UniTok, Column, Vocab\nfrom UniTok.tok import IdTok, BertTok, EntTok, SplitTok, NumberTok\n\n# Create a news id vocab, commonly used in news data, history data, and interaction data.\nnid_vocab = Vocab('nid')\n\n# Create a bert tokenizer, commonly used in tokenizing title and abstract.\neng_tok = BertTok(vocab_dir='bert-base-uncased', name='eng')\n\n# Create a news UniTok object.\nnews_ut = UniTok()\n\n# Add columns to the news UniTok object.\nnews_ut.add_col(Column(\n # Specify the vocab. The column name will be set to 'nid' automatically if not specified.\n tok=IdTok(vocab=nid_vocab),\n)).add_col(Column(\n # The column name will be set to 'title', rather than the name of eng_tok 'eng'.\n name='title',\n tok=eng_tok,\n max_length=20, # Specify the max length. The exceeding part will be truncated.\n)).add_col(Column(\n name='abstract',\n tok=eng_tok, # Abstract and title use the same tokenizer.\n max_length=30,\n)).add_col(Column(\n name='category',\n tok=EntTok, # Vocab will be created automatically, and the vocab name will be set to 'category'.\n)).add_col(Column(\n name='subcat',\n tok=EntTok, # Vocab will be created automatically, and the vocab name will be set to 'subcat'.\n))\n\n# Read the data file.\nnews_ut.read('news.tsv', sep='\\t')\n\n# Tokenize the data.\nnews_ut.tokenize() \n\n# Store the tokenized data.\nnews_ut.store('data/news')\n\n# Create a user id vocab, commonly used in user data and interaction data.\nuid_vocab = Vocab('uid') # \u5728\u7528\u6237\u6570\u636e\u548c\u4ea4\u4e92\u6570\u636e\u4e2d\u90fd\u4f1a\u7528\u5230\n\n# Create a user UniTok object.\nuser_ut = UniTok()\n\n# Add columns to the user UniTok object.\nuser_ut.add_col(Column(\n tok=IdTok(vocab=uid_vocab),\n)).add_col(Column(\n name='history',\n tok=SplitTok(sep=' '), # The news id in the history data is separated by space.\n))\n\n# Read the data file.\nuser_ut.read('user.tsv', sep='\\t') \n\n# Tokenize the data.\nuser_ut.tokenize() \n\n# Store the tokenized data.\nuser_ut.store('data/user')\n\n\ndef inter_tokenize(mode):\n # Create an interaction UniTok object.\n inter_ut = UniTok()\n \n # Add columns to the interaction UniTok object.\n inter_ut.add_index_col(\n # The index column in the interaction data is automatically generated, and the tokenizer does not need to be specified.\n ).add_col(Column(\n # Align with the uid column in user_ut.\n tok=EntTok(vocab=uid_vocab), \n )).add_col(Column(\n # Align with the nid column in news_ut.\n tok=EntTok(vocab=nid_vocab), \n )).add_col(Column(\n name='label',\n # The label column in the interaction data only has two values, 0 and 1.\n tok=NumberTok(vocab_size=2), # NumberTok is supported by UniTok >= 3.0.11.\n ))\n\n # Read the data file.\n inter_ut.read(f'{mode}.tsv', sep='\\t')\n \n # Tokenize the data.\n inter_ut.tokenize() \n \n # Store the tokenized data.\n inter_ut.store(mode)\n\n \ninter_tokenize('data/train')\ninter_tokenize('data/dev')\ninter_tokenize('data/test')\n```\n\n### 3.2 UniDep\n\nUniDep \u662f\u4e00\u4e2a\u6570\u636e\u4f9d\u8d56\u5904\u7406\u7c7b\uff0c\u53ef\u4ee5\u7528\u4e8e\u52a0\u8f7d\u548c\u8bbf\u95ee UniTok \u9884\u5904\u7406\u540e\u7684\u6570\u636e\u3002UniDep \u5305\u62ec\u8bcd\u6c47\u8868\uff08Vocabs\uff09\uff0c\u5143\u6570\u636e\uff08Meta\uff09\u7b49\u3002\n\n`Vocabs` \u7c7b\u662f\u7528\u6765\u96c6\u4e2d\u7ba1\u7406\u6240\u6709\u7684\u8bcd\u6c47\u8868\u7684\u3002\u6bcf\u4e2a `Vocab` \u5bf9\u8c61\u5305\u542b\u4e86\u5bf9\u8c61\u5230\u7d22\u5f15\u7684\u6620\u5c04\uff0c\u7d22\u5f15\u5230\u5bf9\u8c61\u7684\u6620\u5c04\uff0c\u4ee5\u53ca\u4e00\u4e9b\u5176\u5b83\u7684\u5c5e\u6027\u548c\u65b9\u6cd5\u3002\n\n`Meta` \u7c7b\u7528\u6765\u7ba1\u7406\u5143\u6570\u636e\uff0c\u5305\u62ec\u52a0\u8f7d\u3001\u4fdd\u5b58\u548c\u5347\u7ea7\u5143\u6570\u636e\u3002\n\n\u4ee5\u4e0b\u662f\u4e00\u4e2a\u7b80\u5355\u7684\u4f7f\u7528\u793a\u4f8b\uff1a\n\n```python\nfrom UniTok import UniDep\n\n# Load the data.\ndep = UniDep('data/news')\n\n# Get sample size.\nprint(len(dep))\n\n# Get the first sample.\nprint(dep[0])\n```\n\n\n",
"bugtrack_url": null,
"license": "MIT Licence",
"summary": "Unified Tokenizer",
"version": "3.5.3",
"project_urls": {
"Homepage": "https://github.com/Jyonn/UnifiedTokenizer"
},
"split_keywords": [
"token",
" tokenizer"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6353aab09922cc3122017b9bb3cd25be0bff45cd8b37af2e3d4b6d11de5deb71",
"md5": "e6980b22b61924e85ff58815fd62d28b",
"sha256": "48bef34bd7387ce2ed0783953966d4cdf4164c680d786568ef5285b77ebaa6da"
},
"downloads": -1,
"filename": "UniTok-3.5.3.tar.gz",
"has_sig": false,
"md5_digest": "e6980b22b61924e85ff58815fd62d28b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 22205,
"upload_time": "2024-11-24T23:56:42",
"upload_time_iso_8601": "2024-11-24T23:56:42.178093Z",
"url": "https://files.pythonhosted.org/packages/63/53/aab09922cc3122017b9bb3cd25be0bff45cd8b37af2e3d4b6d11de5deb71/UniTok-3.5.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-24 23:56:42",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Jyonn",
"github_project": "UnifiedTokenizer",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pandas",
"specs": []
},
{
"name": "transformers",
"specs": []
},
{
"name": "termplot",
"specs": []
},
{
"name": "numpy",
"specs": []
},
{
"name": "tqdm",
"specs": []
},
{
"name": "prettytable",
"specs": []
},
{
"name": "setuptools",
"specs": []
}
],
"lcname": "unitok"
}