kobert-transformers


Namekobert-transformers JSON
Version 0.6.0 PyPI version JSON
download
home_pagehttps://github.com/monologg/KoBERT-Transformers
SummaryTransformers library for KoBERT, DistilKoBERT
upload_time2024-08-20 11:15:56
maintainerNone
docs_urlNone
authorJangwon Park
requires_python>=3.6
licenseApache License 2.0
keywords distilkobert kobert bert pytorch transformers lightweight
VCS
bugtrack_url
requirements torch transformers sentencepiece
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # KoBERT-Transformers

`KoBERT` & `DistilKoBERT` on ๐Ÿค— Huggingface Transformers ๐Ÿค—

KoBERT ๋ชจ๋ธ์€ [๊ณต์‹ ๋ ˆํฌ](https://github.com/SKTBrain/KoBERT)์˜ ๊ฒƒ๊ณผ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ๋ ˆํฌ๋Š” **Huggingface tokenizer์˜ ๋ชจ๋“  API๋ฅผ ์ง€์›**ํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ œ์ž‘๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

## ๐Ÿšจ ์ค‘์š”! ๐Ÿšจ

### ๐Ÿ™ TL;DR

1. `transformers` ๋Š” `v3.0` ์ด์ƒ์„ ๋ฐ˜๋“œ์‹œ ์„ค์น˜!
2. `tokenizer`๋Š” ๋ณธ ๋ ˆํฌ์˜ `kobert_transformers/tokenization_kobert.py`๋ฅผ ์‚ฌ์šฉ!

### 1. Tokenizer ํ˜ธํ™˜

`Huggingface Transformers`๊ฐ€ `v2.9.0`๋ถ€ํ„ฐ tokenization ๊ด€๋ จ API๊ฐ€ ์ผ๋ถ€ ๋ณ€๊ฒฝ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด์— ๋งž์ถฐ ๊ธฐ์กด์˜ `tokenization_kobert.py`๋ฅผ ์ƒ์œ„ ๋ฒ„์ „์— ๋งž๊ฒŒ ์ˆ˜์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

### 2. Embedding์˜ padding_idx ์ด์Šˆ

์ด์ „๋ถ€ํ„ฐ `BertModel`์˜ `BertEmbeddings`์—์„œ `padding_idx=0`์œผ๋กœ **Hard-coding**๋˜์–ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. (์•„๋ž˜ ์ฝ”๋“œ ์ฐธ๊ณ )

```python
class BertEmbeddings(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=0)
        self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.hidden_size)
        self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
```

๊ทธ๋Ÿฌ๋‚˜ Sentencepiece์˜ ๊ฒฝ์šฐ ๊ธฐ๋ณธ๊ฐ’์œผ๋กœ `pad_token_id=1`, `unk_token_id=0`์œผ๋กœ ์„ค์ •์ด ๋˜์–ด ์žˆ๊ณ  (์ด๋Š” KoBERT๋„ ๋™์ผ), ์ด๋ฅผ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•˜๋Š” BertModel์˜ ๊ฒฝ์šฐ ์›์น˜ ์•Š์€ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Huggingface์—์„œ๋„ ์ตœ๊ทผ์— ํ•ด๋‹น ์ด์Šˆ๋ฅผ ์ธ์ง€ํ•˜์—ฌ ์ด๋ฅผ ์ˆ˜์ •ํ•˜์—ฌ `v2.9.0`์— ๋ฐ˜์˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ([๊ด€๋ จ PR #3793](https://github.com/huggingface/transformers/pull/3793)) config์— `pad_token_id=1` ์„ ์ถ”๊ฐ€ ๊ฐ€๋Šฅํ•˜์—ฌ ์ด๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

```python
class BertEmbeddings(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)
        self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.hidden_size)
        self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
```

๊ทธ๋Ÿฌ๋‚˜ `v.2.9.0`์—์„œ `DistilBERT`, `ALBERT` ๋“ฑ์—๋Š” ์ด ์ด์Šˆ๊ฐ€ ํ•ด๊ฒฐ๋˜์ง€ ์•Š์•„ ์ง์ ‘ PR์„ ์˜ฌ๋ ค ์ฒ˜๋ฆฌํ•˜์˜€๊ณ  ([๊ด€๋ จ PR #3965](https://github.com/huggingface/transformers/pull/3965)), **`v2.9.1`์— ์ตœ์ข…์ ์œผ๋กœ ๋ฐ˜์˜๋˜์–ด ๋ฐฐํฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.**

์•„๋ž˜๋Š” ์ด์ „๊ณผ ํ˜„์žฌ ๋ฒ„์ „์˜ ์ฐจ์ด์ ์„ ๋ณด์—ฌ์ฃผ๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

```python
# Transformers v2.7.0
>>> from transformers import BertModel, DistilBertModel
>>> model = BertModel.from_pretrained("monologg/kobert")
>>> model.embeddings.word_embeddings
Embedding(8002, 768, padding_idx=0)
>>> model = DistilBertModel.from_pretrained("monologg/distilkobert")
>>> model.embeddings.word_embeddings
Embedding(8002, 768, padding_idx=0)


### Transformers v2.9.1
>>> from transformers import BertModel, DistilBertModel
>>> model = BertModel.from_pretrained("monologg/kobert")
>>> model.embeddings.word_embeddings
Embedding(8002, 768, padding_idx=1)
>>> model = DistilBertModel.from_pretrained("monologg/distilkobert")
>>> model.embeddings.word_embeddings
Embedding(8002, 768, padding_idx=1)
```

## KoBERT / DistilKoBERT on ๐Ÿค— Transformers ๐Ÿค—

### Dependencies

- torch>=1.1.0
- transformers>=3,<5

### How to Use

```python
>>> from transformers import BertModel, DistilBertModel
>>> bert_model = BertModel.from_pretrained('monologg/kobert')
>>> distilbert_model = DistilBertModel.from_pretrained('monologg/distilkobert')
```

**Tokenizer๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด, [`kobert_transformers/tokenization_kobert.py`](https://github.com/monologg/KoBERT-Transformers/blob/master/kobert_transformers/tokenization_kobert.py) ํŒŒ์ผ์„ ๋ณต์‚ฌํ•œ ํ›„, `KoBertTokenizer`๋ฅผ ์ž„ํฌํŠธํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.**

- KoBERT์™€ DistilKoBERT ๋ชจ๋‘ ๋™์ผํ•œ ํ† ํฌ๋‚˜์ด์ €๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
- **๊ธฐ์กด KoBERT์˜ ๊ฒฝ์šฐ Special Token์ด ์ œ๋Œ€๋กœ ๋ถ„๋ฆฌ๋˜์ง€ ์•Š๋Š” ์ด์Šˆ**๊ฐ€ ์žˆ์–ด์„œ ํ•ด๋‹น ๋ถ€๋ถ„์„ ์ˆ˜์ •ํ•˜์—ฌ ๋ฐ˜์˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ([Issue link](https://github.com/SKTBrain/KoBERT/issues/11))

```python
>>> from tokenization_kobert import KoBertTokenizer
>>> tokenizer = KoBertTokenizer.from_pretrained('monologg/kobert') # monologg/distilkobert๋„ ๋™์ผ
>>> tokenizer.tokenize("[CLS] ํ•œ๊ตญ์–ด ๋ชจ๋ธ์„ ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค. [SEP]")
>>> ['[CLS]', 'โ–ํ•œ๊ตญ', '์–ด', 'โ–๋ชจ๋ธ', '์„', 'โ–๊ณต์œ ', 'ํ•ฉ๋‹ˆ๋‹ค', '.', '[SEP]']
>>> tokenizer.convert_tokens_to_ids(['[CLS]', 'โ–ํ•œ๊ตญ', '์–ด', 'โ–๋ชจ๋ธ', '์„', 'โ–๊ณต์œ ', 'ํ•ฉ๋‹ˆ๋‹ค', '.', '[SEP]'])
>>> [2, 4958, 6855, 2046, 7088, 1050, 7843, 54, 3]
```

## Kobert-Transformers (Pip library)

[![PyPI](https://img.shields.io/pypi/v/kobert-transformers)](https://pypi.org/project/kobert-transformers/)
[![license](https://img.shields.io/badge/license-Apache%202.0-red)](https://github.com/monologg/DistilKoBERT/blob/master/LICENSE)
[![Downloads](https://pepy.tech/badge/kobert-transformers)](https://pepy.tech/project/kobert-transformers)

- `tokenization_kobert.py`๋ฅผ ๋žฉํ•‘ํ•œ ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
- KoBERT, DistilKoBERT๋ฅผ Huggingface Transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ˜•ํƒœ๋กœ ์ œ๊ณต
- `v0.5.1`์ด์ƒ๋ถ€ํ„ฐ๋Š” `transformers v3.0` ์ด์ƒ์œผ๋กœ ๊ธฐ๋ณธ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค. (`transformers v4.0` ๊นŒ์ง€๋Š” ์ด์Šˆ ์—†์ด ์‚ฌ์šฉ ๊ฐ€๋Šฅ)

### Install Kobert-Transformers

```bash
pip3 install kobert-transformers
```

### How to Use

```python
>>> import torch
>>> from kobert_transformers import get_kobert_model, get_distilkobert_model
>>> model = get_kobert_model()
>>> model.eval()
>>> input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
>>> attention_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]])
>>> token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]])
>>> sequence_output, pooled_output = model(input_ids, attention_mask, token_type_ids)
>>> sequence_output[0]
tensor([[-0.2461,  0.2428,  0.2590,  ..., -0.4861, -0.0731,  0.0756],
        [-0.2478,  0.2420,  0.2552,  ..., -0.4877, -0.0727,  0.0754],
        [-0.2472,  0.2420,  0.2561,  ..., -0.4874, -0.0733,  0.0765]],
       grad_fn=<SelectBackward>)
```

```python
>>> from kobert_transformers import get_tokenizer
>>> tokenizer = get_tokenizer()
>>> tokenizer.tokenize("[CLS] ํ•œ๊ตญ์–ด ๋ชจ๋ธ์„ ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค. [SEP]")
['[CLS]', 'โ–ํ•œ๊ตญ', '์–ด', 'โ–๋ชจ๋ธ', '์„', 'โ–๊ณต์œ ', 'ํ•ฉ๋‹ˆ๋‹ค', '.', '[SEP]']
>>> tokenizer.convert_tokens_to_ids(['[CLS]', 'โ–ํ•œ๊ตญ', '์–ด', 'โ–๋ชจ๋ธ', '์„', 'โ–๊ณต์œ ', 'ํ•ฉ๋‹ˆ๋‹ค', '.', '[SEP]'])
[2, 4958, 6855, 2046, 7088, 1050, 7843, 54, 3]
```

## Reference

- [KoBERT](https://github.com/SKTBrain/KoBERT)
- [DistilKoBERT](https://github.com/monologg/DistilKoBERT)
- [Huggingface Transformers](https://github.com/huggingface/transformers)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/monologg/KoBERT-Transformers",
    "name": "kobert-transformers",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "distilkobert kobert bert pytorch transformers lightweight",
    "author": "Jangwon Park",
    "author_email": "adieujw@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e3/7c/aa6dd2025bf09fa235614962a8d0d7cb27ea739985f5788a05105200b7fb/kobert_transformers-0.6.0.tar.gz",
    "platform": null,
    "description": "# KoBERT-Transformers\n\n`KoBERT` & `DistilKoBERT` on \ud83e\udd17 Huggingface Transformers \ud83e\udd17\n\nKoBERT \ubaa8\ub378\uc740 [\uacf5\uc2dd \ub808\ud3ec](https://github.com/SKTBrain/KoBERT)\uc758 \uac83\uacfc \ub3d9\uc77c\ud569\ub2c8\ub2e4. \ubcf8 \ub808\ud3ec\ub294 **Huggingface tokenizer\uc758 \ubaa8\ub4e0 API\ub97c \uc9c0\uc6d0**\ud558\uae30 \uc704\ud574\uc11c \uc81c\uc791\ub418\uc5c8\uc2b5\ub2c8\ub2e4.\n\n## \ud83d\udea8 \uc911\uc694! \ud83d\udea8\n\n### \ud83d\ude4f TL;DR\n\n1. `transformers` \ub294 `v3.0` \uc774\uc0c1\uc744 \ubc18\ub4dc\uc2dc \uc124\uce58!\n2. `tokenizer`\ub294 \ubcf8 \ub808\ud3ec\uc758 `kobert_transformers/tokenization_kobert.py`\ub97c \uc0ac\uc6a9!\n\n### 1. Tokenizer \ud638\ud658\n\n`Huggingface Transformers`\uac00 `v2.9.0`\ubd80\ud130 tokenization \uad00\ub828 API\uac00 \uc77c\ubd80 \ubcc0\uacbd\ub418\uc5c8\uc2b5\ub2c8\ub2e4. \uc774\uc5d0 \ub9de\ucdb0 \uae30\uc874\uc758 `tokenization_kobert.py`\ub97c \uc0c1\uc704 \ubc84\uc804\uc5d0 \ub9de\uac8c \uc218\uc815\ud558\uc600\uc2b5\ub2c8\ub2e4.\n\n### 2. Embedding\uc758 padding_idx \uc774\uc288\n\n\uc774\uc804\ubd80\ud130 `BertModel`\uc758 `BertEmbeddings`\uc5d0\uc11c `padding_idx=0`\uc73c\ub85c **Hard-coding**\ub418\uc5b4 \uc788\uc5c8\uc2b5\ub2c8\ub2e4. (\uc544\ub798 \ucf54\ub4dc \ucc38\uace0)\n\n```python\nclass BertEmbeddings(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n        self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=0)\n        self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.hidden_size)\n        self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)\n```\n\n\uadf8\ub7ec\ub098 Sentencepiece\uc758 \uacbd\uc6b0 \uae30\ubcf8\uac12\uc73c\ub85c `pad_token_id=1`, `unk_token_id=0`\uc73c\ub85c \uc124\uc815\uc774 \ub418\uc5b4 \uc788\uace0 (\uc774\ub294 KoBERT\ub3c4 \ub3d9\uc77c), \uc774\ub97c \uadf8\ub300\ub85c \uc0ac\uc6a9\ud558\ub294 BertModel\uc758 \uacbd\uc6b0 \uc6d0\uce58 \uc54a\uc740 \uacb0\uacfc\ub97c \uac00\uc838\uc62c \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\nHuggingface\uc5d0\uc11c\ub3c4 \ucd5c\uadfc\uc5d0 \ud574\ub2f9 \uc774\uc288\ub97c \uc778\uc9c0\ud558\uc5ec \uc774\ub97c \uc218\uc815\ud558\uc5ec `v2.9.0`\uc5d0 \ubc18\uc601\ud558\uc600\uc2b5\ub2c8\ub2e4. ([\uad00\ub828 PR #3793](https://github.com/huggingface/transformers/pull/3793)) config\uc5d0 `pad_token_id=1` \uc744 \ucd94\uac00 \uac00\ub2a5\ud558\uc5ec \uc774\ub97c \ud574\uacb0\ud560 \uc218 \uc788\uac8c \ud558\uc600\uc2b5\ub2c8\ub2e4.\n\n```python\nclass BertEmbeddings(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n        self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)\n        self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.hidden_size)\n        self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)\n```\n\n\uadf8\ub7ec\ub098 `v.2.9.0`\uc5d0\uc11c `DistilBERT`, `ALBERT` \ub4f1\uc5d0\ub294 \uc774 \uc774\uc288\uac00 \ud574\uacb0\ub418\uc9c0 \uc54a\uc544 \uc9c1\uc811 PR\uc744 \uc62c\ub824 \ucc98\ub9ac\ud558\uc600\uace0 ([\uad00\ub828 PR #3965](https://github.com/huggingface/transformers/pull/3965)), **`v2.9.1`\uc5d0 \ucd5c\uc885\uc801\uc73c\ub85c \ubc18\uc601\ub418\uc5b4 \ubc30\ud3ec\ub418\uc5c8\uc2b5\ub2c8\ub2e4.**\n\n\uc544\ub798\ub294 \uc774\uc804\uacfc \ud604\uc7ac \ubc84\uc804\uc758 \ucc28\uc774\uc810\uc744 \ubcf4\uc5ec\uc8fc\ub294 \ucf54\ub4dc\uc785\ub2c8\ub2e4.\n\n```python\n# Transformers v2.7.0\n>>> from transformers import BertModel, DistilBertModel\n>>> model = BertModel.from_pretrained(\"monologg/kobert\")\n>>> model.embeddings.word_embeddings\nEmbedding(8002, 768, padding_idx=0)\n>>> model = DistilBertModel.from_pretrained(\"monologg/distilkobert\")\n>>> model.embeddings.word_embeddings\nEmbedding(8002, 768, padding_idx=0)\n\n\n### Transformers v2.9.1\n>>> from transformers import BertModel, DistilBertModel\n>>> model = BertModel.from_pretrained(\"monologg/kobert\")\n>>> model.embeddings.word_embeddings\nEmbedding(8002, 768, padding_idx=1)\n>>> model = DistilBertModel.from_pretrained(\"monologg/distilkobert\")\n>>> model.embeddings.word_embeddings\nEmbedding(8002, 768, padding_idx=1)\n```\n\n## KoBERT / DistilKoBERT on \ud83e\udd17 Transformers \ud83e\udd17\n\n### Dependencies\n\n- torch>=1.1.0\n- transformers>=3,<5\n\n### How to Use\n\n```python\n>>> from transformers import BertModel, DistilBertModel\n>>> bert_model = BertModel.from_pretrained('monologg/kobert')\n>>> distilbert_model = DistilBertModel.from_pretrained('monologg/distilkobert')\n```\n\n**Tokenizer\ub97c \uc0ac\uc6a9\ud558\ub824\uba74, [`kobert_transformers/tokenization_kobert.py`](https://github.com/monologg/KoBERT-Transformers/blob/master/kobert_transformers/tokenization_kobert.py) \ud30c\uc77c\uc744 \ubcf5\uc0ac\ud55c \ud6c4, `KoBertTokenizer`\ub97c \uc784\ud3ec\ud2b8\ud558\uba74 \ub429\ub2c8\ub2e4.**\n\n- KoBERT\uc640 DistilKoBERT \ubaa8\ub450 \ub3d9\uc77c\ud55c \ud1a0\ud06c\ub098\uc774\uc800\ub97c \uc0ac\uc6a9\ud569\ub2c8\ub2e4.\n- **\uae30\uc874 KoBERT\uc758 \uacbd\uc6b0 Special Token\uc774 \uc81c\ub300\ub85c \ubd84\ub9ac\ub418\uc9c0 \uc54a\ub294 \uc774\uc288**\uac00 \uc788\uc5b4\uc11c \ud574\ub2f9 \ubd80\ubd84\uc744 \uc218\uc815\ud558\uc5ec \ubc18\uc601\ud558\uc600\uc2b5\ub2c8\ub2e4. ([Issue link](https://github.com/SKTBrain/KoBERT/issues/11))\n\n```python\n>>> from tokenization_kobert import KoBertTokenizer\n>>> tokenizer = KoBertTokenizer.from_pretrained('monologg/kobert') # monologg/distilkobert\ub3c4 \ub3d9\uc77c\n>>> tokenizer.tokenize(\"[CLS] \ud55c\uad6d\uc5b4 \ubaa8\ub378\uc744 \uacf5\uc720\ud569\ub2c8\ub2e4. [SEP]\")\n>>> ['[CLS]', '\u2581\ud55c\uad6d', '\uc5b4', '\u2581\ubaa8\ub378', '\uc744', '\u2581\uacf5\uc720', '\ud569\ub2c8\ub2e4', '.', '[SEP]']\n>>> tokenizer.convert_tokens_to_ids(['[CLS]', '\u2581\ud55c\uad6d', '\uc5b4', '\u2581\ubaa8\ub378', '\uc744', '\u2581\uacf5\uc720', '\ud569\ub2c8\ub2e4', '.', '[SEP]'])\n>>> [2, 4958, 6855, 2046, 7088, 1050, 7843, 54, 3]\n```\n\n## Kobert-Transformers (Pip library)\n\n[![PyPI](https://img.shields.io/pypi/v/kobert-transformers)](https://pypi.org/project/kobert-transformers/)\n[![license](https://img.shields.io/badge/license-Apache%202.0-red)](https://github.com/monologg/DistilKoBERT/blob/master/LICENSE)\n[![Downloads](https://pepy.tech/badge/kobert-transformers)](https://pepy.tech/project/kobert-transformers)\n\n- `tokenization_kobert.py`\ub97c \ub7a9\ud551\ud55c \ud30c\uc774\uc36c \ub77c\uc774\ube0c\ub7ec\ub9ac\n- KoBERT, DistilKoBERT\ub97c Huggingface Transformers \ub77c\uc774\ube0c\ub7ec\ub9ac \ud615\ud0dc\ub85c \uc81c\uacf5\n- `v0.5.1`\uc774\uc0c1\ubd80\ud130\ub294 `transformers v3.0` \uc774\uc0c1\uc73c\ub85c \uae30\ubcf8 \uc124\uce58\ud569\ub2c8\ub2e4. (`transformers v4.0` \uae4c\uc9c0\ub294 \uc774\uc288 \uc5c6\uc774 \uc0ac\uc6a9 \uac00\ub2a5)\n\n### Install Kobert-Transformers\n\n```bash\npip3 install kobert-transformers\n```\n\n### How to Use\n\n```python\n>>> import torch\n>>> from kobert_transformers import get_kobert_model, get_distilkobert_model\n>>> model = get_kobert_model()\n>>> model.eval()\n>>> input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])\n>>> attention_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]])\n>>> token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]])\n>>> sequence_output, pooled_output = model(input_ids, attention_mask, token_type_ids)\n>>> sequence_output[0]\ntensor([[-0.2461,  0.2428,  0.2590,  ..., -0.4861, -0.0731,  0.0756],\n        [-0.2478,  0.2420,  0.2552,  ..., -0.4877, -0.0727,  0.0754],\n        [-0.2472,  0.2420,  0.2561,  ..., -0.4874, -0.0733,  0.0765]],\n       grad_fn=<SelectBackward>)\n```\n\n```python\n>>> from kobert_transformers import get_tokenizer\n>>> tokenizer = get_tokenizer()\n>>> tokenizer.tokenize(\"[CLS] \ud55c\uad6d\uc5b4 \ubaa8\ub378\uc744 \uacf5\uc720\ud569\ub2c8\ub2e4. [SEP]\")\n['[CLS]', '\u2581\ud55c\uad6d', '\uc5b4', '\u2581\ubaa8\ub378', '\uc744', '\u2581\uacf5\uc720', '\ud569\ub2c8\ub2e4', '.', '[SEP]']\n>>> tokenizer.convert_tokens_to_ids(['[CLS]', '\u2581\ud55c\uad6d', '\uc5b4', '\u2581\ubaa8\ub378', '\uc744', '\u2581\uacf5\uc720', '\ud569\ub2c8\ub2e4', '.', '[SEP]'])\n[2, 4958, 6855, 2046, 7088, 1050, 7843, 54, 3]\n```\n\n## Reference\n\n- [KoBERT](https://github.com/SKTBrain/KoBERT)\n- [DistilKoBERT](https://github.com/monologg/DistilKoBERT)\n- [Huggingface Transformers](https://github.com/huggingface/transformers)\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "Transformers library for KoBERT, DistilKoBERT",
    "version": "0.6.0",
    "project_urls": {
        "Homepage": "https://github.com/monologg/KoBERT-Transformers"
    },
    "split_keywords": [
        "distilkobert",
        "kobert",
        "bert",
        "pytorch",
        "transformers",
        "lightweight"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "77af2a85216d5a4faf2d29fa8325cfdda9f29f8b4d3ad56040162dfb8fca6992",
                "md5": "b9c99890ebfa8a3bbb43e230396bdb8a",
                "sha256": "4d5c170b53ff5256f0c8bffa98f2b2554f1fd4b0d38c3fdc549a17df5a9adb4f"
            },
            "downloads": -1,
            "filename": "kobert_transformers-0.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b9c99890ebfa8a3bbb43e230396bdb8a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 12403,
            "upload_time": "2024-08-20T11:15:55",
            "upload_time_iso_8601": "2024-08-20T11:15:55.766947Z",
            "url": "https://files.pythonhosted.org/packages/77/af/2a85216d5a4faf2d29fa8325cfdda9f29f8b4d3ad56040162dfb8fca6992/kobert_transformers-0.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e37caa6dd2025bf09fa235614962a8d0d7cb27ea739985f5788a05105200b7fb",
                "md5": "ab1918d37a8d10743757f28a6b842b93",
                "sha256": "47ecd26031e1ed500645d0bb7f773bcdc43086640f264b4eaef2032cdf49120c"
            },
            "downloads": -1,
            "filename": "kobert_transformers-0.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ab1918d37a8d10743757f28a6b842b93",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 14048,
            "upload_time": "2024-08-20T11:15:56",
            "upload_time_iso_8601": "2024-08-20T11:15:56.931244Z",
            "url": "https://files.pythonhosted.org/packages/e3/7c/aa6dd2025bf09fa235614962a8d0d7cb27ea739985f5788a05105200b7fb/kobert_transformers-0.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-20 11:15:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "monologg",
    "github_project": "KoBERT-Transformers",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    ">=",
                    "3"
                ],
                [
                    "<",
                    "5"
                ]
            ]
        },
        {
            "name": "sentencepiece",
            "specs": [
                [
                    ">=",
                    "0.1.91"
                ]
            ]
        }
    ],
    "lcname": "kobert-transformers"
}
        
Elapsed time: 0.33710s