<p align="center">
<br>
<img src="https://huggingface.co/landing/assets/tokenizers/tokenizers-logo.png" width="600"/>
<br>
<p>
<p align="center">
<a href="https://badge.fury.io/py/tokenizers">
<img alt="Build" src="https://badge.fury.io/py/tokenizers.svg">
</a>
<a href="https://github.com/huggingface/tokenizers/blob/master/LICENSE">
<img alt="GitHub" src="https://img.shields.io/github/license/huggingface/tokenizers.svg?color=blue">
</a>
</p>
<br>
# Tokenizers
Provides an implementation of today's most used tokenizers, with a focus on performance and
versatility.
Bindings over the [Rust](https://github.com/huggingface/tokenizers/tree/master/tokenizers) implementation.
If you are interested in the High-level design, you can go check it there.
Otherwise, let's dive in!
## Main features:
- Train new vocabularies and tokenize using 4 pre-made tokenizers (Bert WordPiece and the 3
most common BPE versions).
- Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes
less than 20 seconds to tokenize a GB of text on a server's CPU.
- Easy to use, but also extremely versatile.
- Designed for research and production.
- Normalization comes with alignments tracking. It's always possible to get the part of the
original sentence that corresponds to a given token.
- Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.
### Installation
#### With pip:
```bash
pip install tokenizers
```
#### From sources:
To use this method, you need to have the Rust installed:
```bash
# Install with:
curl https://sh.rustup.rs -sSf | sh -s -- -y
export PATH="$HOME/.cargo/bin:$PATH"
```
Once Rust is installed, you can compile doing the following
```bash
git clone https://github.com/huggingface/tokenizers
cd tokenizers/bindings/python
# Create a virtual env (you can use yours as well)
python -m venv .env
source .env/bin/activate
# Install `tokenizers` in the current virtual env
pip install -e .
```
### Load a pretrained tokenizer from the Hub
```python
from tokenizers import Tokenizer
tokenizer = Tokenizer.from_pretrained("bert-base-cased")
```
### Using the provided Tokenizers
We provide some pre-build tokenizers to cover the most common cases. You can easily load one of
these using some `vocab.json` and `merges.txt` files:
```python
from tokenizers import CharBPETokenizer
# Initialize a tokenizer
vocab = "./path/to/vocab.json"
merges = "./path/to/merges.txt"
tokenizer = CharBPETokenizer(vocab, merges)
# And then encode:
encoded = tokenizer.encode("I can feel the magic, can you?")
print(encoded.ids)
print(encoded.tokens)
```
And you can train them just as simply:
```python
from tokenizers import CharBPETokenizer
# Initialize a tokenizer
tokenizer = CharBPETokenizer()
# Then train it!
tokenizer.train([ "./path/to/files/1.txt", "./path/to/files/2.txt" ])
# Now, let's use it:
encoded = tokenizer.encode("I can feel the magic, can you?")
# And finally save it somewhere
tokenizer.save("./path/to/directory/my-bpe.tokenizer.json")
```
#### Provided Tokenizers
- `CharBPETokenizer`: The original BPE
- `ByteLevelBPETokenizer`: The byte level version of the BPE
- `SentencePieceBPETokenizer`: A BPE implementation compatible with the one used by SentencePiece
- `BertWordPieceTokenizer`: The famous Bert tokenizer, using WordPiece
All of these can be used and trained as explained above!
### Build your own
Whenever these provided tokenizers don't give you enough freedom, you can build your own tokenizer,
by putting all the different parts you need together.
You can check how we implemented the [provided tokenizers](https://github.com/huggingface/tokenizers/tree/master/bindings/python/py_src/tokenizers/implementations) and adapt them easily to your own needs.
#### Building a byte-level BPE
Here is an example showing how to build your own byte-level BPE by putting all the different pieces
together, and then saving it to a single file:
```python
from tokenizers import Tokenizer, models, pre_tokenizers, decoders, trainers, processors
# Initialize a tokenizer
tokenizer = Tokenizer(models.BPE())
# Customize pre-tokenization and decoding
tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel(add_prefix_space=True)
tokenizer.decoder = decoders.ByteLevel()
tokenizer.post_processor = processors.ByteLevel(trim_offsets=True)
# And then train
trainer = trainers.BpeTrainer(
vocab_size=20000,
min_frequency=2,
initial_alphabet=pre_tokenizers.ByteLevel.alphabet()
)
tokenizer.train([
"./path/to/dataset/1.txt",
"./path/to/dataset/2.txt",
"./path/to/dataset/3.txt"
], trainer=trainer)
# And Save it
tokenizer.save("byte-level-bpe.tokenizer.json", pretty=True)
```
Now, when you want to use this tokenizer, this is as simple as:
```python
from tokenizers import Tokenizer
tokenizer = Tokenizer.from_file("byte-level-bpe.tokenizer.json")
encoded = tokenizer.encode("I can feel the magic, can you?")
```
Raw data
{
"_id": null,
"home_page": null,
"name": "tokenizers",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "NLP, tokenizer, BPE, transformer, deep learning",
"author": "Anthony MOI <m.anthony.moi@gmail.com>",
"author_email": "Nicolas Patry <patry.nicolas@protonmail.com>, Anthony Moi <anthony@huggingface.co>",
"download_url": "https://files.pythonhosted.org/packages/20/41/c2be10975ca37f6ec40d7abd7e98a5213bb04f284b869c1a24e6504fd94d/tokenizers-0.21.0.tar.gz",
"platform": null,
"description": "<p align=\"center\">\n <br>\n <img src=\"https://huggingface.co/landing/assets/tokenizers/tokenizers-logo.png\" width=\"600\"/>\n <br>\n<p>\n<p align=\"center\">\n <a href=\"https://badge.fury.io/py/tokenizers\">\n <img alt=\"Build\" src=\"https://badge.fury.io/py/tokenizers.svg\">\n </a>\n <a href=\"https://github.com/huggingface/tokenizers/blob/master/LICENSE\">\n <img alt=\"GitHub\" src=\"https://img.shields.io/github/license/huggingface/tokenizers.svg?color=blue\">\n </a>\n</p>\n<br>\n\n# Tokenizers\n\nProvides an implementation of today's most used tokenizers, with a focus on performance and\nversatility.\n\nBindings over the [Rust](https://github.com/huggingface/tokenizers/tree/master/tokenizers) implementation.\nIf you are interested in the High-level design, you can go check it there.\n\nOtherwise, let's dive in!\n\n## Main features:\n\n - Train new vocabularies and tokenize using 4 pre-made tokenizers (Bert WordPiece and the 3\n most common BPE versions).\n - Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes\n less than 20 seconds to tokenize a GB of text on a server's CPU.\n - Easy to use, but also extremely versatile.\n - Designed for research and production.\n - Normalization comes with alignments tracking. It's always possible to get the part of the\n original sentence that corresponds to a given token.\n - Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.\n\n### Installation\n\n#### With pip:\n\n```bash\npip install tokenizers\n```\n\n#### From sources:\n\nTo use this method, you need to have the Rust installed:\n\n```bash\n# Install with:\ncurl https://sh.rustup.rs -sSf | sh -s -- -y\nexport PATH=\"$HOME/.cargo/bin:$PATH\"\n```\n\nOnce Rust is installed, you can compile doing the following\n\n```bash\ngit clone https://github.com/huggingface/tokenizers\ncd tokenizers/bindings/python\n\n# Create a virtual env (you can use yours as well)\npython -m venv .env\nsource .env/bin/activate\n\n# Install `tokenizers` in the current virtual env\npip install -e .\n```\n\n### Load a pretrained tokenizer from the Hub\n\n```python\nfrom tokenizers import Tokenizer\n\ntokenizer = Tokenizer.from_pretrained(\"bert-base-cased\")\n```\n\n### Using the provided Tokenizers\n\nWe provide some pre-build tokenizers to cover the most common cases. You can easily load one of\nthese using some `vocab.json` and `merges.txt` files:\n\n```python\nfrom tokenizers import CharBPETokenizer\n\n# Initialize a tokenizer\nvocab = \"./path/to/vocab.json\"\nmerges = \"./path/to/merges.txt\"\ntokenizer = CharBPETokenizer(vocab, merges)\n\n# And then encode:\nencoded = tokenizer.encode(\"I can feel the magic, can you?\")\nprint(encoded.ids)\nprint(encoded.tokens)\n```\n\nAnd you can train them just as simply:\n\n```python\nfrom tokenizers import CharBPETokenizer\n\n# Initialize a tokenizer\ntokenizer = CharBPETokenizer()\n\n# Then train it!\ntokenizer.train([ \"./path/to/files/1.txt\", \"./path/to/files/2.txt\" ])\n\n# Now, let's use it:\nencoded = tokenizer.encode(\"I can feel the magic, can you?\")\n\n# And finally save it somewhere\ntokenizer.save(\"./path/to/directory/my-bpe.tokenizer.json\")\n```\n\n#### Provided Tokenizers\n\n - `CharBPETokenizer`: The original BPE\n - `ByteLevelBPETokenizer`: The byte level version of the BPE\n - `SentencePieceBPETokenizer`: A BPE implementation compatible with the one used by SentencePiece\n - `BertWordPieceTokenizer`: The famous Bert tokenizer, using WordPiece\n\nAll of these can be used and trained as explained above!\n\n### Build your own\n\nWhenever these provided tokenizers don't give you enough freedom, you can build your own tokenizer,\nby putting all the different parts you need together.\nYou can check how we implemented the [provided tokenizers](https://github.com/huggingface/tokenizers/tree/master/bindings/python/py_src/tokenizers/implementations) and adapt them easily to your own needs.\n\n#### Building a byte-level BPE\n\nHere is an example showing how to build your own byte-level BPE by putting all the different pieces\ntogether, and then saving it to a single file:\n\n```python\nfrom tokenizers import Tokenizer, models, pre_tokenizers, decoders, trainers, processors\n\n# Initialize a tokenizer\ntokenizer = Tokenizer(models.BPE())\n\n# Customize pre-tokenization and decoding\ntokenizer.pre_tokenizer = pre_tokenizers.ByteLevel(add_prefix_space=True)\ntokenizer.decoder = decoders.ByteLevel()\ntokenizer.post_processor = processors.ByteLevel(trim_offsets=True)\n\n# And then train\ntrainer = trainers.BpeTrainer(\n vocab_size=20000,\n min_frequency=2,\n initial_alphabet=pre_tokenizers.ByteLevel.alphabet()\n)\ntokenizer.train([\n \"./path/to/dataset/1.txt\",\n \"./path/to/dataset/2.txt\",\n \"./path/to/dataset/3.txt\"\n], trainer=trainer)\n\n# And Save it\ntokenizer.save(\"byte-level-bpe.tokenizer.json\", pretty=True)\n```\n\nNow, when you want to use this tokenizer, this is as simple as:\n\n```python\nfrom tokenizers import Tokenizer\n\ntokenizer = Tokenizer.from_file(\"byte-level-bpe.tokenizer.json\")\n\nencoded = tokenizer.encode(\"I can feel the magic, can you?\")\n```\n\n",
"bugtrack_url": null,
"license": null,
"summary": null,
"version": "0.21.0",
"project_urls": {
"Homepage": "https://github.com/huggingface/tokenizers",
"Source": "https://github.com/huggingface/tokenizers"
},
"split_keywords": [
"nlp",
" tokenizer",
" bpe",
" transformer",
" deep learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b05c8b09607b37e996dc47e70d6a7b6f4bdd4e4d5ab22fe49d7374565c7fefaf",
"md5": "1a9c7b080e453bb6e3e0fe0820d4fc61",
"sha256": "3c4c93eae637e7d2aaae3d376f06085164e1660f89304c0ab2b1d08a406636b2"
},
"downloads": -1,
"filename": "tokenizers-0.21.0-cp39-abi3-macosx_10_12_x86_64.whl",
"has_sig": false,
"md5_digest": "1a9c7b080e453bb6e3e0fe0820d4fc61",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 2647461,
"upload_time": "2024-11-27T13:11:07",
"upload_time_iso_8601": "2024-11-27T13:11:07.911137Z",
"url": "https://files.pythonhosted.org/packages/b0/5c/8b09607b37e996dc47e70d6a7b6f4bdd4e4d5ab22fe49d7374565c7fefaf/tokenizers-0.21.0-cp39-abi3-macosx_10_12_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "227a88e58bb297c22633ed1c9d16029316e5b5ac5ee44012164c2edede599a5e",
"md5": "05aca62284c1e246bebd64105bc3c4ce",
"sha256": "f53ea537c925422a2e0e92a24cce96f6bc5046bbef24a1652a5edc8ba975f62e"
},
"downloads": -1,
"filename": "tokenizers-0.21.0-cp39-abi3-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "05aca62284c1e246bebd64105bc3c4ce",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 2563639,
"upload_time": "2024-11-27T13:11:05",
"upload_time_iso_8601": "2024-11-27T13:11:05.908963Z",
"url": "https://files.pythonhosted.org/packages/22/7a/88e58bb297c22633ed1c9d16029316e5b5ac5ee44012164c2edede599a5e/tokenizers-0.21.0-cp39-abi3-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f71483429177c19364df27d22bc096d4c2e431e0ba43e56c525434f1f9b0fd00",
"md5": "3d93aac1a8f5a5f8b470c05273fc258b",
"sha256": "6b177fb54c4702ef611de0c069d9169f0004233890e0c4c5bd5508ae05abf193"
},
"downloads": -1,
"filename": "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"has_sig": false,
"md5_digest": "3d93aac1a8f5a5f8b470c05273fc258b",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 2903304,
"upload_time": "2024-11-27T13:10:51",
"upload_time_iso_8601": "2024-11-27T13:10:51.315959Z",
"url": "https://files.pythonhosted.org/packages/f7/14/83429177c19364df27d22bc096d4c2e431e0ba43e56c525434f1f9b0fd00/tokenizers-0.21.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "7edb3433eab42347e0dc5452d8fcc8da03f638c9accffefe5a7c78146666964a",
"md5": "97855b6b0e6e1656845e3b5ce395c0ad",
"sha256": "6b43779a269f4629bebb114e19c3fca0223296ae9fea8bb9a7a6c6fb0657ff8e"
},
"downloads": -1,
"filename": "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl",
"has_sig": false,
"md5_digest": "97855b6b0e6e1656845e3b5ce395c0ad",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 2804378,
"upload_time": "2024-11-27T13:10:53",
"upload_time_iso_8601": "2024-11-27T13:10:53.513518Z",
"url": "https://files.pythonhosted.org/packages/7e/db/3433eab42347e0dc5452d8fcc8da03f638c9accffefe5a7c78146666964a/tokenizers-0.21.0-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "578b7da5e6f89736c2ade02816b4733983fca1c226b0c42980b1ae9dc8fcf5cc",
"md5": "491d4fc72dc54271d07054d3f9dd3d9e",
"sha256": "9aeb255802be90acfd363626753fda0064a8df06031012fe7d52fd9a905eb00e"
},
"downloads": -1,
"filename": "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl",
"has_sig": false,
"md5_digest": "491d4fc72dc54271d07054d3f9dd3d9e",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 3095488,
"upload_time": "2024-11-27T13:11:00",
"upload_time_iso_8601": "2024-11-27T13:11:00.662868Z",
"url": "https://files.pythonhosted.org/packages/57/8b/7da5e6f89736c2ade02816b4733983fca1c226b0c42980b1ae9dc8fcf5cc/tokenizers-0.21.0-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4df65ed6711093dc2c04a4e03f6461798b12669bc5a17c8be7cce1240e0b5ce8",
"md5": "c550654c9b548a47bf1b8a29a1b7076c",
"sha256": "d8b09dbeb7a8d73ee204a70f94fc06ea0f17dcf0844f16102b9f414f0b7463ba"
},
"downloads": -1,
"filename": "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl",
"has_sig": false,
"md5_digest": "c550654c9b548a47bf1b8a29a1b7076c",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 3121410,
"upload_time": "2024-11-27T13:10:55",
"upload_time_iso_8601": "2024-11-27T13:10:55.674455Z",
"url": "https://files.pythonhosted.org/packages/4d/f6/5ed6711093dc2c04a4e03f6461798b12669bc5a17c8be7cce1240e0b5ce8/tokenizers-0.21.0-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "814207600892d48950c5e80505b81411044a2d969368cdc0d929b1c847bf6697",
"md5": "180b35cd8597b69c8676038401b67add",
"sha256": "400832c0904f77ce87c40f1a8a27493071282f785724ae62144324f171377273"
},
"downloads": -1,
"filename": "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl",
"has_sig": false,
"md5_digest": "180b35cd8597b69c8676038401b67add",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 3388821,
"upload_time": "2024-11-27T13:10:58",
"upload_time_iso_8601": "2024-11-27T13:10:58.401781Z",
"url": "https://files.pythonhosted.org/packages/81/42/07600892d48950c5e80505b81411044a2d969368cdc0d929b1c847bf6697/tokenizers-0.21.0-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "220669d7ce374747edaf1695a4f61b83570d91cc8bbfc51ccfecf76f56ab4aac",
"md5": "805817b4adbc8e61b9484362ac488ab5",
"sha256": "e84ca973b3a96894d1707e189c14a774b701596d579ffc7e69debfc036a61a04"
},
"downloads": -1,
"filename": "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "805817b4adbc8e61b9484362ac488ab5",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 3008868,
"upload_time": "2024-11-27T13:11:03",
"upload_time_iso_8601": "2024-11-27T13:11:03.734381Z",
"url": "https://files.pythonhosted.org/packages/22/06/69d7ce374747edaf1695a4f61b83570d91cc8bbfc51ccfecf76f56ab4aac/tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c86954a0aee4d576045b49a0eb8bffdc495634309c823bf886042e6f46b80058",
"md5": "35862cf2955d2487be9b81966de2d273",
"sha256": "eb7202d231b273c34ec67767378cd04c767e967fda12d4a9e36208a34e2f137e"
},
"downloads": -1,
"filename": "tokenizers-0.21.0-cp39-abi3-musllinux_1_2_aarch64.whl",
"has_sig": false,
"md5_digest": "35862cf2955d2487be9b81966de2d273",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 8975831,
"upload_time": "2024-11-27T13:11:10",
"upload_time_iso_8601": "2024-11-27T13:11:10.320054Z",
"url": "https://files.pythonhosted.org/packages/c8/69/54a0aee4d576045b49a0eb8bffdc495634309c823bf886042e6f46b80058/tokenizers-0.21.0-cp39-abi3-musllinux_1_2_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f7f3b776061e4f3ebf2905ba1a25d90380aafd10c02d406437a8ba22d1724d76",
"md5": "287254038b0ced1830e8f95d72d0e7cc",
"sha256": "089d56db6782a73a27fd8abf3ba21779f5b85d4a9f35e3b493c7bbcbbf0d539b"
},
"downloads": -1,
"filename": "tokenizers-0.21.0-cp39-abi3-musllinux_1_2_armv7l.whl",
"has_sig": false,
"md5_digest": "287254038b0ced1830e8f95d72d0e7cc",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 8920746,
"upload_time": "2024-11-27T13:11:13",
"upload_time_iso_8601": "2024-11-27T13:11:13.238977Z",
"url": "https://files.pythonhosted.org/packages/f7/f3/b776061e4f3ebf2905ba1a25d90380aafd10c02d406437a8ba22d1724d76/tokenizers-0.21.0-cp39-abi3-musllinux_1_2_armv7l.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d8eece83d5ec8b6844ad4c3ecfe3333d58ecc1adc61f0878b323a15355bcab24",
"md5": "ea41a5ca79575874d5e6ab799429544c",
"sha256": "c87ca3dc48b9b1222d984b6b7490355a6fdb411a2d810f6f05977258400ddb74"
},
"downloads": -1,
"filename": "tokenizers-0.21.0-cp39-abi3-musllinux_1_2_i686.whl",
"has_sig": false,
"md5_digest": "ea41a5ca79575874d5e6ab799429544c",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 9161814,
"upload_time": "2024-11-27T13:11:16",
"upload_time_iso_8601": "2024-11-27T13:11:16.675096Z",
"url": "https://files.pythonhosted.org/packages/d8/ee/ce83d5ec8b6844ad4c3ecfe3333d58ecc1adc61f0878b323a15355bcab24/tokenizers-0.21.0-cp39-abi3-musllinux_1_2_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "18073e88e65c0ed28fa93aa0c4d264988428eef3df2764c3126dc83e243cb36f",
"md5": "dfbbe50dec074a64ef32f1d644aa843d",
"sha256": "4145505a973116f91bc3ac45988a92e618a6f83eb458f49ea0790df94ee243ff"
},
"downloads": -1,
"filename": "tokenizers-0.21.0-cp39-abi3-musllinux_1_2_x86_64.whl",
"has_sig": false,
"md5_digest": "dfbbe50dec074a64ef32f1d644aa843d",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 9357138,
"upload_time": "2024-11-27T13:11:20",
"upload_time_iso_8601": "2024-11-27T13:11:20.090453Z",
"url": "https://files.pythonhosted.org/packages/18/07/3e88e65c0ed28fa93aa0c4d264988428eef3df2764c3126dc83e243cb36f/tokenizers-0.21.0-cp39-abi3-musllinux_1_2_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "15b0dc4572ca61555fc482ebc933f26cb407c6aceb3dc19c301c68184f8cad03",
"md5": "9188b8fdcbb843e36220f99c4e8b81b0",
"sha256": "eb1702c2f27d25d9dd5b389cc1f2f51813e99f8ca30d9e25348db6585a97e24a"
},
"downloads": -1,
"filename": "tokenizers-0.21.0-cp39-abi3-win32.whl",
"has_sig": false,
"md5_digest": "9188b8fdcbb843e36220f99c4e8b81b0",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 2202266,
"upload_time": "2024-11-27T13:11:28",
"upload_time_iso_8601": "2024-11-27T13:11:28.784897Z",
"url": "https://files.pythonhosted.org/packages/15/b0/dc4572ca61555fc482ebc933f26cb407c6aceb3dc19c301c68184f8cad03/tokenizers-0.21.0-cp39-abi3-win32.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4469d21eb253fa91622da25585d362a874fa4710be600f0ea9446d8d0217cec1",
"md5": "3624df9173b0fa2ad62013a39f72c73a",
"sha256": "87841da5a25a3a5f70c102de371db120f41873b854ba65e52bccd57df5a3780c"
},
"downloads": -1,
"filename": "tokenizers-0.21.0-cp39-abi3-win_amd64.whl",
"has_sig": false,
"md5_digest": "3624df9173b0fa2ad62013a39f72c73a",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.7",
"size": 2389192,
"upload_time": "2024-11-27T13:11:25",
"upload_time_iso_8601": "2024-11-27T13:11:25.724119Z",
"url": "https://files.pythonhosted.org/packages/44/69/d21eb253fa91622da25585d362a874fa4710be600f0ea9446d8d0217cec1/tokenizers-0.21.0-cp39-abi3-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2041c2be10975ca37f6ec40d7abd7e98a5213bb04f284b869c1a24e6504fd94d",
"md5": "d03aa5c857cb696ab19545505b9f92dc",
"sha256": "ee0894bf311b75b0c03079f33859ae4b2334d675d4e93f5a4132e1eae2834fe4"
},
"downloads": -1,
"filename": "tokenizers-0.21.0.tar.gz",
"has_sig": false,
"md5_digest": "d03aa5c857cb696ab19545505b9f92dc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 343021,
"upload_time": "2024-11-27T13:11:23",
"upload_time_iso_8601": "2024-11-27T13:11:23.890954Z",
"url": "https://files.pythonhosted.org/packages/20/41/c2be10975ca37f6ec40d7abd7e98a5213bb04f284b869c1a24e6504fd94d/tokenizers-0.21.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-27 13:11:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "huggingface",
"github_project": "tokenizers",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "tokenizers"
}