pyonmttok

Name	pyonmttok JSON
Version	1.36.0 JSON
	download
home_page	https://opennmt.net
Summary	Fast and customizable text tokenization library with BPE and SentencePiece support
upload_time	2023-01-11 13:46:07
maintainer
docs_url	None
author	OpenNMT
requires_python	>=3.6,<3.12
license	MIT
keywords	tokenization opennmt unicode bpe sentencepiece subword
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # pyonmttok

**pyonmttok** is the Python wrapper for [OpenNMT/Tokenizer](https://github.com/OpenNMT/Tokenizer), a fast and customizable text tokenization library with BPE and SentencePiece support.

**Installation:**

```bash
pip install pyonmttok
```

**Requirements:**

* OS: Linux, macOS, Windows
* Python version: >= 3.6
* pip version: >= 19.3

**Table of contents**

1. [Tokenization](#tokenization)
1. [Subword learning](#subword-learning)
1. [Vocabulary](#vocabulary)
1. [Token API](#token-api)
1. [Utilities](#utilities)

## Tokenization

### Example

```python
>>> import pyonmtok
>>> tokenizer = pyonmttok.Tokenizer("aggressive", joiner_annotate=True)
>>> tokens = tokenizer("Hello World!")
>>> tokens
['Hello', 'World', '￭!']
>>> tokenizer.detokenize(tokens)
'Hello World!'
```

### Interface

#### Constructor

```python
tokenizer = pyonmttok.Tokenizer(
    mode: str,
    *,
    lang: Optional[str] = None,
    bpe_model_path: Optional[str] = None,
    bpe_dropout: float = 0,
    vocabulary: Optional[List[str]] = None,
    vocabulary_path: Optional[str] = None,
    vocabulary_threshold: int = 0,
    sp_model_path: Optional[str] = None,
    sp_nbest_size: int = 0,
    sp_alpha: float = 0.1,
    joiner: str = "￭",
    joiner_annotate: bool = False,
    joiner_new: bool = False,
    support_prior_joiners: bool = False,
    spacer_annotate: bool = False,
    spacer_new: bool = False,
    case_feature: bool = False,
    case_markup: bool = False,
    soft_case_regions: bool = False,
    no_substitution: bool = False,
    with_separators: bool = False,
    preserve_placeholders: bool = False,
    preserve_segmented_tokens: bool = False,
    segment_case: bool = False,
    segment_numbers: bool = False,
    segment_alphabet_change: bool = False,
    segment_alphabet: Optional[List[str]] = None,
)

# SentencePiece-compatible tokenizer.
tokenizer = pyonmttok.SentencePieceTokenizer(
    model_path: str,
    vocabulary_path: Optional[str] = None,
    vocabulary_threshold: int = 0,
    nbest_size: int = 0,
    alpha: float = 0.1,
)

# Copy constructor.
tokenizer = pyonmttok.Tokenizer(tokenizer: pyonmttok.Tokenizer)

# Return the tokenization options (excluding options related to subword).
tokenizer.options
```

See the [documentation](https://github.com/OpenNMT/Tokenizer/blob/master/docs/options.md) for a description of each tokenization option.

#### Tokenization

```python
# Tokenize a text.
# When training=False, subword regularization such as BPE dropout is disabled.
tokenizer.__call__(text: str, training: bool = True) -> List[str]

# Tokenize a text and return optional features.
# When as_token_objects=True, the method returns Token objects (see below).
tokenizer.tokenize(
    text: str,
    as_token_objects: bool = False,
    training: bool = True,
) -> Union[Tuple[List[str], Optional[List[List[str]]]], List[pyonmttok.Token]]

# Tokenize a batch of text.
tokenizer.tokenize_batch(
    batch_text: List[str],
    as_token_objects: bool = False,
    training: bool = True,
) -> Union[Tuple[List[List[str]], List[Optional[List[List[str]]]]], List[List[pyonmttok.Token]]]

# Tokenize a file.
tokenizer.tokenize_file(
    input_path: str,
    output_path: str,
    num_threads: int = 1,
    verbose: bool = False,
    training: bool = True,
    tokens_delimiter: str = " ",
)
```

#### Detokenization

```python
# The detokenize method converts a list of tokens back to a string.
tokenizer.detokenize(
    tokens: List[str],
    features: Optional[List[List[str]]] = None,
) -> str
tokenizer.detokenize(tokens: List[pyonmttok.Token]) -> str

# The detokenize_with_ranges method also returns a dictionary mapping a token
# index to a range in the detokenized text.
# Set merge_ranges=True to merge consecutive ranges, e.g. subwords of the same
# token in case of subword tokenization.
# Set unicode_ranges=True to return ranges over Unicode characters instead of bytes.
tokenizer.detokenize_with_ranges(
    tokens: Union[List[str], List[pyonmttok.Token]],
    merge_ranges: bool = False,
    unicode_ranges: bool = False,
) -> Tuple[str, Dict[int, Tuple[int, int]]]

# Detokenize a file.
tokenizer.detokenize_file(
    input_path: str,
    output_path: str,
    tokens_delimiter: str = " ",
)
```

## Subword learning

### Example

The Python wrapper supports BPE and SentencePiece subword learning through a common interface:

**1\. Create the subword learner with the tokenization you want to apply, e.g.:**

```python
# BPE is trained and applied on the tokenization output before joiner (or spacer) annotations.
tokenizer = pyonmttok.Tokenizer("aggressive", joiner_annotate=True, segment_numbers=True)
learner = pyonmttok.BPELearner(tokenizer=tokenizer, symbols=32000)

# SentencePiece can learn from raw sentences so a tokenizer in not required.
learner = pyonmttok.SentencePieceLearner(vocab_size=32000, character_coverage=0.98)
```

**2\. Feed some raw data:**

```python
# Feed detokenized sentences:
learner.ingest("Hello world!")
learner.ingest("How are you?")

# or detokenized text files:
learner.ingest_file("/data/train1.en")
learner.ingest_file("/data/train2.en")
```

**3\. Start the learning process:**

```python
tokenizer = learner.learn("/data/model-32k")
```

The returned `tokenizer` instance can be used to apply subword tokenization on new data.

### Interface

```python
# See https://github.com/rsennrich/subword-nmt/blob/master/subword_nmt/learn_bpe.py
# for argument documentation.
learner = pyonmttok.BPELearner(
    tokenizer: Optional[pyonmttok.Tokenizer] = None,  # Defaults to tokenization mode "space".
    symbols: int = 10000,
    min_frequency: int = 2,
    total_symbols: bool = False,
)

# See https://github.com/google/sentencepiece/blob/master/src/spm_train_main.cc
# for available training options.
learner = pyonmttok.SentencePieceLearner(
    tokenizer: Optional[pyonmttok.Tokenizer] = None,  # Defaults to tokenization mode "none".
    keep_vocab: bool = False,  # Keep the generated vocabulary (model_path will act like model_prefix in spm_train)
    **training_options,
)

learner.ingest(text: str)
learner.ingest_file(path: str)
learner.ingest_token(token: Union[str, pyonmttok.Token])

learner.learn(model_path: str, verbose: bool = False) -> pyonmttok.Tokenizer
```

## Vocabulary

### Example

```python
tokenizer = pyonmttok.Tokenizer("aggressive", joiner_annotate=True)

with open("train.txt") as train_file:
    vocab = pyonmttok.build_vocab_from_lines(
        train_file,
        tokenizer=tokenizer,
        maximum_size=32000,
        special_tokens=["<blank>", "<unk>", "<s>", "</s>"],
    )

with open("vocab.txt", "w") as vocab_file:
    for token in vocab.ids_to_tokens:
        vocab_file.write("%s\n" % token)
```

### Interface

```python
# Special tokens are added with ids 0, 1, etc., and are never removed by a resize.
vocab = pyonmttok.Vocab(special_tokens: Optional[List[str]] = None)

# Read-only properties.
vocab.tokens_to_ids -> Dict[str, int]
vocab.ids_to_tokens -> List[str]
vocab.counters -> List[int]

# Get or set the ID returned for out-of-vocabulary tokens.
# By default, it is the ID of the token <unk> if present in the vocabulary, len(vocab) otherwise.
vocab.default_id -> int

vocab.lookup_token(token: str) -> int
vocab.lookup_index(index: int) -> str

# Calls lookup_token on a batch of tokens.
vocab.__call__(tokens: List[str]) -> List[int]

vocab.__len__() -> int                  # Implements: len(vocab)
vocab.__contains__(token: str) -> bool  # Implements: "hello" in vocab
vocab.__getitem__(token: str) -> int    # Implements: vocab["hello"]

# Add tokens to the vocabulary after tokenization.
# If a tokenizer is not set, the text is split on spaces.
vocab.add_from_text(text: str, tokenizer: Optional[pyonmttok.Tokenizer] = None) -> None
vocab.add_from_file(path: str, tokenizer: Optional[pyonmttok.Tokenizer] = None) -> None
vocab.add_token(token: str, count: int = 1) -> None

vocab.resize(maximum_size: int = 0, minimum_frequency: int = 1) -> None


# Build a vocabulary from an iterator of lines.
# If a tokenizer is not set, the lines are split on spaces.
pyonmttok.build_vocab_from_lines(
    lines: Iterable[str],
    tokenizer: Optional[pyonmttok.Tokenizer] = None,
    maximum_size: int = 0,
    minimum_frequency: int = 1,
    special_tokens: Optional[List[str]] = None,
) -> pyonmttok.Vocab

# Build a vocabulary from an iterator of tokens.
pyonmttok.build_vocab_from_tokens(
    tokens: Iterable[str],
    maximum_size: int = 0,
    minimum_frequency: int = 1,
    special_tokens: Optional[List[str]] = None,
) -> pyonmttok.Vocab
```

## Token API

The Token API allows to tokenize text into `pyonmttok.Token` objects. This API can be useful to apply some logics at the token level but still retain enough information to write the tokenization on disk or detokenize.

### Example

```python
>>> tokenizer = pyonmttok.Tokenizer("aggressive", joiner_annotate=True)
>>> tokens = tokenizer.tokenize("Hello World!", as_token_objects=True)
>>> tokens
[Token('Hello'), Token('World'), Token('!', join_left=True)]
>>> tokens[-1].surface
'!'
>>> tokenizer.serialize_tokens(tokens)[0]
['Hello', 'World', '￭!']
>>> tokens[-1].surface = '.'
>>> tokenizer.serialize_tokens(tokens)[0]
['Hello', 'World', '￭.']
>>> tokenizer.detokenize(tokens)
'Hello World.'
```

### Interface

The `pyonmttok.Token` class has the following attributes:

* `surface`: a string, the token value
* `type`: a `pyonmttok.TokenType` value, the type of the token
* `join_left`: a boolean, whether the token should be joined to the token on the left or not
* `join_right`: a boolean, whether the token should be joined to the token on the right or not
* `preserve`: a boolean, whether joiners and spacers can be attached to this token or not
* `features`: a list of string, the features attached to the token
* `spacer`: a boolean, whether the token is prefixed by a SentencePiece spacer or not (only set when using SentencePiece)
* `casing`: a `pyonmttok.Casing` value, the casing of the token (only set when tokenizing with `case_feature` or `case_markup`)

The `pyonmttok.TokenType` enumeration is used to identify tokens that were split by a subword tokenization. The enumeration has the following values:

* `TokenType.WORD`
* `TokenType.LEADING_SUBWORD`
* `TokenType.TRAILING_SUBWORD`

The `pyonmttok.Casing` enumeration is used to identify the original casing of a token that was lowercased by the `case_feature` or `case_markup` tokenization options. The enumeration has the following values:

* `Casing.LOWERCASE`
* `Casing.UPPERCASE`
* `Casing.MIXED`
* `Casing.CAPITALIZED`
* `Casing.NONE`

The `Tokenizer` instances provide methods to serialize or deserialize `Token` objects:

```python
# Serialize Token objects to strings that can be saved on disk.
tokenizer.serialize_tokens(
    tokens: List[pyonmttok.Token],
) -> Tuple[List[str], Optional[List[List[str]]]]

# Deserialize strings into Token objects.
tokenizer.deserialize_tokens(
    tokens: List[str],
    features: Optional[List[List[str]]] = None,
) -> List[pyonmttok.Token]
```

## Utilities

### Interface

```python
# Returns True if the string has the placeholder format.
pyonmttok.is_placeholder(token: str)

# Sets the random seed for reproducible tokenization.
pyonmttok.set_random_seed(seed: int)

# Checks if the language code is valid.
pyonmttok.is_valid_language(lang: str).
```

Raw data

            {
    "_id": null,
    "home_page": "https://opennmt.net",
    "name": "pyonmttok",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6,<3.12",
    "maintainer_email": "",
    "keywords": "tokenization opennmt unicode bpe sentencepiece subword",
    "author": "OpenNMT",
    "author_email": "guillaume.klein@systrangroup.com",
    "download_url": "",
    "platform": null,
    "description": "# pyonmttok\n\n**pyonmttok** is the Python wrapper for [OpenNMT/Tokenizer](https://github.com/OpenNMT/Tokenizer), a fast and customizable text tokenization library with BPE and SentencePiece support.\n\n**Installation:**\n\n```bash\npip install pyonmttok\n```\n\n**Requirements:**\n\n* OS: Linux, macOS, Windows\n* Python version: >= 3.6\n* pip version: >= 19.3\n\n**Table of contents**\n\n1. [Tokenization](#tokenization)\n1. [Subword learning](#subword-learning)\n1. [Vocabulary](#vocabulary)\n1. [Token API](#token-api)\n1. [Utilities](#utilities)\n\n## Tokenization\n\n### Example\n\n```python\n>>> import pyonmtok\n>>> tokenizer = pyonmttok.Tokenizer(\"aggressive\", joiner_annotate=True)\n>>> tokens = tokenizer(\"Hello World!\")\n>>> tokens\n['Hello', 'World', '\uffed!']\n>>> tokenizer.detokenize(tokens)\n'Hello World!'\n```\n\n### Interface\n\n#### Constructor\n\n```python\ntokenizer = pyonmttok.Tokenizer(\n    mode: str,\n    *,\n    lang: Optional[str] = None,\n    bpe_model_path: Optional[str] = None,\n    bpe_dropout: float = 0,\n    vocabulary: Optional[List[str]] = None,\n    vocabulary_path: Optional[str] = None,\n    vocabulary_threshold: int = 0,\n    sp_model_path: Optional[str] = None,\n    sp_nbest_size: int = 0,\n    sp_alpha: float = 0.1,\n    joiner: str = \"\uffed\",\n    joiner_annotate: bool = False,\n    joiner_new: bool = False,\n    support_prior_joiners: bool = False,\n    spacer_annotate: bool = False,\n    spacer_new: bool = False,\n    case_feature: bool = False,\n    case_markup: bool = False,\n    soft_case_regions: bool = False,\n    no_substitution: bool = False,\n    with_separators: bool = False,\n    preserve_placeholders: bool = False,\n    preserve_segmented_tokens: bool = False,\n    segment_case: bool = False,\n    segment_numbers: bool = False,\n    segment_alphabet_change: bool = False,\n    segment_alphabet: Optional[List[str]] = None,\n)\n\n# SentencePiece-compatible tokenizer.\ntokenizer = pyonmttok.SentencePieceTokenizer(\n    model_path: str,\n    vocabulary_path: Optional[str] = None,\n    vocabulary_threshold: int = 0,\n    nbest_size: int = 0,\n    alpha: float = 0.1,\n)\n\n# Copy constructor.\ntokenizer = pyonmttok.Tokenizer(tokenizer: pyonmttok.Tokenizer)\n\n# Return the tokenization options (excluding options related to subword).\ntokenizer.options\n```\n\nSee the [documentation](https://github.com/OpenNMT/Tokenizer/blob/master/docs/options.md) for a description of each tokenization option.\n\n#### Tokenization\n\n```python\n# Tokenize a text.\n# When training=False, subword regularization such as BPE dropout is disabled.\ntokenizer.__call__(text: str, training: bool = True) -> List[str]\n\n# Tokenize a text and return optional features.\n# When as_token_objects=True, the method returns Token objects (see below).\ntokenizer.tokenize(\n    text: str,\n    as_token_objects: bool = False,\n    training: bool = True,\n) -> Union[Tuple[List[str], Optional[List[List[str]]]], List[pyonmttok.Token]]\n\n# Tokenize a batch of text.\ntokenizer.tokenize_batch(\n    batch_text: List[str],\n    as_token_objects: bool = False,\n    training: bool = True,\n) -> Union[Tuple[List[List[str]], List[Optional[List[List[str]]]]], List[List[pyonmttok.Token]]]\n\n# Tokenize a file.\ntokenizer.tokenize_file(\n    input_path: str,\n    output_path: str,\n    num_threads: int = 1,\n    verbose: bool = False,\n    training: bool = True,\n    tokens_delimiter: str = \" \",\n)\n```\n\n#### Detokenization\n\n```python\n# The detokenize method converts a list of tokens back to a string.\ntokenizer.detokenize(\n    tokens: List[str],\n    features: Optional[List[List[str]]] = None,\n) -> str\ntokenizer.detokenize(tokens: List[pyonmttok.Token]) -> str\n\n# The detokenize_with_ranges method also returns a dictionary mapping a token\n# index to a range in the detokenized text.\n# Set merge_ranges=True to merge consecutive ranges, e.g. subwords of the same\n# token in case of subword tokenization.\n# Set unicode_ranges=True to return ranges over Unicode characters instead of bytes.\ntokenizer.detokenize_with_ranges(\n    tokens: Union[List[str], List[pyonmttok.Token]],\n    merge_ranges: bool = False,\n    unicode_ranges: bool = False,\n) -> Tuple[str, Dict[int, Tuple[int, int]]]\n\n# Detokenize a file.\ntokenizer.detokenize_file(\n    input_path: str,\n    output_path: str,\n    tokens_delimiter: str = \" \",\n)\n```\n\n## Subword learning\n\n### Example\n\nThe Python wrapper supports BPE and SentencePiece subword learning through a common interface:\n\n**1\\. Create the subword learner with the tokenization you want to apply, e.g.:**\n\n```python\n# BPE is trained and applied on the tokenization output before joiner (or spacer) annotations.\ntokenizer = pyonmttok.Tokenizer(\"aggressive\", joiner_annotate=True, segment_numbers=True)\nlearner = pyonmttok.BPELearner(tokenizer=tokenizer, symbols=32000)\n\n# SentencePiece can learn from raw sentences so a tokenizer in not required.\nlearner = pyonmttok.SentencePieceLearner(vocab_size=32000, character_coverage=0.98)\n```\n\n**2\\. Feed some raw data:**\n\n```python\n# Feed detokenized sentences:\nlearner.ingest(\"Hello world!\")\nlearner.ingest(\"How are you?\")\n\n# or detokenized text files:\nlearner.ingest_file(\"/data/train1.en\")\nlearner.ingest_file(\"/data/train2.en\")\n```\n\n**3\\. Start the learning process:**\n\n```python\ntokenizer = learner.learn(\"/data/model-32k\")\n```\n\nThe returned `tokenizer` instance can be used to apply subword tokenization on new data.\n\n### Interface\n\n```python\n# See https://github.com/rsennrich/subword-nmt/blob/master/subword_nmt/learn_bpe.py\n# for argument documentation.\nlearner = pyonmttok.BPELearner(\n    tokenizer: Optional[pyonmttok.Tokenizer] = None,  # Defaults to tokenization mode \"space\".\n    symbols: int = 10000,\n    min_frequency: int = 2,\n    total_symbols: bool = False,\n)\n\n# See https://github.com/google/sentencepiece/blob/master/src/spm_train_main.cc\n# for available training options.\nlearner = pyonmttok.SentencePieceLearner(\n    tokenizer: Optional[pyonmttok.Tokenizer] = None,  # Defaults to tokenization mode \"none\".\n    keep_vocab: bool = False,  # Keep the generated vocabulary (model_path will act like model_prefix in spm_train)\n    **training_options,\n)\n\nlearner.ingest(text: str)\nlearner.ingest_file(path: str)\nlearner.ingest_token(token: Union[str, pyonmttok.Token])\n\nlearner.learn(model_path: str, verbose: bool = False) -> pyonmttok.Tokenizer\n```\n\n## Vocabulary\n\n### Example\n\n```python\ntokenizer = pyonmttok.Tokenizer(\"aggressive\", joiner_annotate=True)\n\nwith open(\"train.txt\") as train_file:\n    vocab = pyonmttok.build_vocab_from_lines(\n        train_file,\n        tokenizer=tokenizer,\n        maximum_size=32000,\n        special_tokens=[\"<blank>\", \"<unk>\", \"<s>\", \"</s>\"],\n    )\n\nwith open(\"vocab.txt\", \"w\") as vocab_file:\n    for token in vocab.ids_to_tokens:\n        vocab_file.write(\"%s\\n\" % token)\n```\n\n### Interface\n\n```python\n# Special tokens are added with ids 0, 1, etc., and are never removed by a resize.\nvocab = pyonmttok.Vocab(special_tokens: Optional[List[str]] = None)\n\n# Read-only properties.\nvocab.tokens_to_ids -> Dict[str, int]\nvocab.ids_to_tokens -> List[str]\nvocab.counters -> List[int]\n\n# Get or set the ID returned for out-of-vocabulary tokens.\n# By default, it is the ID of the token <unk> if present in the vocabulary, len(vocab) otherwise.\nvocab.default_id -> int\n\nvocab.lookup_token(token: str) -> int\nvocab.lookup_index(index: int) -> str\n\n# Calls lookup_token on a batch of tokens.\nvocab.__call__(tokens: List[str]) -> List[int]\n\nvocab.__len__() -> int                  # Implements: len(vocab)\nvocab.__contains__(token: str) -> bool  # Implements: \"hello\" in vocab\nvocab.__getitem__(token: str) -> int    # Implements: vocab[\"hello\"]\n\n# Add tokens to the vocabulary after tokenization.\n# If a tokenizer is not set, the text is split on spaces.\nvocab.add_from_text(text: str, tokenizer: Optional[pyonmttok.Tokenizer] = None) -> None\nvocab.add_from_file(path: str, tokenizer: Optional[pyonmttok.Tokenizer] = None) -> None\nvocab.add_token(token: str, count: int = 1) -> None\n\nvocab.resize(maximum_size: int = 0, minimum_frequency: int = 1) -> None\n\n\n# Build a vocabulary from an iterator of lines.\n# If a tokenizer is not set, the lines are split on spaces.\npyonmttok.build_vocab_from_lines(\n    lines: Iterable[str],\n    tokenizer: Optional[pyonmttok.Tokenizer] = None,\n    maximum_size: int = 0,\n    minimum_frequency: int = 1,\n    special_tokens: Optional[List[str]] = None,\n) -> pyonmttok.Vocab\n\n# Build a vocabulary from an iterator of tokens.\npyonmttok.build_vocab_from_tokens(\n    tokens: Iterable[str],\n    maximum_size: int = 0,\n    minimum_frequency: int = 1,\n    special_tokens: Optional[List[str]] = None,\n) -> pyonmttok.Vocab\n```\n\n## Token API\n\nThe Token API allows to tokenize text into `pyonmttok.Token` objects. This API can be useful to apply some logics at the token level but still retain enough information to write the tokenization on disk or detokenize.\n\n### Example\n\n```python\n>>> tokenizer = pyonmttok.Tokenizer(\"aggressive\", joiner_annotate=True)\n>>> tokens = tokenizer.tokenize(\"Hello World!\", as_token_objects=True)\n>>> tokens\n[Token('Hello'), Token('World'), Token('!', join_left=True)]\n>>> tokens[-1].surface\n'!'\n>>> tokenizer.serialize_tokens(tokens)[0]\n['Hello', 'World', '\uffed!']\n>>> tokens[-1].surface = '.'\n>>> tokenizer.serialize_tokens(tokens)[0]\n['Hello', 'World', '\uffed.']\n>>> tokenizer.detokenize(tokens)\n'Hello World.'\n```\n\n### Interface\n\nThe `pyonmttok.Token` class has the following attributes:\n\n* `surface`: a string, the token value\n* `type`: a `pyonmttok.TokenType` value, the type of the token\n* `join_left`: a boolean, whether the token should be joined to the token on the left or not\n* `join_right`: a boolean, whether the token should be joined to the token on the right or not\n* `preserve`: a boolean, whether joiners and spacers can be attached to this token or not\n* `features`: a list of string, the features attached to the token\n* `spacer`: a boolean, whether the token is prefixed by a SentencePiece spacer or not (only set when using SentencePiece)\n* `casing`: a `pyonmttok.Casing` value, the casing of the token (only set when tokenizing with `case_feature` or `case_markup`)\n\nThe `pyonmttok.TokenType` enumeration is used to identify tokens that were split by a subword tokenization. The enumeration has the following values:\n\n* `TokenType.WORD`\n* `TokenType.LEADING_SUBWORD`\n* `TokenType.TRAILING_SUBWORD`\n\nThe `pyonmttok.Casing` enumeration is used to identify the original casing of a token that was lowercased by the `case_feature` or `case_markup` tokenization options. The enumeration has the following values:\n\n* `Casing.LOWERCASE`\n* `Casing.UPPERCASE`\n* `Casing.MIXED`\n* `Casing.CAPITALIZED`\n* `Casing.NONE`\n\nThe `Tokenizer` instances provide methods to serialize or deserialize `Token` objects:\n\n```python\n# Serialize Token objects to strings that can be saved on disk.\ntokenizer.serialize_tokens(\n    tokens: List[pyonmttok.Token],\n) -> Tuple[List[str], Optional[List[List[str]]]]\n\n# Deserialize strings into Token objects.\ntokenizer.deserialize_tokens(\n    tokens: List[str],\n    features: Optional[List[List[str]]] = None,\n) -> List[pyonmttok.Token]\n```\n\n## Utilities\n\n### Interface\n\n```python\n# Returns True if the string has the placeholder format.\npyonmttok.is_placeholder(token: str)\n\n# Sets the random seed for reproducible tokenization.\npyonmttok.set_random_seed(seed: int)\n\n# Checks if the language code is valid.\npyonmttok.is_valid_language(lang: str).\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Fast and customizable text tokenization library with BPE and SentencePiece support",
    "version": "1.36.0",
    "split_keywords": [
        "tokenization",
        "opennmt",
        "unicode",
        "bpe",
        "sentencepiece",
        "subword"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0ee4614c1a43905178659ceb63e316d26ff9b8ead054cfc2438c04f747f54c4f",
                "md5": "547d7e80d535a41288f99735fd429ca4",
                "sha256": "dd6ee135d8f9ce8e7d1b4debccf2c0304216fedd7541fd662ea279944ce48174"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp310-cp310-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "547d7e80d535a41288f99735fd429ca4",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.6,<3.12",
            "size": 14354360,
            "upload_time": "2023-01-11T13:46:07",
            "upload_time_iso_8601": "2023-01-11T13:46:07.655992Z",
            "url": "https://files.pythonhosted.org/packages/0e/e4/614c1a43905178659ceb63e316d26ff9b8ead054cfc2438c04f747f54c4f/pyonmttok-1.36.0-cp310-cp310-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2fbc1f83574f9dfed42d8041a9682e2577f8ec24ee73eb40f72ec3512654ee8f",
                "md5": "4caf709da3c4b07c20420a3302de81f2",
                "sha256": "d12ff8faccff0a2023dd1253eddb20160f185cdd4c91a4fb3e73791928d52eee"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp310-cp310-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "4caf709da3c4b07c20420a3302de81f2",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.6,<3.12",
            "size": 13992746,
            "upload_time": "2023-01-11T13:46:11",
            "upload_time_iso_8601": "2023-01-11T13:46:11.151921Z",
            "url": "https://files.pythonhosted.org/packages/2f/bc/1f83574f9dfed42d8041a9682e2577f8ec24ee73eb40f72ec3512654ee8f/pyonmttok-1.36.0-cp310-cp310-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0974324e71946f830d24e77c9034679458c3f452e19fbd70e943ead6842b3068",
                "md5": "19836d206245bcc89e86a4013ca6efb3",
                "sha256": "d51ee235c8e12f29c2322e04eb171b559eb185ded265b8261325ad8592d47a55"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "19836d206245bcc89e86a4013ca6efb3",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.6,<3.12",
            "size": 16655459,
            "upload_time": "2023-01-11T13:46:13",
            "upload_time_iso_8601": "2023-01-11T13:46:13.675558Z",
            "url": "https://files.pythonhosted.org/packages/09/74/324e71946f830d24e77c9034679458c3f452e19fbd70e943ead6842b3068/pyonmttok-1.36.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5edc83abf6a10e159f7cc1ab6d251afaebe0ba18b8bb2a88fb378b0bce71df45",
                "md5": "3a211a868dc391cc1b3b67cfdc35807c",
                "sha256": "39dc36182dc614aeea69263104d40fe408e1ba6e092d5b42de22baa786457cdc"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "3a211a868dc391cc1b3b67cfdc35807c",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.6,<3.12",
            "size": 16727406,
            "upload_time": "2023-01-11T13:46:16",
            "upload_time_iso_8601": "2023-01-11T13:46:16.825040Z",
            "url": "https://files.pythonhosted.org/packages/5e/dc/83abf6a10e159f7cc1ab6d251afaebe0ba18b8bb2a88fb378b0bce71df45/pyonmttok-1.36.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3db43e6cd14593252d72c37e120d7b62623d82341ece8faf914170ebbd40f4ab",
                "md5": "75da3f8601ad6437e6d62bdc7b9d8685",
                "sha256": "d479c29240c7836496eefe15a8198dc03faa0a9e0781bc01ce384bfdc2744f27"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp310-cp310-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "75da3f8601ad6437e6d62bdc7b9d8685",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.6,<3.12",
            "size": 14114230,
            "upload_time": "2023-01-11T13:46:19",
            "upload_time_iso_8601": "2023-01-11T13:46:19.597785Z",
            "url": "https://files.pythonhosted.org/packages/3d/b4/3e6cd14593252d72c37e120d7b62623d82341ece8faf914170ebbd40f4ab/pyonmttok-1.36.0-cp310-cp310-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8a94d243529c9e9be4aa3e826fb65b9c88d622f15a68b50309ae79558f5c64c5",
                "md5": "d13b810b3c397ce583ed6e7e749c9601",
                "sha256": "09cb3cfa40d33cb73f0d0ec48581c9eb26cf94793bf40926b7ccdd375f729636"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp311-cp311-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "d13b810b3c397ce583ed6e7e749c9601",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.6,<3.12",
            "size": 14354321,
            "upload_time": "2023-01-11T13:46:22",
            "upload_time_iso_8601": "2023-01-11T13:46:22.015160Z",
            "url": "https://files.pythonhosted.org/packages/8a/94/d243529c9e9be4aa3e826fb65b9c88d622f15a68b50309ae79558f5c64c5/pyonmttok-1.36.0-cp311-cp311-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "90bc19349d1a6e36978a38d99f087c5b9d9f027dafca7119c753c821094ae632",
                "md5": "ba5c955b1c8d893c1e4f36a905f07dd7",
                "sha256": "31f9ed03b1be18cfc0fbbb483f3087c572dec1ebbb9a6290777dd900b742640d"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp311-cp311-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "ba5c955b1c8d893c1e4f36a905f07dd7",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.6,<3.12",
            "size": 13992737,
            "upload_time": "2023-01-11T13:46:25",
            "upload_time_iso_8601": "2023-01-11T13:46:25.085688Z",
            "url": "https://files.pythonhosted.org/packages/90/bc/19349d1a6e36978a38d99f087c5b9d9f027dafca7119c753c821094ae632/pyonmttok-1.36.0-cp311-cp311-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c8f764d6fcfd40f0a6555a126399de26059ecaf15375bf15c0c86b3b84831b88",
                "md5": "c87599800ab35ba57e714e05014548a9",
                "sha256": "5809048d68bad532c648e50be15e18e501fa1f1e1c521f2897b34ca7c06672b2"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "c87599800ab35ba57e714e05014548a9",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.6,<3.12",
            "size": 16664935,
            "upload_time": "2023-01-11T13:46:27",
            "upload_time_iso_8601": "2023-01-11T13:46:27.643791Z",
            "url": "https://files.pythonhosted.org/packages/c8/f7/64d6fcfd40f0a6555a126399de26059ecaf15375bf15c0c86b3b84831b88/pyonmttok-1.36.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "32eef8ab366bd730bd1b06239204b1d4388f92e1c27739bed107962a52425b3d",
                "md5": "faf900cbce06d93e590310afd04fe51c",
                "sha256": "c0c542b28cad05788a4bf48a10477ee2a4035d51e9738c1077bec2de969638a7"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "faf900cbce06d93e590310afd04fe51c",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.6,<3.12",
            "size": 16738872,
            "upload_time": "2023-01-11T13:46:30",
            "upload_time_iso_8601": "2023-01-11T13:46:30.408153Z",
            "url": "https://files.pythonhosted.org/packages/32/ee/f8ab366bd730bd1b06239204b1d4388f92e1c27739bed107962a52425b3d/pyonmttok-1.36.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d8119deb98c101df767d5c7e37a205fa61a2e4c91f95df02a16f2e059003ceaa",
                "md5": "0a219a3c520512566c775e22e03f1506",
                "sha256": "897aec90d2ac52718951629d576da9571fa4ea2cdd8c20ae12eb679fd193c739"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp311-cp311-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "0a219a3c520512566c775e22e03f1506",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.6,<3.12",
            "size": 14114261,
            "upload_time": "2023-01-11T13:46:33",
            "upload_time_iso_8601": "2023-01-11T13:46:33.814067Z",
            "url": "https://files.pythonhosted.org/packages/d8/11/9deb98c101df767d5c7e37a205fa61a2e4c91f95df02a16f2e059003ceaa/pyonmttok-1.36.0-cp311-cp311-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5cebb17cbd26bbad0791d650c9823794ae8c2dd658ba55a1f809eac66111b11b",
                "md5": "1dba8cb00c2fd7a9ec99712c28a475d5",
                "sha256": "3e7393ed9bc7b143edbe4b6a290e428b15e610e691d664cb6c909deb25750dcc"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp36-cp36m-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "1dba8cb00c2fd7a9ec99712c28a475d5",
            "packagetype": "bdist_wheel",
            "python_version": "cp36",
            "requires_python": ">=3.6,<3.12",
            "size": 14348856,
            "upload_time": "2023-01-11T13:46:37",
            "upload_time_iso_8601": "2023-01-11T13:46:37.037351Z",
            "url": "https://files.pythonhosted.org/packages/5c/eb/b17cbd26bbad0791d650c9823794ae8c2dd658ba55a1f809eac66111b11b/pyonmttok-1.36.0-cp36-cp36m-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4c180f009acdfa2237dcbc638d93d5d5373d0ff7600889b761d695b9410abe48",
                "md5": "6a231665973913278d68f12dd7a02cf1",
                "sha256": "d79bcd92ad736530658393fb225f94015da932baffdcf74fb7f94f21fb78c3cf"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "6a231665973913278d68f12dd7a02cf1",
            "packagetype": "bdist_wheel",
            "python_version": "cp36",
            "requires_python": ">=3.6,<3.12",
            "size": 16854037,
            "upload_time": "2023-01-11T13:46:39",
            "upload_time_iso_8601": "2023-01-11T13:46:39.978871Z",
            "url": "https://files.pythonhosted.org/packages/4c/18/0f009acdfa2237dcbc638d93d5d5373d0ff7600889b761d695b9410abe48/pyonmttok-1.36.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "15f10766152549486438e540df1c0897216594fdaddec6ebabf6b4506b436158",
                "md5": "7409a8bdd0773641136907fa549eaf1e",
                "sha256": "19f1e827dca6342072c23f6b55a5b4f9f67862802f97ebf295b1e1af90894f72"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "7409a8bdd0773641136907fa549eaf1e",
            "packagetype": "bdist_wheel",
            "python_version": "cp36",
            "requires_python": ">=3.6,<3.12",
            "size": 16927719,
            "upload_time": "2023-01-11T13:46:43",
            "upload_time_iso_8601": "2023-01-11T13:46:43.940456Z",
            "url": "https://files.pythonhosted.org/packages/15/f1/0766152549486438e540df1c0897216594fdaddec6ebabf6b4506b436158/pyonmttok-1.36.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f1df14c463ee152be119a5e35860f3188abe2507d47744b637637da0dc6857a4",
                "md5": "4ff02d31457a0f26d57512d59f1258ce",
                "sha256": "3cf7f0518d67b37b1ee66bedfebc5e2441e40d4586c2761b2990bfe6ecf850d9"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp36-cp36m-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "4ff02d31457a0f26d57512d59f1258ce",
            "packagetype": "bdist_wheel",
            "python_version": "cp36",
            "requires_python": ">=3.6,<3.12",
            "size": 14118812,
            "upload_time": "2023-01-11T13:46:46",
            "upload_time_iso_8601": "2023-01-11T13:46:46.965882Z",
            "url": "https://files.pythonhosted.org/packages/f1/df/14c463ee152be119a5e35860f3188abe2507d47744b637637da0dc6857a4/pyonmttok-1.36.0-cp36-cp36m-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bbbb90ec5389ea883aff8903427694c1a138f80d736db11167ce0dc2d26f071d",
                "md5": "221b6d11965b2d093195a9d5ed70ac78",
                "sha256": "b0d0fdf5b353e8826188a7f2ce3bd3df988af979f74dbb8b1e2fd08b973384b1"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp37-cp37m-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "221b6d11965b2d093195a9d5ed70ac78",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.6,<3.12",
            "size": 14348942,
            "upload_time": "2023-01-11T13:46:50",
            "upload_time_iso_8601": "2023-01-11T13:46:50.738016Z",
            "url": "https://files.pythonhosted.org/packages/bb/bb/90ec5389ea883aff8903427694c1a138f80d736db11167ce0dc2d26f071d/pyonmttok-1.36.0-cp37-cp37m-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ce1f4608ce00c88ac42c617cde4229ee2e6d27ff5c6b645cc76f85f99cd7e46d",
                "md5": "6204ce95be5edffd4707e06c8472fb2d",
                "sha256": "98a5683d6bdb1a8bcaef2ac6a72683e7547dc033bcb6eeb54aed5eb22da666b6"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "6204ce95be5edffd4707e06c8472fb2d",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.6,<3.12",
            "size": 16854706,
            "upload_time": "2023-01-11T13:46:53",
            "upload_time_iso_8601": "2023-01-11T13:46:53.855170Z",
            "url": "https://files.pythonhosted.org/packages/ce/1f/4608ce00c88ac42c617cde4229ee2e6d27ff5c6b645cc76f85f99cd7e46d/pyonmttok-1.36.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ed1193f22243237ec6b5554f2c4705c0722d650d83c7a9c9cec1f431a1047436",
                "md5": "034e8065e60ac43bc9a55adf46de31bf",
                "sha256": "a4d1a89cbefdb4d0f6f34c68038147f6f249b31127154a141fd2010a066a557b"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "034e8065e60ac43bc9a55adf46de31bf",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.6,<3.12",
            "size": 16927999,
            "upload_time": "2023-01-11T13:46:56",
            "upload_time_iso_8601": "2023-01-11T13:46:56.899751Z",
            "url": "https://files.pythonhosted.org/packages/ed/11/93f22243237ec6b5554f2c4705c0722d650d83c7a9c9cec1f431a1047436/pyonmttok-1.36.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "911f10477bcb6a2f14e6fa01172dba1d601dea470bdbc3c991217d92798d21a3",
                "md5": "638a30b33e8739c8ed575b7abd1f587b",
                "sha256": "036904449139fbfcef1bc0d7c048d89e34305aacdf178d0d701598c5fe5043f0"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp37-cp37m-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "638a30b33e8739c8ed575b7abd1f587b",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.6,<3.12",
            "size": 14115659,
            "upload_time": "2023-01-11T13:46:59",
            "upload_time_iso_8601": "2023-01-11T13:46:59.416188Z",
            "url": "https://files.pythonhosted.org/packages/91/1f/10477bcb6a2f14e6fa01172dba1d601dea470bdbc3c991217d92798d21a3/pyonmttok-1.36.0-cp37-cp37m-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ecd299c364bbf59541d7bbb56ce365205ddb68d9d6c3e4dee7b99d75ee478c6b",
                "md5": "078641ef497b1c4209694beb3b3809de",
                "sha256": "ef32a68a905a51e5dedbf704679d530b8e7af32d7a2603622875292c79da1cde"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp38-cp38-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "078641ef497b1c4209694beb3b3809de",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.6,<3.12",
            "size": 14354384,
            "upload_time": "2023-01-11T13:47:01",
            "upload_time_iso_8601": "2023-01-11T13:47:01.879173Z",
            "url": "https://files.pythonhosted.org/packages/ec/d2/99c364bbf59541d7bbb56ce365205ddb68d9d6c3e4dee7b99d75ee478c6b/pyonmttok-1.36.0-cp38-cp38-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4a8d87862795cd1c63d2d1bf8c977948dd557943a60146585750527ddb2c5779",
                "md5": "cc035eade9038102c1efa7743372f52c",
                "sha256": "8010913929e4a7008fad1f6def3d08c695d7ecc0201a09bc356e9dac24925c4a"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp38-cp38-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "cc035eade9038102c1efa7743372f52c",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.6,<3.12",
            "size": 13992689,
            "upload_time": "2023-01-11T13:47:04",
            "upload_time_iso_8601": "2023-01-11T13:47:04.409752Z",
            "url": "https://files.pythonhosted.org/packages/4a/8d/87862795cd1c63d2d1bf8c977948dd557943a60146585750527ddb2c5779/pyonmttok-1.36.0-cp38-cp38-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ce7d6744a68ac4349fc0578f7ffc6b65d8472078bc0abc0f3d7f0f3ec718a354",
                "md5": "096b843edcb6c4fe6ae07c150d3b7742",
                "sha256": "a3ddb450a8e3631e3430a48587bccabfc5fa8eb6d62545309e03a421c12383e3"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "096b843edcb6c4fe6ae07c150d3b7742",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.6,<3.12",
            "size": 16662003,
            "upload_time": "2023-01-11T13:47:07",
            "upload_time_iso_8601": "2023-01-11T13:47:07.043502Z",
            "url": "https://files.pythonhosted.org/packages/ce/7d/6744a68ac4349fc0578f7ffc6b65d8472078bc0abc0f3d7f0f3ec718a354/pyonmttok-1.36.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f2ed2c819f0fa9601ec0d8b7f1393399cdd8b73e7f5682f65c252cd6feac01fe",
                "md5": "689ef930fd0412d38f4ca17e8a5cb271",
                "sha256": "c4cef7c073f143e6c6a45a9a4d95fb7e3c452c253ccca939fcf201a6d5367b24"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "689ef930fd0412d38f4ca17e8a5cb271",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.6,<3.12",
            "size": 16735041,
            "upload_time": "2023-01-11T13:47:10",
            "upload_time_iso_8601": "2023-01-11T13:47:10.111462Z",
            "url": "https://files.pythonhosted.org/packages/f2/ed/2c819f0fa9601ec0d8b7f1393399cdd8b73e7f5682f65c252cd6feac01fe/pyonmttok-1.36.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b23d27b8c2e5a09c7be3adbd210aa28c9600672ed5341e15f4d0a729dd02dda4",
                "md5": "aa981d64715244904675c4aa3047149a",
                "sha256": "f1c8e8703b624fec17675e254c29c308dbf0c13697c0c7fe4bd46ee768753859"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp38-cp38-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "aa981d64715244904675c4aa3047149a",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.6,<3.12",
            "size": 14114098,
            "upload_time": "2023-01-11T13:47:13",
            "upload_time_iso_8601": "2023-01-11T13:47:13.367338Z",
            "url": "https://files.pythonhosted.org/packages/b2/3d/27b8c2e5a09c7be3adbd210aa28c9600672ed5341e15f4d0a729dd02dda4/pyonmttok-1.36.0-cp38-cp38-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e5d9623f9c863130c501e693b6c6f9308b2d7b579c0b7681da1e310011bf2a3d",
                "md5": "62e4db8957d535cd636fc10a862bb92a",
                "sha256": "a99513c72744b71ad7f8f7c06a2279073613a11c87b717bcb4a1d14c8ca5c39b"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp39-cp39-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "62e4db8957d535cd636fc10a862bb92a",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.6,<3.12",
            "size": 14354569,
            "upload_time": "2023-01-11T13:47:15",
            "upload_time_iso_8601": "2023-01-11T13:47:15.782926Z",
            "url": "https://files.pythonhosted.org/packages/e5/d9/623f9c863130c501e693b6c6f9308b2d7b579c0b7681da1e310011bf2a3d/pyonmttok-1.36.0-cp39-cp39-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3a2745fb01f67f3bbf583d272724a62f4b28a04ad090d527bc0b5f15b60b2e6d",
                "md5": "da3bc3e03bb50ad359c6b159d9ab2b6e",
                "sha256": "c6f762f50d8c7313cbe1d6f50fb3ef5dda5d35b7a66e144fb80a8fab66e7c9e8"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp39-cp39-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "da3bc3e03bb50ad359c6b159d9ab2b6e",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.6,<3.12",
            "size": 13992903,
            "upload_time": "2023-01-11T13:47:18",
            "upload_time_iso_8601": "2023-01-11T13:47:18.621931Z",
            "url": "https://files.pythonhosted.org/packages/3a/27/45fb01f67f3bbf583d272724a62f4b28a04ad090d527bc0b5f15b60b2e6d/pyonmttok-1.36.0-cp39-cp39-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dd446193b655c30664b8e82b2027136a729258b0d3761d054d9eda60cc8eace1",
                "md5": "5f6ff0744d4f1437bcf9f635f3cba21f",
                "sha256": "723843c9ae46e025867d9972cdf57408760f42075b0497d54374181a5913c006"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "5f6ff0744d4f1437bcf9f635f3cba21f",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.6,<3.12",
            "size": 16653175,
            "upload_time": "2023-01-11T13:47:21",
            "upload_time_iso_8601": "2023-01-11T13:47:21.356396Z",
            "url": "https://files.pythonhosted.org/packages/dd/44/6193b655c30664b8e82b2027136a729258b0d3761d054d9eda60cc8eace1/pyonmttok-1.36.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "12fca05a3d1da44b92b56beda378dd17950344884472bdfc0da22b3a23c4bd59",
                "md5": "fdac7004ac09340791130936bc44f2e1",
                "sha256": "267e38a4e536555847b0a77eb900ba3133902597c4de453cbfb51f50fa4d7a33"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "fdac7004ac09340791130936bc44f2e1",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.6,<3.12",
            "size": 16725924,
            "upload_time": "2023-01-11T13:47:24",
            "upload_time_iso_8601": "2023-01-11T13:47:24.418704Z",
            "url": "https://files.pythonhosted.org/packages/12/fc/a05a3d1da44b92b56beda378dd17950344884472bdfc0da22b3a23c4bd59/pyonmttok-1.36.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "863e9761395cd1fb9725a0734565daed7935f1c8f36907ed0f0fccd58bf5441a",
                "md5": "4c174ba9bb509e778599eadb357f9363",
                "sha256": "e4e1e60c6f755ae7e178c98d17ce6633825f7528e5dc11d3311f342c18ca5abf"
            },
            "downloads": -1,
            "filename": "pyonmttok-1.36.0-cp39-cp39-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "4c174ba9bb509e778599eadb357f9363",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.6,<3.12",
            "size": 14109672,
            "upload_time": "2023-01-11T13:47:27",
            "upload_time_iso_8601": "2023-01-11T13:47:27.018310Z",
            "url": "https://files.pythonhosted.org/packages/86/3e/9761395cd1fb9725a0734565daed7935f1c8f36907ed0f0fccd58bf5441a/pyonmttok-1.36.0-cp39-cp39-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-11 13:46:07",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "pyonmttok"
}

OpenNMT