[![build status](https://github.com/asottile/tokenize-rt/actions/workflows/main.yml/badge.svg)](https://github.com/asottile/tokenize-rt/actions/workflows/main.yml)
[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/asottile/tokenize-rt/main.svg)](https://results.pre-commit.ci/latest/github/asottile/tokenize-rt/main)
tokenize-rt
===========
The stdlib `tokenize` module does not properly roundtrip. This wrapper
around the stdlib provides two additional tokens `ESCAPED_NL` and
`UNIMPORTANT_WS`, and a `Token` data type. Use `src_to_tokens` and
`tokens_to_src` to roundtrip.
This library is useful if you're writing a refactoring tool based on the
python tokenization.
## Installation
```bash
pip install tokenize-rt
```
## Usage
### datastructures
#### `tokenize_rt.Offset(line=None, utf8_byte_offset=None)`
A token offset, useful as a key when cross referencing the `ast` and the
tokenized source.
#### `tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)`
Construct a token
- `name`: one of the token names listed in `token.tok_name` or
`ESCAPED_NL` or `UNIMPORTANT_WS`
- `src`: token's source as text
- `line`: the line number that this token appears on.
- `utf8_byte_offset`: the utf8 byte offset that this token appears on in the
line.
#### `tokenize_rt.Token.offset`
Retrieves an `Offset` for this token.
### converting to and from `Token` representations
#### `tokenize_rt.src_to_tokens(text: str) -> List[Token]`
#### `tokenize_rt.tokens_to_src(Iterable[Token]) -> str`
### additional tokens added by `tokenize-rt`
#### `tokenize_rt.ESCAPED_NL`
#### `tokenize_rt.UNIMPORTANT_WS`
### helpers
#### `tokenize_rt.NON_CODING_TOKENS`
A `frozenset` containing tokens which may appear between others while not
affecting control flow or code:
- `COMMENT`
- `ESCAPED_NL`
- `NL`
- `UNIMPORTANT_WS`
#### `tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]`
parse a string literal into its prefix and string content
```pycon
>>> parse_string_literal('f"foo"')
('f', '"foo"')
```
#### `tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]`
yields `(index, token)` pairs. Useful for rewriting source.
#### `tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]`
find the indices of the string parts of a (joined) string literal
- `i` should start at the end of the string literal
- returns `()` (an empty tuple) for things which are not string literals
```pycon
>>> tokens = src_to_tokens('"foo" "bar".capitalize()')
>>> rfind_string_parts(tokens, 2)
(0, 2)
>>> tokens = src_to_tokens('("foo" "bar").capitalize()')
>>> rfind_string_parts(tokens, 4)
(1, 3)
```
## Differences from `tokenize`
- `tokenize-rt` adds `ESCAPED_NL` for a backslash-escaped newline "token"
- `tokenize-rt` adds `UNIMPORTANT_WS` for whitespace (discarded in `tokenize`)
- `tokenize-rt` normalizes string prefixes, even if they are not parsed -- for
instance, this means you'll see `Token('STRING', "f'foo'", ...)` even in
python 2.
- `tokenize-rt` normalizes python 2 long literals (`4l` / `4L`) and octal
literals (`0755`) in python 3 (for easier rewriting of python 2 code while
running python 3).
## Sample usage
- https://github.com/asottile/add-trailing-comma
- https://github.com/asottile/future-annotations
- https://github.com/asottile/future-fstrings
- https://github.com/asottile/pyupgrade
- https://github.com/asottile/yesqa
Raw data
{
"_id": null,
"home_page": "https://github.com/asottile/tokenize-rt",
"name": "tokenize-rt",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": null,
"author": "Anthony Sottile",
"author_email": "asottile@umich.edu",
"download_url": "https://files.pythonhosted.org/packages/6b/0a/5854d8ced8c1e00193d1353d13db82d7f813f99bd5dcb776ce3e2a4c0d19/tokenize_rt-6.1.0.tar.gz",
"platform": null,
"description": "[![build status](https://github.com/asottile/tokenize-rt/actions/workflows/main.yml/badge.svg)](https://github.com/asottile/tokenize-rt/actions/workflows/main.yml)\n[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/asottile/tokenize-rt/main.svg)](https://results.pre-commit.ci/latest/github/asottile/tokenize-rt/main)\n\ntokenize-rt\n===========\n\nThe stdlib `tokenize` module does not properly roundtrip. This wrapper\naround the stdlib provides two additional tokens `ESCAPED_NL` and\n`UNIMPORTANT_WS`, and a `Token` data type. Use `src_to_tokens` and\n`tokens_to_src` to roundtrip.\n\nThis library is useful if you're writing a refactoring tool based on the\npython tokenization.\n\n## Installation\n\n```bash\npip install tokenize-rt\n```\n\n## Usage\n\n### datastructures\n\n#### `tokenize_rt.Offset(line=None, utf8_byte_offset=None)`\n\nA token offset, useful as a key when cross referencing the `ast` and the\ntokenized source.\n\n#### `tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)`\n\nConstruct a token\n\n- `name`: one of the token names listed in `token.tok_name` or\n `ESCAPED_NL` or `UNIMPORTANT_WS`\n- `src`: token's source as text\n- `line`: the line number that this token appears on.\n- `utf8_byte_offset`: the utf8 byte offset that this token appears on in the\n line.\n\n#### `tokenize_rt.Token.offset`\n\nRetrieves an `Offset` for this token.\n\n### converting to and from `Token` representations\n\n#### `tokenize_rt.src_to_tokens(text: str) -> List[Token]`\n\n#### `tokenize_rt.tokens_to_src(Iterable[Token]) -> str`\n\n### additional tokens added by `tokenize-rt`\n\n#### `tokenize_rt.ESCAPED_NL`\n\n#### `tokenize_rt.UNIMPORTANT_WS`\n\n### helpers\n\n#### `tokenize_rt.NON_CODING_TOKENS`\n\nA `frozenset` containing tokens which may appear between others while not\naffecting control flow or code:\n- `COMMENT`\n- `ESCAPED_NL`\n- `NL`\n- `UNIMPORTANT_WS`\n\n#### `tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]`\n\nparse a string literal into its prefix and string content\n\n```pycon\n>>> parse_string_literal('f\"foo\"')\n('f', '\"foo\"')\n```\n\n#### `tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]`\n\nyields `(index, token)` pairs. Useful for rewriting source.\n\n#### `tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]`\n\nfind the indices of the string parts of a (joined) string literal\n\n- `i` should start at the end of the string literal\n- returns `()` (an empty tuple) for things which are not string literals\n\n```pycon\n>>> tokens = src_to_tokens('\"foo\" \"bar\".capitalize()')\n>>> rfind_string_parts(tokens, 2)\n(0, 2)\n>>> tokens = src_to_tokens('(\"foo\" \"bar\").capitalize()')\n>>> rfind_string_parts(tokens, 4)\n(1, 3)\n```\n\n## Differences from `tokenize`\n\n- `tokenize-rt` adds `ESCAPED_NL` for a backslash-escaped newline \"token\"\n- `tokenize-rt` adds `UNIMPORTANT_WS` for whitespace (discarded in `tokenize`)\n- `tokenize-rt` normalizes string prefixes, even if they are not parsed -- for\n instance, this means you'll see `Token('STRING', \"f'foo'\", ...)` even in\n python 2.\n- `tokenize-rt` normalizes python 2 long literals (`4l` / `4L`) and octal\n literals (`0755`) in python 3 (for easier rewriting of python 2 code while\n running python 3).\n\n## Sample usage\n\n- https://github.com/asottile/add-trailing-comma\n- https://github.com/asottile/future-annotations\n- https://github.com/asottile/future-fstrings\n- https://github.com/asottile/pyupgrade\n- https://github.com/asottile/yesqa\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A wrapper around the stdlib `tokenize` which roundtrips.",
"version": "6.1.0",
"project_urls": {
"Homepage": "https://github.com/asottile/tokenize-rt"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "87ba576aac29b10dfa49a6ce650001d1bb31f81e734660555eaf144bfe5b8995",
"md5": "b5aaf30ed9873884c66151995f3cd12c",
"sha256": "d706141cdec4aa5f358945abe36b911b8cbdc844545da99e811250c0cee9b6fc"
},
"downloads": -1,
"filename": "tokenize_rt-6.1.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "b5aaf30ed9873884c66151995f3cd12c",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.9",
"size": 6015,
"upload_time": "2024-10-22T00:14:57",
"upload_time_iso_8601": "2024-10-22T00:14:57.469645Z",
"url": "https://files.pythonhosted.org/packages/87/ba/576aac29b10dfa49a6ce650001d1bb31f81e734660555eaf144bfe5b8995/tokenize_rt-6.1.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6b0a5854d8ced8c1e00193d1353d13db82d7f813f99bd5dcb776ce3e2a4c0d19",
"md5": "48bdf2b8db11ee253ea3943a3e750a73",
"sha256": "e8ee836616c0877ab7c7b54776d2fefcc3bde714449a206762425ae114b53c86"
},
"downloads": -1,
"filename": "tokenize_rt-6.1.0.tar.gz",
"has_sig": false,
"md5_digest": "48bdf2b8db11ee253ea3943a3e750a73",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 5506,
"upload_time": "2024-10-22T00:14:59",
"upload_time_iso_8601": "2024-10-22T00:14:59.189300Z",
"url": "https://files.pythonhosted.org/packages/6b/0a/5854d8ced8c1e00193d1353d13db82d7f813f99bd5dcb776ce3e2a4c0d19/tokenize_rt-6.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-22 00:14:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "asottile",
"github_project": "tokenize-rt",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "tokenize-rt"
}