<p align="center">
Get identifiers, names, paths, URLs and words from the command output.<br>
The <a href="https://github.com/anki-code/xontrib-output-search">xontrib-output-search</a> for <a href="https://xon.sh/">xonsh shell</a> is using this library.
</p>
<p align="center">
If you like the idea click ⭐ on the repo and stay tuned by watching releases.
</p>
## Install
```shell script
pip install -U tokenize-output
```
## Usage
You can use `tokenize-output` command as well as export the tokenizers in Python:
```python
from tokenize_output.tokenize_output import *
tokenizer_split("Hello world!")
# {'final': set(), 'new': {'Hello', 'world!'}}
```
#### Words tokenizing
```shell script
echo "Try https://github.com/xxh/xxh" | tokenize-output -p
# Try
# https://github.com/xxh/xxh
```
#### JSON, Python dict and JavaScript object tokenizing
```shell script
echo '{"Try": "xonsh shell"}' | tokenize-output -p
# Try
# shell
# xonsh
# xonsh shell
```
#### env tokenizing
```shell script
echo 'PATH=/one/two:/three/four' | tokenize-output -p
# /one/two
# /one/two:/three/four
# /three/four
# PATH
```
## Development
### Tokenizers
Tokenizer is a functions which extract tokens from the text.
| Priority | Tokenizer | Text example | Tokens |
| ---------| ---------- | ----- | ------ |
| 1 | **dict** | `{"key": "val as str"}` | `key`, `val as str` |
| 2 | **env** | `PATH=/bin:/etc` | `PATH`, `/bin:/etc`, `/bin`, `/etc` |
| 3 | **split** | `Split me \n now!` | `Split`, `me`, `now!` |
| 4 | **strip** | `{Hello}!.` | `Hello` |
You can create your tokenizer and add it to `tokenizers_all` in `tokenize_output.py`.
Tokenizing is a recursive process where every tokenizer returns `final` and `new` tokens.
The `final` tokens directly go to the result list of tokens. The `new` tokens go to all
tokenizers again to find new tokens. As result if there is a mix of json and env data
in the output it will be found and tokenized in appropriate way.
### How to add tokenizer
You can start from `env` tokenizer:
1. [Prepare regexp](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L10)
2. [Prepare tokenizer function](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L57-L70)
3. [Add the function to the list](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L139-L144) and [to the preset](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L147).
4. [Add test](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tests/test_tokenize.py#L34-L35).
5. Now you can test and debug (see below).
### Test and debug
Run tests:
```shell script
cd ~
git clone https://github.com/anki-code/tokenize-output
cd tokenize-output
python -m pytest tests/
```
To debug the tokenizer:
```shell script
echo "Hello world" | ./tokenize-output -p
```
## Related projects
* [xontrib-output-search][XONTRIB_OUTPUT_SEARCH] for [xonsh shell][XONSH]
[XONTRIB_OUTPUT_SEARCH]: https://github.com/anki-code/xontrib-output-search
[XONSH]: https://xon.sh/
Raw data
{
"_id": null,
"home_page": "https://github.com/anki-code/tokenize-output",
"name": "tokenize-output",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "",
"author": "anki-code",
"author_email": "author@example.com",
"download_url": "https://files.pythonhosted.org/packages/4b/18/a301ad7a8ad40744179544a377c9e660ad6de18321159986aa4f93a859ad/tokenize-output-0.4.10.tar.gz",
"platform": "any",
"description": "<p align=\"center\">\nGet identifiers, names, paths, URLs and words from the command output.<br> \nThe <a href=\"https://github.com/anki-code/xontrib-output-search\">xontrib-output-search</a> for <a href=\"https://xon.sh/\">xonsh shell</a> is using this library.\n</p>\n\n<p align=\"center\"> \nIf you like the idea click \u2b50 on the repo and stay tuned by watching releases.\n</p>\n\n## Install\n```shell script\npip install -U tokenize-output\n```\n\n## Usage\n\nYou can use `tokenize-output` command as well as export the tokenizers in Python:\n```python\nfrom tokenize_output.tokenize_output import *\ntokenizer_split(\"Hello world!\")\n# {'final': set(), 'new': {'Hello', 'world!'}}\n```\n\n#### Words tokenizing\n```shell script\necho \"Try https://github.com/xxh/xxh\" | tokenize-output -p\n# Try\n# https://github.com/xxh/xxh\n```\n\n#### JSON, Python dict and JavaScript object tokenizing\n```shell script\necho '{\"Try\": \"xonsh shell\"}' | tokenize-output -p\n# Try\n# shell\n# xonsh\n# xonsh shell\n``` \n\n#### env tokenizing\n```shell script\necho 'PATH=/one/two:/three/four' | tokenize-output -p\n# /one/two\n# /one/two:/three/four\n# /three/four\n# PATH\n``` \n\n## Development\n\n### Tokenizers\nTokenizer is a functions which extract tokens from the text.\n\n| Priority | Tokenizer | Text example | Tokens |\n| ---------| ---------- | ----- | ------ |\n| 1 | **dict** | `{\"key\": \"val as str\"}` | `key`, `val as str` |\n| 2 | **env** | `PATH=/bin:/etc` | `PATH`, `/bin:/etc`, `/bin`, `/etc` | \n| 3 | **split** | `Split me \\n now!` | `Split`, `me`, `now!` | \n| 4 | **strip** | `{Hello}!.` | `Hello` | \n\nYou can create your tokenizer and add it to `tokenizers_all` in `tokenize_output.py`.\n\nTokenizing is a recursive process where every tokenizer returns `final` and `new` tokens. \nThe `final` tokens directly go to the result list of tokens. The `new` tokens go to all \ntokenizers again to find new tokens. As result if there is a mix of json and env data \nin the output it will be found and tokenized in appropriate way. \n\n### How to add tokenizer\n\nYou can start from `env` tokenizer:\n\n1. [Prepare regexp](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L10)\n2. [Prepare tokenizer function](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L57-L70)\n3. [Add the function to the list](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L139-L144) and [to the preset](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L147).\n4. [Add test](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tests/test_tokenize.py#L34-L35).\n5. Now you can test and debug (see below).\n\n### Test and debug\nRun tests:\n```shell script\ncd ~\ngit clone https://github.com/anki-code/tokenize-output\ncd tokenize-output\npython -m pytest tests/\n```\nTo debug the tokenizer:\n```shell script\necho \"Hello world\" | ./tokenize-output -p\n```\n\n## Related projects\n* [xontrib-output-search][XONTRIB_OUTPUT_SEARCH] for [xonsh shell][XONSH]\n\n[XONTRIB_OUTPUT_SEARCH]: https://github.com/anki-code/xontrib-output-search\n[XONSH]: https://xon.sh/\n",
"bugtrack_url": null,
"license": "BSD",
"summary": "Get identifiers, names, paths, URLs and words from the command output.",
"version": "0.4.10",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0eac616053a95c2ab3eddf086222235434287124887369d700eb089eb5f337da",
"md5": "f56708d5972c91c4cd6ad82e917cb3ce",
"sha256": "1efb30e229d26840e5ff4c23c9d8fcfb059a641c20b3286fdf29ebaaab9bdb5c"
},
"downloads": -1,
"filename": "tokenize_output-0.4.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f56708d5972c91c4cd6ad82e917cb3ce",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 5982,
"upload_time": "2023-04-12T09:38:04",
"upload_time_iso_8601": "2023-04-12T09:38:04.732695Z",
"url": "https://files.pythonhosted.org/packages/0e/ac/616053a95c2ab3eddf086222235434287124887369d700eb089eb5f337da/tokenize_output-0.4.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4b18a301ad7a8ad40744179544a377c9e660ad6de18321159986aa4f93a859ad",
"md5": "d9e03135fdc7bfe569722c8d1744d452",
"sha256": "2930974b5e47e3fb12be4526085b87a2fcc96781c995c30645c33ea9c8d4d011"
},
"downloads": -1,
"filename": "tokenize-output-0.4.10.tar.gz",
"has_sig": false,
"md5_digest": "d9e03135fdc7bfe569722c8d1744d452",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 5488,
"upload_time": "2023-04-12T09:38:06",
"upload_time_iso_8601": "2023-04-12T09:38:06.117389Z",
"url": "https://files.pythonhosted.org/packages/4b/18/a301ad7a8ad40744179544a377c9e660ad6de18321159986aa4f93a859ad/tokenize-output-0.4.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-12 09:38:06",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "anki-code",
"github_project": "tokenize-output",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "tokenize-output"
}