tokenize-output

Name	tokenize-output JSON
Version	0.4.10 JSON
	download
home_page	https://github.com/anki-code/tokenize-output
Summary	Get identifiers, names, paths, URLs and words from the command output.
upload_time	2023-04-12 09:38:06
maintainer
docs_url	None
author	anki-code
requires_python	>=3.6
license	BSD
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <p align="center">
Get identifiers, names, paths, URLs and words from the command output.<br> 
The <a href="https://github.com/anki-code/xontrib-output-search">xontrib-output-search</a> for <a href="https://xon.sh/">xonsh shell</a> is using this library.
</p>

<p align="center">  
If you like the idea click ⭐ on the repo and stay tuned by watching releases.
</p>

## Install
```shell script
pip install -U tokenize-output
```

## Usage

You can use `tokenize-output` command as well as export the tokenizers in Python:
```python
from tokenize_output.tokenize_output import *
tokenizer_split("Hello world!")
# {'final': set(), 'new': {'Hello', 'world!'}}
```

#### Words tokenizing
```shell script
echo "Try https://github.com/xxh/xxh" | tokenize-output -p
# Try
# https://github.com/xxh/xxh
```

#### JSON, Python dict and JavaScript object tokenizing
```shell script
echo '{"Try": "xonsh shell"}' | tokenize-output -p
# Try
# shell
# xonsh
# xonsh shell
```    

#### env tokenizing
```shell script
echo 'PATH=/one/two:/three/four' | tokenize-output -p
# /one/two
# /one/two:/three/four
# /three/four
# PATH
```    

## Development

### Tokenizers
Tokenizer is a functions which extract tokens from the text.

| Priority | Tokenizer  | Text example  | Tokens |
| ---------| ---------- | ----- | ------ |
| 1        | **dict**   | `{"key": "val as str"}` | `key`, `val as str` |
| 2        | **env**    | `PATH=/bin:/etc` | `PATH`, `/bin:/etc`, `/bin`, `/etc` |   
| 3        | **split**  | `Split  me \n now!` | `Split`, `me`, `now!` |   
| 4        | **strip**  | `{Hello}!.` | `Hello` |   

You can create your tokenizer and add it to `tokenizers_all` in `tokenize_output.py`.

Tokenizing is a recursive process where every tokenizer returns `final` and `new` tokens. 
The `final` tokens directly go to the result list of tokens. The `new` tokens go to all 
tokenizers again to find new tokens. As result if there is a mix of json and env data 
in the output it will be found and tokenized in appropriate way.  

### How to add tokenizer

You can start from `env` tokenizer:

1. [Prepare regexp](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L10)
2. [Prepare tokenizer function](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L57-L70)
3. [Add the function to the list](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L139-L144) and [to the preset](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L147).
4. [Add test](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tests/test_tokenize.py#L34-L35).
5. Now you can test and debug (see below).

### Test and debug
Run tests:
```shell script
cd ~
git clone https://github.com/anki-code/tokenize-output
cd tokenize-output
python -m pytest tests/
```
To debug the tokenizer:
```shell script
echo "Hello world" | ./tokenize-output -p
```

## Related projects
* [xontrib-output-search][XONTRIB_OUTPUT_SEARCH] for [xonsh shell][XONSH]

[XONTRIB_OUTPUT_SEARCH]: https://github.com/anki-code/xontrib-output-search
[XONSH]: https://xon.sh/

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/anki-code/tokenize-output",
    "name": "tokenize-output",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "anki-code",
    "author_email": "author@example.com",
    "download_url": "https://files.pythonhosted.org/packages/4b/18/a301ad7a8ad40744179544a377c9e660ad6de18321159986aa4f93a859ad/tokenize-output-0.4.10.tar.gz",
    "platform": "any",
    "description": "<p align=\"center\">\nGet identifiers, names, paths, URLs and words from the command output.<br> \nThe <a href=\"https://github.com/anki-code/xontrib-output-search\">xontrib-output-search</a> for <a href=\"https://xon.sh/\">xonsh shell</a> is using this library.\n</p>\n\n<p align=\"center\">  \nIf you like the idea click \u2b50 on the repo and stay tuned by watching releases.\n</p>\n\n## Install\n```shell script\npip install -U tokenize-output\n```\n\n## Usage\n\nYou can use `tokenize-output` command as well as export the tokenizers in Python:\n```python\nfrom tokenize_output.tokenize_output import *\ntokenizer_split(\"Hello world!\")\n# {'final': set(), 'new': {'Hello', 'world!'}}\n```\n\n#### Words tokenizing\n```shell script\necho \"Try https://github.com/xxh/xxh\" | tokenize-output -p\n# Try\n# https://github.com/xxh/xxh\n```\n\n#### JSON, Python dict and JavaScript object tokenizing\n```shell script\necho '{\"Try\": \"xonsh shell\"}' | tokenize-output -p\n# Try\n# shell\n# xonsh\n# xonsh shell\n```    \n\n#### env tokenizing\n```shell script\necho 'PATH=/one/two:/three/four' | tokenize-output -p\n# /one/two\n# /one/two:/three/four\n# /three/four\n# PATH\n```    \n\n## Development\n\n### Tokenizers\nTokenizer is a functions which extract tokens from the text.\n\n| Priority | Tokenizer  | Text example  | Tokens |\n| ---------| ---------- | ----- | ------ |\n| 1        | **dict**   | `{\"key\": \"val as str\"}` | `key`, `val as str` |\n| 2        | **env**    | `PATH=/bin:/etc` | `PATH`, `/bin:/etc`, `/bin`, `/etc` |   \n| 3        | **split**  | `Split  me \\n now!` | `Split`, `me`, `now!` |   \n| 4        | **strip**  | `{Hello}!.` | `Hello` |   \n\nYou can create your tokenizer and add it to `tokenizers_all` in `tokenize_output.py`.\n\nTokenizing is a recursive process where every tokenizer returns `final` and `new` tokens. \nThe `final` tokens directly go to the result list of tokens. The `new` tokens go to all \ntokenizers again to find new tokens. As result if there is a mix of json and env data \nin the output it will be found and tokenized in appropriate way.  \n\n### How to add tokenizer\n\nYou can start from `env` tokenizer:\n\n1. [Prepare regexp](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L10)\n2. [Prepare tokenizer function](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L57-L70)\n3. [Add the function to the list](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L139-L144) and [to the preset](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L147).\n4. [Add test](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tests/test_tokenize.py#L34-L35).\n5. Now you can test and debug (see below).\n\n### Test and debug\nRun tests:\n```shell script\ncd ~\ngit clone https://github.com/anki-code/tokenize-output\ncd tokenize-output\npython -m pytest tests/\n```\nTo debug the tokenizer:\n```shell script\necho \"Hello world\" | ./tokenize-output -p\n```\n\n## Related projects\n* [xontrib-output-search][XONTRIB_OUTPUT_SEARCH] for [xonsh shell][XONSH]\n\n[XONTRIB_OUTPUT_SEARCH]: https://github.com/anki-code/xontrib-output-search\n[XONSH]: https://xon.sh/\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "Get identifiers, names, paths, URLs and words from the command output.",
    "version": "0.4.10",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0eac616053a95c2ab3eddf086222235434287124887369d700eb089eb5f337da",
                "md5": "f56708d5972c91c4cd6ad82e917cb3ce",
                "sha256": "1efb30e229d26840e5ff4c23c9d8fcfb059a641c20b3286fdf29ebaaab9bdb5c"
            },
            "downloads": -1,
            "filename": "tokenize_output-0.4.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f56708d5972c91c4cd6ad82e917cb3ce",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 5982,
            "upload_time": "2023-04-12T09:38:04",
            "upload_time_iso_8601": "2023-04-12T09:38:04.732695Z",
            "url": "https://files.pythonhosted.org/packages/0e/ac/616053a95c2ab3eddf086222235434287124887369d700eb089eb5f337da/tokenize_output-0.4.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4b18a301ad7a8ad40744179544a377c9e660ad6de18321159986aa4f93a859ad",
                "md5": "d9e03135fdc7bfe569722c8d1744d452",
                "sha256": "2930974b5e47e3fb12be4526085b87a2fcc96781c995c30645c33ea9c8d4d011"
            },
            "downloads": -1,
            "filename": "tokenize-output-0.4.10.tar.gz",
            "has_sig": false,
            "md5_digest": "d9e03135fdc7bfe569722c8d1744d452",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 5488,
            "upload_time": "2023-04-12T09:38:06",
            "upload_time_iso_8601": "2023-04-12T09:38:06.117389Z",
            "url": "https://files.pythonhosted.org/packages/4b/18/a301ad7a8ad40744179544a377c9e660ad6de18321159986aa4f93a859ad/tokenize-output-0.4.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-12 09:38:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "anki-code",
    "github_project": "tokenize-output",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "tokenize-output"
}

anki-code