tixent


Nametixent JSON
Version 0.0.3 PyPI version JSON
download
home_pageNone
SummaryA text splitting tool
upload_time2023-12-07 05:26:52
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords openai tiktoken token
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Tixent: A Text Splitting Tool

**PyPI**
[![PyPI - Version](https://img.shields.io/pypi/v/tixent.svg)](https://pypi.org/project/tixent)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/tixent.svg)](https://pypi.org/project/tixent)
[![License](https://img.shields.io/pypi/l/tixent.svg)](https://github.com/sincekmori/tixent/blob/main/LICENSE)

**CI/CD**
[![test](https://github.com/sincekmori/tixent/actions/workflows/test.yml/badge.svg)](https://github.com/sincekmori/tixent/actions/workflows/test.yml)
[![lint](https://github.com/sincekmori/tixent/actions/workflows/lint.yml/badge.svg)](https://github.com/sincekmori/tixent/actions/workflows/lint.yml)

**Build System**
[![Hatch project](https://img.shields.io/badge/%F0%9F%A5%9A-Hatch-4051b5.svg)](https://github.com/pypa/hatch)

**Code**
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
[![Checked with mypy](https://www.mypy-lang.org/static/mypy_badge.svg)](https://mypy-lang.org/)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v1.json)](https://github.com/charliermarsh/ruff)

**Docstrings**
[![docformatter](https://img.shields.io/badge/%20formatter-docformatter-fedcba.svg)](https://github.com/PyCQA/docformatter)
[![numpy](https://img.shields.io/badge/%20style-numpy-459db9.svg)](https://numpydoc.readthedocs.io/en/latest/format.html)

---

## Installation

```console
pip install tixent
```

## Example

Suppose we have a function template that generates a string from a list of texts.
Additionally, suppose we have a large list of texts.
When you apply that list of texts to the function, it generates a long string.

Tixent can split the string generated by the template function so that the return value of _counter_ for each element is less than a certain number.

Here, _counter_ is a function that maps a string to an integer.
Examples of such functions are `len`, which measures the length of a string, or `tiktoken_counter("text-davinci-003")`, which measures the number of tokens in a string

```python
from typing import List

from tixent import split, tiktoken_counter


def summarization_template(texts: List[str]) -> str:
    text = " ".join(texts)
    t = "Summarize the following text.\n"
    t += f'Text: """{text}"""'
    return t


texts = [
    "Lorem ipsum dolor sit amet",
    "consectetur adipiscing elit",
    "sed do eiusmod tempor incididunt ut labore et dolore magna aliqua",
    "Ut enim ad minim veniam",
    "quis nostrud exercitation ullamco laboris nisi",
    "ut aliquip ex ea commodo consequat",
    "Duis aute irure dolor in reprehenderit in voluptate velit",
    "esse cillum dolore eu fugiat nulla pariatur",
    "Excepteur sint occaecat cupidatat non proident",
    "sunt in culpa qui officia deserunt mollit anim id est laborum",
]
counter = tiktoken_counter("text-davinci-003")
max_count = 60

split_texts = split(texts, summarization_template, counter, max_count)
for text in split_texts:
    count = counter(text)
    assert count <= max_count
    print(f"count: {count}")
    print(text)
    print()
```

```console
count: 60
Summarize the following text.
Text: """Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim veniam"""

count: 58
Summarize the following text.
Text: """quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat Duis aute irure dolor in reprehenderit in voluptate velit"""

count: 43
Summarize the following text.
Text: """esse cillum dolore eu fugiat nulla pariatur Excepteur sint occaecat cupidatat non proident"""

count: 31
Summarize the following text.
Text: """sunt in culpa qui officia deserunt mollit anim id est laborum"""
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "tixent",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "openai,tiktoken,token",
    "author": null,
    "author_email": "Shinsuke Mori <sincekmori@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/1d/34/a38d076c40a35dae1edd784c4eb2543b186212260aec42e8aa9ee85f32f3/tixent-0.0.3.tar.gz",
    "platform": null,
    "description": "# Tixent: A Text Splitting Tool\n\n**PyPI**\n[![PyPI - Version](https://img.shields.io/pypi/v/tixent.svg)](https://pypi.org/project/tixent)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/tixent.svg)](https://pypi.org/project/tixent)\n[![License](https://img.shields.io/pypi/l/tixent.svg)](https://github.com/sincekmori/tixent/blob/main/LICENSE)\n\n**CI/CD**\n[![test](https://github.com/sincekmori/tixent/actions/workflows/test.yml/badge.svg)](https://github.com/sincekmori/tixent/actions/workflows/test.yml)\n[![lint](https://github.com/sincekmori/tixent/actions/workflows/lint.yml/badge.svg)](https://github.com/sincekmori/tixent/actions/workflows/lint.yml)\n\n**Build System**\n[![Hatch project](https://img.shields.io/badge/%F0%9F%A5%9A-Hatch-4051b5.svg)](https://github.com/pypa/hatch)\n\n**Code**\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)\n[![Checked with mypy](https://www.mypy-lang.org/static/mypy_badge.svg)](https://mypy-lang.org/)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v1.json)](https://github.com/charliermarsh/ruff)\n\n**Docstrings**\n[![docformatter](https://img.shields.io/badge/%20formatter-docformatter-fedcba.svg)](https://github.com/PyCQA/docformatter)\n[![numpy](https://img.shields.io/badge/%20style-numpy-459db9.svg)](https://numpydoc.readthedocs.io/en/latest/format.html)\n\n---\n\n## Installation\n\n```console\npip install tixent\n```\n\n## Example\n\nSuppose we have a function template that generates a string from a list of texts.\nAdditionally, suppose we have a large list of texts.\nWhen you apply that list of texts to the function, it generates a long string.\n\nTixent can split the string generated by the template function so that the return value of _counter_ for each element is less than a certain number.\n\nHere, _counter_ is a function that maps a string to an integer.\nExamples of such functions are `len`, which measures the length of a string, or `tiktoken_counter(\"text-davinci-003\")`, which measures the number of tokens in a string\n\n```python\nfrom typing import List\n\nfrom tixent import split, tiktoken_counter\n\n\ndef summarization_template(texts: List[str]) -> str:\n    text = \" \".join(texts)\n    t = \"Summarize the following text.\\n\"\n    t += f'Text: \"\"\"{text}\"\"\"'\n    return t\n\n\ntexts = [\n    \"Lorem ipsum dolor sit amet\",\n    \"consectetur adipiscing elit\",\n    \"sed do eiusmod tempor incididunt ut labore et dolore magna aliqua\",\n    \"Ut enim ad minim veniam\",\n    \"quis nostrud exercitation ullamco laboris nisi\",\n    \"ut aliquip ex ea commodo consequat\",\n    \"Duis aute irure dolor in reprehenderit in voluptate velit\",\n    \"esse cillum dolore eu fugiat nulla pariatur\",\n    \"Excepteur sint occaecat cupidatat non proident\",\n    \"sunt in culpa qui officia deserunt mollit anim id est laborum\",\n]\ncounter = tiktoken_counter(\"text-davinci-003\")\nmax_count = 60\n\nsplit_texts = split(texts, summarization_template, counter, max_count)\nfor text in split_texts:\n    count = counter(text)\n    assert count <= max_count\n    print(f\"count: {count}\")\n    print(text)\n    print()\n```\n\n```console\ncount: 60\nSummarize the following text.\nText: \"\"\"Lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim veniam\"\"\"\n\ncount: 58\nSummarize the following text.\nText: \"\"\"quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat Duis aute irure dolor in reprehenderit in voluptate velit\"\"\"\n\ncount: 43\nSummarize the following text.\nText: \"\"\"esse cillum dolore eu fugiat nulla pariatur Excepteur sint occaecat cupidatat non proident\"\"\"\n\ncount: 31\nSummarize the following text.\nText: \"\"\"sunt in culpa qui officia deserunt mollit anim id est laborum\"\"\"\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A text splitting tool",
    "version": "0.0.3",
    "project_urls": {
        "Documentation": "https://github.com/sincekmori/tixent#readme",
        "Issues": "https://github.com/sincekmori/tixent/issues",
        "Source": "https://github.com/sincekmori/tixent"
    },
    "split_keywords": [
        "openai",
        "tiktoken",
        "token"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a5e05ac3b710c107bfc52faee283a413d6563f8d0ac376de778b7c8cdf93da4d",
                "md5": "bd5e10b40f9cad21e96b55eeab7353b0",
                "sha256": "3edee261041b62f17a8da48d7852baf71b6e1e0fa34194c10abc7a6432973b48"
            },
            "downloads": -1,
            "filename": "tixent-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bd5e10b40f9cad21e96b55eeab7353b0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 5520,
            "upload_time": "2023-12-07T05:26:54",
            "upload_time_iso_8601": "2023-12-07T05:26:54.575580Z",
            "url": "https://files.pythonhosted.org/packages/a5/e0/5ac3b710c107bfc52faee283a413d6563f8d0ac376de778b7c8cdf93da4d/tixent-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1d34a38d076c40a35dae1edd784c4eb2543b186212260aec42e8aa9ee85f32f3",
                "md5": "a578e2ebb2febec9a2b094a40b6c8b86",
                "sha256": "72084f3d107d953435b72c004c93af334a121330d115d92a8a1d2fa2e41c9488"
            },
            "downloads": -1,
            "filename": "tixent-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "a578e2ebb2febec9a2b094a40b6c8b86",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 6474,
            "upload_time": "2023-12-07T05:26:52",
            "upload_time_iso_8601": "2023-12-07T05:26:52.824728Z",
            "url": "https://files.pythonhosted.org/packages/1d/34/a38d076c40a35dae1edd784c4eb2543b186212260aec42e8aa9ee85f32f3/tixent-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-07 05:26:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sincekmori",
    "github_project": "tixent#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "tixent"
}
        
Elapsed time: 0.14994s