# semantic-text-splitter
[![Documentation Status](https://readthedocs.org/projects/semantic-text-splitter/badge/?version=stable)](https://semantic-text-splitter.readthedocs.io/en/latest/?badge=latest) [![Licence](https://img.shields.io/crates/l/text-splitter)](https://github.com/benbrandt/text-splitter/blob/main/LICENSE.txt)
Large language models (LLMs) can be used for many tasks, but often have a limited context size that can be smaller than documents you might want to use. To use documents of larger length, you often have to split your text into chunks to fit within this context size.
This crate provides methods for splitting longer pieces of text into smaller chunks, aiming to maximize a desired chunk size, but still splitting at semantically sensible boundaries whenever possible.
## Get Started
### By Number of Characters
```python
from semantic_text_splitter import TextSplitter
# Maximum number of characters in a chunk
max_characters = 1000
# Optionally can also have the splitter not trim whitespace for you
splitter = TextSplitter(max_characters)
# splitter = TextSplitter(max_characters, trim=False)
chunks = splitter.chunks("your document text")
```
### Using a Range for Chunk Capacity
You also have the option of specifying your chunk capacity as a range.
Once a chunk has reached a length that falls within the range it will be returned.
It is always possible that a chunk may be returned that is less than the `start` value, as adding the next piece of text may have made it larger than the `end` capacity.
```python
from semantic_text_splitter import TextSplitter
# Maximum number of characters in a chunk. Will fill up the
# chunk until it is somewhere in this range.
splitter = TextSplitter((200,1000))
chunks = splitter.chunks("your document text")
```
### Using a Hugging Face Tokenizer
```python
from semantic_text_splitter import TextSplitter
from tokenizers import Tokenizer
# Maximum number of tokens in a chunk
max_tokens = 1000
tokenizer = Tokenizer.from_pretrained("bert-base-uncased")
splitter = TextSplitter.from_huggingface_tokenizer(tokenizer, max_tokens)
chunks = splitter.chunks("your document text")
```
### Using a Tiktoken Tokenizer
```python
from semantic_text_splitter import TextSplitter
# Maximum number of tokens in a chunk
max_tokens = 1000
splitter = TextSplitter.from_tiktoken_model("gpt-3.5-turbo", max_tokens)
chunks = splitter.chunks("your document text")
```
### Using a Custom Callback
```python
from semantic_text_splitter import TextSplitter
splitter = TextSplitter.from_callback(lambda text: len(text), 1000)
chunks = splitter.chunks("your document text")
```
### Markdown
All of the above examples also can also work with Markdown text. You can use the `MarkdownSplitter` in the same ways as the `TextSplitter`.
```python
from semantic_text_splitter import MarkdownSplitter
# Maximum number of characters in a chunk
max_characters = 1000
# Optionally can also have the splitter not trim whitespace for you
splitter = MarkdownSplitter(max_characters)
# splitter = MarkdownSplitter(max_characters, trim=False)
chunks = splitter.chunks("# Header\n\nyour document text")
```
## Method
To preserve as much semantic meaning within a chunk as possible, each chunk is composed of the largest semantic units that can fit in the next given chunk. For each splitter type, there is a defined set of semantic levels. Here is an example of the steps used:
1. Split the text by a increasing semantic levels.
2. Check the first item for each level and select the highest level whose first item still fits within the chunk size.
3. Merge as many of these neighboring sections of this level or above into a chunk to maximize chunk length. Boundaries of higher semantic levels are always included when merging, so that the chunk doesn't inadvertantly cross semantic boundaries.
The boundaries used to split the text if using the `chunks` method, in ascending order:
### `TextSplitter` Semantic Levels
1. Characters
2. [Unicode Grapheme Cluster Boundaries](https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries)
3. [Unicode Word Boundaries](https://www.unicode.org/reports/tr29/#Word_Boundaries)
4. [Unicode Sentence Boundaries](https://www.unicode.org/reports/tr29/#Sentence_Boundaries)
5. Ascending sequence length of newlines. (Newline is `\r\n`, `\n`, or `\r`) Each unique length of consecutive newline sequences is treated as its own semantic level. So a sequence of 2 newlines is a higher level than a sequence of 1 newline, and so on.
Splitting doesn't occur below the character level, otherwise you could get partial bytes of a char, which may not be a valid unicode str.
### `MarkdownSplitter` Semantic Levels
Markdown is parsed according to the `CommonMark` spec, along with some optional features such as GitHub Flavored Markdown.
1. Characters
2. [Unicode Grapheme Cluster Boundaries](https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries)
3. [Unicode Word Boundaries](https://www.unicode.org/reports/tr29/#Word_Boundaries)
4. [Unicode Sentence Boundaries](https://www.unicode.org/reports/tr29/#Sentence_Boundaries)
5. Soft line breaks (single newline) which isn't necessarily a new element in Markdown.
6. Inline elements such as: text nodes, emphasis, strong, strikethrough, link, image, table cells, inline code, footnote references, task list markers, and inline html.
7. Block elements suce as: paragraphs, code blocks, footnote definitions, metadata. Also, a block quote or row/item within a table or list that can contain other "block" type elements, and a list or table that contains items.
8. Thematic breaks or horizontal rules.
9. Headings by level
Splitting doesn't occur below the character level, otherwise you could get partial bytes of a char, which may not be a valid unicode str.
### Note on sentences
There are lots of methods of determining sentence breaks, all to varying degrees of accuracy, and many requiring ML models to do so. Rather than trying to find the perfect sentence breaks, we rely on unicode method of sentence boundaries, which in most cases is good enough for finding a decent semantic breaking point if a paragraph is too large, and avoids the performance penalties of many other methods.
## Inspiration
This crate was inspired by [LangChain's TextSplitter](https://api.python.langchain.com/en/latest/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html#langchain_text_splitters.character.RecursiveCharacterTextSplitter). But, looking into the implementation, there was potential for better performance as well as better semantic chunking.
A big thank you to the Unicode team for their [icu_segmenter](https://crates.io/crates/icu_segmenter) crate that manages a lot of the complexity of matching the Unicode rules for words and sentences.
Raw data
{
"_id": null,
"home_page": null,
"name": "semantic-text-splitter",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "text, split, tokenizer, nlp, ai",
"author": "Ben Brandt <benjamin.j.brandt@gmail.com>",
"author_email": "Ben Brandt <benjamin.j.brandt@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/88/82/b4ac669f8d7e8ff7ee65796c989db8ae27540cff199f1c2e46f850fefad5/semantic_text_splitter-0.20.1.tar.gz",
"platform": null,
"description": "# semantic-text-splitter\n\n[![Documentation Status](https://readthedocs.org/projects/semantic-text-splitter/badge/?version=stable)](https://semantic-text-splitter.readthedocs.io/en/latest/?badge=latest) [![Licence](https://img.shields.io/crates/l/text-splitter)](https://github.com/benbrandt/text-splitter/blob/main/LICENSE.txt)\n\nLarge language models (LLMs) can be used for many tasks, but often have a limited context size that can be smaller than documents you might want to use. To use documents of larger length, you often have to split your text into chunks to fit within this context size.\n\nThis crate provides methods for splitting longer pieces of text into smaller chunks, aiming to maximize a desired chunk size, but still splitting at semantically sensible boundaries whenever possible.\n\n## Get Started\n\n### By Number of Characters\n\n```python\nfrom semantic_text_splitter import TextSplitter\n\n# Maximum number of characters in a chunk\nmax_characters = 1000\n# Optionally can also have the splitter not trim whitespace for you\nsplitter = TextSplitter(max_characters)\n# splitter = TextSplitter(max_characters, trim=False)\n\nchunks = splitter.chunks(\"your document text\")\n```\n\n### Using a Range for Chunk Capacity\n\nYou also have the option of specifying your chunk capacity as a range.\n\nOnce a chunk has reached a length that falls within the range it will be returned.\n\nIt is always possible that a chunk may be returned that is less than the `start` value, as adding the next piece of text may have made it larger than the `end` capacity.\n\n```python\nfrom semantic_text_splitter import TextSplitter\n\n\n# Maximum number of characters in a chunk. Will fill up the\n# chunk until it is somewhere in this range.\nsplitter = TextSplitter((200,1000))\n\nchunks = splitter.chunks(\"your document text\")\n```\n\n### Using a Hugging Face Tokenizer\n\n```python\nfrom semantic_text_splitter import TextSplitter\nfrom tokenizers import Tokenizer\n\n# Maximum number of tokens in a chunk\nmax_tokens = 1000\ntokenizer = Tokenizer.from_pretrained(\"bert-base-uncased\")\nsplitter = TextSplitter.from_huggingface_tokenizer(tokenizer, max_tokens)\n\nchunks = splitter.chunks(\"your document text\")\n```\n\n### Using a Tiktoken Tokenizer\n\n```python\nfrom semantic_text_splitter import TextSplitter\n\n# Maximum number of tokens in a chunk\nmax_tokens = 1000\nsplitter = TextSplitter.from_tiktoken_model(\"gpt-3.5-turbo\", max_tokens)\n\nchunks = splitter.chunks(\"your document text\")\n```\n\n### Using a Custom Callback\n\n```python\nfrom semantic_text_splitter import TextSplitter\n\nsplitter = TextSplitter.from_callback(lambda text: len(text), 1000)\n\nchunks = splitter.chunks(\"your document text\")\n```\n\n### Markdown\n\nAll of the above examples also can also work with Markdown text. You can use the `MarkdownSplitter` in the same ways as the `TextSplitter`.\n\n```python\nfrom semantic_text_splitter import MarkdownSplitter\n\n# Maximum number of characters in a chunk\nmax_characters = 1000\n# Optionally can also have the splitter not trim whitespace for you\nsplitter = MarkdownSplitter(max_characters)\n# splitter = MarkdownSplitter(max_characters, trim=False)\n\nchunks = splitter.chunks(\"# Header\\n\\nyour document text\")\n```\n\n## Method\n\nTo preserve as much semantic meaning within a chunk as possible, each chunk is composed of the largest semantic units that can fit in the next given chunk. For each splitter type, there is a defined set of semantic levels. Here is an example of the steps used:\n\n1. Split the text by a increasing semantic levels.\n2. Check the first item for each level and select the highest level whose first item still fits within the chunk size.\n3. Merge as many of these neighboring sections of this level or above into a chunk to maximize chunk length. Boundaries of higher semantic levels are always included when merging, so that the chunk doesn't inadvertantly cross semantic boundaries.\n\nThe boundaries used to split the text if using the `chunks` method, in ascending order:\n\n### `TextSplitter` Semantic Levels\n\n1. Characters\n2. [Unicode Grapheme Cluster Boundaries](https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries)\n3. [Unicode Word Boundaries](https://www.unicode.org/reports/tr29/#Word_Boundaries)\n4. [Unicode Sentence Boundaries](https://www.unicode.org/reports/tr29/#Sentence_Boundaries)\n5. Ascending sequence length of newlines. (Newline is `\\r\\n`, `\\n`, or `\\r`) Each unique length of consecutive newline sequences is treated as its own semantic level. So a sequence of 2 newlines is a higher level than a sequence of 1 newline, and so on.\n\nSplitting doesn't occur below the character level, otherwise you could get partial bytes of a char, which may not be a valid unicode str.\n\n### `MarkdownSplitter` Semantic Levels\n\nMarkdown is parsed according to the `CommonMark` spec, along with some optional features such as GitHub Flavored Markdown.\n\n1. Characters\n2. [Unicode Grapheme Cluster Boundaries](https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries)\n3. [Unicode Word Boundaries](https://www.unicode.org/reports/tr29/#Word_Boundaries)\n4. [Unicode Sentence Boundaries](https://www.unicode.org/reports/tr29/#Sentence_Boundaries)\n5. Soft line breaks (single newline) which isn't necessarily a new element in Markdown.\n6. Inline elements such as: text nodes, emphasis, strong, strikethrough, link, image, table cells, inline code, footnote references, task list markers, and inline html.\n7. Block elements suce as: paragraphs, code blocks, footnote definitions, metadata. Also, a block quote or row/item within a table or list that can contain other \"block\" type elements, and a list or table that contains items.\n8. Thematic breaks or horizontal rules.\n9. Headings by level\n\nSplitting doesn't occur below the character level, otherwise you could get partial bytes of a char, which may not be a valid unicode str.\n\n### Note on sentences\n\nThere are lots of methods of determining sentence breaks, all to varying degrees of accuracy, and many requiring ML models to do so. Rather than trying to find the perfect sentence breaks, we rely on unicode method of sentence boundaries, which in most cases is good enough for finding a decent semantic breaking point if a paragraph is too large, and avoids the performance penalties of many other methods.\n\n## Inspiration\n\nThis crate was inspired by [LangChain's TextSplitter](https://api.python.langchain.com/en/latest/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html#langchain_text_splitters.character.RecursiveCharacterTextSplitter). But, looking into the implementation, there was potential for better performance as well as better semantic chunking.\n\nA big thank you to the Unicode team for their [icu_segmenter](https://crates.io/crates/icu_segmenter) crate that manages a lot of the complexity of matching the Unicode rules for words and sentences.\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.",
"version": "0.20.1",
"project_urls": {
"Source Code": "https://github.com/benbrandt/text-splitter"
},
"split_keywords": [
"text",
" split",
" tokenizer",
" nlp",
" ai"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "216b9834ccec8140b13b5c0a70cc6aea240f8ff685114cc9c2f68c1869a87f07",
"md5": "d671378c57c1953dbdfa430fdacc356a",
"sha256": "bd5e50abf89a7877840fb85a6d6f62b8b8267c849183733f22eec28cb18b4b79"
},
"downloads": -1,
"filename": "semantic_text_splitter-0.20.1-cp39-abi3-macosx_10_12_x86_64.whl",
"has_sig": false,
"md5_digest": "d671378c57c1953dbdfa430fdacc356a",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 8091579,
"upload_time": "2025-01-01T20:27:27",
"upload_time_iso_8601": "2025-01-01T20:27:27.052368Z",
"url": "https://files.pythonhosted.org/packages/21/6b/9834ccec8140b13b5c0a70cc6aea240f8ff685114cc9c2f68c1869a87f07/semantic_text_splitter-0.20.1-cp39-abi3-macosx_10_12_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1821ccbbacee1c6c67546afa83b56c9a9086447e75009d985af37c4ea4266d9c",
"md5": "02e8d81402bca884f72625a15a402374",
"sha256": "674a1342ebc71df957e09eaacb4bf1394d7d0e28fecc0640336f09b408e5bb2f"
},
"downloads": -1,
"filename": "semantic_text_splitter-0.20.1-cp39-abi3-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "02e8d81402bca884f72625a15a402374",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 8077342,
"upload_time": "2025-01-01T20:27:30",
"upload_time_iso_8601": "2025-01-01T20:27:30.600669Z",
"url": "https://files.pythonhosted.org/packages/18/21/ccbbacee1c6c67546afa83b56c9a9086447e75009d985af37c4ea4266d9c/semantic_text_splitter-0.20.1-cp39-abi3-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "07d6a479ac3012b2d56d10ecdb9297bb9105b7eb4e2ce417379aee48739a0717",
"md5": "b62e18b99ddeb1b3388ac6f444b803d2",
"sha256": "5ba9623eb3ea46c5bf73f12bd894162b7d92a9cfbf7733d61d02583252032d09"
},
"downloads": -1,
"filename": "semantic_text_splitter-0.20.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"has_sig": false,
"md5_digest": "b62e18b99ddeb1b3388ac6f444b803d2",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 8299304,
"upload_time": "2025-01-01T20:27:34",
"upload_time_iso_8601": "2025-01-01T20:27:34.342791Z",
"url": "https://files.pythonhosted.org/packages/07/d6/a479ac3012b2d56d10ecdb9297bb9105b7eb4e2ce417379aee48739a0717/semantic_text_splitter-0.20.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "60fa2f9c38ba17b324286c06a00d5dcb98cfdf3c45fb35e357c10ab53b8068f6",
"md5": "1f7fb2c0b11e43fa6fb63c87f54cdb48",
"sha256": "e544ccdda690cc2bc8540e25c1baa7dd419a208d9832f4dd0065e4cf6f1aa72e"
},
"downloads": -1,
"filename": "semantic_text_splitter-0.20.1-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl",
"has_sig": false,
"md5_digest": "1f7fb2c0b11e43fa6fb63c87f54cdb48",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 8190823,
"upload_time": "2025-01-01T20:27:36",
"upload_time_iso_8601": "2025-01-01T20:27:36.625925Z",
"url": "https://files.pythonhosted.org/packages/60/fa/2f9c38ba17b324286c06a00d5dcb98cfdf3c45fb35e357c10ab53b8068f6/semantic_text_splitter-0.20.1-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d30162706b62eeadbb43eaf86cf936de4d7e6505813966e5fcf20310f3a26015",
"md5": "6908171e4e2b980f0c21f981ec6f3a1b",
"sha256": "665cd5508064bc1a8dbb819ec8242691da4199becb725b3963cc4c6a1bdcfb09"
},
"downloads": -1,
"filename": "semantic_text_splitter-0.20.1-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl",
"has_sig": false,
"md5_digest": "6908171e4e2b980f0c21f981ec6f3a1b",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 8453613,
"upload_time": "2025-01-01T20:27:38",
"upload_time_iso_8601": "2025-01-01T20:27:38.828980Z",
"url": "https://files.pythonhosted.org/packages/d3/01/62706b62eeadbb43eaf86cf936de4d7e6505813966e5fcf20310f3a26015/semantic_text_splitter-0.20.1-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "8430381fe4775736f4d27e1263f1a163f7accd1964cc586173c55653e315d834",
"md5": "a83d15fc0fe7d2682ed65dc485041bce",
"sha256": "1a0f9ab9cb8e45a904d2637dcb293d99197b0ec36f459c191f13b846be748a79"
},
"downloads": -1,
"filename": "semantic_text_splitter-0.20.1-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl",
"has_sig": false,
"md5_digest": "a83d15fc0fe7d2682ed65dc485041bce",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 8510853,
"upload_time": "2025-01-01T20:27:42",
"upload_time_iso_8601": "2025-01-01T20:27:42.830059Z",
"url": "https://files.pythonhosted.org/packages/84/30/381fe4775736f4d27e1263f1a163f7accd1964cc586173c55653e315d834/semantic_text_splitter-0.20.1-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "eba5781c5d545fc6f6cc3bfaed86bb1ce7548bb3c2d96810a7f8d82bea4fd7eb",
"md5": "d8349c1e78ba556b89385050c377f1ce",
"sha256": "4cded452bb7ef5841c67ff78e82281a7fca774277dec4cd2de1aa7cf835cea06"
},
"downloads": -1,
"filename": "semantic_text_splitter-0.20.1-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl",
"has_sig": false,
"md5_digest": "d8349c1e78ba556b89385050c377f1ce",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 8771215,
"upload_time": "2025-01-01T20:27:47",
"upload_time_iso_8601": "2025-01-01T20:27:47.114321Z",
"url": "https://files.pythonhosted.org/packages/eb/a5/781c5d545fc6f6cc3bfaed86bb1ce7548bb3c2d96810a7f8d82bea4fd7eb/semantic_text_splitter-0.20.1-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b3ace3cbc7b63d7f083cadf7d3f3946f7cefef0af56aca2c75bb2d7da50fa3bb",
"md5": "89e482d2871586d2ffab61fb342acdd0",
"sha256": "f85f21be2c4687c873fb7d7aa78d9ecaae76215ca0adf1731666ffd46db8beef"
},
"downloads": -1,
"filename": "semantic_text_splitter-0.20.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "89e482d2871586d2ffab61fb342acdd0",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 8397034,
"upload_time": "2025-01-01T20:27:52",
"upload_time_iso_8601": "2025-01-01T20:27:52.241768Z",
"url": "https://files.pythonhosted.org/packages/b3/ac/e3cbc7b63d7f083cadf7d3f3946f7cefef0af56aca2c75bb2d7da50fa3bb/semantic_text_splitter-0.20.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "7a8e4ed6ffc4bd48e7807ba00bb2b03e8f020318763beefdc99f3b8fa87cbb58",
"md5": "0b7603371325198c76f2afefdbb4891a",
"sha256": "aa343bcb976eddbe5694f4b84ce963ae2f9575d46bb9726120b50ca5442feaad"
},
"downloads": -1,
"filename": "semantic_text_splitter-0.20.1-cp39-abi3-win32.whl",
"has_sig": false,
"md5_digest": "0b7603371325198c76f2afefdbb4891a",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 7640492,
"upload_time": "2025-01-01T20:27:55",
"upload_time_iso_8601": "2025-01-01T20:27:55.025125Z",
"url": "https://files.pythonhosted.org/packages/7a/8e/4ed6ffc4bd48e7807ba00bb2b03e8f020318763beefdc99f3b8fa87cbb58/semantic_text_splitter-0.20.1-cp39-abi3-win32.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6c52b2e7cf01b2d3e043e61387984ff1d5f7397e033a8cbd82cafbb22c8bd850",
"md5": "361b1fb6f484914ec037c8badea68a44",
"sha256": "945a26e9429664a3d17a3ad36d51d1ac53cc0ed5ae4e18d16d78b27c036a92c7"
},
"downloads": -1,
"filename": "semantic_text_splitter-0.20.1-cp39-abi3-win_amd64.whl",
"has_sig": false,
"md5_digest": "361b1fb6f484914ec037c8badea68a44",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 7812211,
"upload_time": "2025-01-01T20:27:57",
"upload_time_iso_8601": "2025-01-01T20:27:57.240261Z",
"url": "https://files.pythonhosted.org/packages/6c/52/b2e7cf01b2d3e043e61387984ff1d5f7397e033a8cbd82cafbb22c8bd850/semantic_text_splitter-0.20.1-cp39-abi3-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "8882b4ac669f8d7e8ff7ee65796c989db8ae27540cff199f1c2e46f850fefad5",
"md5": "d386a2cc92e0b4f86fab964ed3ca2f69",
"sha256": "cf20fbc897e53dfe573432abd63472901cb1166d2089c4f6999067becb16f2b9"
},
"downloads": -1,
"filename": "semantic_text_splitter-0.20.1.tar.gz",
"has_sig": false,
"md5_digest": "d386a2cc92e0b4f86fab964ed3ca2f69",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 286746,
"upload_time": "2025-01-01T20:28:00",
"upload_time_iso_8601": "2025-01-01T20:28:00.028853Z",
"url": "https://files.pythonhosted.org/packages/88/82/b4ac669f8d7e8ff7ee65796c989db8ae27540cff199f1c2e46f850fefad5/semantic_text_splitter-0.20.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-01 20:28:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "benbrandt",
"github_project": "text-splitter",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "semantic-text-splitter"
}