# Jajula Chunking
A comprehensive Python library for text chunking strategies optimized for RAG (Retrieval-Augmented Generation) applications.
## Features
- **9 Different Chunking Strategies** - From simple fixed-size to advanced semantic chunking
- **RAG-Optimized** - Designed specifically for retrieval-augmented generation workflows
- **Easy to Use** - Simple, consistent API across all chunkers
- **Extensible** - Base classes for creating custom chunking strategies
- **Production Ready** - Comprehensive error handling and validation
## Installation
```bash
pip install jajula-chunking
```
## Quick Start
```python
from jajula_chunking import FixedSizeChunker, SemanticChunker
# Fixed-size chunking
chunker = FixedSizeChunker(chunk_size=500, overlap=50)
chunks = chunker.chunk("Your long text here...")
# Semantic chunking
semantic_chunker = SemanticChunker(similarity_threshold=0.6)
semantic_chunks = semantic_chunker.chunk("Your text here...")
for chunk in chunks:
print(f"ID: {chunk.chunk_id}")
print(f"Content: {chunk.content}")
print(f"Metadata: {chunk.metadata}")
print("---")
```
## Available Chunkers
1. **FixedSizeChunker** - Fixed character/word-based chunking
2. **SentenceBasedChunker** - Sentence boundary-based chunking
3. **ParagraphBasedChunker** - Paragraph boundary-based chunking
4. **SemanticChunker** - AI-powered semantic chunking
5. **HierarchicalChunker** - Multi-level hierarchical chunking
6. **StructureBasedChunker** - Document structure-based chunking (HTML/Markdown)
7. **TokenBasedChunker** - Token-count based chunking
8. **RecursiveChunker** - Recursive text splitting with multiple separators
9. **AdaptiveChunker** - Adaptive chunking based on content analysis
## Documentation
For detailed documentation and examples, visit: [Documentation Link]
## Contributing
Contributions are welcome! Please read our contributing guidelines for details.
## License
MIT License - see LICENSE file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "jajula-chunking",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "chunking, rag, nlp, text-processing, ai, machine-learning",
"author": null,
"author_email": "Jajula <contact@jajula.com>",
"download_url": "https://files.pythonhosted.org/packages/58/09/e48186cc0ce7291d64708b17175a8923ec17108cab2212c7acdad7e26187/jajula_chunking-0.1.1.tar.gz",
"platform": null,
"description": "# Jajula Chunking\r\n\r\nA comprehensive Python library for text chunking strategies optimized for RAG (Retrieval-Augmented Generation) applications.\r\n\r\n## Features\r\n\r\n- **9 Different Chunking Strategies** - From simple fixed-size to advanced semantic chunking\r\n- **RAG-Optimized** - Designed specifically for retrieval-augmented generation workflows\r\n- **Easy to Use** - Simple, consistent API across all chunkers\r\n- **Extensible** - Base classes for creating custom chunking strategies\r\n- **Production Ready** - Comprehensive error handling and validation\r\n\r\n## Installation\r\n\r\n```bash\r\npip install jajula-chunking\r\n```\r\n\r\n## Quick Start\r\n\r\n```python\r\nfrom jajula_chunking import FixedSizeChunker, SemanticChunker\r\n\r\n# Fixed-size chunking\r\nchunker = FixedSizeChunker(chunk_size=500, overlap=50)\r\nchunks = chunker.chunk(\"Your long text here...\")\r\n\r\n# Semantic chunking\r\nsemantic_chunker = SemanticChunker(similarity_threshold=0.6)\r\nsemantic_chunks = semantic_chunker.chunk(\"Your text here...\")\r\n\r\nfor chunk in chunks:\r\n print(f\"ID: {chunk.chunk_id}\")\r\n print(f\"Content: {chunk.content}\")\r\n print(f\"Metadata: {chunk.metadata}\")\r\n print(\"---\")\r\n```\r\n\r\n## Available Chunkers\r\n\r\n1. **FixedSizeChunker** - Fixed character/word-based chunking\r\n2. **SentenceBasedChunker** - Sentence boundary-based chunking\r\n3. **ParagraphBasedChunker** - Paragraph boundary-based chunking\r\n4. **SemanticChunker** - AI-powered semantic chunking\r\n5. **HierarchicalChunker** - Multi-level hierarchical chunking\r\n6. **StructureBasedChunker** - Document structure-based chunking (HTML/Markdown)\r\n7. **TokenBasedChunker** - Token-count based chunking\r\n8. **RecursiveChunker** - Recursive text splitting with multiple separators\r\n9. **AdaptiveChunker** - Adaptive chunking based on content analysis\r\n\r\n## Documentation\r\n\r\nFor detailed documentation and examples, visit: [Documentation Link]\r\n\r\n## Contributing\r\n\r\nContributions are welcome! Please read our contributing guidelines for details.\r\n\r\n## License\r\n\r\nMIT License - see LICENSE file for details.\r\n\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A comprehensive text chunking library for RAG applications with multiple strategies",
"version": "0.1.1",
"project_urls": {
"Documentation": "https://jajula-chunking.readthedocs.io",
"Homepage": "https://github.com/jajula/jajula-chunking",
"Issues": "https://github.com/jajula/jajula-chunking/issues",
"Repository": "https://github.com/jajula/jajula-chunking"
},
"split_keywords": [
"chunking",
" rag",
" nlp",
" text-processing",
" ai",
" machine-learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "e6176787ebde3614210d2a74261a83adab12e352352ae1888a83b5536104df4a",
"md5": "3d4245ecfff3333d9a9bd7d703d6927a",
"sha256": "6befe23570b1f12dd0bcd52d7cdf6767c8c86fb79cf706000e1de0e570163c70"
},
"downloads": -1,
"filename": "jajula_chunking-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3d4245ecfff3333d9a9bd7d703d6927a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 26500,
"upload_time": "2025-08-26T18:54:16",
"upload_time_iso_8601": "2025-08-26T18:54:16.105168Z",
"url": "https://files.pythonhosted.org/packages/e6/17/6787ebde3614210d2a74261a83adab12e352352ae1888a83b5536104df4a/jajula_chunking-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5809e48186cc0ce7291d64708b17175a8923ec17108cab2212c7acdad7e26187",
"md5": "79d3190b5f4ddb31dcd6c1b1de7093f7",
"sha256": "2421610f8008acbe733819a1365f50335cde46e9da6d3df9d5f6f019e5f72540"
},
"downloads": -1,
"filename": "jajula_chunking-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "79d3190b5f4ddb31dcd6c1b1de7093f7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 51668,
"upload_time": "2025-08-26T18:54:18",
"upload_time_iso_8601": "2025-08-26T18:54:18.003797Z",
"url": "https://files.pythonhosted.org/packages/58/09/e48186cc0ce7291d64708b17175a8923ec17108cab2212c7acdad7e26187/jajula_chunking-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-26 18:54:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jajula",
"github_project": "jajula-chunking",
"github_not_found": true,
"lcname": "jajula-chunking"
}