jajula-chunking


Namejajula-chunking JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryA comprehensive text chunking library for RAG applications with multiple strategies
upload_time2025-08-26 18:54:18
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords chunking rag nlp text-processing ai machine-learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Jajula Chunking

A comprehensive Python library for text chunking strategies optimized for RAG (Retrieval-Augmented Generation) applications.

## Features

- **9 Different Chunking Strategies** - From simple fixed-size to advanced semantic chunking
- **RAG-Optimized** - Designed specifically for retrieval-augmented generation workflows
- **Easy to Use** - Simple, consistent API across all chunkers
- **Extensible** - Base classes for creating custom chunking strategies
- **Production Ready** - Comprehensive error handling and validation

## Installation

```bash
pip install jajula-chunking
```

## Quick Start

```python
from jajula_chunking import FixedSizeChunker, SemanticChunker

# Fixed-size chunking
chunker = FixedSizeChunker(chunk_size=500, overlap=50)
chunks = chunker.chunk("Your long text here...")

# Semantic chunking
semantic_chunker = SemanticChunker(similarity_threshold=0.6)
semantic_chunks = semantic_chunker.chunk("Your text here...")

for chunk in chunks:
    print(f"ID: {chunk.chunk_id}")
    print(f"Content: {chunk.content}")
    print(f"Metadata: {chunk.metadata}")
    print("---")
```

## Available Chunkers

1. **FixedSizeChunker** - Fixed character/word-based chunking
2. **SentenceBasedChunker** - Sentence boundary-based chunking
3. **ParagraphBasedChunker** - Paragraph boundary-based chunking
4. **SemanticChunker** - AI-powered semantic chunking
5. **HierarchicalChunker** - Multi-level hierarchical chunking
6. **StructureBasedChunker** - Document structure-based chunking (HTML/Markdown)
7. **TokenBasedChunker** - Token-count based chunking
8. **RecursiveChunker** - Recursive text splitting with multiple separators
9. **AdaptiveChunker** - Adaptive chunking based on content analysis

## Documentation

For detailed documentation and examples, visit: [Documentation Link]

## Contributing

Contributions are welcome! Please read our contributing guidelines for details.

## License

MIT License - see LICENSE file for details.


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "jajula-chunking",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "chunking, rag, nlp, text-processing, ai, machine-learning",
    "author": null,
    "author_email": "Jajula <contact@jajula.com>",
    "download_url": "https://files.pythonhosted.org/packages/58/09/e48186cc0ce7291d64708b17175a8923ec17108cab2212c7acdad7e26187/jajula_chunking-0.1.1.tar.gz",
    "platform": null,
    "description": "# Jajula Chunking\r\n\r\nA comprehensive Python library for text chunking strategies optimized for RAG (Retrieval-Augmented Generation) applications.\r\n\r\n## Features\r\n\r\n- **9 Different Chunking Strategies** - From simple fixed-size to advanced semantic chunking\r\n- **RAG-Optimized** - Designed specifically for retrieval-augmented generation workflows\r\n- **Easy to Use** - Simple, consistent API across all chunkers\r\n- **Extensible** - Base classes for creating custom chunking strategies\r\n- **Production Ready** - Comprehensive error handling and validation\r\n\r\n## Installation\r\n\r\n```bash\r\npip install jajula-chunking\r\n```\r\n\r\n## Quick Start\r\n\r\n```python\r\nfrom jajula_chunking import FixedSizeChunker, SemanticChunker\r\n\r\n# Fixed-size chunking\r\nchunker = FixedSizeChunker(chunk_size=500, overlap=50)\r\nchunks = chunker.chunk(\"Your long text here...\")\r\n\r\n# Semantic chunking\r\nsemantic_chunker = SemanticChunker(similarity_threshold=0.6)\r\nsemantic_chunks = semantic_chunker.chunk(\"Your text here...\")\r\n\r\nfor chunk in chunks:\r\n    print(f\"ID: {chunk.chunk_id}\")\r\n    print(f\"Content: {chunk.content}\")\r\n    print(f\"Metadata: {chunk.metadata}\")\r\n    print(\"---\")\r\n```\r\n\r\n## Available Chunkers\r\n\r\n1. **FixedSizeChunker** - Fixed character/word-based chunking\r\n2. **SentenceBasedChunker** - Sentence boundary-based chunking\r\n3. **ParagraphBasedChunker** - Paragraph boundary-based chunking\r\n4. **SemanticChunker** - AI-powered semantic chunking\r\n5. **HierarchicalChunker** - Multi-level hierarchical chunking\r\n6. **StructureBasedChunker** - Document structure-based chunking (HTML/Markdown)\r\n7. **TokenBasedChunker** - Token-count based chunking\r\n8. **RecursiveChunker** - Recursive text splitting with multiple separators\r\n9. **AdaptiveChunker** - Adaptive chunking based on content analysis\r\n\r\n## Documentation\r\n\r\nFor detailed documentation and examples, visit: [Documentation Link]\r\n\r\n## Contributing\r\n\r\nContributions are welcome! Please read our contributing guidelines for details.\r\n\r\n## License\r\n\r\nMIT License - see LICENSE file for details.\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A comprehensive text chunking library for RAG applications with multiple strategies",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://jajula-chunking.readthedocs.io",
        "Homepage": "https://github.com/jajula/jajula-chunking",
        "Issues": "https://github.com/jajula/jajula-chunking/issues",
        "Repository": "https://github.com/jajula/jajula-chunking"
    },
    "split_keywords": [
        "chunking",
        " rag",
        " nlp",
        " text-processing",
        " ai",
        " machine-learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e6176787ebde3614210d2a74261a83adab12e352352ae1888a83b5536104df4a",
                "md5": "3d4245ecfff3333d9a9bd7d703d6927a",
                "sha256": "6befe23570b1f12dd0bcd52d7cdf6767c8c86fb79cf706000e1de0e570163c70"
            },
            "downloads": -1,
            "filename": "jajula_chunking-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3d4245ecfff3333d9a9bd7d703d6927a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 26500,
            "upload_time": "2025-08-26T18:54:16",
            "upload_time_iso_8601": "2025-08-26T18:54:16.105168Z",
            "url": "https://files.pythonhosted.org/packages/e6/17/6787ebde3614210d2a74261a83adab12e352352ae1888a83b5536104df4a/jajula_chunking-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5809e48186cc0ce7291d64708b17175a8923ec17108cab2212c7acdad7e26187",
                "md5": "79d3190b5f4ddb31dcd6c1b1de7093f7",
                "sha256": "2421610f8008acbe733819a1365f50335cde46e9da6d3df9d5f6f019e5f72540"
            },
            "downloads": -1,
            "filename": "jajula_chunking-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "79d3190b5f4ddb31dcd6c1b1de7093f7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 51668,
            "upload_time": "2025-08-26T18:54:18",
            "upload_time_iso_8601": "2025-08-26T18:54:18.003797Z",
            "url": "https://files.pythonhosted.org/packages/58/09/e48186cc0ce7291d64708b17175a8923ec17108cab2212c7acdad7e26187/jajula_chunking-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-26 18:54:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jajula",
    "github_project": "jajula-chunking",
    "github_not_found": true,
    "lcname": "jajula-chunking"
}
        
Elapsed time: 1.77965s