ragctl


Nameragctl JSON
Version 0.1.3 PyPI version JSON
download
home_pageNone
SummaryRAG Studio - Production-ready RAG toolkit with advanced OCR, semantic chunking, and intelligent document processing
upload_time2025-10-30 10:41:20
maintainerNone
docs_urlNone
authorNone
requires_python<3.13,>=3.10
licenseNone
keywords rag document-processing ocr chunking nlp machine-learning embeddings semantic-search
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ๐Ÿš€ RAG Studio

**Production-ready document processing CLI for RAG applications**

Process documents, extract text with advanced OCR, chunk intelligently, and prepare data for RAG systems - all from the command line with `ragctl`.

[![Version](https://img.shields.io/badge/version-0.1.3-blue.svg)](https://github.com/horiz-data/ragstudio)
[![PyPI](https://img.shields.io/badge/pypi-ragctl-blue.svg)](https://pypi.org/project/ragctl/)
[![Status](https://img.shields.io/badge/status-beta-yellow.svg)](https://github.com/horiz-data/ragstudio)
[![Tests](https://img.shields.io/badge/tests-496%20passed-success.svg)](https://github.com/horiz-data/ragstudio)
[![Coverage](https://img.shields.io/badge/coverage-41%25-yellow.svg)](https://github.com/horiz-data/ragstudio)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)

---

## ๐ŸŽฏ What is RAG Studio?

RAG Studio (`ragctl`) is a **command-line tool** for processing documents into chunks ready for Retrieval-Augmented Generation (RAG) systems. It handles the dirty work of document ingestion, OCR, and intelligent chunking so you can focus on building your RAG application.

**Key capabilities:**
- ๐Ÿ“„ Universal document loading (PDF, DOCX, images, HTML, Markdown, etc.)
- ๐Ÿ” Advanced OCR with automatic fallback (EasyOCR โ†’ PaddleOCR โ†’ pytesseract)
- โœ‚๏ธ Intelligent semantic chunking using LangChain
- ๐Ÿ“ฆ Production-ready batch processing with auto-retry
- ๐Ÿ’พ Multiple export formats (JSON, JSONL, CSV)
- ๐Ÿ—„๏ธ Direct ingestion into Qdrant vector store

---

## โœจ Features

### ๐Ÿ“„ Universal Document Processing
- **Supported formats**: PDF, DOCX, ODT, TXT, HTML, Markdown, Images (JPEG, PNG)
- **Smart OCR cascade**:
  1. EasyOCR (best quality, multi-language)
  2. PaddleOCR (fast, good for complex layouts)
  3. pytesseract (fallback, most tolerant)
- **Quality detection**: Automatically rejects unreadable documents
- **Multi-language**: French, English, German, Spanish, Italian, Portuguese, and more

### โœ‚๏ธ Intelligent Chunking
- **Semantic chunking**: Context-aware text splitting using LangChain RecursiveCharacterTextSplitter
- **Multiple strategies**:
  - `semantic` - Smart splitting by meaning (default)
  - `sentence` - Split by sentences
  - `token` - Fixed token-based splitting
- **Configurable**: Token limits (50-2000), overlap (0-500), model selection
- **Rich metadata**: Source file, chunk index, token count, strategy, timestamps

### ๐Ÿ”„ Production-Ready Batch Processing
- **Automatic retry**: Up to 3 attempts with exponential backoff (1s, 2s, 4s...)
- **Interactive error handling**:
  - `interactive` - Prompt user on each error (default)
  - `auto-continue` - Continue on errors (CI/CD mode)
  - `auto-stop` - Stop on first error (validation mode)
  - `auto-skip` - Skip failed files automatically
- **Complete history**: Every run saved to `~/.atlasrag/history/`
- **Retry capability**: `ragctl retry` to rerun failed files only
- **Per-file output**: One chunk file per document for better traceability

### ๐Ÿ’พ Flexible Export & Storage
- **Export formats**: JSON, JSONL (streaming), CSV (Excel-compatible)
- **Vector store integration**: Direct ingestion into Qdrant
- **No database required**: Pure file-based export for easy sharing

### โš™๏ธ Configuration System
- **Hierarchical config**: CLI flags > Environment variables > YAML file > Defaults
- **Example config**: `config.example.yml` with detailed documentation
- **Easy customization**: Override any setting via command line

---

## ๐Ÿš€ Quick Start

### Installation

#### From PyPI (Recommended)

```bash
# Install from PyPI
pip install ragctl

# Verify installation
ragctl --version
```

#### From Source

```bash
# Clone repository
git clone git@github.com:horiz-data/ragstudio.git
cd ragstudio

# Install with pip
pip install -e .

# Verify installation
ragctl --version
```

### Basic Usage

```bash
# Process a single document
ragctl chunk document.pdf --show

# Process with advanced OCR for scanned documents
ragctl chunk scanned.pdf --advanced-ocr -o chunks.json

# Batch process a folder
ragctl batch ./documents --output ./chunks/

# Batch with auto-retry for CI/CD
ragctl batch ./documents --output ./chunks/ --auto-continue
```

---

## ๐Ÿ’ก Usage Examples

### Single Document Processing

```bash
# Simple text file
ragctl chunk document.txt --show

# PDF with semantic chunking (default)
ragctl chunk report.pdf -o report_chunks.json

# Scanned image with OCR
ragctl chunk contract.jpeg --advanced-ocr --show

# Custom chunking parameters
ragctl chunk document.pdf \
  --strategy semantic \
  --max-tokens 500 \
  --overlap 100 \
  -o output.jsonl
```

### Batch Processing

```bash
# Process all files in a directory
ragctl batch ./documents --output ./chunks/

# Process only PDFs recursively
ragctl batch ./documents \
  --pattern "*.pdf" \
  --recursive \
  --output ./chunks/

# CI/CD mode - continue on errors
ragctl batch ./documents \
  --output ./chunks/ \
  --auto-continue \
  --save-history

# Per-file output (default):
# chunks/
# โ”œโ”€โ”€ doc1_chunks.jsonl  (25 chunks)
# โ”œโ”€โ”€ doc2_chunks.jsonl  (42 chunks)
# โ””โ”€โ”€ doc3_chunks.jsonl  (18 chunks)

# Single-file output (all chunks combined):
ragctl batch ./documents \
  --output ./all_chunks.jsonl \
  --single-file
```

### Retry Failed Files

```bash
# Show last failed run
ragctl retry --show

# Retry all failed files from last run
ragctl retry

# Retry specific run by ID
ragctl retry run_20251028_133403
```

### Vector Store Integration

```bash
# Ingest chunks into Qdrant
ragctl ingest chunks.jsonl \
  --collection my-docs \
  --url http://localhost:6333

# Get system info
ragctl info
```

### Evaluate Chunking Quality

```bash
# Evaluate chunking strategy
ragctl eval document.pdf \
  --strategies semantic sentence token \
  --metrics coverage overlap coherence

# Compare strategies with visualization
ragctl eval document.pdf --compare --output eval_results.json
```

---

## ๐Ÿ“š Documentation

| Document | Description |
|----------|-------------|
| **[Getting Started](docs/getting-started.md)** | Installation and first steps |
| **[CLI Guide](docs/cli-guide.md)** | Complete command reference |
| **[Security](docs/security/)** | Security features and best practices |
| **[Full Documentation](docs/)** | Complete documentation index |

---

## โš™๏ธ Configuration

Create `~/.atlasrag/config.yml` or use CLI flags:

```yaml
# OCR settings
ocr:
  use_advanced_ocr: false
  enable_fallback: true

# Chunking settings
chunking:
  strategy: semantic
  max_tokens: 400
  overlap: 50

# Output settings
output:
  format: jsonl
  include_metadata: true
  pretty_print: true
```

**Configuration hierarchy**: CLI flags > Environment variables > YAML config > Defaults

---

## ๐Ÿงช Testing

```bash
# Run all tests
make test

# Run CLI tests
make test-cli

# Quick validation
ragctl --version
ragctl chunk tests/data/sample.txt --show
```

**Test Coverage**: 129 tests, 96% coverage

---

## ๐Ÿ“Š Performance

### Processing Speed
- **Text documents**: ~100-200 docs/minute
- **PDFs with OCR**: ~5-10 docs/minute (depends on page count)
- **Batch processing**: Parallel-ready with retry mechanism

### Quality Metrics
- **OCR accuracy**: 95%+ with EasyOCR on clear scans
- **Chunk quality**: 90% readability threshold enforced
- **Semantic coherence**: LangChain's RecursiveCharacterTextSplitter optimized for context

---

## ๐Ÿ› ๏ธ CLI Commands

| Command | Description |
|---------|-------------|
| `ragctl chunk` | Process a single document |
| `ragctl batch` | Batch process multiple files |
| `ragctl retry` | Retry failed files from history |
| `ragctl ingest` | Ingest chunks into Qdrant |
| `ragctl eval` | Evaluate chunking quality |
| `ragctl info` | System information |

Run `ragctl COMMAND --help` for detailed options.

---

## ๐Ÿ› Troubleshooting

### Common Issues

**NumPy incompatibility**
```bash
# For OCR support, use NumPy 1.x
pip install "numpy<2.0"
```

**Missing system dependencies**
```bash
# Ubuntu/Debian
sudo apt-get install tesseract-ocr poppler-utils

# macOS
brew install tesseract poppler
```

**"Document unreadable" errors**
- Try lowering quality threshold: `--ocr-threshold 0.2`
- Use advanced OCR: `--advanced-ocr`
- Check document is not corrupted

**Import errors**
```bash
# Reinstall dependencies
pip install -e .
```

More help: [Getting Started Guide](docs/getting-started.md#troubleshooting)

---

## ๐Ÿ”ง Development

```bash
# Install dev dependencies
make install-dev

# Format code
make format

# Run linters
make lint

# Install pre-commit hooks
make pre-commit-install

# Run all CI checks
make ci-all
```

---

## ๐Ÿ“ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

## ๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines (coming soon).

---

## ๐Ÿ“ง Support

- **Documentation**: [docs/](docs/)
- **Issues**: [GitHub Issues](https://github.com/horiz-data/ragstudio/issues)
- **Discussions**: [GitHub Discussions](https://github.com/horiz-data/ragstudio/discussions)

---

## ๐Ÿ™ Acknowledgments

Built with:
- [LangChain](https://github.com/langchain-ai/langchain) - Text splitting and document loading
- [EasyOCR](https://github.com/JaidedAI/EasyOCR) - OCR engine
- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) - Alternative OCR engine
- [Unstructured](https://github.com/Unstructured-IO/unstructured) - Document parsing
- [Typer](https://github.com/tiangolo/typer) - CLI framework
- [Rich](https://github.com/Textualize/rich) - Terminal formatting

---

**Version**: 0.1.2 | **Status**: Beta | **License**: MIT
# Test

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ragctl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": "Horiz Data <sekkaahmed@gmail.com>",
    "keywords": "rag, document-processing, ocr, chunking, nlp, machine-learning, embeddings, semantic-search",
    "author": null,
    "author_email": "Horiz Data <sekkaahmed@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/70/6b/daa081d1d0cd34ffba0b8c7efa72a6160797480022f7cd54c2c81f517a29/ragctl-0.1.3.tar.gz",
    "platform": null,
    "description": "# \ud83d\ude80 RAG Studio\n\n**Production-ready document processing CLI for RAG applications**\n\nProcess documents, extract text with advanced OCR, chunk intelligently, and prepare data for RAG systems - all from the command line with `ragctl`.\n\n[![Version](https://img.shields.io/badge/version-0.1.3-blue.svg)](https://github.com/horiz-data/ragstudio)\n[![PyPI](https://img.shields.io/badge/pypi-ragctl-blue.svg)](https://pypi.org/project/ragctl/)\n[![Status](https://img.shields.io/badge/status-beta-yellow.svg)](https://github.com/horiz-data/ragstudio)\n[![Tests](https://img.shields.io/badge/tests-496%20passed-success.svg)](https://github.com/horiz-data/ragstudio)\n[![Coverage](https://img.shields.io/badge/coverage-41%25-yellow.svg)](https://github.com/horiz-data/ragstudio)\n[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)\n[![Python](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)\n\n---\n\n## \ud83c\udfaf What is RAG Studio?\n\nRAG Studio (`ragctl`) is a **command-line tool** for processing documents into chunks ready for Retrieval-Augmented Generation (RAG) systems. It handles the dirty work of document ingestion, OCR, and intelligent chunking so you can focus on building your RAG application.\n\n**Key capabilities:**\n- \ud83d\udcc4 Universal document loading (PDF, DOCX, images, HTML, Markdown, etc.)\n- \ud83d\udd0d Advanced OCR with automatic fallback (EasyOCR \u2192 PaddleOCR \u2192 pytesseract)\n- \u2702\ufe0f Intelligent semantic chunking using LangChain\n- \ud83d\udce6 Production-ready batch processing with auto-retry\n- \ud83d\udcbe Multiple export formats (JSON, JSONL, CSV)\n- \ud83d\uddc4\ufe0f Direct ingestion into Qdrant vector store\n\n---\n\n## \u2728 Features\n\n### \ud83d\udcc4 Universal Document Processing\n- **Supported formats**: PDF, DOCX, ODT, TXT, HTML, Markdown, Images (JPEG, PNG)\n- **Smart OCR cascade**:\n  1. EasyOCR (best quality, multi-language)\n  2. PaddleOCR (fast, good for complex layouts)\n  3. pytesseract (fallback, most tolerant)\n- **Quality detection**: Automatically rejects unreadable documents\n- **Multi-language**: French, English, German, Spanish, Italian, Portuguese, and more\n\n### \u2702\ufe0f Intelligent Chunking\n- **Semantic chunking**: Context-aware text splitting using LangChain RecursiveCharacterTextSplitter\n- **Multiple strategies**:\n  - `semantic` - Smart splitting by meaning (default)\n  - `sentence` - Split by sentences\n  - `token` - Fixed token-based splitting\n- **Configurable**: Token limits (50-2000), overlap (0-500), model selection\n- **Rich metadata**: Source file, chunk index, token count, strategy, timestamps\n\n### \ud83d\udd04 Production-Ready Batch Processing\n- **Automatic retry**: Up to 3 attempts with exponential backoff (1s, 2s, 4s...)\n- **Interactive error handling**:\n  - `interactive` - Prompt user on each error (default)\n  - `auto-continue` - Continue on errors (CI/CD mode)\n  - `auto-stop` - Stop on first error (validation mode)\n  - `auto-skip` - Skip failed files automatically\n- **Complete history**: Every run saved to `~/.atlasrag/history/`\n- **Retry capability**: `ragctl retry` to rerun failed files only\n- **Per-file output**: One chunk file per document for better traceability\n\n### \ud83d\udcbe Flexible Export & Storage\n- **Export formats**: JSON, JSONL (streaming), CSV (Excel-compatible)\n- **Vector store integration**: Direct ingestion into Qdrant\n- **No database required**: Pure file-based export for easy sharing\n\n### \u2699\ufe0f Configuration System\n- **Hierarchical config**: CLI flags > Environment variables > YAML file > Defaults\n- **Example config**: `config.example.yml` with detailed documentation\n- **Easy customization**: Override any setting via command line\n\n---\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n#### From PyPI (Recommended)\n\n```bash\n# Install from PyPI\npip install ragctl\n\n# Verify installation\nragctl --version\n```\n\n#### From Source\n\n```bash\n# Clone repository\ngit clone git@github.com:horiz-data/ragstudio.git\ncd ragstudio\n\n# Install with pip\npip install -e .\n\n# Verify installation\nragctl --version\n```\n\n### Basic Usage\n\n```bash\n# Process a single document\nragctl chunk document.pdf --show\n\n# Process with advanced OCR for scanned documents\nragctl chunk scanned.pdf --advanced-ocr -o chunks.json\n\n# Batch process a folder\nragctl batch ./documents --output ./chunks/\n\n# Batch with auto-retry for CI/CD\nragctl batch ./documents --output ./chunks/ --auto-continue\n```\n\n---\n\n## \ud83d\udca1 Usage Examples\n\n### Single Document Processing\n\n```bash\n# Simple text file\nragctl chunk document.txt --show\n\n# PDF with semantic chunking (default)\nragctl chunk report.pdf -o report_chunks.json\n\n# Scanned image with OCR\nragctl chunk contract.jpeg --advanced-ocr --show\n\n# Custom chunking parameters\nragctl chunk document.pdf \\\n  --strategy semantic \\\n  --max-tokens 500 \\\n  --overlap 100 \\\n  -o output.jsonl\n```\n\n### Batch Processing\n\n```bash\n# Process all files in a directory\nragctl batch ./documents --output ./chunks/\n\n# Process only PDFs recursively\nragctl batch ./documents \\\n  --pattern \"*.pdf\" \\\n  --recursive \\\n  --output ./chunks/\n\n# CI/CD mode - continue on errors\nragctl batch ./documents \\\n  --output ./chunks/ \\\n  --auto-continue \\\n  --save-history\n\n# Per-file output (default):\n# chunks/\n# \u251c\u2500\u2500 doc1_chunks.jsonl  (25 chunks)\n# \u251c\u2500\u2500 doc2_chunks.jsonl  (42 chunks)\n# \u2514\u2500\u2500 doc3_chunks.jsonl  (18 chunks)\n\n# Single-file output (all chunks combined):\nragctl batch ./documents \\\n  --output ./all_chunks.jsonl \\\n  --single-file\n```\n\n### Retry Failed Files\n\n```bash\n# Show last failed run\nragctl retry --show\n\n# Retry all failed files from last run\nragctl retry\n\n# Retry specific run by ID\nragctl retry run_20251028_133403\n```\n\n### Vector Store Integration\n\n```bash\n# Ingest chunks into Qdrant\nragctl ingest chunks.jsonl \\\n  --collection my-docs \\\n  --url http://localhost:6333\n\n# Get system info\nragctl info\n```\n\n### Evaluate Chunking Quality\n\n```bash\n# Evaluate chunking strategy\nragctl eval document.pdf \\\n  --strategies semantic sentence token \\\n  --metrics coverage overlap coherence\n\n# Compare strategies with visualization\nragctl eval document.pdf --compare --output eval_results.json\n```\n\n---\n\n## \ud83d\udcda Documentation\n\n| Document | Description |\n|----------|-------------|\n| **[Getting Started](docs/getting-started.md)** | Installation and first steps |\n| **[CLI Guide](docs/cli-guide.md)** | Complete command reference |\n| **[Security](docs/security/)** | Security features and best practices |\n| **[Full Documentation](docs/)** | Complete documentation index |\n\n---\n\n## \u2699\ufe0f Configuration\n\nCreate `~/.atlasrag/config.yml` or use CLI flags:\n\n```yaml\n# OCR settings\nocr:\n  use_advanced_ocr: false\n  enable_fallback: true\n\n# Chunking settings\nchunking:\n  strategy: semantic\n  max_tokens: 400\n  overlap: 50\n\n# Output settings\noutput:\n  format: jsonl\n  include_metadata: true\n  pretty_print: true\n```\n\n**Configuration hierarchy**: CLI flags > Environment variables > YAML config > Defaults\n\n---\n\n## \ud83e\uddea Testing\n\n```bash\n# Run all tests\nmake test\n\n# Run CLI tests\nmake test-cli\n\n# Quick validation\nragctl --version\nragctl chunk tests/data/sample.txt --show\n```\n\n**Test Coverage**: 129 tests, 96% coverage\n\n---\n\n## \ud83d\udcca Performance\n\n### Processing Speed\n- **Text documents**: ~100-200 docs/minute\n- **PDFs with OCR**: ~5-10 docs/minute (depends on page count)\n- **Batch processing**: Parallel-ready with retry mechanism\n\n### Quality Metrics\n- **OCR accuracy**: 95%+ with EasyOCR on clear scans\n- **Chunk quality**: 90% readability threshold enforced\n- **Semantic coherence**: LangChain's RecursiveCharacterTextSplitter optimized for context\n\n---\n\n## \ud83d\udee0\ufe0f CLI Commands\n\n| Command | Description |\n|---------|-------------|\n| `ragctl chunk` | Process a single document |\n| `ragctl batch` | Batch process multiple files |\n| `ragctl retry` | Retry failed files from history |\n| `ragctl ingest` | Ingest chunks into Qdrant |\n| `ragctl eval` | Evaluate chunking quality |\n| `ragctl info` | System information |\n\nRun `ragctl COMMAND --help` for detailed options.\n\n---\n\n## \ud83d\udc1b Troubleshooting\n\n### Common Issues\n\n**NumPy incompatibility**\n```bash\n# For OCR support, use NumPy 1.x\npip install \"numpy<2.0\"\n```\n\n**Missing system dependencies**\n```bash\n# Ubuntu/Debian\nsudo apt-get install tesseract-ocr poppler-utils\n\n# macOS\nbrew install tesseract poppler\n```\n\n**\"Document unreadable\" errors**\n- Try lowering quality threshold: `--ocr-threshold 0.2`\n- Use advanced OCR: `--advanced-ocr`\n- Check document is not corrupted\n\n**Import errors**\n```bash\n# Reinstall dependencies\npip install -e .\n```\n\nMore help: [Getting Started Guide](docs/getting-started.md#troubleshooting)\n\n---\n\n## \ud83d\udd27 Development\n\n```bash\n# Install dev dependencies\nmake install-dev\n\n# Format code\nmake format\n\n# Run linters\nmake lint\n\n# Install pre-commit hooks\nmake pre-commit-install\n\n# Run all CI checks\nmake ci-all\n```\n\n---\n\n## \ud83d\udcdd License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines (coming soon).\n\n---\n\n## \ud83d\udce7 Support\n\n- **Documentation**: [docs/](docs/)\n- **Issues**: [GitHub Issues](https://github.com/horiz-data/ragstudio/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/horiz-data/ragstudio/discussions)\n\n---\n\n## \ud83d\ude4f Acknowledgments\n\nBuilt with:\n- [LangChain](https://github.com/langchain-ai/langchain) - Text splitting and document loading\n- [EasyOCR](https://github.com/JaidedAI/EasyOCR) - OCR engine\n- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) - Alternative OCR engine\n- [Unstructured](https://github.com/Unstructured-IO/unstructured) - Document parsing\n- [Typer](https://github.com/tiangolo/typer) - CLI framework\n- [Rich](https://github.com/Textualize/rich) - Terminal formatting\n\n---\n\n**Version**: 0.1.2 | **Status**: Beta | **License**: MIT\n# Test\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "RAG Studio - Production-ready RAG toolkit with advanced OCR, semantic chunking, and intelligent document processing",
    "version": "0.1.3",
    "project_urls": {
        "Changelog": "https://github.com/horiz-data/ragstudio/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/horiz-data/ragstudio/tree/main/docs",
        "Homepage": "https://github.com/horiz-data/ragstudio",
        "Issues": "https://github.com/horiz-data/ragstudio/issues",
        "Repository": "https://github.com/horiz-data/ragstudio"
    },
    "split_keywords": [
        "rag",
        " document-processing",
        " ocr",
        " chunking",
        " nlp",
        " machine-learning",
        " embeddings",
        " semantic-search"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8aa1db784d275041a8be04430611304dc8e44c60b9db839f4d8120ea5ac7e5a2",
                "md5": "80fe24de313aa100dc976e30fd3a8a3f",
                "sha256": "679a676bba612c92233655c7136ee2e3108fb77f97d540b7e2599847f89086f7"
            },
            "downloads": -1,
            "filename": "ragctl-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "80fe24de313aa100dc976e30fd3a8a3f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 238051,
            "upload_time": "2025-10-30T10:41:18",
            "upload_time_iso_8601": "2025-10-30T10:41:18.659397Z",
            "url": "https://files.pythonhosted.org/packages/8a/a1/db784d275041a8be04430611304dc8e44c60b9db839f4d8120ea5ac7e5a2/ragctl-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "706bdaa081d1d0cd34ffba0b8c7efa72a6160797480022f7cd54c2c81f517a29",
                "md5": "ac00cf7376ecc735e6e36f55789ea15c",
                "sha256": "66fa9f87ca5236037c37576360bfd710915fa721ae4fd88101bc7b3393048a0a"
            },
            "downloads": -1,
            "filename": "ragctl-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "ac00cf7376ecc735e6e36f55789ea15c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 192770,
            "upload_time": "2025-10-30T10:41:20",
            "upload_time_iso_8601": "2025-10-30T10:41:20.429716Z",
            "url": "https://files.pythonhosted.org/packages/70/6b/daa081d1d0cd34ffba0b8c7efa72a6160797480022f7cd54c2c81f517a29/ragctl-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-30 10:41:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "horiz-data",
    "github_project": "ragstudio",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "ragctl"
}
        
Elapsed time: 2.48203s