tinbox

Name	tinbox JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	A CLI translation tool using LLMs for document translation
upload_time	2025-02-16 19:16:56
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	None
keywords	cli docx llm pdf translation
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # 🔄 Tinbox: Your Ultimate CLI Translation Tool

**Tinbox** is a robust command-line tool designed to tackle the challenges of translating large documents, especially PDFs, using Large Language Models (LLMs). Unlike other tools, Tinbox excels in handling extensive document sizes and navigates around model limitations related to size and copyright issues, ensuring seamless and efficient translations.

**Why Choose Tinbox?**
- **Handles Large Documents**: Efficiently processes large PDFs and other document types.
- **Overcomes Model Limitations**: Bypasses common model refusals due to size or copyright concerns.
- **No OCR Needed**: Directly translates PDFs using advanced multimodal models.
- **Smart Algorithms**: Achieve optimal translation results with our intelligent algorithms.
- **Local and Cloud Support**: Use models locally or in the cloud, depending on your preference.

**Quick Start Example:**
```bash
tinbox --to es document.pdf
```

## 🎯 The Problems Tinbox Solves

1. **PDF Translation Challenges**
   - Most tools require OCR, leading to formatting loss and errors
   - Tinbox uses multimodal models to directly understand PDFs as images

2. **Large Document Limitations**
   - Traditional tools often fail with large documents
   - Models frequently refuse or timeout on big files
   - Tinbox smartly splits and processes documents while maintaining context

3. **Model Refusal Issues**
   - Many models refuse translation tasks due to:
     - Copyright concerns
     - Document size limitations
     - Rate limiting
   - Tinbox's algorithms work around these limitations intelligently

4. **Quality and Consistency**
   - Smart algorithms ensure consistent translations across document sections
   - Maintains context between pages and segments
   - Repairs potential inconsistencies at section boundaries

![Tinbox Workflow](link_to_diagram.png)

🔍 **Key Highlights:**
- Translate PDFs without OCR using advanced AI models
- Handle documents of any size with smart splitting algorithms
- Work around common model limitations and refusals
- Track costs and performance with built-in benchmarking

## ✨ Features

### 📄 Smart Document Handling
- **PDFs**: Processed directly as images - no OCR needed!
- **Word (docx)**: Preserves formatting while translating
- **Text files**: Efficient processing for large files

### 🧠 Intelligent Translation
- **Smart Algorithms**:
  - Page-by-Page with Seam Repair (default for PDF)
  - Sliding Window for long text documents
  - Automatic context preservation between sections

### 🤖 Flexible Model Support
- Use powerful cloud models (GPT-4V, Claude 3.5 Sonnet)
- Run translations locally with Ollama
- Mix and match models for different tasks

### 🌐 Language Support
- Flexible source/target language specification using ISO 639-1 codes
- Common language aliases (e.g., 'en', 'zh', 'es')

6. 📊 **Benchmarking**  
   - Track overall translation time and token usage/cost
   - Compare algorithms or model providers side-by-side

## 🚀 Getting Started

### Quick Install

```bash
# Install base package
pip install tinbox

# For PDF support (recommended)
pip install tinbox[pdf]

# For Word document support
pip install tinbox[docx]

# Install everything
pip install tinbox[all]
```

### Basic Usage

1. **Translate a PDF to Spanish**
   ```bash
   tinbox --to es document.pdf
   ```

2. **Translate a Word document from Chinese to English**
   ```bash
   tinbox --from zh --to en document.docx
   ```

3. **Handle a large text file with custom settings**
   ```bash
   tinbox --to fr --algorithm sliding-window large_document.txt
   ```

### 💡 Tips for Best Results

1. **For Large Documents**
   - Use the sliding window algorithm: `--algorithm sliding-window`
   - Adjust window size if needed: `--window-size 3000`

2. **For PDFs**
   - The default page-by-page algorithm works best
   - No OCR needed - just point to your PDF!

3. **For Best Performance**
   - Use local models via Ollama for faster processing
   - Cloud models (GPT-4V, Claude) for highest quality

## 📖 Detailed Documentation

### Command-Line Options

#### Core Options
| Option              | Description                                           | Example                    |
|--------------------|-------------------------------------------------------|----------------------------|
| `--from, -f`       | Source language (auto-detect if not specified)        | `--from zh`               |
| `--to, -t`         | Target language (default: English)                    | `--to es`                 |
| `--model`          | Model to use for translation                          | `--model gpt-4v`          |
| `--output, -o`     | Output file (default: print to console)              | `--output translated.txt`  |

#### Algorithm Options
| Option              | Description                                           | Default                    |
|--------------------|-------------------------------------------------------|----------------------------|
| `--algorithm, -a`  | Translation algorithm (`page` or `sliding-window`)    | `page` for PDF            |
| `--window-size`    | Size of translation window                            | 2000 tokens               |
| `--overlap-size`   | Overlap between windows                               | 200 tokens                |

#### Output Format Options
| Option              | Description                                           | Example Output             |
|--------------------|-------------------------------------------------------|----------------------------|
| `--format, -F`     | Output format (text, json, markdown)                  | See examples below         |
| `--benchmark, -b`  | Include performance metrics                           | Translation time, costs    |

### Supported Languages

Common language codes (ISO 639-1):

| Code | Language    | Also Accepts |
|------|-------------|--------------|
| en   | English     | eng          |
| es   | Spanish     | spa          |
| zh   | Chinese     | chi, cmn     |
| fr   | French      | fra          |
| de   | German      | deu, ger     |
| ja   | Japanese    | jpn          |
| ko   | Korean      | kor          |
| ru   | Russian     | rus          |
| ar   | Arabic      | ara          |
| hi   | Hindi       | hin          |

### Output Format Examples

#### 1. Plain Text (Default)
```bash
tinbox translate document.pdf --to es
# Output: Translated text...
```

#### 2. JSON Output
```bash
tinbox translate document.pdf --to es --format json
```

Example response:
```json
{
  "metadata": {
    "source_lang": "en",
    "target_lang": "es",
    "model": "claude-3-sonnet",
    "algorithm": "page"
  },
  "result": {
    "text": "Translated text...",
    "tokens_used": 1500,
    "cost": 0.045,
    "time_taken": 12.5
  }
}
```

#### 3. Markdown Report
```bash
tinbox translate document.pdf --to es --format markdown
```

### Advanced Usage

1. **Handling Very Large Documents**
   ```bash
   tinbox --to es --algorithm sliding-window \
          --window-size 3000 --overlap-size 300 \
          large_document.pdf
   ```

2. **Using Local Models**
   ```bash
   tinbox --to fr --model ollama:mistral-small document.txt
   ```

3. **Benchmarking Different Models**
   ```bash
   tinbox --to de --benchmark --model gpt-4v document.pdf
   ```

## 🛠 Project Structure

```
tinbox/
├── src/
│   └── tinbox/
│       ├── cli.py                 # Command-line interface
│       ├── core/                  # Core functionality
│       │   ├── cost.py           # Cost tracking
│       │   ├── processor/        # Document processors
│       │   └── translation/      # Translation algorithms
│       └── utils/                # Utilities
└── tests/                        # Test suite
```

## 🔜 Future Plans

1. **Enhanced Output Formats**
   - PDF output with original formatting
   - Word document export
   - HTML with parallel text

2. **Advanced Features**
   - AI-powered section detection
   - Custom terminology support
   - Interactive translation review
   - Domain-specific model fine-tuning

3. **Performance Improvements**
   - Parallel processing
   - Better caching
   - Reduced API costs

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "tinbox",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "cli, docx, llm, pdf, translation",
    "author": null,
    "author_email": "strickvl <your.email@example.com>",
    "download_url": "https://files.pythonhosted.org/packages/54/d0/104db21e1fe4e66cdda6a14e435b3d9df168227865818044ea4437ec5ef0/tinbox-0.1.0.tar.gz",
    "platform": null,
    "description": "# \ud83d\udd04 Tinbox: Your Ultimate CLI Translation Tool\n\n**Tinbox** is a robust command-line tool designed to tackle the challenges of translating large documents, especially PDFs, using Large Language Models (LLMs). Unlike other tools, Tinbox excels in handling extensive document sizes and navigates around model limitations related to size and copyright issues, ensuring seamless and efficient translations.\n\n**Why Choose Tinbox?**\n- **Handles Large Documents**: Efficiently processes large PDFs and other document types.\n- **Overcomes Model Limitations**: Bypasses common model refusals due to size or copyright concerns.\n- **No OCR Needed**: Directly translates PDFs using advanced multimodal models.\n- **Smart Algorithms**: Achieve optimal translation results with our intelligent algorithms.\n- **Local and Cloud Support**: Use models locally or in the cloud, depending on your preference.\n\n**Quick Start Example:**\n```bash\ntinbox --to es document.pdf\n```\n\n## \ud83c\udfaf The Problems Tinbox Solves\n\n1. **PDF Translation Challenges**\n   - Most tools require OCR, leading to formatting loss and errors\n   - Tinbox uses multimodal models to directly understand PDFs as images\n\n2. **Large Document Limitations**\n   - Traditional tools often fail with large documents\n   - Models frequently refuse or timeout on big files\n   - Tinbox smartly splits and processes documents while maintaining context\n\n3. **Model Refusal Issues**\n   - Many models refuse translation tasks due to:\n     - Copyright concerns\n     - Document size limitations\n     - Rate limiting\n   - Tinbox's algorithms work around these limitations intelligently\n\n4. **Quality and Consistency**\n   - Smart algorithms ensure consistent translations across document sections\n   - Maintains context between pages and segments\n   - Repairs potential inconsistencies at section boundaries\n\n![Tinbox Workflow](link_to_diagram.png)\n\n\ud83d\udd0d **Key Highlights:**\n- Translate PDFs without OCR using advanced AI models\n- Handle documents of any size with smart splitting algorithms\n- Work around common model limitations and refusals\n- Track costs and performance with built-in benchmarking\n\n## \u2728 Features\n\n### \ud83d\udcc4 Smart Document Handling\n- **PDFs**: Processed directly as images - no OCR needed!\n- **Word (docx)**: Preserves formatting while translating\n- **Text files**: Efficient processing for large files\n\n### \ud83e\udde0 Intelligent Translation\n- **Smart Algorithms**:\n  - Page-by-Page with Seam Repair (default for PDF)\n  - Sliding Window for long text documents\n  - Automatic context preservation between sections\n\n### \ud83e\udd16 Flexible Model Support\n- Use powerful cloud models (GPT-4V, Claude 3.5 Sonnet)\n- Run translations locally with Ollama\n- Mix and match models for different tasks\n\n### \ud83c\udf10 Language Support\n- Flexible source/target language specification using ISO 639-1 codes\n- Common language aliases (e.g., 'en', 'zh', 'es')\n\n6. \ud83d\udcca **Benchmarking**  \n   - Track overall translation time and token usage/cost\n   - Compare algorithms or model providers side-by-side\n\n## \ud83d\ude80 Getting Started\n\n### Quick Install\n\n```bash\n# Install base package\npip install tinbox\n\n# For PDF support (recommended)\npip install tinbox[pdf]\n\n# For Word document support\npip install tinbox[docx]\n\n# Install everything\npip install tinbox[all]\n```\n\n### Basic Usage\n\n1. **Translate a PDF to Spanish**\n   ```bash\n   tinbox --to es document.pdf\n   ```\n\n2. **Translate a Word document from Chinese to English**\n   ```bash\n   tinbox --from zh --to en document.docx\n   ```\n\n3. **Handle a large text file with custom settings**\n   ```bash\n   tinbox --to fr --algorithm sliding-window large_document.txt\n   ```\n\n### \ud83d\udca1 Tips for Best Results\n\n1. **For Large Documents**\n   - Use the sliding window algorithm: `--algorithm sliding-window`\n   - Adjust window size if needed: `--window-size 3000`\n\n2. **For PDFs**\n   - The default page-by-page algorithm works best\n   - No OCR needed - just point to your PDF!\n\n3. **For Best Performance**\n   - Use local models via Ollama for faster processing\n   - Cloud models (GPT-4V, Claude) for highest quality\n\n## \ud83d\udcd6 Detailed Documentation\n\n### Command-Line Options\n\n#### Core Options\n| Option              | Description                                           | Example                    |\n|--------------------|-------------------------------------------------------|----------------------------|\n| `--from, -f`       | Source language (auto-detect if not specified)        | `--from zh`               |\n| `--to, -t`         | Target language (default: English)                    | `--to es`                 |\n| `--model`          | Model to use for translation                          | `--model gpt-4v`          |\n| `--output, -o`     | Output file (default: print to console)              | `--output translated.txt`  |\n\n#### Algorithm Options\n| Option              | Description                                           | Default                    |\n|--------------------|-------------------------------------------------------|----------------------------|\n| `--algorithm, -a`  | Translation algorithm (`page` or `sliding-window`)    | `page` for PDF            |\n| `--window-size`    | Size of translation window                            | 2000 tokens               |\n| `--overlap-size`   | Overlap between windows                               | 200 tokens                |\n\n#### Output Format Options\n| Option              | Description                                           | Example Output             |\n|--------------------|-------------------------------------------------------|----------------------------|\n| `--format, -F`     | Output format (text, json, markdown)                  | See examples below         |\n| `--benchmark, -b`  | Include performance metrics                           | Translation time, costs    |\n\n### Supported Languages\n\nCommon language codes (ISO 639-1):\n\n| Code | Language    | Also Accepts |\n|------|-------------|--------------|\n| en   | English     | eng          |\n| es   | Spanish     | spa          |\n| zh   | Chinese     | chi, cmn     |\n| fr   | French      | fra          |\n| de   | German      | deu, ger     |\n| ja   | Japanese    | jpn          |\n| ko   | Korean      | kor          |\n| ru   | Russian     | rus          |\n| ar   | Arabic      | ara          |\n| hi   | Hindi       | hin          |\n\n### Output Format Examples\n\n#### 1. Plain Text (Default)\n```bash\ntinbox translate document.pdf --to es\n# Output: Translated text...\n```\n\n#### 2. JSON Output\n```bash\ntinbox translate document.pdf --to es --format json\n```\n\nExample response:\n```json\n{\n  \"metadata\": {\n    \"source_lang\": \"en\",\n    \"target_lang\": \"es\",\n    \"model\": \"claude-3-sonnet\",\n    \"algorithm\": \"page\"\n  },\n  \"result\": {\n    \"text\": \"Translated text...\",\n    \"tokens_used\": 1500,\n    \"cost\": 0.045,\n    \"time_taken\": 12.5\n  }\n}\n```\n\n#### 3. Markdown Report\n```bash\ntinbox translate document.pdf --to es --format markdown\n```\n\n### Advanced Usage\n\n1. **Handling Very Large Documents**\n   ```bash\n   tinbox --to es --algorithm sliding-window \\\n          --window-size 3000 --overlap-size 300 \\\n          large_document.pdf\n   ```\n\n2. **Using Local Models**\n   ```bash\n   tinbox --to fr --model ollama:mistral-small document.txt\n   ```\n\n3. **Benchmarking Different Models**\n   ```bash\n   tinbox --to de --benchmark --model gpt-4v document.pdf\n   ```\n\n## \ud83d\udee0 Project Structure\n\n```\ntinbox/\n\u251c\u2500\u2500 src/\n\u2502   \u2514\u2500\u2500 tinbox/\n\u2502       \u251c\u2500\u2500 cli.py                 # Command-line interface\n\u2502       \u251c\u2500\u2500 core/                  # Core functionality\n\u2502       \u2502   \u251c\u2500\u2500 cost.py           # Cost tracking\n\u2502       \u2502   \u251c\u2500\u2500 processor/        # Document processors\n\u2502       \u2502   \u2514\u2500\u2500 translation/      # Translation algorithms\n\u2502       \u2514\u2500\u2500 utils/                # Utilities\n\u2514\u2500\u2500 tests/                        # Test suite\n```\n\n## \ud83d\udd1c Future Plans\n\n1. **Enhanced Output Formats**\n   - PDF output with original formatting\n   - Word document export\n   - HTML with parallel text\n\n2. **Advanced Features**\n   - AI-powered section detection\n   - Custom terminology support\n   - Interactive translation review\n   - Domain-specific model fine-tuning\n\n3. **Performance Improvements**\n   - Parallel processing\n   - Better caching\n   - Reduced API costs\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A CLI translation tool using LLMs for document translation",
    "version": "0.1.0",
    "project_urls": null,
    "split_keywords": [
        "cli",
        " docx",
        " llm",
        " pdf",
        " translation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bcc26e6ea1d27e1cd9aac79b4fc3b8a5eaf014c94dc5639e88053c5713584caa",
                "md5": "a982648191089f1f990dce3bc234ca0d",
                "sha256": "89e18df20e4f87104226024a728682db8e82ddd5961fc8a28d1596978b0d1dc2"
            },
            "downloads": -1,
            "filename": "tinbox-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a982648191089f1f990dce3bc234ca0d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 38371,
            "upload_time": "2025-02-16T19:16:49",
            "upload_time_iso_8601": "2025-02-16T19:16:49.413238Z",
            "url": "https://files.pythonhosted.org/packages/bc/c2/6e6ea1d27e1cd9aac79b4fc3b8a5eaf014c94dc5639e88053c5713584caa/tinbox-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "54d0104db21e1fe4e66cdda6a14e435b3d9df168227865818044ea4437ec5ef0",
                "md5": "c199b74dbea99c7e236f14da2e567dc7",
                "sha256": "08ce0331860edb09731966bc1aa4a8a42a9de9f3511cfdaca56c042d1b601236"
            },
            "downloads": -1,
            "filename": "tinbox-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "c199b74dbea99c7e236f14da2e567dc7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 352030,
            "upload_time": "2025-02-16T19:16:56",
            "upload_time_iso_8601": "2025-02-16T19:16:56.337832Z",
            "url": "https://files.pythonhosted.org/packages/54/d0/104db21e1fe4e66cdda6a14e435b3d9df168227865818044ea4437ec5ef0/tinbox-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-16 19:16:56",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "tinbox"
}

None