comiq


Namecomiq JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryComic-Focused Hybrid OCR Python Library
upload_time2025-07-10 09:25:23
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords
VCS
bugtrack_url
requirements openai python-dotenv pydantic easyocr paddleocr paddlepaddle
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ComiQ: Comic-Focused Hybrid OCR Library

ComiQ is an advanced Optical Character Recognition (OCR) library specifically designed for comics. It combines traditional OCR engines with a powerful AI model to provide accurate text detection and grouping in comic images.

For examples of ComiQ's capabilities, visit the [examples directory](examples/ReadME.md).

## Features

- **Hybrid OCR:** Combines multiple OCR engines for improved accuracy and robustness.
- **Extensible:** Register your own custom OCR engines.
- **AI-Powered Grouping:** Uses an AI model to intelligently group detected text into coherent bubbles and captions.
- **Configurable:** Easily configure the AI model, OCR engines, and other parameters.
- **Flexible:** Supports both file paths and in-memory image arrays.
- **Environment-Friendly:** Automatically loads your API key from environment variables.

## Installation

You can install ComiQ and its dependencies with a single pip command:

```bash
pip install comiq
```

This will install the CPU-versions of the required OCR libraries. For GPU acceleration, please see the section below.

### Optional: GPU Acceleration

For significantly better performance, you can install the GPU-enabled versions of PyTorch and PaddlePaddle. It is highly recommended to do this in a clean virtual environment.

**1. Uninstall CPU Versions (if necessary):**
If you have already installed the CPU versions, uninstall them first:
```bash
pip uninstall torch paddlepaddle
```

**2. Install GPU-Enabled PyTorch:**
Visit the official [PyTorch website](https://pytorch.org/get-started/locally/) and use the command generator to find the correct installation command for your specific system (OS, package manager, and CUDA version).

*Example for Windows with an NVIDIA GPU (CUDA 12.1):*
```bash
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```

**3. Install GPU-Enabled PaddlePaddle:**
To ensure compatibility, install a version below 3.0:
```bash
pip install "paddlepaddle-gpu<3.0"
```

After installing the GPU versions, the OCR engines in ComiQ will automatically use them.

## Quick Start

1.  **Set your API Key:** ComiQ requires a Gemini API key. You can either pass it to the `ComiQ` constructor or set it as an environment variable named `GEMINI_API_KEY`.

    You can create a `.env` file in your project's root directory:
    ```
    GEMINI_API_KEY="your-api-key-here"
    ```

2.  **Use the `ComiQ` class:**

```python
from comiq import ComiQ
import cv2

# Initialize ComiQ. It will automatically load the API key from the .env file.
comiq = ComiQ()

# Process an image from a file path
image_path = "path/to/your/comic/image.jpg"
data = comiq.extract(image_path)

# Or process an image from a numpy array
image_array = cv2.imread(image_path)
data_from_array = comiq.extract(image_array)

# 'data' now contains a list of text bubbles with their text and locations.
print(data)
```

## API Reference

### `ComiQ(api_key: str = None, model_name: str = "gemini-1.5-flash", **kwargs)`

Initializes the ComiQ instance.

- **`api_key` (str, optional):** Your Gemini API key. If not provided, it will be loaded from the `GEMINI_API_KEY` environment variable.
- **`model_name` (str, optional):** The name of the AI model to use. Defaults to `"gemini-1.5-flash"`.
- **`**kwargs`:** Additional configuration for the OCR and AI models. See "Custom Configuration" for more details.

### `extract(image: Union[str, 'numpy.ndarray'], ocr: Union[str, List[str]] = "paddleocr")`

Extracts and groups text from the given comic image.

- **`image` (str or numpy.ndarray):** The path to the image file or the image as a NumPy array.
- **`ocr` (str or list, optional):** The OCR engine(s) to use. Can be `"paddleocr"`, `"easyocr"`, or a list like `["paddleocr", "easyocr"]`. Defaults to `"paddleocr"`.

**Returns:**
- `dict`: A dictionary containing the processed data, including text bubbles, their locations, and other metadata.

### `register_ocr_engine(name: str, engine: Callable)`
Registers a new OCR engine.

- **`name` (str):** The name to identify the engine.
- **`engine` (Callable):** The function that implements the engine. See "Advanced Usage" for details.

### `get_available_ocr_engines() -> List[str]`
Returns a list of all registered OCR engine names.

## Advanced Usage: Registering a Custom OCR Engine

You can extend ComiQ by adding your own OCR engine. Your custom engine must be a function that adheres to the following contract:

- **Input:** It must accept two arguments:
    1.  `image` (a `numpy.ndarray` in BGR format).
    2.  `**kwargs` (a dictionary for any configuration your engine needs).
- **Output:** It must return a list of dictionaries, where each dictionary represents a detected text box and has two keys:
    1.  `"text_box"`: A list of four integers `[ymin, xmin, ymax, xmax]`.
    2.  `"text"`: The detected text as a string.

**Example:**

Here is how you could wrap the `pytesseract` library as a custom engine:

```python
import comiq
import pytesseract
import cv2
import numpy as np

# 1. Define the custom engine function
def pytesseract_engine(image: np.ndarray, **kwargs) -> list:
    """
    A custom OCR engine using pytesseract.
    It expects a 'config' kwarg, e.g., pytesseract_engine(image, config="--psm 6")
    """
    # pytesseract works with RGB images
    rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    # Use pytesseract to get structured data
    data = pytesseract.image_to_data(
        rgb_image,
        output_type=pytesseract.Output.DICT,
        config=kwargs.get("config", "")
    )
    
    results = []
    for i in range(len(data['text'])):
        if int(data['conf'][i]) > 60: # Filter by confidence
            (x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])
            text = data['text'][i]
            if text.strip():
                results.append({
                    "text_box": [y, x, y + h, x + w],
                    "text": text
                })
    return results

# 2. Register the new engine with ComiQ
comiq.register_ocr_engine("pytesseract", pytesseract_engine)

# 3. Now you can use it!
my_comiq = comiq.ComiQ()
tesseract_config = {"ocr": {"pytesseract": {"config": "--psm 6"}}}
data = my_comiq.extract(
    "path/to/image.png",
    ocr="pytesseract",
    **tesseract_config
)

print(comiq.get_available_ocr_engines())
# Output: ['paddleocr', 'easyocr', 'pytesseract']
```

## Custom Configuration

You can pass additional configuration to the `ComiQ` constructor to customize the behavior of the OCR and AI models.

```python
from comiq import ComiQ

# Custom configuration
config = {
    "ocr": {
        "paddleocr": {
            "lang": "japan", # Use Japanese language model for PaddleOCR
        },
        "easyocr": {
            "reader": {"gpu": False}, # Disable GPU for EasyOCR
        }
    },
    "ai": {
        "temperature": 0.5, # Make the AI model more creative
    }
}

comiq = ComiQ(model_name="gemini-1.5-pro", **config)

data = comiq.extract("path/to/manga.jpg", ocr="paddleocr")
```

## Contributing

Contributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for more details.

## License

ComiQ is licensed under the [MIT License](LICENSE).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "comiq",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "MoltenSteel <stonesteel27@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/f1/e8/bdc709bcf19adc2507f1c329bbded5712558100d839c18ad628312b5b129/comiq-0.1.2.tar.gz",
    "platform": null,
    "description": "# ComiQ: Comic-Focused Hybrid OCR Library\r\n\r\nComiQ is an advanced Optical Character Recognition (OCR) library specifically designed for comics. It combines traditional OCR engines with a powerful AI model to provide accurate text detection and grouping in comic images.\r\n\r\nFor examples of ComiQ's capabilities, visit the [examples directory](examples/ReadME.md).\r\n\r\n## Features\r\n\r\n- **Hybrid OCR:** Combines multiple OCR engines for improved accuracy and robustness.\r\n- **Extensible:** Register your own custom OCR engines.\r\n- **AI-Powered Grouping:** Uses an AI model to intelligently group detected text into coherent bubbles and captions.\r\n- **Configurable:** Easily configure the AI model, OCR engines, and other parameters.\r\n- **Flexible:** Supports both file paths and in-memory image arrays.\r\n- **Environment-Friendly:** Automatically loads your API key from environment variables.\r\n\r\n## Installation\r\n\r\nYou can install ComiQ and its dependencies with a single pip command:\r\n\r\n```bash\r\npip install comiq\r\n```\r\n\r\nThis will install the CPU-versions of the required OCR libraries. For GPU acceleration, please see the section below.\r\n\r\n### Optional: GPU Acceleration\r\n\r\nFor significantly better performance, you can install the GPU-enabled versions of PyTorch and PaddlePaddle. It is highly recommended to do this in a clean virtual environment.\r\n\r\n**1. Uninstall CPU Versions (if necessary):**\r\nIf you have already installed the CPU versions, uninstall them first:\r\n```bash\r\npip uninstall torch paddlepaddle\r\n```\r\n\r\n**2. Install GPU-Enabled PyTorch:**\r\nVisit the official [PyTorch website](https://pytorch.org/get-started/locally/) and use the command generator to find the correct installation command for your specific system (OS, package manager, and CUDA version).\r\n\r\n*Example for Windows with an NVIDIA GPU (CUDA 12.1):*\r\n```bash\r\npip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121\r\n```\r\n\r\n**3. Install GPU-Enabled PaddlePaddle:**\r\nTo ensure compatibility, install a version below 3.0:\r\n```bash\r\npip install \"paddlepaddle-gpu<3.0\"\r\n```\r\n\r\nAfter installing the GPU versions, the OCR engines in ComiQ will automatically use them.\r\n\r\n## Quick Start\r\n\r\n1.  **Set your API Key:** ComiQ requires a Gemini API key. You can either pass it to the `ComiQ` constructor or set it as an environment variable named `GEMINI_API_KEY`.\r\n\r\n    You can create a `.env` file in your project's root directory:\r\n    ```\r\n    GEMINI_API_KEY=\"your-api-key-here\"\r\n    ```\r\n\r\n2.  **Use the `ComiQ` class:**\r\n\r\n```python\r\nfrom comiq import ComiQ\r\nimport cv2\r\n\r\n# Initialize ComiQ. It will automatically load the API key from the .env file.\r\ncomiq = ComiQ()\r\n\r\n# Process an image from a file path\r\nimage_path = \"path/to/your/comic/image.jpg\"\r\ndata = comiq.extract(image_path)\r\n\r\n# Or process an image from a numpy array\r\nimage_array = cv2.imread(image_path)\r\ndata_from_array = comiq.extract(image_array)\r\n\r\n# 'data' now contains a list of text bubbles with their text and locations.\r\nprint(data)\r\n```\r\n\r\n## API Reference\r\n\r\n### `ComiQ(api_key: str = None, model_name: str = \"gemini-1.5-flash\", **kwargs)`\r\n\r\nInitializes the ComiQ instance.\r\n\r\n- **`api_key` (str, optional):** Your Gemini API key. If not provided, it will be loaded from the `GEMINI_API_KEY` environment variable.\r\n- **`model_name` (str, optional):** The name of the AI model to use. Defaults to `\"gemini-1.5-flash\"`.\r\n- **`**kwargs`:** Additional configuration for the OCR and AI models. See \"Custom Configuration\" for more details.\r\n\r\n### `extract(image: Union[str, 'numpy.ndarray'], ocr: Union[str, List[str]] = \"paddleocr\")`\r\n\r\nExtracts and groups text from the given comic image.\r\n\r\n- **`image` (str or numpy.ndarray):** The path to the image file or the image as a NumPy array.\r\n- **`ocr` (str or list, optional):** The OCR engine(s) to use. Can be `\"paddleocr\"`, `\"easyocr\"`, or a list like `[\"paddleocr\", \"easyocr\"]`. Defaults to `\"paddleocr\"`.\r\n\r\n**Returns:**\r\n- `dict`: A dictionary containing the processed data, including text bubbles, their locations, and other metadata.\r\n\r\n### `register_ocr_engine(name: str, engine: Callable)`\r\nRegisters a new OCR engine.\r\n\r\n- **`name` (str):** The name to identify the engine.\r\n- **`engine` (Callable):** The function that implements the engine. See \"Advanced Usage\" for details.\r\n\r\n### `get_available_ocr_engines() -> List[str]`\r\nReturns a list of all registered OCR engine names.\r\n\r\n## Advanced Usage: Registering a Custom OCR Engine\r\n\r\nYou can extend ComiQ by adding your own OCR engine. Your custom engine must be a function that adheres to the following contract:\r\n\r\n- **Input:** It must accept two arguments:\r\n    1.  `image` (a `numpy.ndarray` in BGR format).\r\n    2.  `**kwargs` (a dictionary for any configuration your engine needs).\r\n- **Output:** It must return a list of dictionaries, where each dictionary represents a detected text box and has two keys:\r\n    1.  `\"text_box\"`: A list of four integers `[ymin, xmin, ymax, xmax]`.\r\n    2.  `\"text\"`: The detected text as a string.\r\n\r\n**Example:**\r\n\r\nHere is how you could wrap the `pytesseract` library as a custom engine:\r\n\r\n```python\r\nimport comiq\r\nimport pytesseract\r\nimport cv2\r\nimport numpy as np\r\n\r\n# 1. Define the custom engine function\r\ndef pytesseract_engine(image: np.ndarray, **kwargs) -> list:\r\n    \"\"\"\r\n    A custom OCR engine using pytesseract.\r\n    It expects a 'config' kwarg, e.g., pytesseract_engine(image, config=\"--psm 6\")\r\n    \"\"\"\r\n    # pytesseract works with RGB images\r\n    rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\r\n    \r\n    # Use pytesseract to get structured data\r\n    data = pytesseract.image_to_data(\r\n        rgb_image,\r\n        output_type=pytesseract.Output.DICT,\r\n        config=kwargs.get(\"config\", \"\")\r\n    )\r\n    \r\n    results = []\r\n    for i in range(len(data['text'])):\r\n        if int(data['conf'][i]) > 60: # Filter by confidence\r\n            (x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])\r\n            text = data['text'][i]\r\n            if text.strip():\r\n                results.append({\r\n                    \"text_box\": [y, x, y + h, x + w],\r\n                    \"text\": text\r\n                })\r\n    return results\r\n\r\n# 2. Register the new engine with ComiQ\r\ncomiq.register_ocr_engine(\"pytesseract\", pytesseract_engine)\r\n\r\n# 3. Now you can use it!\r\nmy_comiq = comiq.ComiQ()\r\ntesseract_config = {\"ocr\": {\"pytesseract\": {\"config\": \"--psm 6\"}}}\r\ndata = my_comiq.extract(\r\n    \"path/to/image.png\",\r\n    ocr=\"pytesseract\",\r\n    **tesseract_config\r\n)\r\n\r\nprint(comiq.get_available_ocr_engines())\r\n# Output: ['paddleocr', 'easyocr', 'pytesseract']\r\n```\r\n\r\n## Custom Configuration\r\n\r\nYou can pass additional configuration to the `ComiQ` constructor to customize the behavior of the OCR and AI models.\r\n\r\n```python\r\nfrom comiq import ComiQ\r\n\r\n# Custom configuration\r\nconfig = {\r\n    \"ocr\": {\r\n        \"paddleocr\": {\r\n            \"lang\": \"japan\", # Use Japanese language model for PaddleOCR\r\n        },\r\n        \"easyocr\": {\r\n            \"reader\": {\"gpu\": False}, # Disable GPU for EasyOCR\r\n        }\r\n    },\r\n    \"ai\": {\r\n        \"temperature\": 0.5, # Make the AI model more creative\r\n    }\r\n}\r\n\r\ncomiq = ComiQ(model_name=\"gemini-1.5-pro\", **config)\r\n\r\ndata = comiq.extract(\"path/to/manga.jpg\", ocr=\"paddleocr\")\r\n```\r\n\r\n## Contributing\r\n\r\nContributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for more details.\r\n\r\n## License\r\n\r\nComiQ is licensed under the [MIT License](LICENSE).\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Comic-Focused Hybrid OCR Python Library",
    "version": "0.1.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/StoneSteel27/ComiQ/issues",
        "Homepage": "https://github.com/StoneSteel27/ComiQ"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "70c88a47e476685ffc4dcaa30067d89ee888921e21aa477b32306e310439a709",
                "md5": "d49f1cf15406324c13864f77cca01495",
                "sha256": "613354406321ca0e20a6d1ce0833c8a06c95ddce7abb4143341500e4576c59f1"
            },
            "downloads": -1,
            "filename": "comiq-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d49f1cf15406324c13864f77cca01495",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 12872,
            "upload_time": "2025-07-10T09:24:50",
            "upload_time_iso_8601": "2025-07-10T09:24:50.712256Z",
            "url": "https://files.pythonhosted.org/packages/70/c8/8a47e476685ffc4dcaa30067d89ee888921e21aa477b32306e310439a709/comiq-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f1e8bdc709bcf19adc2507f1c329bbded5712558100d839c18ad628312b5b129",
                "md5": "03228924a8d64a2f7099f3916fe64317",
                "sha256": "f15dfd4d2e0d9cd5979c9f2e4ce2bd0265e6a7eae331dd04627200d44f92db4f"
            },
            "downloads": -1,
            "filename": "comiq-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "03228924a8d64a2f7099f3916fe64317",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 7672216,
            "upload_time": "2025-07-10T09:25:23",
            "upload_time_iso_8601": "2025-07-10T09:25:23.243617Z",
            "url": "https://files.pythonhosted.org/packages/f1/e8/bdc709bcf19adc2507f1c329bbded5712558100d839c18ad628312b5b129/comiq-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-10 09:25:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "StoneSteel27",
    "github_project": "ComiQ",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "openai",
            "specs": []
        },
        {
            "name": "python-dotenv",
            "specs": []
        },
        {
            "name": "pydantic",
            "specs": []
        },
        {
            "name": "easyocr",
            "specs": []
        },
        {
            "name": "paddleocr",
            "specs": [
                [
                    "<",
                    "3.0"
                ]
            ]
        },
        {
            "name": "paddlepaddle",
            "specs": [
                [
                    "<",
                    "3.0"
                ]
            ]
        }
    ],
    "lcname": "comiq"
}
        
Elapsed time: 0.74869s