# ComiQ: Comic-Focused Hybrid OCR Library
ComiQ is an advanced Optical Character Recognition (OCR) library specifically designed for comics. It combines traditional OCR engines with a powerful AI model to provide accurate text detection and grouping in comic images.
For examples of ComiQ's capabilities, visit the [examples directory](examples/ReadME.md).
## Features
- **Hybrid OCR:** Combines multiple OCR engines for improved accuracy and robustness.
- **Extensible:** Register your own custom OCR engines.
- **AI-Powered Grouping:** Uses an AI model to intelligently group detected text into coherent bubbles and captions.
- **Configurable:** Easily configure the AI model, OCR engines, and other parameters.
- **Flexible:** Supports both file paths and in-memory image arrays.
- **Environment-Friendly:** Automatically loads your API key from environment variables.
## Installation
You can install ComiQ and its dependencies with a single pip command:
```bash
pip install comiq
```
This will install the CPU-versions of the required OCR libraries. For GPU acceleration, please see the section below.
### Optional: GPU Acceleration
For significantly better performance, you can install the GPU-enabled versions of PyTorch and PaddlePaddle. It is highly recommended to do this in a clean virtual environment.
**1. Uninstall CPU Versions (if necessary):**
If you have already installed the CPU versions, uninstall them first:
```bash
pip uninstall torch paddlepaddle
```
**2. Install GPU-Enabled PyTorch:**
Visit the official [PyTorch website](https://pytorch.org/get-started/locally/) and use the command generator to find the correct installation command for your specific system (OS, package manager, and CUDA version).
*Example for Windows with an NVIDIA GPU (CUDA 12.1):*
```bash
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```
**3. Install GPU-Enabled PaddlePaddle:**
To ensure compatibility, install a version below 3.0:
```bash
pip install "paddlepaddle-gpu<3.0"
```
After installing the GPU versions, the OCR engines in ComiQ will automatically use them.
## Quick Start
1. **Set your API Key:** ComiQ requires a Gemini API key. You can either pass it to the `ComiQ` constructor or set it as an environment variable named `GEMINI_API_KEY`.
You can create a `.env` file in your project's root directory:
```
GEMINI_API_KEY="your-api-key-here"
```
2. **Use the `ComiQ` class:**
```python
from comiq import ComiQ
import cv2
# Initialize ComiQ. It will automatically load the API key from the .env file.
comiq = ComiQ()
# Process an image from a file path
image_path = "path/to/your/comic/image.jpg"
data = comiq.extract(image_path)
# Or process an image from a numpy array
image_array = cv2.imread(image_path)
data_from_array = comiq.extract(image_array)
# 'data' now contains a list of text bubbles with their text and locations.
print(data)
```
## API Reference
### `ComiQ(api_key: str = None, model_name: str = "gemini-1.5-flash", **kwargs)`
Initializes the ComiQ instance.
- **`api_key` (str, optional):** Your Gemini API key. If not provided, it will be loaded from the `GEMINI_API_KEY` environment variable.
- **`model_name` (str, optional):** The name of the AI model to use. Defaults to `"gemini-1.5-flash"`.
- **`**kwargs`:** Additional configuration for the OCR and AI models. See "Custom Configuration" for more details.
### `extract(image: Union[str, 'numpy.ndarray'], ocr: Union[str, List[str]] = "paddleocr")`
Extracts and groups text from the given comic image.
- **`image` (str or numpy.ndarray):** The path to the image file or the image as a NumPy array.
- **`ocr` (str or list, optional):** The OCR engine(s) to use. Can be `"paddleocr"`, `"easyocr"`, or a list like `["paddleocr", "easyocr"]`. Defaults to `"paddleocr"`.
**Returns:**
- `dict`: A dictionary containing the processed data, including text bubbles, their locations, and other metadata.
### `register_ocr_engine(name: str, engine: Callable)`
Registers a new OCR engine.
- **`name` (str):** The name to identify the engine.
- **`engine` (Callable):** The function that implements the engine. See "Advanced Usage" for details.
### `get_available_ocr_engines() -> List[str]`
Returns a list of all registered OCR engine names.
## Advanced Usage: Registering a Custom OCR Engine
You can extend ComiQ by adding your own OCR engine. Your custom engine must be a function that adheres to the following contract:
- **Input:** It must accept two arguments:
1. `image` (a `numpy.ndarray` in BGR format).
2. `**kwargs` (a dictionary for any configuration your engine needs).
- **Output:** It must return a list of dictionaries, where each dictionary represents a detected text box and has two keys:
1. `"text_box"`: A list of four integers `[ymin, xmin, ymax, xmax]`.
2. `"text"`: The detected text as a string.
**Example:**
Here is how you could wrap the `pytesseract` library as a custom engine:
```python
import comiq
import pytesseract
import cv2
import numpy as np
# 1. Define the custom engine function
def pytesseract_engine(image: np.ndarray, **kwargs) -> list:
"""
A custom OCR engine using pytesseract.
It expects a 'config' kwarg, e.g., pytesseract_engine(image, config="--psm 6")
"""
# pytesseract works with RGB images
rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Use pytesseract to get structured data
data = pytesseract.image_to_data(
rgb_image,
output_type=pytesseract.Output.DICT,
config=kwargs.get("config", "")
)
results = []
for i in range(len(data['text'])):
if int(data['conf'][i]) > 60: # Filter by confidence
(x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])
text = data['text'][i]
if text.strip():
results.append({
"text_box": [y, x, y + h, x + w],
"text": text
})
return results
# 2. Register the new engine with ComiQ
comiq.register_ocr_engine("pytesseract", pytesseract_engine)
# 3. Now you can use it!
my_comiq = comiq.ComiQ()
tesseract_config = {"ocr": {"pytesseract": {"config": "--psm 6"}}}
data = my_comiq.extract(
"path/to/image.png",
ocr="pytesseract",
**tesseract_config
)
print(comiq.get_available_ocr_engines())
# Output: ['paddleocr', 'easyocr', 'pytesseract']
```
## Custom Configuration
You can pass additional configuration to the `ComiQ` constructor to customize the behavior of the OCR and AI models.
```python
from comiq import ComiQ
# Custom configuration
config = {
"ocr": {
"paddleocr": {
"lang": "japan", # Use Japanese language model for PaddleOCR
},
"easyocr": {
"reader": {"gpu": False}, # Disable GPU for EasyOCR
}
},
"ai": {
"temperature": 0.5, # Make the AI model more creative
}
}
comiq = ComiQ(model_name="gemini-1.5-pro", **config)
data = comiq.extract("path/to/manga.jpg", ocr="paddleocr")
```
## Contributing
Contributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for more details.
## License
ComiQ is licensed under the [MIT License](LICENSE).
Raw data
{
"_id": null,
"home_page": null,
"name": "comiq",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "MoltenSteel <stonesteel27@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/f1/e8/bdc709bcf19adc2507f1c329bbded5712558100d839c18ad628312b5b129/comiq-0.1.2.tar.gz",
"platform": null,
"description": "# ComiQ: Comic-Focused Hybrid OCR Library\r\n\r\nComiQ is an advanced Optical Character Recognition (OCR) library specifically designed for comics. It combines traditional OCR engines with a powerful AI model to provide accurate text detection and grouping in comic images.\r\n\r\nFor examples of ComiQ's capabilities, visit the [examples directory](examples/ReadME.md).\r\n\r\n## Features\r\n\r\n- **Hybrid OCR:** Combines multiple OCR engines for improved accuracy and robustness.\r\n- **Extensible:** Register your own custom OCR engines.\r\n- **AI-Powered Grouping:** Uses an AI model to intelligently group detected text into coherent bubbles and captions.\r\n- **Configurable:** Easily configure the AI model, OCR engines, and other parameters.\r\n- **Flexible:** Supports both file paths and in-memory image arrays.\r\n- **Environment-Friendly:** Automatically loads your API key from environment variables.\r\n\r\n## Installation\r\n\r\nYou can install ComiQ and its dependencies with a single pip command:\r\n\r\n```bash\r\npip install comiq\r\n```\r\n\r\nThis will install the CPU-versions of the required OCR libraries. For GPU acceleration, please see the section below.\r\n\r\n### Optional: GPU Acceleration\r\n\r\nFor significantly better performance, you can install the GPU-enabled versions of PyTorch and PaddlePaddle. It is highly recommended to do this in a clean virtual environment.\r\n\r\n**1. Uninstall CPU Versions (if necessary):**\r\nIf you have already installed the CPU versions, uninstall them first:\r\n```bash\r\npip uninstall torch paddlepaddle\r\n```\r\n\r\n**2. Install GPU-Enabled PyTorch:**\r\nVisit the official [PyTorch website](https://pytorch.org/get-started/locally/) and use the command generator to find the correct installation command for your specific system (OS, package manager, and CUDA version).\r\n\r\n*Example for Windows with an NVIDIA GPU (CUDA 12.1):*\r\n```bash\r\npip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121\r\n```\r\n\r\n**3. Install GPU-Enabled PaddlePaddle:**\r\nTo ensure compatibility, install a version below 3.0:\r\n```bash\r\npip install \"paddlepaddle-gpu<3.0\"\r\n```\r\n\r\nAfter installing the GPU versions, the OCR engines in ComiQ will automatically use them.\r\n\r\n## Quick Start\r\n\r\n1. **Set your API Key:** ComiQ requires a Gemini API key. You can either pass it to the `ComiQ` constructor or set it as an environment variable named `GEMINI_API_KEY`.\r\n\r\n You can create a `.env` file in your project's root directory:\r\n ```\r\n GEMINI_API_KEY=\"your-api-key-here\"\r\n ```\r\n\r\n2. **Use the `ComiQ` class:**\r\n\r\n```python\r\nfrom comiq import ComiQ\r\nimport cv2\r\n\r\n# Initialize ComiQ. It will automatically load the API key from the .env file.\r\ncomiq = ComiQ()\r\n\r\n# Process an image from a file path\r\nimage_path = \"path/to/your/comic/image.jpg\"\r\ndata = comiq.extract(image_path)\r\n\r\n# Or process an image from a numpy array\r\nimage_array = cv2.imread(image_path)\r\ndata_from_array = comiq.extract(image_array)\r\n\r\n# 'data' now contains a list of text bubbles with their text and locations.\r\nprint(data)\r\n```\r\n\r\n## API Reference\r\n\r\n### `ComiQ(api_key: str = None, model_name: str = \"gemini-1.5-flash\", **kwargs)`\r\n\r\nInitializes the ComiQ instance.\r\n\r\n- **`api_key` (str, optional):** Your Gemini API key. If not provided, it will be loaded from the `GEMINI_API_KEY` environment variable.\r\n- **`model_name` (str, optional):** The name of the AI model to use. Defaults to `\"gemini-1.5-flash\"`.\r\n- **`**kwargs`:** Additional configuration for the OCR and AI models. See \"Custom Configuration\" for more details.\r\n\r\n### `extract(image: Union[str, 'numpy.ndarray'], ocr: Union[str, List[str]] = \"paddleocr\")`\r\n\r\nExtracts and groups text from the given comic image.\r\n\r\n- **`image` (str or numpy.ndarray):** The path to the image file or the image as a NumPy array.\r\n- **`ocr` (str or list, optional):** The OCR engine(s) to use. Can be `\"paddleocr\"`, `\"easyocr\"`, or a list like `[\"paddleocr\", \"easyocr\"]`. Defaults to `\"paddleocr\"`.\r\n\r\n**Returns:**\r\n- `dict`: A dictionary containing the processed data, including text bubbles, their locations, and other metadata.\r\n\r\n### `register_ocr_engine(name: str, engine: Callable)`\r\nRegisters a new OCR engine.\r\n\r\n- **`name` (str):** The name to identify the engine.\r\n- **`engine` (Callable):** The function that implements the engine. See \"Advanced Usage\" for details.\r\n\r\n### `get_available_ocr_engines() -> List[str]`\r\nReturns a list of all registered OCR engine names.\r\n\r\n## Advanced Usage: Registering a Custom OCR Engine\r\n\r\nYou can extend ComiQ by adding your own OCR engine. Your custom engine must be a function that adheres to the following contract:\r\n\r\n- **Input:** It must accept two arguments:\r\n 1. `image` (a `numpy.ndarray` in BGR format).\r\n 2. `**kwargs` (a dictionary for any configuration your engine needs).\r\n- **Output:** It must return a list of dictionaries, where each dictionary represents a detected text box and has two keys:\r\n 1. `\"text_box\"`: A list of four integers `[ymin, xmin, ymax, xmax]`.\r\n 2. `\"text\"`: The detected text as a string.\r\n\r\n**Example:**\r\n\r\nHere is how you could wrap the `pytesseract` library as a custom engine:\r\n\r\n```python\r\nimport comiq\r\nimport pytesseract\r\nimport cv2\r\nimport numpy as np\r\n\r\n# 1. Define the custom engine function\r\ndef pytesseract_engine(image: np.ndarray, **kwargs) -> list:\r\n \"\"\"\r\n A custom OCR engine using pytesseract.\r\n It expects a 'config' kwarg, e.g., pytesseract_engine(image, config=\"--psm 6\")\r\n \"\"\"\r\n # pytesseract works with RGB images\r\n rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\r\n \r\n # Use pytesseract to get structured data\r\n data = pytesseract.image_to_data(\r\n rgb_image,\r\n output_type=pytesseract.Output.DICT,\r\n config=kwargs.get(\"config\", \"\")\r\n )\r\n \r\n results = []\r\n for i in range(len(data['text'])):\r\n if int(data['conf'][i]) > 60: # Filter by confidence\r\n (x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])\r\n text = data['text'][i]\r\n if text.strip():\r\n results.append({\r\n \"text_box\": [y, x, y + h, x + w],\r\n \"text\": text\r\n })\r\n return results\r\n\r\n# 2. Register the new engine with ComiQ\r\ncomiq.register_ocr_engine(\"pytesseract\", pytesseract_engine)\r\n\r\n# 3. Now you can use it!\r\nmy_comiq = comiq.ComiQ()\r\ntesseract_config = {\"ocr\": {\"pytesseract\": {\"config\": \"--psm 6\"}}}\r\ndata = my_comiq.extract(\r\n \"path/to/image.png\",\r\n ocr=\"pytesseract\",\r\n **tesseract_config\r\n)\r\n\r\nprint(comiq.get_available_ocr_engines())\r\n# Output: ['paddleocr', 'easyocr', 'pytesseract']\r\n```\r\n\r\n## Custom Configuration\r\n\r\nYou can pass additional configuration to the `ComiQ` constructor to customize the behavior of the OCR and AI models.\r\n\r\n```python\r\nfrom comiq import ComiQ\r\n\r\n# Custom configuration\r\nconfig = {\r\n \"ocr\": {\r\n \"paddleocr\": {\r\n \"lang\": \"japan\", # Use Japanese language model for PaddleOCR\r\n },\r\n \"easyocr\": {\r\n \"reader\": {\"gpu\": False}, # Disable GPU for EasyOCR\r\n }\r\n },\r\n \"ai\": {\r\n \"temperature\": 0.5, # Make the AI model more creative\r\n }\r\n}\r\n\r\ncomiq = ComiQ(model_name=\"gemini-1.5-pro\", **config)\r\n\r\ndata = comiq.extract(\"path/to/manga.jpg\", ocr=\"paddleocr\")\r\n```\r\n\r\n## Contributing\r\n\r\nContributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for more details.\r\n\r\n## License\r\n\r\nComiQ is licensed under the [MIT License](LICENSE).\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Comic-Focused Hybrid OCR Python Library",
"version": "0.1.2",
"project_urls": {
"Bug Tracker": "https://github.com/StoneSteel27/ComiQ/issues",
"Homepage": "https://github.com/StoneSteel27/ComiQ"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "70c88a47e476685ffc4dcaa30067d89ee888921e21aa477b32306e310439a709",
"md5": "d49f1cf15406324c13864f77cca01495",
"sha256": "613354406321ca0e20a6d1ce0833c8a06c95ddce7abb4143341500e4576c59f1"
},
"downloads": -1,
"filename": "comiq-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d49f1cf15406324c13864f77cca01495",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 12872,
"upload_time": "2025-07-10T09:24:50",
"upload_time_iso_8601": "2025-07-10T09:24:50.712256Z",
"url": "https://files.pythonhosted.org/packages/70/c8/8a47e476685ffc4dcaa30067d89ee888921e21aa477b32306e310439a709/comiq-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f1e8bdc709bcf19adc2507f1c329bbded5712558100d839c18ad628312b5b129",
"md5": "03228924a8d64a2f7099f3916fe64317",
"sha256": "f15dfd4d2e0d9cd5979c9f2e4ce2bd0265e6a7eae331dd04627200d44f92db4f"
},
"downloads": -1,
"filename": "comiq-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "03228924a8d64a2f7099f3916fe64317",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 7672216,
"upload_time": "2025-07-10T09:25:23",
"upload_time_iso_8601": "2025-07-10T09:25:23.243617Z",
"url": "https://files.pythonhosted.org/packages/f1/e8/bdc709bcf19adc2507f1c329bbded5712558100d839c18ad628312b5b129/comiq-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-10 09:25:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "StoneSteel27",
"github_project": "ComiQ",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "openai",
"specs": []
},
{
"name": "python-dotenv",
"specs": []
},
{
"name": "pydantic",
"specs": []
},
{
"name": "easyocr",
"specs": []
},
{
"name": "paddleocr",
"specs": [
[
"<",
"3.0"
]
]
},
{
"name": "paddlepaddle",
"specs": [
[
"<",
"3.0"
]
]
}
],
"lcname": "comiq"
}