# Comicbox
A comic book archive metadata reader and writer.
## β¨ Features
### π Comic Formats
Comicbox reads CBZ, CBR, CBT, and optionally PDF. Comicbox archives and writes
CBZ archives and PDF metadata.
### π·οΈ Metadata Formats
Comicbox reads and writes:
- [ComicRack ComicInfo.xml v2.1 (draft) schema](https://anansi-project.github.io/docs/comicinfo/schemas/v2.1),
- [Metron MetronInfo.xml v1.0](https://metron-project.github.io/docs/category/metroninfo)
- [Comic Book Lover ComicBookInfo schema](https://code.google.com/archive/p/comicbookinfo/)
- [CoMet schema](https://github.com/wdhongtw/comet-utils).
- [PDF Metadata](https://pymupdf.readthedocs.io/en/latest/tutorial.html#accessing-meta-data).
- Embedding ComicInfo.xml or MetronInfo.xml inside PDFs.
- A variety of filename schemes that encode metadata.
### Usefulness
Comicbox's primary purpose is a library for use by
[Codex comic reader](https://github.com/ajslater/codex/). The API isn't well
documented, but you can infer what it does pretty easily here:
[comicbox.comic_archive](https://github.com/ajslater/comicbox/blob/main/comicbox/comic_archive.py)
as the primary interface.
The command line can perform most of comicbox's functions including reading and
writing metadata recursively, converting between metadata formats and extracting
pages.
### Limitations and Alternatives
Comicbox does _not_ use popular metadata database APIs or have a GUI!
[Comictagger](https://github.com/comictagger/comictagger) probably the most
useful comicbook tagger. It does most of what Comicbox does but also
automatically tags comics with the ComicVine API and has a desktop UI.
## π News
Comicbox has a [NEWS file](NEWS.md) to summarize changes that affect users.
## πΈοΈ HTML Docs
[HTML formatted docs are available here](https://comicbox.readthedocs.io)
## π¦ Installation
<!-- eslint-skip -->
```sh
pip install comicbox
```
Comicbox supports PDFs as an extra when installed like:
<!-- eslint-skip -->
```sh
pip install comicbox[pdf]
```
### Dependencies
#### Base
Comicbox generally works without any binary dependencies but requires `unrar` be
on the path to convert CBR into CBZ or extract files from CBRs.
#### PDF
The pymupdf dependency has wheels that install a local version of libmupdf. But
for some platforms (e.g. Linux on ARM, Windows) it may require libstdc++ and
c/c++ build tools installed to compile a libmupdf. More detail on this is
available in the
[pymupdf docs](https://pymupdf.readthedocs.io/en/latest/installation.html#installation-when-a-suitable-wheel-is-not-available).
##### Installing Comicbox on ARM (AARCH64) with Python 3.13
Pymupdf has no pre-built wheels for AARCH64 so pip must build it and the build
fails on Python 3.13 without this environment variable set:
```sh
PYMUPDF_SETUP_PY_LIMITED_API=0 pip install comicbox
```
You will also have to have the `build-essential` and `python3-dev` or equivalent
packages installed on on your Linux.
## β¨οΈ Use
##### Related Projects
Comicbox makes use of two of my other small projects:
[comicfn2dict](https://github.com/ajslater/comicfn2dict) which parses metadata
in comic filenames into python dicts. This library is also used by Comictagger.
[pdffile](https://github.com/ajslater/pdffile) which presents a ZipFile like
interface for PDF files.
### Console
Type
<!-- eslint-skip -->
```sh
comicbox -h
```
see the CLI help.
#### Examples
<!-- eslint-skip -->
```sh
comicbox test.cbz -m "{Tags: a,b,c, story_arcs: {d:1,e:'',f:3}" -m "Publisher: SmallComics" -w cr
```
Will write those tags to comicinfo.xml in the archive.
Be sure to add spaces after colons so they parse as valid YAML key value pairs.
This is easy to forget.
But it's probably better to use the --print action to see what it's going to do
before you actually write to the archive:
<!-- eslint-skip -->
```sh
comicbox test.cbz -m "{Tags: a,b,c, story_arcs: {d:1,e:'',f:3}" -m "Publisher: SmallComics" -p
```
A recursive example:
<!-- eslint-skip -->
```sh
comicbox --recurse -m "publisher: 'SC Comics'" -w cr ./SmallComicsComics/
```
Will recursively change the publisher to "SC Comics" for every comic found in
under the SmallComicsComics directory.
#### Escaping YAML
the `-m` command line argument accepts the YAML language for tags. Certain
characters like `\,:;_()$%^@` are part of the YAML language. To successful
include them as data in your tags, look up
["Escaping YAML" documentation online](https://www.w3schools.io/file/yaml-escape-characters/)
##### Deleting Metadata
To delete metadata from the cli you're best off exporting the current metadata,
editing the file and then re-importing it with the delete previous metadata
option:
<!-- eslint-skip -->
```sh
# export the current metadata
comicbox --export cix "My Overtagged Comic.cbz"
# Adjust the metadata in an editor.
nvim comicinfo.xml
# Check that importing the metadata will look how you like
comicbox --import comicinfo.xml -p "My Overtagged Comic.cbz"
# Delete all previous metadata from the comic (careful!)
comicbox --delete-all-tags "My Overtagged Comic.cbz"
# Import the metadata into the file and write it.
comicbox --import comicinfo.xml --write cix "My Overtagged Comic.cbz"
```
#### Quirks
##### --metadata parses all formats.
The comicbox.yaml format represents the ComicInfo.xml Web tag as sub an
`identifiers.<NID>.url` tag. But fear not, you don't have to remember this. The
CLI accepts heterogeneous tag types with the `-m` option, so you can type:
<!-- eslint-skip -->
```sh
comicbox -p -m "Web: https://foo.com" mycomic.cbz
```
and the identifier tag should appear in comicbox.yaml as:
```yaml
identifiers:
foo.com:
id_key: ""
url: https://foo.com
```
You don't even need the root tag.
##### Setting Title when Stories are present.
If the metadata contains Stories (MetronInfo.xml only) the title is computed
from the Stories. If you wish to set the title regardless, use the --replace
option. e.g.
```sh
comicbox -m "series: 'G.I. Robot', title: 'Foreign and Domestic'" -Rp
```
But be aware it will also create a story with the title's new name.
##### Identifiers
Comicbox aggregates IDS, GTINS and URLS from other formats into a common
Identifiers structure.
##### Reprints
Comicbox aggregates Alternate Names, Aliases and IsVersionOf from other formats
into a common Reprints list.
##### URNs
Because the Notes field is commonly abused in ComicInfo.xml to represent fields
ComicInfo does not (yet?) support comicbox parses the notes field heavily
looking for embedded data. Comicbox also writes identifiers into the Notes field
using an
[Uniform Resource Name](https://en.wikipedia.org/wiki/Uniform_Resource_Name)
format.
Comicbox also looks for identifiers in Tag fields of formats that don't have
their own Identifiers field.
##### Prettified Fields
Comicbox liberally accepts all kinds of values that may be enums in other
formats, like AgeRating, Formats and Creidit Roles. In a weak attempt to
standardize these values comicbox will Title case values submitted to these
fields. When writing to standard formats, comicbox attempts to transforms these
values into enums supported by the output format.
#### Packages
Comicbox actually installs three different packages:
- `comicbox` The main API and CLI script.
- `comicfn2dict` A separate library for parsing comic filenames into dicts it
also includes a CLI script.
- `pdffile` A utility library for reading and writing PDF files with an API like
Python's ZipFile
### βοΈ Config
comicbox accepts command line arguments but also an optional config file and
environment variables.
The variables have defaults specified in
[a default yaml](https://github.com/ajslater/comicbox/blob/main/comicbox/config_default.yaml)
The environment variables are the variable name prefixed with `COMICBOX_`. (e.g.
COMICBOX_COMICINFOXML=0)
#### Log Level
change logging level:
<!-- eslint-skip -->
```sh
LOGLEVEL=ERROR comicbox -p <path>
```
## π API
Comicbox is mostly used by me in [Codex](https://github.com/ajslater/codex/) as
a metadata extractor. Here's a brief example, but the API remains undocumented.
```python
with Comicbox(path_to_comic) as cb:
metadata = cb.to_dict()
page_count = cb.page_count()
file_type = cb.get_file_type()
mtime = cb.get_metadata_mtime()
image_data = car.get_cover_page(to_pixmap=True)
```
Attached to these docs in the navigation header there are some auto generated
API docs that might be better than nothing.
## π Schemas
Comicbox supports most popular comicbook metadata schema definitions. These are
defined on the [SCHEMAS page](SCHEMAS.md).
## π Tag Translations
A rough [table](TAGS.md) of how Comicbox handles tag translations between
popular comic book metadata formats.
## π Development
Comicbox code is hosted at [Github](https://github.com/ajslater/comicbox)
You may access most development tasks from the makefile. Run make to see
documentation.
### Environment variables
There is a special environment variable `DEBUG_TRANSFORM` that will print
verbose schema transform information
Raw data
{
"_id": null,
"home_page": null,
"name": "comicbox",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": "cb7, cbr, cbt, cbz, comet, comic, comicbookinfo, comicinfo, metroninfo, pdf",
"author": null,
"author_email": "AJ Slater <aj@slater.net>",
"download_url": "https://files.pythonhosted.org/packages/72/66/1efeeb555c5f0288f5e0ab100543b6cef284b57d3cd4bc0b65b3a069d093/comicbox-2.0.2.tar.gz",
"platform": null,
"description": "# Comicbox\n\nA comic book archive metadata reader and writer.\n\n## \u2728 Features\n\n### \ud83d\udcda Comic Formats\n\nComicbox reads CBZ, CBR, CBT, and optionally PDF. Comicbox archives and writes\nCBZ archives and PDF metadata.\n\n### \ud83c\udff7\ufe0f Metadata Formats\n\nComicbox reads and writes:\n\n- [ComicRack ComicInfo.xml v2.1 (draft) schema](https://anansi-project.github.io/docs/comicinfo/schemas/v2.1),\n- [Metron MetronInfo.xml v1.0](https://metron-project.github.io/docs/category/metroninfo)\n- [Comic Book Lover ComicBookInfo schema](https://code.google.com/archive/p/comicbookinfo/)\n- [CoMet schema](https://github.com/wdhongtw/comet-utils).\n- [PDF Metadata](https://pymupdf.readthedocs.io/en/latest/tutorial.html#accessing-meta-data).\n - Embedding ComicInfo.xml or MetronInfo.xml inside PDFs.\n- A variety of filename schemes that encode metadata.\n\n### Usefulness\n\nComicbox's primary purpose is a library for use by\n[Codex comic reader](https://github.com/ajslater/codex/). The API isn't well\ndocumented, but you can infer what it does pretty easily here:\n[comicbox.comic_archive](https://github.com/ajslater/comicbox/blob/main/comicbox/comic_archive.py)\nas the primary interface.\n\nThe command line can perform most of comicbox's functions including reading and\nwriting metadata recursively, converting between metadata formats and extracting\npages.\n\n### Limitations and Alternatives\n\nComicbox does _not_ use popular metadata database APIs or have a GUI!\n\n[Comictagger](https://github.com/comictagger/comictagger) probably the most\nuseful comicbook tagger. It does most of what Comicbox does but also\nautomatically tags comics with the ComicVine API and has a desktop UI.\n\n## \ud83d\udcdc News\n\nComicbox has a [NEWS file](NEWS.md) to summarize changes that affect users.\n\n## \ud83d\udd78\ufe0f HTML Docs\n\n[HTML formatted docs are available here](https://comicbox.readthedocs.io)\n\n## \ud83d\udce6 Installation\n\n<!-- eslint-skip -->\n\n```sh\npip install comicbox\n```\n\nComicbox supports PDFs as an extra when installed like:\n\n<!-- eslint-skip -->\n\n```sh\npip install comicbox[pdf]\n```\n\n### Dependencies\n\n#### Base\n\nComicbox generally works without any binary dependencies but requires `unrar` be\non the path to convert CBR into CBZ or extract files from CBRs.\n\n#### PDF\n\nThe pymupdf dependency has wheels that install a local version of libmupdf. But\nfor some platforms (e.g. Linux on ARM, Windows) it may require libstdc++ and\nc/c++ build tools installed to compile a libmupdf. More detail on this is\navailable in the\n[pymupdf docs](https://pymupdf.readthedocs.io/en/latest/installation.html#installation-when-a-suitable-wheel-is-not-available).\n\n##### Installing Comicbox on ARM (AARCH64) with Python 3.13\n\nPymupdf has no pre-built wheels for AARCH64 so pip must build it and the build\nfails on Python 3.13 without this environment variable set:\n\n```sh\nPYMUPDF_SETUP_PY_LIMITED_API=0 pip install comicbox\n```\n\nYou will also have to have the `build-essential` and `python3-dev` or equivalent\npackages installed on on your Linux.\n\n## \u2328\ufe0f Use\n\n##### Related Projects\n\nComicbox makes use of two of my other small projects:\n\n[comicfn2dict](https://github.com/ajslater/comicfn2dict) which parses metadata\nin comic filenames into python dicts. This library is also used by Comictagger.\n\n[pdffile](https://github.com/ajslater/pdffile) which presents a ZipFile like\ninterface for PDF files.\n\n### Console\n\nType\n\n<!-- eslint-skip -->\n\n```sh\ncomicbox -h\n```\n\nsee the CLI help.\n\n#### Examples\n\n<!-- eslint-skip -->\n\n```sh\ncomicbox test.cbz -m \"{Tags: a,b,c, story_arcs: {d:1,e:'',f:3}\" -m \"Publisher: SmallComics\" -w cr\n```\n\nWill write those tags to comicinfo.xml in the archive.\n\nBe sure to add spaces after colons so they parse as valid YAML key value pairs.\nThis is easy to forget.\n\nBut it's probably better to use the --print action to see what it's going to do\nbefore you actually write to the archive:\n\n<!-- eslint-skip -->\n\n```sh\ncomicbox test.cbz -m \"{Tags: a,b,c, story_arcs: {d:1,e:'',f:3}\" -m \"Publisher: SmallComics\" -p\n```\n\nA recursive example:\n\n<!-- eslint-skip -->\n\n```sh\ncomicbox --recurse -m \"publisher: 'SC Comics'\" -w cr ./SmallComicsComics/\n```\n\nWill recursively change the publisher to \"SC Comics\" for every comic found in\nunder the SmallComicsComics directory.\n\n#### Escaping YAML\n\nthe `-m` command line argument accepts the YAML language for tags. Certain\ncharacters like `\\,:;_()$%^@` are part of the YAML language. To successful\ninclude them as data in your tags, look up\n[\"Escaping YAML\" documentation online](https://www.w3schools.io/file/yaml-escape-characters/)\n\n##### Deleting Metadata\n\nTo delete metadata from the cli you're best off exporting the current metadata,\nediting the file and then re-importing it with the delete previous metadata\noption:\n\n<!-- eslint-skip -->\n\n```sh\n# export the current metadata\ncomicbox --export cix \"My Overtagged Comic.cbz\"\n# Adjust the metadata in an editor.\nnvim comicinfo.xml\n# Check that importing the metadata will look how you like\ncomicbox --import comicinfo.xml -p \"My Overtagged Comic.cbz\"\n# Delete all previous metadata from the comic (careful!)\ncomicbox --delete-all-tags \"My Overtagged Comic.cbz\"\n# Import the metadata into the file and write it.\ncomicbox --import comicinfo.xml --write cix \"My Overtagged Comic.cbz\"\n```\n\n#### Quirks\n\n##### --metadata parses all formats.\n\nThe comicbox.yaml format represents the ComicInfo.xml Web tag as sub an\n`identifiers.<NID>.url` tag. But fear not, you don't have to remember this. The\nCLI accepts heterogeneous tag types with the `-m` option, so you can type:\n\n<!-- eslint-skip -->\n\n```sh\ncomicbox -p -m \"Web: https://foo.com\" mycomic.cbz\n```\n\nand the identifier tag should appear in comicbox.yaml as:\n\n```yaml\nidentifiers:\n foo.com:\n id_key: \"\"\n url: https://foo.com\n```\n\nYou don't even need the root tag.\n\n##### Setting Title when Stories are present.\n\nIf the metadata contains Stories (MetronInfo.xml only) the title is computed\nfrom the Stories. If you wish to set the title regardless, use the --replace\noption. e.g.\n\n```sh\ncomicbox -m \"series: 'G.I. Robot', title: 'Foreign and Domestic'\" -Rp\n```\n\nBut be aware it will also create a story with the title's new name.\n\n##### Identifiers\n\nComicbox aggregates IDS, GTINS and URLS from other formats into a common\nIdentifiers structure.\n\n##### Reprints\n\nComicbox aggregates Alternate Names, Aliases and IsVersionOf from other formats\ninto a common Reprints list.\n\n##### URNs\n\nBecause the Notes field is commonly abused in ComicInfo.xml to represent fields\nComicInfo does not (yet?) support comicbox parses the notes field heavily\nlooking for embedded data. Comicbox also writes identifiers into the Notes field\nusing an\n[Uniform Resource Name](https://en.wikipedia.org/wiki/Uniform_Resource_Name)\nformat.\n\nComicbox also looks for identifiers in Tag fields of formats that don't have\ntheir own Identifiers field.\n\n##### Prettified Fields\n\nComicbox liberally accepts all kinds of values that may be enums in other\nformats, like AgeRating, Formats and Creidit Roles. In a weak attempt to\nstandardize these values comicbox will Title case values submitted to these\nfields. When writing to standard formats, comicbox attempts to transforms these\nvalues into enums supported by the output format.\n\n#### Packages\n\nComicbox actually installs three different packages:\n\n- `comicbox` The main API and CLI script.\n- `comicfn2dict` A separate library for parsing comic filenames into dicts it\n also includes a CLI script.\n- `pdffile` A utility library for reading and writing PDF files with an API like\n Python's ZipFile\n\n### \u2699\ufe0f Config\n\ncomicbox accepts command line arguments but also an optional config file and\nenvironment variables.\n\nThe variables have defaults specified in\n[a default yaml](https://github.com/ajslater/comicbox/blob/main/comicbox/config_default.yaml)\n\nThe environment variables are the variable name prefixed with `COMICBOX_`. (e.g.\nCOMICBOX_COMICINFOXML=0)\n\n#### Log Level\n\nchange logging level:\n\n<!-- eslint-skip -->\n\n```sh\nLOGLEVEL=ERROR comicbox -p <path>\n```\n\n## \ud83d\udee0 API\n\nComicbox is mostly used by me in [Codex](https://github.com/ajslater/codex/) as\na metadata extractor. Here's a brief example, but the API remains undocumented.\n\n```python\nwith Comicbox(path_to_comic) as cb:\n metadata = cb.to_dict()\n page_count = cb.page_count()\n file_type = cb.get_file_type()\n mtime = cb.get_metadata_mtime()\n image_data = car.get_cover_page(to_pixmap=True)\n```\n\nAttached to these docs in the navigation header there are some auto generated\nAPI docs that might be better than nothing.\n\n## \ud83d\udccb Schemas\n\nComicbox supports most popular comicbook metadata schema definitions. These are\ndefined on the [SCHEMAS page](SCHEMAS.md).\n\n## \ud83d\udd00 Tag Translations\n\nA rough [table](TAGS.md) of how Comicbox handles tag translations between\npopular comic book metadata formats.\n\n## \ud83d\udee0 Development\n\nComicbox code is hosted at [Github](https://github.com/ajslater/comicbox)\n\nYou may access most development tasks from the makefile. Run make to see\ndocumentation.\n\n### Environment variables\n\nThere is a special environment variable `DEBUG_TRANSFORM` that will print\nverbose schema transform information\n",
"bugtrack_url": null,
"license": "LGPL-3.0-only",
"summary": "Comic book archive multi format metadata read/write/transform tool and image extractor.",
"version": "2.0.2",
"project_urls": {
"documentation": "https://comicbox.readthedocs.io",
"repository": "https://github.com/ajslater/comicbox"
},
"split_keywords": [
"cb7",
" cbr",
" cbt",
" cbz",
" comet",
" comic",
" comicbookinfo",
" comicinfo",
" metroninfo",
" pdf"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "bec12b9984c3c61bf00c33545fa8d5502a4336322b78859eedf1c10246ec4e9c",
"md5": "41b1ccd60027c335d0d9696d738a2554",
"sha256": "d7fe159bc4c94ad50d35fa17e0c9c56b70a4cfc24d8592384cc63a9b4a272e44"
},
"downloads": -1,
"filename": "comicbox-2.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "41b1ccd60027c335d0d9696d738a2554",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 154422,
"upload_time": "2025-08-02T00:52:40",
"upload_time_iso_8601": "2025-08-02T00:52:40.320273Z",
"url": "https://files.pythonhosted.org/packages/be/c1/2b9984c3c61bf00c33545fa8d5502a4336322b78859eedf1c10246ec4e9c/comicbox-2.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "72661efeeb555c5f0288f5e0ab100543b6cef284b57d3cd4bc0b65b3a069d093",
"md5": "94cf5c6ebb25ab070cc9598e2df1f2e0",
"sha256": "853ecc8d8f5c8b0554c87b3b8b54c5121b0af59a7086161dbb3ee0e075e5fba4"
},
"downloads": -1,
"filename": "comicbox-2.0.2.tar.gz",
"has_sig": false,
"md5_digest": "94cf5c6ebb25ab070cc9598e2df1f2e0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 70901279,
"upload_time": "2025-08-02T00:52:43",
"upload_time_iso_8601": "2025-08-02T00:52:43.364214Z",
"url": "https://files.pythonhosted.org/packages/72/66/1efeeb555c5f0288f5e0ab100543b6cef284b57d3cd4bc0b65b3a069d093/comicbox-2.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-02 00:52:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ajslater",
"github_project": "comicbox",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"circle": true,
"lcname": "comicbox"
}