# opencc_purepy
[](https://pypi.org/project/opencc-purepy/)
[](https://github.com/laisuk/opencc_pyo3/blob/main/LICENSE)
[](https://pepy.tech/project/opencc-purepy)
[](https://github.com/laisuk/opencc_purepy/actions/workflows/release.yml)
**opencc_purepy** is a **pure Python** implementation of [OpenCC (Open Chinese Convert)](https://github.com/BYVoid/OpenCC), supporting conversion between Simplified, Traditional, Hong Kong, Taiwan, and Japanese Kanji.
It uses dictionary-based segmentation and mapping logic inspired by the original OpenCC.
---
## π© Features
- **Pure Python** β no native dependencies
- **Multiple Chinese locale conversions** (Simplified, Traditional, HK, TW, JP)
- **Punctuation style conversion** (optional)
- **Automatic code detection** (Simplified/Traditional)
- **CLI** with Office document support (`.docx`, `.xlsx`, `.pptx`, `.odt`, `.ods`, `.odp`, `.epub`)
> π `opencc_purepy` core library is compatible with **Python 2.7+** if used as an import module.
The CLI tool `opencc-purepy` requires **Python 3.7 or later** (due to `f-string` usage).
---
## π Supported Conversion Configs
| Code | Description |
|---------|------------------------------------------------|
| `s2t` | Simplified β Traditional |
| `t2s` | Traditional β Simplified |
| `s2tw` | Simplified β Traditional (Taiwan) |
| `tw2s` | Traditional (Taiwan) β Simplified |
| `s2twp` | Simplified β Traditional (Taiwan) with idioms |
| `tw2sp` | Traditional (Taiwan) β Simplified with idioms |
| `s2hk` | Simplified β Traditional (Hong Kong) |
| `hk2s` | Traditional (Hong Kong) β Simplified |
| `t2tw` | Traditional β Traditional (Taiwan) |
| `tw2t` | Traditional (Taiwan) β Traditional |
| `t2twp` | Traditional β Traditional (Taiwan) with idioms |
| `tw2tp` | Traditional (Taiwan) β Traditional with idioms |
| `t2hk` | Traditional β Traditional (Hong Kong) |
| `hk2t` | Traditional (Hong Kong) β Traditional |
| `t2jp` | Japanese Kyujitai β Shinjitai |
| `jp2t` | Japanese Shinjitai β Kyujitai |
---
## π¦ Installation
```bash
pip install opencc-purepy
```
---
## π Usage
### Python
```python
from opencc_purepy import OpenCC
text = "βζ₯η δΈθ§ζοΌε€ε€ι»εΌιΈγβ"
opencc = OpenCC("s2t")
converted = opencc.convert(text, punctuation=True)
print(converted) # γζ₯η δΈθ¦ΊζοΌθθθεΌι³₯γγ
```
### CLI
#### Text File Conversion
```sh
python -m opencc_purepy convert -i input.txt -o output.txt -c s2t -p
# or, if installed as a script:
opencc-purepy convert -i input.txt -o output.txt -c s2t -p
```
#### Office Document Conversion subcommand (`office`)
Supports: `.docx`, `.xlsx`, `.pptx`, `.odt`, `.ods`, `.odp`, `.epub`
```sh
# Convert Word document with font preservation
opencc-purepy office -i example.docx -c t2s --keep-font
# Convert EPUB and auto-detect output name
opencc-purepy office -i book.epub -c s2t --auto-ext
# Convert Excel and specify output path and format
opencc-purepy office -i sheet.xlsx -o result.xlsx -c s2tw --format xlsx
```
> βΉοΈ With `office` subcommand, the input is processed as an Office or EPUB document and OpenCC conversion is applied internally.
---
## π§© API Reference
### `OpenCC` class
- `OpenCC(config: str = "s2t")`
Create a converter with the specified config.
- `convert(input: str, punctuation: bool = False) -> str`
Convert text with optional punctuation conversion.
- `zho_check(input: str) -> int`
Detect the code of the input text:
1 - Traditional, 2 - Simplified, 0 - Others
---
## π Development
- Python bindings: [`opencc_purepy/__init__.py`](https://github.com/laisuk/opencc_purepy/blob/master/opencc_purepy/__init__.py), [`opencc_purepy/opencc_purepy.pyi`](https://github.com/laisuk/opencc_purepy/blob/master/opencc_purepy/opencc_purepy.pyi)
- CLI: [`opencc_purepy/__main__.py`](https://github.com/laisuk/opencc_purepy/blob/master/opencc_purepy/__main__.py)
---
## β‘ Benchmark
> Measured on a local machine using the default "s2t" configuration.
> Each test averaged over 20 runs with preloaded dictionaries.
| Input Size | Avg. Time (ms) |
|-------------------|---------------:|
| **100 chars** | 0.15 ms |
| **1,000 chars** | 0.93 ms |
| **10,000 chars** | 8.76 ms |
| **100,000 chars** | 86.05 ms |
*Timings exclude initialization; focus is on pure conversion speed.*
---
## π License
This project is licensed under the [MIT License](https://github.com/laisuk/opencc_purepy/blob/master/LICENSE).
---
Powered by **Pure Python** and **OpenCC** Lexicons.
Raw data
{
"_id": null,
"home_page": null,
"name": "opencc-purepy",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "opencc, chinese, text, conversion, pure-python",
"author": null,
"author_email": "laisuk <laisuk@yahoo.com>",
"download_url": "https://files.pythonhosted.org/packages/84/99/147a5c95b8ed2c4161a6d2bf78f4406fd1a46c6b44407983d76bad099822/opencc_purepy-1.1.0.tar.gz",
"platform": null,
"description": "# opencc_purepy\r\n\r\n[](https://pypi.org/project/opencc-purepy/)\r\n[](https://github.com/laisuk/opencc_pyo3/blob/main/LICENSE)\r\n[](https://pepy.tech/project/opencc-purepy)\r\n[](https://github.com/laisuk/opencc_purepy/actions/workflows/release.yml)\r\n\r\n**opencc_purepy** is a **pure Python** implementation of [OpenCC (Open Chinese Convert)](https://github.com/BYVoid/OpenCC), supporting conversion between Simplified, Traditional, Hong Kong, Taiwan, and Japanese Kanji. \r\nIt uses dictionary-based segmentation and mapping logic inspired by the original OpenCC.\r\n\r\n---\r\n\r\n## \ud83d\udea9 Features\r\n\r\n- **Pure Python** \u2013 no native dependencies\r\n- **Multiple Chinese locale conversions** (Simplified, Traditional, HK, TW, JP)\r\n- **Punctuation style conversion** (optional)\r\n- **Automatic code detection** (Simplified/Traditional)\r\n- **CLI** with Office document support (`.docx`, `.xlsx`, `.pptx`, `.odt`, `.ods`, `.odp`, `.epub`)\r\n\r\n> \ud83d\udc0d `opencc_purepy` core library is compatible with **Python 2.7+** if used as an import module. \r\nThe CLI tool `opencc-purepy` requires **Python 3.7 or later** (due to `f-string` usage).\r\n\r\n---\r\n\r\n## \ud83d\udd01 Supported Conversion Configs\r\n\r\n| Code | Description |\r\n|---------|------------------------------------------------|\r\n| `s2t` | Simplified \u2192 Traditional |\r\n| `t2s` | Traditional \u2192 Simplified |\r\n| `s2tw` | Simplified \u2192 Traditional (Taiwan) |\r\n| `tw2s` | Traditional (Taiwan) \u2192 Simplified |\r\n| `s2twp` | Simplified \u2192 Traditional (Taiwan) with idioms |\r\n| `tw2sp` | Traditional (Taiwan) \u2192 Simplified with idioms |\r\n| `s2hk` | Simplified \u2192 Traditional (Hong Kong) |\r\n| `hk2s` | Traditional (Hong Kong) \u2192 Simplified |\r\n| `t2tw` | Traditional \u2192 Traditional (Taiwan) |\r\n| `tw2t` | Traditional (Taiwan) \u2192 Traditional |\r\n| `t2twp` | Traditional \u2192 Traditional (Taiwan) with idioms |\r\n| `tw2tp` | Traditional (Taiwan) \u2192 Traditional with idioms |\r\n| `t2hk` | Traditional \u2192 Traditional (Hong Kong) |\r\n| `hk2t` | Traditional (Hong Kong) \u2192 Traditional |\r\n| `t2jp` | Japanese Kyujitai \u2192 Shinjitai |\r\n| `jp2t` | Japanese Shinjitai \u2192 Kyujitai |\r\n\r\n---\r\n\r\n## \ud83d\udce6 Installation\r\n\r\n```bash\r\npip install opencc-purepy\r\n```\r\n\r\n---\r\n\r\n## \ud83d\ude80 Usage\r\n\r\n### Python\r\n\r\n```python\r\nfrom opencc_purepy import OpenCC\r\n\r\ntext = \"\u201c\u6625\u7720\u4e0d\u89c9\u6653\uff0c\u5904\u5904\u95fb\u557c\u9e1f\u3002\u201d\"\r\nopencc = OpenCC(\"s2t\")\r\nconverted = opencc.convert(text, punctuation=True)\r\nprint(converted) # \u300c\u6625\u7720\u4e0d\u89ba\u66c9\uff0c\u8655\u8655\u805e\u557c\u9ce5\u3002\u300d\r\n```\r\n\r\n### CLI\r\n\r\n#### Text File Conversion\r\n\r\n```sh\r\npython -m opencc_purepy convert -i input.txt -o output.txt -c s2t -p\r\n# or, if installed as a script:\r\nopencc-purepy convert -i input.txt -o output.txt -c s2t -p\r\n```\r\n\r\n#### Office Document Conversion subcommand (`office`)\r\n\r\nSupports: `.docx`, `.xlsx`, `.pptx`, `.odt`, `.ods`, `.odp`, `.epub`\r\n\r\n```sh\r\n# Convert Word document with font preservation\r\nopencc-purepy office -i example.docx -c t2s --keep-font\r\n\r\n# Convert EPUB and auto-detect output name\r\nopencc-purepy office -i book.epub -c s2t --auto-ext\r\n\r\n# Convert Excel and specify output path and format\r\nopencc-purepy office -i sheet.xlsx -o result.xlsx -c s2tw --format xlsx\r\n```\r\n\r\n> \u2139\ufe0f With `office` subcommand, the input is processed as an Office or EPUB document and OpenCC conversion is applied internally.\r\n\r\n---\r\n\r\n## \ud83e\udde9 API Reference\r\n\r\n### `OpenCC` class\r\n\r\n- `OpenCC(config: str = \"s2t\")` \r\n Create a converter with the specified config.\r\n- `convert(input: str, punctuation: bool = False) -> str` \r\n Convert text with optional punctuation conversion.\r\n- `zho_check(input: str) -> int` \r\n Detect the code of the input text: \r\n 1 - Traditional, 2 - Simplified, 0 - Others\r\n\r\n---\r\n\r\n## \ud83d\udee0 Development\r\n\r\n- Python bindings: [`opencc_purepy/__init__.py`](https://github.com/laisuk/opencc_purepy/blob/master/opencc_purepy/__init__.py), [`opencc_purepy/opencc_purepy.pyi`](https://github.com/laisuk/opencc_purepy/blob/master/opencc_purepy/opencc_purepy.pyi)\r\n- CLI: [`opencc_purepy/__main__.py`](https://github.com/laisuk/opencc_purepy/blob/master/opencc_purepy/__main__.py)\r\n\r\n---\r\n\r\n## \u26a1 Benchmark\r\n\r\n> Measured on a local machine using the default \"s2t\" configuration. \r\n> Each test averaged over 20 runs with preloaded dictionaries.\r\n\r\n| Input Size | Avg. Time (ms) |\r\n|-------------------|---------------:|\r\n| **100 chars** | 0.15 ms |\r\n| **1,000 chars** | 0.93 ms |\r\n| **10,000 chars** | 8.76 ms |\r\n| **100,000 chars** | 86.05 ms |\r\n\r\n*Timings exclude initialization; focus is on pure conversion speed.*\r\n\r\n---\r\n\r\n## \ud83d\udcc4 License\r\n\r\nThis project is licensed under the [MIT License](https://github.com/laisuk/opencc_purepy/blob/master/LICENSE).\r\n\r\n---\r\n\r\nPowered by **Pure Python** and **OpenCC** Lexicons.\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Pure Python implementation of OpenCC for Chinese text conversion",
"version": "1.1.0",
"project_urls": {
"ChangeLog": "https://github.com/laisuk/opencc_purepy/blob/master/CHANGELOG.md",
"Homepage": "https://github.com/laisuk/opencc_purepy",
"Issues": "https://github.com/laisuk/opencc_purepy/issues"
},
"split_keywords": [
"opencc",
" chinese",
" text",
" conversion",
" pure-python"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "602645a2783fffa5c914f1da5f4f7e8a550fb0370e782b3d796a219789688156",
"md5": "b2d9a0f552f9dec49b3afbb348ce8aa3",
"sha256": "4ecbcccd0c4a79c802eb9ef9b2ceee3dd5ccbd881bbed6aa6d634990f06a7c58"
},
"downloads": -1,
"filename": "opencc_purepy-1.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b2d9a0f552f9dec49b3afbb348ce8aa3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 1000052,
"upload_time": "2025-08-13T22:24:57",
"upload_time_iso_8601": "2025-08-13T22:24:57.359889Z",
"url": "https://files.pythonhosted.org/packages/60/26/45a2783fffa5c914f1da5f4f7e8a550fb0370e782b3d796a219789688156/opencc_purepy-1.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "8499147a5c95b8ed2c4161a6d2bf78f4406fd1a46c6b44407983d76bad099822",
"md5": "0807bbdad84469cea71acdadfc05e29e",
"sha256": "c0e59480b2fa5936986b423e38264831587be9af4a86d6c68553cc45f93d8915"
},
"downloads": -1,
"filename": "opencc_purepy-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "0807bbdad84469cea71acdadfc05e29e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 1000211,
"upload_time": "2025-08-13T22:25:00",
"upload_time_iso_8601": "2025-08-13T22:25:00.007097Z",
"url": "https://files.pythonhosted.org/packages/84/99/147a5c95b8ed2c4161a6d2bf78f4406fd1a46c6b44407983d76bad099822/opencc_purepy-1.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-13 22:25:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "laisuk",
"github_project": "opencc_purepy",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "opencc-purepy"
}