opencc-purepy


Nameopencc-purepy JSON
Version 1.1.0 PyPI version JSON
download
home_pageNone
SummaryPure Python implementation of OpenCC for Chinese text conversion
upload_time2025-08-13 22:25:00
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseMIT
keywords opencc chinese text conversion pure-python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # opencc_purepy

[![PyPI version](https://img.shields.io/pypi/v/opencc-purepy)](https://pypi.org/project/opencc-purepy/)
[![License](https://img.shields.io/github/license/laisuk/opencc_pyo3)](https://github.com/laisuk/opencc_pyo3/blob/main/LICENSE)
[![Downloads](https://static.pepy.tech/personalized-badge/opencc-purepy?period=month&units=international_system&left_color=black&right_color=orange&left_text=Downloads)](https://pepy.tech/project/opencc-purepy)
[![Build & Release](https://github.com/laisuk/opencc_purepy/actions/workflows/release.yml/badge.svg)](https://github.com/laisuk/opencc_purepy/actions/workflows/release.yml)

**opencc_purepy** is a **pure Python** implementation of [OpenCC (Open Chinese Convert)](https://github.com/BYVoid/OpenCC), supporting conversion between Simplified, Traditional, Hong Kong, Taiwan, and Japanese Kanji.  
It uses dictionary-based segmentation and mapping logic inspired by the original OpenCC.

---

## 🚩 Features

- **Pure Python** – no native dependencies
- **Multiple Chinese locale conversions** (Simplified, Traditional, HK, TW, JP)
- **Punctuation style conversion** (optional)
- **Automatic code detection** (Simplified/Traditional)
- **CLI** with Office document support (`.docx`, `.xlsx`, `.pptx`, `.odt`, `.ods`, `.odp`, `.epub`)

> 🐍 `opencc_purepy` core library is compatible with **Python 2.7+** if used as an import module.  
The CLI tool `opencc-purepy` requires **Python 3.7 or later** (due to `f-string` usage).

---

## πŸ” Supported Conversion Configs

| Code    | Description                                    |
|---------|------------------------------------------------|
| `s2t`   | Simplified β†’ Traditional                       |
| `t2s`   | Traditional β†’ Simplified                       |
| `s2tw`  | Simplified β†’ Traditional (Taiwan)              |
| `tw2s`  | Traditional (Taiwan) β†’ Simplified              |
| `s2twp` | Simplified β†’ Traditional (Taiwan) with idioms  |
| `tw2sp` | Traditional (Taiwan) β†’ Simplified with idioms  |
| `s2hk`  | Simplified β†’ Traditional (Hong Kong)           |
| `hk2s`  | Traditional (Hong Kong) β†’ Simplified           |
| `t2tw`  | Traditional β†’ Traditional (Taiwan)             |
| `tw2t`  | Traditional (Taiwan) β†’ Traditional             |
| `t2twp` | Traditional β†’ Traditional (Taiwan) with idioms |
| `tw2tp` | Traditional (Taiwan) β†’ Traditional with idioms |
| `t2hk`  | Traditional β†’ Traditional (Hong Kong)          |
| `hk2t`  | Traditional (Hong Kong) β†’ Traditional          |
| `t2jp`  | Japanese Kyujitai β†’ Shinjitai                  |
| `jp2t`  | Japanese Shinjitai β†’ Kyujitai                  |

---

## πŸ“¦ Installation

```bash
pip install opencc-purepy
```

---

## πŸš€ Usage

### Python

```python
from opencc_purepy import OpenCC

text = "β€œζ˜₯ηœ δΈθ§‰ζ™“οΌŒε€„ε€„ι—»ε•ΌιΈŸγ€‚β€"
opencc = OpenCC("s2t")
converted = opencc.convert(text, punctuation=True)
print(converted)  # γ€Œζ˜₯ηœ δΈθ¦Ίζ›‰οΌŒθ™•θ™•θžε•Όι³₯。」
```

### CLI

#### Text File Conversion

```sh
python -m opencc_purepy convert -i input.txt -o output.txt -c s2t -p
# or, if installed as a script:
opencc-purepy convert -i input.txt -o output.txt -c s2t -p
```

#### Office Document Conversion subcommand (`office`)

Supports: `.docx`, `.xlsx`, `.pptx`, `.odt`, `.ods`, `.odp`, `.epub`

```sh
# Convert Word document with font preservation
opencc-purepy office -i example.docx -c t2s --keep-font

# Convert EPUB and auto-detect output name
opencc-purepy office -i book.epub -c s2t --auto-ext

# Convert Excel and specify output path and format
opencc-purepy office -i sheet.xlsx -o result.xlsx -c s2tw --format xlsx
```

> ℹ️ With `office` subcommand, the input is processed as an Office or EPUB document and OpenCC conversion is applied internally.

---

## 🧩 API Reference

### `OpenCC` class

- `OpenCC(config: str = "s2t")`  
  Create a converter with the specified config.
- `convert(input: str, punctuation: bool = False) -> str`  
  Convert text with optional punctuation conversion.
- `zho_check(input: str) -> int`  
  Detect the code of the input text:  
    1 - Traditional, 2 - Simplified, 0 - Others

---

## πŸ›  Development

- Python bindings: [`opencc_purepy/__init__.py`](https://github.com/laisuk/opencc_purepy/blob/master/opencc_purepy/__init__.py), [`opencc_purepy/opencc_purepy.pyi`](https://github.com/laisuk/opencc_purepy/blob/master/opencc_purepy/opencc_purepy.pyi)
- CLI: [`opencc_purepy/__main__.py`](https://github.com/laisuk/opencc_purepy/blob/master/opencc_purepy/__main__.py)

---

## ⚑ Benchmark

> Measured on a local machine using the default "s2t" configuration.  
> Each test averaged over 20 runs with preloaded dictionaries.

| Input Size        | Avg. Time (ms) |
|-------------------|---------------:|
| **100 chars**     |        0.15 ms |
| **1,000 chars**   |        0.93 ms |
| **10,000 chars**  |        8.76 ms |
| **100,000 chars** |       86.05 ms |

*Timings exclude initialization; focus is on pure conversion speed.*

---

## πŸ“„ License

This project is licensed under the [MIT License](https://github.com/laisuk/opencc_purepy/blob/master/LICENSE).

---

Powered by **Pure Python** and **OpenCC** Lexicons.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "opencc-purepy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "opencc, chinese, text, conversion, pure-python",
    "author": null,
    "author_email": "laisuk <laisuk@yahoo.com>",
    "download_url": "https://files.pythonhosted.org/packages/84/99/147a5c95b8ed2c4161a6d2bf78f4406fd1a46c6b44407983d76bad099822/opencc_purepy-1.1.0.tar.gz",
    "platform": null,
    "description": "# opencc_purepy\r\n\r\n[![PyPI version](https://img.shields.io/pypi/v/opencc-purepy)](https://pypi.org/project/opencc-purepy/)\r\n[![License](https://img.shields.io/github/license/laisuk/opencc_pyo3)](https://github.com/laisuk/opencc_pyo3/blob/main/LICENSE)\r\n[![Downloads](https://static.pepy.tech/personalized-badge/opencc-purepy?period=month&units=international_system&left_color=black&right_color=orange&left_text=Downloads)](https://pepy.tech/project/opencc-purepy)\r\n[![Build & Release](https://github.com/laisuk/opencc_purepy/actions/workflows/release.yml/badge.svg)](https://github.com/laisuk/opencc_purepy/actions/workflows/release.yml)\r\n\r\n**opencc_purepy** is a **pure Python** implementation of [OpenCC (Open Chinese Convert)](https://github.com/BYVoid/OpenCC), supporting conversion between Simplified, Traditional, Hong Kong, Taiwan, and Japanese Kanji.  \r\nIt uses dictionary-based segmentation and mapping logic inspired by the original OpenCC.\r\n\r\n---\r\n\r\n## \ud83d\udea9 Features\r\n\r\n- **Pure Python** \u2013 no native dependencies\r\n- **Multiple Chinese locale conversions** (Simplified, Traditional, HK, TW, JP)\r\n- **Punctuation style conversion** (optional)\r\n- **Automatic code detection** (Simplified/Traditional)\r\n- **CLI** with Office document support (`.docx`, `.xlsx`, `.pptx`, `.odt`, `.ods`, `.odp`, `.epub`)\r\n\r\n> \ud83d\udc0d `opencc_purepy` core library is compatible with **Python 2.7+** if used as an import module.  \r\nThe CLI tool `opencc-purepy` requires **Python 3.7 or later** (due to `f-string` usage).\r\n\r\n---\r\n\r\n## \ud83d\udd01 Supported Conversion Configs\r\n\r\n| Code    | Description                                    |\r\n|---------|------------------------------------------------|\r\n| `s2t`   | Simplified \u2192 Traditional                       |\r\n| `t2s`   | Traditional \u2192 Simplified                       |\r\n| `s2tw`  | Simplified \u2192 Traditional (Taiwan)              |\r\n| `tw2s`  | Traditional (Taiwan) \u2192 Simplified              |\r\n| `s2twp` | Simplified \u2192 Traditional (Taiwan) with idioms  |\r\n| `tw2sp` | Traditional (Taiwan) \u2192 Simplified with idioms  |\r\n| `s2hk`  | Simplified \u2192 Traditional (Hong Kong)           |\r\n| `hk2s`  | Traditional (Hong Kong) \u2192 Simplified           |\r\n| `t2tw`  | Traditional \u2192 Traditional (Taiwan)             |\r\n| `tw2t`  | Traditional (Taiwan) \u2192 Traditional             |\r\n| `t2twp` | Traditional \u2192 Traditional (Taiwan) with idioms |\r\n| `tw2tp` | Traditional (Taiwan) \u2192 Traditional with idioms |\r\n| `t2hk`  | Traditional \u2192 Traditional (Hong Kong)          |\r\n| `hk2t`  | Traditional (Hong Kong) \u2192 Traditional          |\r\n| `t2jp`  | Japanese Kyujitai \u2192 Shinjitai                  |\r\n| `jp2t`  | Japanese Shinjitai \u2192 Kyujitai                  |\r\n\r\n---\r\n\r\n## \ud83d\udce6 Installation\r\n\r\n```bash\r\npip install opencc-purepy\r\n```\r\n\r\n---\r\n\r\n## \ud83d\ude80 Usage\r\n\r\n### Python\r\n\r\n```python\r\nfrom opencc_purepy import OpenCC\r\n\r\ntext = \"\u201c\u6625\u7720\u4e0d\u89c9\u6653\uff0c\u5904\u5904\u95fb\u557c\u9e1f\u3002\u201d\"\r\nopencc = OpenCC(\"s2t\")\r\nconverted = opencc.convert(text, punctuation=True)\r\nprint(converted)  # \u300c\u6625\u7720\u4e0d\u89ba\u66c9\uff0c\u8655\u8655\u805e\u557c\u9ce5\u3002\u300d\r\n```\r\n\r\n### CLI\r\n\r\n#### Text File Conversion\r\n\r\n```sh\r\npython -m opencc_purepy convert -i input.txt -o output.txt -c s2t -p\r\n# or, if installed as a script:\r\nopencc-purepy convert -i input.txt -o output.txt -c s2t -p\r\n```\r\n\r\n#### Office Document Conversion subcommand (`office`)\r\n\r\nSupports: `.docx`, `.xlsx`, `.pptx`, `.odt`, `.ods`, `.odp`, `.epub`\r\n\r\n```sh\r\n# Convert Word document with font preservation\r\nopencc-purepy office -i example.docx -c t2s --keep-font\r\n\r\n# Convert EPUB and auto-detect output name\r\nopencc-purepy office -i book.epub -c s2t --auto-ext\r\n\r\n# Convert Excel and specify output path and format\r\nopencc-purepy office -i sheet.xlsx -o result.xlsx -c s2tw --format xlsx\r\n```\r\n\r\n> \u2139\ufe0f With `office` subcommand, the input is processed as an Office or EPUB document and OpenCC conversion is applied internally.\r\n\r\n---\r\n\r\n## \ud83e\udde9 API Reference\r\n\r\n### `OpenCC` class\r\n\r\n- `OpenCC(config: str = \"s2t\")`  \r\n  Create a converter with the specified config.\r\n- `convert(input: str, punctuation: bool = False) -> str`  \r\n  Convert text with optional punctuation conversion.\r\n- `zho_check(input: str) -> int`  \r\n  Detect the code of the input text:  \r\n  &nbsp;&nbsp;1 - Traditional, 2 - Simplified, 0 - Others\r\n\r\n---\r\n\r\n## \ud83d\udee0 Development\r\n\r\n- Python bindings: [`opencc_purepy/__init__.py`](https://github.com/laisuk/opencc_purepy/blob/master/opencc_purepy/__init__.py), [`opencc_purepy/opencc_purepy.pyi`](https://github.com/laisuk/opencc_purepy/blob/master/opencc_purepy/opencc_purepy.pyi)\r\n- CLI: [`opencc_purepy/__main__.py`](https://github.com/laisuk/opencc_purepy/blob/master/opencc_purepy/__main__.py)\r\n\r\n---\r\n\r\n## \u26a1 Benchmark\r\n\r\n> Measured on a local machine using the default \"s2t\" configuration.  \r\n> Each test averaged over 20 runs with preloaded dictionaries.\r\n\r\n| Input Size        | Avg. Time (ms) |\r\n|-------------------|---------------:|\r\n| **100 chars**     |        0.15 ms |\r\n| **1,000 chars**   |        0.93 ms |\r\n| **10,000 chars**  |        8.76 ms |\r\n| **100,000 chars** |       86.05 ms |\r\n\r\n*Timings exclude initialization; focus is on pure conversion speed.*\r\n\r\n---\r\n\r\n## \ud83d\udcc4 License\r\n\r\nThis project is licensed under the [MIT License](https://github.com/laisuk/opencc_purepy/blob/master/LICENSE).\r\n\r\n---\r\n\r\nPowered by **Pure Python** and **OpenCC** Lexicons.\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Pure Python implementation of OpenCC for Chinese text conversion",
    "version": "1.1.0",
    "project_urls": {
        "ChangeLog": "https://github.com/laisuk/opencc_purepy/blob/master/CHANGELOG.md",
        "Homepage": "https://github.com/laisuk/opencc_purepy",
        "Issues": "https://github.com/laisuk/opencc_purepy/issues"
    },
    "split_keywords": [
        "opencc",
        " chinese",
        " text",
        " conversion",
        " pure-python"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "602645a2783fffa5c914f1da5f4f7e8a550fb0370e782b3d796a219789688156",
                "md5": "b2d9a0f552f9dec49b3afbb348ce8aa3",
                "sha256": "4ecbcccd0c4a79c802eb9ef9b2ceee3dd5ccbd881bbed6aa6d634990f06a7c58"
            },
            "downloads": -1,
            "filename": "opencc_purepy-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b2d9a0f552f9dec49b3afbb348ce8aa3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 1000052,
            "upload_time": "2025-08-13T22:24:57",
            "upload_time_iso_8601": "2025-08-13T22:24:57.359889Z",
            "url": "https://files.pythonhosted.org/packages/60/26/45a2783fffa5c914f1da5f4f7e8a550fb0370e782b3d796a219789688156/opencc_purepy-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8499147a5c95b8ed2c4161a6d2bf78f4406fd1a46c6b44407983d76bad099822",
                "md5": "0807bbdad84469cea71acdadfc05e29e",
                "sha256": "c0e59480b2fa5936986b423e38264831587be9af4a86d6c68553cc45f93d8915"
            },
            "downloads": -1,
            "filename": "opencc_purepy-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "0807bbdad84469cea71acdadfc05e29e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 1000211,
            "upload_time": "2025-08-13T22:25:00",
            "upload_time_iso_8601": "2025-08-13T22:25:00.007097Z",
            "url": "https://files.pythonhosted.org/packages/84/99/147a5c95b8ed2c4161a6d2bf78f4406fd1a46c6b44407983d76bad099822/opencc_purepy-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-13 22:25:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "laisuk",
    "github_project": "opencc_purepy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "opencc-purepy"
}
        
Elapsed time: 0.62301s