pingze-classifier


Namepingze-classifier JSON
Version 0.11 PyPI version JSON
download
home_pagehttps://github.com/rbnyng/pingze_classifier
SummaryA Python package for classifying Chinese characters based on the Pingshui rhyme scheme
upload_time2024-09-04 16:19:51
maintainerNone
docs_urlNone
authorrbnyng
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Ping-ze Classifier

**Ping-ze Classifier** is a Python package designed to classify Chinese characters as 'ping' (平) or 'ze' (仄) tonal patterns based on the [Pingshui Rhyme](https://zh.wikisource.org/wiki/%E5%B9%B3%E6%B0%B4%E9%9F%BB) (平水韻) rhyme scheme. You can read more about ping-ze, or tonal patterns, [here.](https://en.wikipedia.org/wiki/Tone_pattern) The package includes a pre-scraped JSON file that contains the complete data from the rhyme dictionary, and also provides a scraper to update the data if necessary.

## Features

- **Classify Chinese characters**: Easily classify characters as 'ping' or 'ze' based on the Pingshui Rhyme scheme.
- **Pre-packaged data**: Includes a pre-scraped JSON file with the complete rhyme data.
- **Scraper**: An optional scraper is included to regenerate the JSON file when the source data changes.

## Installation

You can install the package by running:

```bash
pip install pingze_classifier
```

## Usage

### Classifying Characters

Once installed, you can use the PingZeClassifier to classify Chinese characters based on the Pingshui Rhyme (平水韻) scheme:

```python
from pingze_classifier import PingZeClassifier

# Initialize the classifier (uses pre-packaged JSON by default)
classifier = PingZeClassifier()

# Classify a sentence
sentence = "知否?知否?應是綠肥紅瘦。"
result = classifier.classify(sentence)
print(result)
# Output: ['ping', 'ze', 'unknown', 'ping', 'ze', 'unknown', 'ping', 'ze', 'ze', 'ping', 'ping', 'ze', 'unknown']
```

### Scraping and Regenerating the JSON Data

The package includes a pre-generated JSON file, but if the Pingshui Rhyme source from wikisource changes you can run the scraper.

#### Running the Scraper

By default, the scraper won't run if the JSON file already exists. To force a refresh and regenerate the JSON data, run the following command:

```bash
scrape-pingze --force-refresh
```

This will scrape the latest data from the source and regenerate the `organized_ping_ze_rhyme_dict.json` file.

## Dependencies

    `requests`: For web scraping the Pingshui Rhyme data.
    `beautifulsoup4`: For parsing the HTML content from the Pingshui Rhyme source page.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/rbnyng/pingze_classifier",
    "name": "pingze-classifier",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "rbnyng",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/b5/1c/0af67219cf9260c5ccc25e7f1b943926c2d3cf78df6dfc1e5a3f5a899931/pingze_classifier-0.11.tar.gz",
    "platform": null,
    "description": "# Ping-ze Classifier\r\n\r\n**Ping-ze Classifier** is a Python package designed to classify Chinese characters as 'ping' (\u5e73) or 'ze' (\u4ec4) tonal patterns based on the [Pingshui Rhyme](https://zh.wikisource.org/wiki/%E5%B9%B3%E6%B0%B4%E9%9F%BB) (\u5e73\u6c34\u97fb) rhyme scheme. You can read more about ping-ze, or tonal patterns, [here.](https://en.wikipedia.org/wiki/Tone_pattern) The package includes a pre-scraped JSON file that contains the complete data from the rhyme dictionary, and also provides a scraper to update the data if necessary.\r\n\r\n## Features\r\n\r\n- **Classify Chinese characters**: Easily classify characters as 'ping' or 'ze' based on the Pingshui Rhyme scheme.\r\n- **Pre-packaged data**: Includes a pre-scraped JSON file with the complete rhyme data.\r\n- **Scraper**: An optional scraper is included to regenerate the JSON file when the source data changes.\r\n\r\n## Installation\r\n\r\nYou can install the package by running:\r\n\r\n```bash\r\npip install pingze_classifier\r\n```\r\n\r\n## Usage\r\n\r\n### Classifying Characters\r\n\r\nOnce installed, you can use the PingZeClassifier to classify Chinese characters based on the Pingshui Rhyme (\u5e73\u6c34\u97fb) scheme:\r\n\r\n```python\r\nfrom pingze_classifier import PingZeClassifier\r\n\r\n# Initialize the classifier (uses pre-packaged JSON by default)\r\nclassifier = PingZeClassifier()\r\n\r\n# Classify a sentence\r\nsentence = \"\u77e5\u5426\uff1f\u77e5\u5426\uff1f\u61c9\u662f\u7da0\u80a5\u7d05\u7626\u3002\"\r\nresult = classifier.classify(sentence)\r\nprint(result)\r\n# Output: ['ping', 'ze', 'unknown', 'ping', 'ze', 'unknown', 'ping', 'ze', 'ze', 'ping', 'ping', 'ze', 'unknown']\r\n```\r\n\r\n### Scraping and Regenerating the JSON Data\r\n\r\nThe package includes a pre-generated JSON file, but if the Pingshui Rhyme source from wikisource changes you can run the scraper.\r\n\r\n#### Running the Scraper\r\n\r\nBy default, the scraper won't run if the JSON file already exists. To force a refresh and regenerate the JSON data, run the following command:\r\n\r\n```bash\r\nscrape-pingze --force-refresh\r\n```\r\n\r\nThis will scrape the latest data from the source and regenerate the `organized_ping_ze_rhyme_dict.json` file.\r\n\r\n## Dependencies\r\n\r\n    `requests`: For web scraping the Pingshui Rhyme data.\r\n    `beautifulsoup4`: For parsing the HTML content from the Pingshui Rhyme source page.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python package for classifying Chinese characters based on the Pingshui rhyme scheme",
    "version": "0.11",
    "project_urls": {
        "Homepage": "https://github.com/rbnyng/pingze_classifier"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5efd7f860bc5bdb5911187fa377095197dcc706acf30e0a126a4684c59be454f",
                "md5": "dd2a40ad9635203462fc6234789acd31",
                "sha256": "c2977721560dc8791ffb1a5b7c9379dcb93fa9f1909d6ceaf6bc6dbaeeac43aa"
            },
            "downloads": -1,
            "filename": "pingze_classifier-0.11-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dd2a40ad9635203462fc6234789acd31",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 27533,
            "upload_time": "2024-09-04T16:19:50",
            "upload_time_iso_8601": "2024-09-04T16:19:50.013639Z",
            "url": "https://files.pythonhosted.org/packages/5e/fd/7f860bc5bdb5911187fa377095197dcc706acf30e0a126a4684c59be454f/pingze_classifier-0.11-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b51c0af67219cf9260c5ccc25e7f1b943926c2d3cf78df6dfc1e5a3f5a899931",
                "md5": "9e31e35d0dc84cbfb366958e351e8d7d",
                "sha256": "69c9caf35faad0d3238ae66af8aa7e14f4686f7cd616865f4b4535c01c6c5542"
            },
            "downloads": -1,
            "filename": "pingze_classifier-0.11.tar.gz",
            "has_sig": false,
            "md5_digest": "9e31e35d0dc84cbfb366958e351e8d7d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 29581,
            "upload_time": "2024-09-04T16:19:51",
            "upload_time_iso_8601": "2024-09-04T16:19:51.905783Z",
            "url": "https://files.pythonhosted.org/packages/b5/1c/0af67219cf9260c5ccc25e7f1b943926c2d3cf78df6dfc1e5a3f5a899931/pingze_classifier-0.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-04 16:19:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rbnyng",
    "github_project": "pingze_classifier",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pingze-classifier"
}
        
Elapsed time: 0.31529s