vietnamese-poem-classifier


Namevietnamese-poem-classifier JSON
Version 0.1.4 PyPI version JSON
download
home_pagehttps://github.com/Anshler/vietnamese-poem-classifier
SummaryClassify genre and score Vietnamese poems
upload_time2023-12-30 13:13:47
maintainer
docs_urlNone
authorHuynh Minh Triet
requires_python
licenseMIT
keywords poem
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Vietnamese poem classification and evaluation 📜🔍

A Vietnamese poem classifer using [BertForSequenceClassification](https://huggingface.co/trituenhantaoio/bert-base-vietnamese-uncased) with the accuracy of ```99.7%```

This is a side project during the making of our [Vietnamese poem generator](https://github.com/Anshler/poem_generator)

## Features

* Classify Vietnamese poem into categories of ```4 chu```, ```5 chu```, ```7 chu```, ```luc bat``` and ```8 chu```
* Score the quality of each poem, based soldly on its conformation to the rigid rule of various types of Vietnamese poem. Using 3 criterias: Length, Tone and Rhyme as follow: ```score = L/10 + 3T/10 + 6R/10```

The rules for each genre are defined below:

| Genre | Length | Tone | Rhyme |
|------------------|------------------|--------------|------------------------|
| 4 chu    | - 4 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 2nd word is uneven (trắc), the 4th word is even (bằng) <br>- Vice versa | Last word (4th) of each line: <br>- Continuous rhyme (gieo vần tiếp) <br>- Alternating rhyme (gieo vần tréo) <br>- Three-line rhyme (gieo vần ba)|
| 5 chu    | - 5 words per line <br>- 4 lines per stanza (optional)  | Same as "4 chu" | Same as "4 chu" |
| 7 chu    | - 7 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 2nd word is uneven (trắc), the 4th word is even (bằng), the 6th word is uneven (trắc) <br> - 5th word and last word (7th) must have different tone | The last word of 1st, 2nd, 4th line per stanza must have same tone and rhyme |
| luc bat    | - 6 words in odd line <br>- 8 words in even line <br>- 4 lines per stanza (optional) | For 6-word line: <br>- If the 2nd word is uneven (trắc) the 4th word is even (bằng), the 6th word is uneven (trắc) <br><br> For 8-word line: <br>- Must be same as previous 6-word line <br>- The last word (8th) mut have same tone as 6th word | The last word (6th) in 6-word line must rhyme with the 6th word in the next 8-word line and the 8th word in the previous 8-word line |
| 8 chu    | - 8 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 3rd word is uneven (trắc), the 5th word is even (bằng), the 8th word is uneven (trắc)| Same as "4 chu" |




## Data

A collection of 171188 Vietnamese poems with different genres: luc-bat, 5-chu, 7-chu, 8-chu, 4-chu. Download [here](https://github.com/fsoft-ailab/Poem-Generator/raw/master/dataset/poems_dataset.zip)

For more detail, refer to the _Acknowledgments_ section

## Training

Training code is in our repo [Vietnamese poem generator](https://github.com/Anshler/poem_generator)

Run:
```
python poem_classifier_training.py
```

## Installation

```
pip install vietnamese-poem-classifier
```
Or

```
pip install git+https://github.com/Anshler/vietnamese-poem-classifier
```

## Inference

```python
from vietnamese_poem_classifier.poem_classifier import PoemClassifier

classifier = PoemClassifier()

poem = '''Người đi theo gió đuổi mây
          Tôi buồn nhặt nhạnh tháng ngày lãng quên
          Em theo hú bóng kim tiền
          Bần thần tôi ngẫm triền miên thói đời.'''

classifier.predict(poem)

#>> [{'label': 'luc bat', 'confidence': 0.9999017715454102, 'poem_score': 0.75, 'l_score': 1.0, 't_score': 1.0, 'r_score': 0.5833333333333333}]
```

## Model

The model's weights are published at Huggingface [Anshler/vietnamese-poem-classifier](https://huggingface.co/Anshler/vietnamese-poem-classifier) 

## Acknowledgments

_This project was inspired by the evaluation method from ```fsoft-ailab```'s_ [SP-GPT2 Poem-Generator](https://github.com/fsoft-ailab/Poem-Generator)

_Dataset also taken from their repo_

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Anshler/vietnamese-poem-classifier",
    "name": "vietnamese-poem-classifier",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "poem",
    "author": "Huynh Minh Triet",
    "author_email": "huynhminhtriet2002@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a5/d0/459bb8d29c0ece80165be971cd8c36e5d4df26154d3d6e0a8c478432dddf/vietnamese-poem-classifier-0.1.4.tar.gz",
    "platform": null,
    "description": "# Vietnamese poem classification and evaluation \ud83d\udcdc\ud83d\udd0d\n\nA Vietnamese poem classifer using [BertForSequenceClassification](https://huggingface.co/trituenhantaoio/bert-base-vietnamese-uncased) with the accuracy of ```99.7%```\n\nThis is a side project during the making of our [Vietnamese poem generator](https://github.com/Anshler/poem_generator)\n\n## Features\n\n* Classify Vietnamese poem into categories of ```4 chu```, ```5 chu```, ```7 chu```, ```luc bat``` and ```8 chu```\n* Score the quality of each poem, based soldly on its conformation to the rigid rule of various types of Vietnamese poem. Using 3 criterias: Length, Tone and Rhyme as follow: ```score = L/10 + 3T/10 + 6R/10```\n\nThe rules for each genre are defined below:\n\n| Genre | Length | Tone | Rhyme |\n|------------------|------------------|--------------|------------------------|\n| 4 chu    | - 4 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 2nd word is uneven (tr\u1eafc), the 4th word is even (b\u1eb1ng) <br>- Vice versa | Last word (4th) of each line: <br>- Continuous rhyme (gieo v\u1ea7n ti\u1ebfp) <br>- Alternating rhyme (gieo v\u1ea7n tr\u00e9o) <br>- Three-line rhyme (gieo v\u1ea7n ba)|\n| 5 chu    | - 5 words per line <br>- 4 lines per stanza (optional)  | Same as \"4 chu\" | Same as \"4 chu\" |\n| 7 chu    | - 7 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 2nd word is uneven (tr\u1eafc), the 4th word is even (b\u1eb1ng), the 6th word is uneven (tr\u1eafc) <br> - 5th word and last word (7th) must have different tone | The last word of 1st, 2nd, 4th line per stanza must have same tone and rhyme |\n| luc bat    | - 6 words in odd line <br>- 8 words in even line <br>- 4 lines per stanza (optional) | For 6-word line: <br>- If the 2nd word is uneven (tr\u1eafc) the 4th word is even (b\u1eb1ng), the 6th word is uneven (tr\u1eafc) <br><br> For 8-word line: <br>- Must be same as previous 6-word line <br>- The last word (8th) mut have same tone as 6th word | The last word (6th) in 6-word line must rhyme with the 6th word in the next 8-word line and the 8th word in the previous 8-word line |\n| 8 chu    | - 8 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 3rd word is uneven (tr\u1eafc), the 5th word is even (b\u1eb1ng), the 8th word is uneven (tr\u1eafc)| Same as \"4 chu\" |\n\n\n\n\n## Data\n\nA collection of 171188 Vietnamese poems with different genres: luc-bat, 5-chu, 7-chu, 8-chu, 4-chu. Download [here](https://github.com/fsoft-ailab/Poem-Generator/raw/master/dataset/poems_dataset.zip)\n\nFor more detail, refer to the _Acknowledgments_ section\n\n## Training\n\nTraining code is in our repo [Vietnamese poem generator](https://github.com/Anshler/poem_generator)\n\nRun:\n```\npython poem_classifier_training.py\n```\n\n## Installation\n\n```\npip install vietnamese-poem-classifier\n```\nOr\n\n```\npip install git+https://github.com/Anshler/vietnamese-poem-classifier\n```\n\n## Inference\n\n```python\nfrom vietnamese_poem_classifier.poem_classifier import PoemClassifier\n\nclassifier = PoemClassifier()\n\npoem = '''Ng\u01b0\u1eddi \u0111i theo gi\u00f3 \u0111u\u1ed5i m\u00e2y\n          T\u00f4i bu\u1ed3n nh\u1eb7t nh\u1ea1nh th\u00e1ng ng\u00e0y l\u00e3ng qu\u00ean\n          Em theo h\u00fa b\u00f3ng kim ti\u1ec1n\n          B\u1ea7n th\u1ea7n t\u00f4i ng\u1eabm tri\u1ec1n mi\u00ean th\u00f3i \u0111\u1eddi.'''\n\nclassifier.predict(poem)\n\n#>> [{'label': 'luc bat', 'confidence': 0.9999017715454102, 'poem_score': 0.75, 'l_score': 1.0, 't_score': 1.0, 'r_score': 0.5833333333333333}]\n```\n\n## Model\n\nThe model's weights are published at Huggingface [Anshler/vietnamese-poem-classifier](https://huggingface.co/Anshler/vietnamese-poem-classifier) \n\n## Acknowledgments\n\n_This project was inspired by the evaluation method from ```fsoft-ailab```'s_ [SP-GPT2 Poem-Generator](https://github.com/fsoft-ailab/Poem-Generator)\n\n_Dataset also taken from their repo_\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Classify genre and score Vietnamese poems",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/Anshler/vietnamese-poem-classifier"
    },
    "split_keywords": [
        "poem"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "829ec5e411a9a2390f0ee5e9e3a09e61d4305a144ce0910c6e0bcbce2f997ea4",
                "md5": "5fa095506099d214665c689f155b0edb",
                "sha256": "6aa95720762743fcb57c2d8146cfa0268ce9d799ca2038a7035a84b394d9596b"
            },
            "downloads": -1,
            "filename": "vietnamese_poem_classifier-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5fa095506099d214665c689f155b0edb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 15460,
            "upload_time": "2023-12-30T13:13:45",
            "upload_time_iso_8601": "2023-12-30T13:13:45.198845Z",
            "url": "https://files.pythonhosted.org/packages/82/9e/c5e411a9a2390f0ee5e9e3a09e61d4305a144ce0910c6e0bcbce2f997ea4/vietnamese_poem_classifier-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a5d0459bb8d29c0ece80165be971cd8c36e5d4df26154d3d6e0a8c478432dddf",
                "md5": "0ccea0dab0cd93e0db305ccc19e27d90",
                "sha256": "d96193ca33a50429a86bfbdccd1d02e214209383888edbd1a70d07de8554b223"
            },
            "downloads": -1,
            "filename": "vietnamese-poem-classifier-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "0ccea0dab0cd93e0db305ccc19e27d90",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 16263,
            "upload_time": "2023-12-30T13:13:47",
            "upload_time_iso_8601": "2023-12-30T13:13:47.085114Z",
            "url": "https://files.pythonhosted.org/packages/a5/d0/459bb8d29c0ece80165be971cd8c36e5d4df26154d3d6e0a8c478432dddf/vietnamese-poem-classifier-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-30 13:13:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Anshler",
    "github_project": "vietnamese-poem-classifier",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "vietnamese-poem-classifier"
}
        
Elapsed time: 0.15986s