# Vietnamese poem classification and evaluation 📜🔍
A Vietnamese poem classifer using [BertForSequenceClassification](https://huggingface.co/trituenhantaoio/bert-base-vietnamese-uncased) with the accuracy of ```99.7%```
This is a side project during the making of our [Vietnamese poem generator](https://github.com/Anshler/poem_generator)
## Features
* Classify Vietnamese poem into categories of ```4 chu```, ```5 chu```, ```7 chu```, ```luc bat``` and ```8 chu```
* Score the quality of each poem, based soldly on its conformation to the rigid rule of various types of Vietnamese poem. Using 3 criterias: Length, Tone and Rhyme as follow: ```score = L/10 + 3T/10 + 6R/10```
The rules for each genre are defined below:
| Genre | Length | Tone | Rhyme |
|------------------|------------------|--------------|------------------------|
| 4 chu | - 4 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 2nd word is uneven (trắc), the 4th word is even (bằng) <br>- Vice versa | Last word (4th) of each line: <br>- Continuous rhyme (gieo vần tiếp) <br>- Alternating rhyme (gieo vần tréo) <br>- Three-line rhyme (gieo vần ba)|
| 5 chu | - 5 words per line <br>- 4 lines per stanza (optional) | Same as "4 chu" | Same as "4 chu" |
| 7 chu | - 7 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 2nd word is uneven (trắc), the 4th word is even (bằng), the 6th word is uneven (trắc) <br> - 5th word and last word (7th) must have different tone | The last word of 1st, 2nd, 4th line per stanza must have same tone and rhyme |
| luc bat | - 6 words in odd line <br>- 8 words in even line <br>- 4 lines per stanza (optional) | For 6-word line: <br>- If the 2nd word is uneven (trắc) the 4th word is even (bằng), the 6th word is uneven (trắc) <br><br> For 8-word line: <br>- Must be same as previous 6-word line <br>- The last word (8th) mut have same tone as 6th word | The last word (6th) in 6-word line must rhyme with the 6th word in the next 8-word line and the 8th word in the previous 8-word line |
| 8 chu | - 8 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 3rd word is uneven (trắc), the 5th word is even (bằng), the 8th word is uneven (trắc)| Same as "4 chu" |
## Data
A collection of 171188 Vietnamese poems with different genres: luc-bat, 5-chu, 7-chu, 8-chu, 4-chu. Download [here](https://github.com/fsoft-ailab/Poem-Generator/raw/master/dataset/poems_dataset.zip)
For more detail, refer to the _Acknowledgments_ section
## Training
Training code is in our repo [Vietnamese poem generator](https://github.com/Anshler/poem_generator)
Run:
```
python poem_classifier_training.py
```
## Installation
```
pip install vietnamese-poem-classifier
```
Or
```
pip install git+https://github.com/Anshler/vietnamese-poem-classifier
```
## Inference
```python
from vietnamese_poem_classifier.poem_classifier import PoemClassifier
classifier = PoemClassifier()
poem = '''Người đi theo gió đuổi mây
Tôi buồn nhặt nhạnh tháng ngày lãng quên
Em theo hú bóng kim tiền
Bần thần tôi ngẫm triền miên thói đời.'''
classifier.predict(poem)
#>> [{'label': 'luc bat', 'confidence': 0.9999017715454102, 'poem_score': 0.75, 'l_score': 1.0, 't_score': 1.0, 'r_score': 0.5833333333333333}]
```
## Model
The model's weights are published at Huggingface [Anshler/vietnamese-poem-classifier](https://huggingface.co/Anshler/vietnamese-poem-classifier)
## Acknowledgments
_This project was inspired by the evaluation method from ```fsoft-ailab```'s_ [SP-GPT2 Poem-Generator](https://github.com/fsoft-ailab/Poem-Generator)
_Dataset also taken from their repo_
Raw data
{
"_id": null,
"home_page": "https://github.com/Anshler/vietnamese-poem-classifier",
"name": "vietnamese-poem-classifier",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "poem",
"author": "Huynh Minh Triet",
"author_email": "huynhminhtriet2002@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/a5/d0/459bb8d29c0ece80165be971cd8c36e5d4df26154d3d6e0a8c478432dddf/vietnamese-poem-classifier-0.1.4.tar.gz",
"platform": null,
"description": "# Vietnamese poem classification and evaluation \ud83d\udcdc\ud83d\udd0d\n\nA Vietnamese poem classifer using [BertForSequenceClassification](https://huggingface.co/trituenhantaoio/bert-base-vietnamese-uncased) with the accuracy of ```99.7%```\n\nThis is a side project during the making of our [Vietnamese poem generator](https://github.com/Anshler/poem_generator)\n\n## Features\n\n* Classify Vietnamese poem into categories of ```4 chu```, ```5 chu```, ```7 chu```, ```luc bat``` and ```8 chu```\n* Score the quality of each poem, based soldly on its conformation to the rigid rule of various types of Vietnamese poem. Using 3 criterias: Length, Tone and Rhyme as follow: ```score = L/10 + 3T/10 + 6R/10```\n\nThe rules for each genre are defined below:\n\n| Genre | Length | Tone | Rhyme |\n|------------------|------------------|--------------|------------------------|\n| 4 chu | - 4 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 2nd word is uneven (tr\u1eafc), the 4th word is even (b\u1eb1ng) <br>- Vice versa | Last word (4th) of each line: <br>- Continuous rhyme (gieo v\u1ea7n ti\u1ebfp) <br>- Alternating rhyme (gieo v\u1ea7n tr\u00e9o) <br>- Three-line rhyme (gieo v\u1ea7n ba)|\n| 5 chu | - 5 words per line <br>- 4 lines per stanza (optional) | Same as \"4 chu\" | Same as \"4 chu\" |\n| 7 chu | - 7 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 2nd word is uneven (tr\u1eafc), the 4th word is even (b\u1eb1ng), the 6th word is uneven (tr\u1eafc) <br> - 5th word and last word (7th) must have different tone | The last word of 1st, 2nd, 4th line per stanza must have same tone and rhyme |\n| luc bat | - 6 words in odd line <br>- 8 words in even line <br>- 4 lines per stanza (optional) | For 6-word line: <br>- If the 2nd word is uneven (tr\u1eafc) the 4th word is even (b\u1eb1ng), the 6th word is uneven (tr\u1eafc) <br><br> For 8-word line: <br>- Must be same as previous 6-word line <br>- The last word (8th) mut have same tone as 6th word | The last word (6th) in 6-word line must rhyme with the 6th word in the next 8-word line and the 8th word in the previous 8-word line |\n| 8 chu | - 8 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 3rd word is uneven (tr\u1eafc), the 5th word is even (b\u1eb1ng), the 8th word is uneven (tr\u1eafc)| Same as \"4 chu\" |\n\n\n\n\n## Data\n\nA collection of 171188 Vietnamese poems with different genres: luc-bat, 5-chu, 7-chu, 8-chu, 4-chu. Download [here](https://github.com/fsoft-ailab/Poem-Generator/raw/master/dataset/poems_dataset.zip)\n\nFor more detail, refer to the _Acknowledgments_ section\n\n## Training\n\nTraining code is in our repo [Vietnamese poem generator](https://github.com/Anshler/poem_generator)\n\nRun:\n```\npython poem_classifier_training.py\n```\n\n## Installation\n\n```\npip install vietnamese-poem-classifier\n```\nOr\n\n```\npip install git+https://github.com/Anshler/vietnamese-poem-classifier\n```\n\n## Inference\n\n```python\nfrom vietnamese_poem_classifier.poem_classifier import PoemClassifier\n\nclassifier = PoemClassifier()\n\npoem = '''Ng\u01b0\u1eddi \u0111i theo gi\u00f3 \u0111u\u1ed5i m\u00e2y\n T\u00f4i bu\u1ed3n nh\u1eb7t nh\u1ea1nh th\u00e1ng ng\u00e0y l\u00e3ng qu\u00ean\n Em theo h\u00fa b\u00f3ng kim ti\u1ec1n\n B\u1ea7n th\u1ea7n t\u00f4i ng\u1eabm tri\u1ec1n mi\u00ean th\u00f3i \u0111\u1eddi.'''\n\nclassifier.predict(poem)\n\n#>> [{'label': 'luc bat', 'confidence': 0.9999017715454102, 'poem_score': 0.75, 'l_score': 1.0, 't_score': 1.0, 'r_score': 0.5833333333333333}]\n```\n\n## Model\n\nThe model's weights are published at Huggingface [Anshler/vietnamese-poem-classifier](https://huggingface.co/Anshler/vietnamese-poem-classifier) \n\n## Acknowledgments\n\n_This project was inspired by the evaluation method from ```fsoft-ailab```'s_ [SP-GPT2 Poem-Generator](https://github.com/fsoft-ailab/Poem-Generator)\n\n_Dataset also taken from their repo_\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Classify genre and score Vietnamese poems",
"version": "0.1.4",
"project_urls": {
"Homepage": "https://github.com/Anshler/vietnamese-poem-classifier"
},
"split_keywords": [
"poem"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "829ec5e411a9a2390f0ee5e9e3a09e61d4305a144ce0910c6e0bcbce2f997ea4",
"md5": "5fa095506099d214665c689f155b0edb",
"sha256": "6aa95720762743fcb57c2d8146cfa0268ce9d799ca2038a7035a84b394d9596b"
},
"downloads": -1,
"filename": "vietnamese_poem_classifier-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5fa095506099d214665c689f155b0edb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 15460,
"upload_time": "2023-12-30T13:13:45",
"upload_time_iso_8601": "2023-12-30T13:13:45.198845Z",
"url": "https://files.pythonhosted.org/packages/82/9e/c5e411a9a2390f0ee5e9e3a09e61d4305a144ce0910c6e0bcbce2f997ea4/vietnamese_poem_classifier-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a5d0459bb8d29c0ece80165be971cd8c36e5d4df26154d3d6e0a8c478432dddf",
"md5": "0ccea0dab0cd93e0db305ccc19e27d90",
"sha256": "d96193ca33a50429a86bfbdccd1d02e214209383888edbd1a70d07de8554b223"
},
"downloads": -1,
"filename": "vietnamese-poem-classifier-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "0ccea0dab0cd93e0db305ccc19e27d90",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 16263,
"upload_time": "2023-12-30T13:13:47",
"upload_time_iso_8601": "2023-12-30T13:13:47.085114Z",
"url": "https://files.pythonhosted.org/packages/a5/d0/459bb8d29c0ece80165be971cd8c36e5d4df26154d3d6e0a8c478432dddf/vietnamese-poem-classifier-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-30 13:13:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Anshler",
"github_project": "vietnamese-poem-classifier",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "vietnamese-poem-classifier"
}