Name | span-to-ibo JSON |
Version |
0.1.0
JSON |
| download |
home_page | https://github.com/inoueakimitsu/span-to-ibo |
Summary | This is a script to convert the output file of doccano to a format that is easy to handle with sklearn-crfsuite. |
upload_time | 2023-03-29 11:44:48 |
maintainer | |
docs_url | None |
author | Akimitsu Inoue |
requires_python | >=3.9,<4.0 |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Span to IBO
This is a script to convert the output file of `doccano`
to a format that is easy to handle with `sklearn-crfsuite`.
## Usage
```bash
python doccano.py --input_path <path to doccano exported jsonl file> --output_path <path to output file>
```
## Input file format
The input file is a jsonl file exported from `doccano`.
```json
{"text": "東京都渋谷区渋谷 2丁目2−8 渋谷マークシティ", "labels": [[0, 9, "LOC"]]}
{"text": "東京都渋谷区神南 1丁目1−1", "labels": [[0, 7, "LOC"]]}
...
```
## Output file format
The output file is a json file of the following format:
```json
[
[
{"word": "東京都", "label": "B-LOC", "pos_tag": "名詞", "pos_tag[:2]": "名詞,固有名詞", "pos_tag_all": "名詞,固有名詞,地域,一般,*,*,東京都,トウキョウト,トーキョート", "BOS": true, "EOS": false},
{"word": "渋谷区", "label": "I-LOC", "pos_tag": "名詞", "pos_tag[:2]": "名詞,固有名詞", "pos_tag_all": "名詞,固有名詞,地域,一般,*,*,渋谷区,シブヤク,シブヤク", "BOS": false, "EOS": false},
...
],
...,
]
```
## Reference
This program is mainly based on the following repository.
https://github.com/ToshihikoSakai/jsontoconll
All mistakes in this script are mine.
Raw data
{
"_id": null,
"home_page": "https://github.com/inoueakimitsu/span-to-ibo",
"name": "span-to-ibo",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "Akimitsu Inoue",
"author_email": "inoue.akimitsu@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/3d/da/c894d124ce96836b421d08971689e84aa169ffb4359a617ab34a6a0c26c5/span_to_ibo-0.1.0.tar.gz",
"platform": null,
"description": "# Span to IBO\n\nThis is a script to convert the output file of `doccano` \nto a format that is easy to handle with `sklearn-crfsuite`.\n\n## Usage\n\n```bash\npython doccano.py --input_path <path to doccano exported jsonl file> --output_path <path to output file>\n```\n\n## Input file format\n\nThe input file is a jsonl file exported from `doccano`.\n\n```json\n{\"text\": \"\u6771\u4eac\u90fd\u6e0b\u8c37\u533a\u6e0b\u8c37 \uff12\u4e01\u76ee\uff12\u2212\uff18 \u6e0b\u8c37\u30de\u30fc\u30af\u30b7\u30c6\u30a3\", \"labels\": [[0, 9, \"LOC\"]]}\n{\"text\": \"\u6771\u4eac\u90fd\u6e0b\u8c37\u533a\u795e\u5357 \uff11\u4e01\u76ee\uff11\u2212\uff11\", \"labels\": [[0, 7, \"LOC\"]]}\n...\n```\n\n## Output file format\n\nThe output file is a json file of the following format:\n\n```json\n[\n [\n {\"word\": \"\u6771\u4eac\u90fd\", \"label\": \"B-LOC\", \"pos_tag\": \"\u540d\u8a5e\", \"pos_tag[:2]\": \"\u540d\u8a5e,\u56fa\u6709\u540d\u8a5e\", \"pos_tag_all\": \"\u540d\u8a5e,\u56fa\u6709\u540d\u8a5e,\u5730\u57df,\u4e00\u822c,*,*,\u6771\u4eac\u90fd,\u30c8\u30a6\u30ad\u30e7\u30a6\u30c8,\u30c8\u30fc\u30ad\u30e7\u30fc\u30c8\", \"BOS\": true, \"EOS\": false},\n {\"word\": \"\u6e0b\u8c37\u533a\", \"label\": \"I-LOC\", \"pos_tag\": \"\u540d\u8a5e\", \"pos_tag[:2]\": \"\u540d\u8a5e,\u56fa\u6709\u540d\u8a5e\", \"pos_tag_all\": \"\u540d\u8a5e,\u56fa\u6709\u540d\u8a5e,\u5730\u57df,\u4e00\u822c,*,*,\u6e0b\u8c37\u533a,\u30b7\u30d6\u30e4\u30af,\u30b7\u30d6\u30e4\u30af\", \"BOS\": false, \"EOS\": false},\n ...\n ],\n ...,\n]\n```\n\n## Reference\n\nThis program is mainly based on the following repository.\nhttps://github.com/ToshihikoSakai/jsontoconll\n\nAll mistakes in this script are mine.\n",
"bugtrack_url": null,
"license": "",
"summary": "This is a script to convert the output file of doccano to a format that is easy to handle with sklearn-crfsuite.",
"version": "0.1.0",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e09f4469d9c5b57368745459034f3b2c7bd09fdbd64ef6e4f5ac81d5dcf9a932",
"md5": "eb6966ec82683cff004508cef0374d3a",
"sha256": "06074b8fb2bdd1b39ecf061c80197e9c46b32e47a473a119160c5f2f249d0342"
},
"downloads": -1,
"filename": "span_to_ibo-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "eb6966ec82683cff004508cef0374d3a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<4.0",
"size": 5150,
"upload_time": "2023-03-29T11:44:46",
"upload_time_iso_8601": "2023-03-29T11:44:46.690103Z",
"url": "https://files.pythonhosted.org/packages/e0/9f/4469d9c5b57368745459034f3b2c7bd09fdbd64ef6e4f5ac81d5dcf9a932/span_to_ibo-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3ddac894d124ce96836b421d08971689e84aa169ffb4359a617ab34a6a0c26c5",
"md5": "093297582092752a99c3aa186c058d03",
"sha256": "0db8dc420d54c7dcb0508adcd9ce75f876a31a4023a6ae95e52bc5f4a725e2e9"
},
"downloads": -1,
"filename": "span_to_ibo-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "093297582092752a99c3aa186c058d03",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<4.0",
"size": 4468,
"upload_time": "2023-03-29T11:44:48",
"upload_time_iso_8601": "2023-03-29T11:44:48.999464Z",
"url": "https://files.pythonhosted.org/packages/3d/da/c894d124ce96836b421d08971689e84aa169ffb4359a617ab34a6a0c26c5/span_to_ibo-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-29 11:44:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "inoueakimitsu",
"github_project": "span-to-ibo",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "span-to-ibo"
}