span-to-ibo


Namespan-to-ibo JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/inoueakimitsu/span-to-ibo
SummaryThis is a script to convert the output file of doccano to a format that is easy to handle with sklearn-crfsuite.
upload_time2023-03-29 11:44:48
maintainer
docs_urlNone
authorAkimitsu Inoue
requires_python>=3.9,<4.0
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Span to IBO

This is a script to convert the output file of `doccano` 
to a format that is easy to handle with `sklearn-crfsuite`.

## Usage

```bash
python doccano.py --input_path <path to doccano exported jsonl file> --output_path <path to output file>
```

## Input file format

The input file is a jsonl file exported from `doccano`.

```json
{"text": "東京都渋谷区渋谷 2丁目2−8 渋谷マークシティ", "labels": [[0, 9, "LOC"]]}
{"text": "東京都渋谷区神南 1丁目1−1", "labels": [[0, 7, "LOC"]]}
...
```

## Output file format

The output file is a json file of the following format:

```json
[
    [
        {"word": "東京都", "label": "B-LOC", "pos_tag": "名詞", "pos_tag[:2]": "名詞,固有名詞", "pos_tag_all": "名詞,固有名詞,地域,一般,*,*,東京都,トウキョウト,トーキョート", "BOS": true, "EOS": false},
        {"word": "渋谷区", "label": "I-LOC", "pos_tag": "名詞", "pos_tag[:2]": "名詞,固有名詞", "pos_tag_all": "名詞,固有名詞,地域,一般,*,*,渋谷区,シブヤク,シブヤク", "BOS": false, "EOS": false},
        ...
    ],
    ...,
]
```

## Reference

This program is mainly based on the following repository.
https://github.com/ToshihikoSakai/jsontoconll

All mistakes in this script are mine.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/inoueakimitsu/span-to-ibo",
    "name": "span-to-ibo",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Akimitsu Inoue",
    "author_email": "inoue.akimitsu@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/3d/da/c894d124ce96836b421d08971689e84aa169ffb4359a617ab34a6a0c26c5/span_to_ibo-0.1.0.tar.gz",
    "platform": null,
    "description": "# Span to IBO\n\nThis is a script to convert the output file of `doccano` \nto a format that is easy to handle with `sklearn-crfsuite`.\n\n## Usage\n\n```bash\npython doccano.py --input_path <path to doccano exported jsonl file> --output_path <path to output file>\n```\n\n## Input file format\n\nThe input file is a jsonl file exported from `doccano`.\n\n```json\n{\"text\": \"\u6771\u4eac\u90fd\u6e0b\u8c37\u533a\u6e0b\u8c37 \uff12\u4e01\u76ee\uff12\u2212\uff18 \u6e0b\u8c37\u30de\u30fc\u30af\u30b7\u30c6\u30a3\", \"labels\": [[0, 9, \"LOC\"]]}\n{\"text\": \"\u6771\u4eac\u90fd\u6e0b\u8c37\u533a\u795e\u5357 \uff11\u4e01\u76ee\uff11\u2212\uff11\", \"labels\": [[0, 7, \"LOC\"]]}\n...\n```\n\n## Output file format\n\nThe output file is a json file of the following format:\n\n```json\n[\n    [\n        {\"word\": \"\u6771\u4eac\u90fd\", \"label\": \"B-LOC\", \"pos_tag\": \"\u540d\u8a5e\", \"pos_tag[:2]\": \"\u540d\u8a5e,\u56fa\u6709\u540d\u8a5e\", \"pos_tag_all\": \"\u540d\u8a5e,\u56fa\u6709\u540d\u8a5e,\u5730\u57df,\u4e00\u822c,*,*,\u6771\u4eac\u90fd,\u30c8\u30a6\u30ad\u30e7\u30a6\u30c8,\u30c8\u30fc\u30ad\u30e7\u30fc\u30c8\", \"BOS\": true, \"EOS\": false},\n        {\"word\": \"\u6e0b\u8c37\u533a\", \"label\": \"I-LOC\", \"pos_tag\": \"\u540d\u8a5e\", \"pos_tag[:2]\": \"\u540d\u8a5e,\u56fa\u6709\u540d\u8a5e\", \"pos_tag_all\": \"\u540d\u8a5e,\u56fa\u6709\u540d\u8a5e,\u5730\u57df,\u4e00\u822c,*,*,\u6e0b\u8c37\u533a,\u30b7\u30d6\u30e4\u30af,\u30b7\u30d6\u30e4\u30af\", \"BOS\": false, \"EOS\": false},\n        ...\n    ],\n    ...,\n]\n```\n\n## Reference\n\nThis program is mainly based on the following repository.\nhttps://github.com/ToshihikoSakai/jsontoconll\n\nAll mistakes in this script are mine.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "This is a script to convert the output file of doccano to a format that is easy to handle with sklearn-crfsuite.",
    "version": "0.1.0",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e09f4469d9c5b57368745459034f3b2c7bd09fdbd64ef6e4f5ac81d5dcf9a932",
                "md5": "eb6966ec82683cff004508cef0374d3a",
                "sha256": "06074b8fb2bdd1b39ecf061c80197e9c46b32e47a473a119160c5f2f249d0342"
            },
            "downloads": -1,
            "filename": "span_to_ibo-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "eb6966ec82683cff004508cef0374d3a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 5150,
            "upload_time": "2023-03-29T11:44:46",
            "upload_time_iso_8601": "2023-03-29T11:44:46.690103Z",
            "url": "https://files.pythonhosted.org/packages/e0/9f/4469d9c5b57368745459034f3b2c7bd09fdbd64ef6e4f5ac81d5dcf9a932/span_to_ibo-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3ddac894d124ce96836b421d08971689e84aa169ffb4359a617ab34a6a0c26c5",
                "md5": "093297582092752a99c3aa186c058d03",
                "sha256": "0db8dc420d54c7dcb0508adcd9ce75f876a31a4023a6ae95e52bc5f4a725e2e9"
            },
            "downloads": -1,
            "filename": "span_to_ibo-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "093297582092752a99c3aa186c058d03",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 4468,
            "upload_time": "2023-03-29T11:44:48",
            "upload_time_iso_8601": "2023-03-29T11:44:48.999464Z",
            "url": "https://files.pythonhosted.org/packages/3d/da/c894d124ce96836b421d08971689e84aa169ffb4359a617ab34a6a0c26c5/span_to_ibo-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-29 11:44:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "inoueakimitsu",
    "github_project": "span-to-ibo",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "span-to-ibo"
}
        
Elapsed time: 0.19002s