![](docs/docs/img/logo_title_wide.png)
# ja-timex
自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器
## 概要
`ja-timex` は、現代日本語で書かれた自然文に含まれる時間情報表現を抽出し`TIMEX3`と呼ばれるアノテーション仕様に変換することで、プログラムが利用できるような形に規格化するルールベースの解析器です。
以下の機能を持っています。
- ルールベースによる日本語テキストからの日付や時刻、期間や頻度といった時間情報表現を抽出
- アラビア数字/漢数字、西暦/和暦などの多彩なフォーマットに対応
- 時間表現のdatetime/timedeltaオブジェクトへの変換サポート
### 入力
```python
from ja_timex import TimexParser
timexes = TimexParser().parse("彼は2008年4月から週に3回のジョギングを、朝8時から1時間行ってきた")
```
### 出力
```python
[<TIMEX3 tid="t0" type="DATE" value="2008-04-XX" text="2008年4月">,
<TIMEX3 tid="t1" type="SET" value="P1W" freq="3X" text="週に3回">,
<TIMEX3 tid="t2" type="TIME" value="T08-XX-XX" text="朝8時">,
<TIMEX3 tid="t3" type="DURATION" value="PT1H" text="1時間">]
```
### datetime/timedeltaへの変換
```python
# <TIMEX3 tid="t0" type="DATE" value="2008-04-XX" text="2008年4月">
In []: timexes[0].to_datetime()
Out[]: DateTime(2008, 4, 1, 0, 0, 0, tzinfo=Timezone('Asia/Tokyo'))
```
```python
# <TIMEX3 tid="t3" type="DURATION" value="PT1H" text="1時間">
In []: timexes[3].to_duration()
Out[]: Duration(hours=1)
```
## インストール
```
pip install ja-timex
```
## ドキュメント
[ja\-timex documentation](https://ja-timex.github.io/docs/)
### 参考仕様
本パッケージは、以下の論文で提案されている時間情報アノテーションの枠組みを元に作成しています。
- [1] [小西光, 浅原正幸, & 前川喜久雄. (2013). 『現代日本語書き言葉均衡コーパス』 に対する時間情報アノテーション. 自然言語処理, 20(2), 201-221.](https://www.jstage.jst.go.jp/article/jnlp/20/2/20_201/_article/-char/ja/)
- [2] [成澤克麻 (2014)「自然言語処理における数量表現の取り扱い」東北大学大学院 修士論文](http://www.cl.ecei.tohoku.ac.jp/publications/2015/mthesis2013_narisawa_submitted.pdf)
Raw data
{
"_id": null,
"home_page": "https://github.com/yagays/ja-timex",
"name": "ja-timex",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8.1,<4.0.0",
"maintainer_email": "",
"keywords": "ja_timex,NLP,japanese",
"author": "Yuki Okuda",
"author_email": "y.okuda@dr-ubie.com",
"download_url": "https://files.pythonhosted.org/packages/69/97/c8cd9fc4e6c0e80aabaae1a822f7d1737e44bf149cc4a2a4e0f8934d788a/ja_timex-0.2.8.tar.gz",
"platform": null,
"description": "![](docs/docs/img/logo_title_wide.png)\n\n# ja-timex\n\n\u81ea\u7136\u8a00\u8a9e\u3067\u66f8\u304b\u308c\u305f\u6642\u9593\u60c5\u5831\u8868\u73fe\u3092\u62bd\u51fa/\u898f\u683c\u5316\u3059\u308b\u30eb\u30fc\u30eb\u30d9\u30fc\u30b9\u306e\u89e3\u6790\u5668\n\n## \u6982\u8981\n`ja-timex` \u306f\u3001\u73fe\u4ee3\u65e5\u672c\u8a9e\u3067\u66f8\u304b\u308c\u305f\u81ea\u7136\u6587\u306b\u542b\u307e\u308c\u308b\u6642\u9593\u60c5\u5831\u8868\u73fe\u3092\u62bd\u51fa\u3057`TIMEX3`\u3068\u547c\u3070\u308c\u308b\u30a2\u30ce\u30c6\u30fc\u30b7\u30e7\u30f3\u4ed5\u69d8\u306b\u5909\u63db\u3059\u308b\u3053\u3068\u3067\u3001\u30d7\u30ed\u30b0\u30e9\u30e0\u304c\u5229\u7528\u3067\u304d\u308b\u3088\u3046\u306a\u5f62\u306b\u898f\u683c\u5316\u3059\u308b\u30eb\u30fc\u30eb\u30d9\u30fc\u30b9\u306e\u89e3\u6790\u5668\u3067\u3059\u3002\n\n\u4ee5\u4e0b\u306e\u6a5f\u80fd\u3092\u6301\u3063\u3066\u3044\u307e\u3059\u3002\n\n- \u30eb\u30fc\u30eb\u30d9\u30fc\u30b9\u306b\u3088\u308b\u65e5\u672c\u8a9e\u30c6\u30ad\u30b9\u30c8\u304b\u3089\u306e\u65e5\u4ed8\u3084\u6642\u523b\u3001\u671f\u9593\u3084\u983b\u5ea6\u3068\u3044\u3063\u305f\u6642\u9593\u60c5\u5831\u8868\u73fe\u3092\u62bd\u51fa\n- \u30a2\u30e9\u30d3\u30a2\u6570\u5b57/\u6f22\u6570\u5b57\u3001\u897f\u66a6/\u548c\u66a6\u306a\u3069\u306e\u591a\u5f69\u306a\u30d5\u30a9\u30fc\u30de\u30c3\u30c8\u306b\u5bfe\u5fdc\n- \u6642\u9593\u8868\u73fe\u306edatetime/timedelta\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\u3078\u306e\u5909\u63db\u30b5\u30dd\u30fc\u30c8\n\n### \u5165\u529b\n\n```python\nfrom ja_timex import TimexParser\n\ntimexes = TimexParser().parse(\"\u5f7c\u306f2008\u5e744\u6708\u304b\u3089\u9031\u306b3\u56de\u306e\u30b8\u30e7\u30ae\u30f3\u30b0\u3092\u3001\u671d8\u6642\u304b\u30891\u6642\u9593\u884c\u3063\u3066\u304d\u305f\")\n```\n\n### \u51fa\u529b\n\n```python\n[<TIMEX3 tid=\"t0\" type=\"DATE\" value=\"2008-04-XX\" text=\"2008\u5e744\u6708\">,\n <TIMEX3 tid=\"t1\" type=\"SET\" value=\"P1W\" freq=\"3X\" text=\"\u9031\u306b3\u56de\">,\n <TIMEX3 tid=\"t2\" type=\"TIME\" value=\"T08-XX-XX\" text=\"\u671d8\u6642\">,\n <TIMEX3 tid=\"t3\" type=\"DURATION\" value=\"PT1H\" text=\"1\u6642\u9593\">]\n```\n\n### datetime/timedelta\u3078\u306e\u5909\u63db\n\n```python\n# <TIMEX3 tid=\"t0\" type=\"DATE\" value=\"2008-04-XX\" text=\"2008\u5e744\u6708\">\nIn []: timexes[0].to_datetime()\nOut[]: DateTime(2008, 4, 1, 0, 0, 0, tzinfo=Timezone('Asia/Tokyo'))\n```\n\n\n```python\n# <TIMEX3 tid=\"t3\" type=\"DURATION\" value=\"PT1H\" text=\"1\u6642\u9593\">\nIn []: timexes[3].to_duration()\nOut[]: Duration(hours=1)\n```\n\n## \u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\n\n```\npip install ja-timex\n```\n\n## \u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\n[ja\\-timex documentation](https://ja-timex.github.io/docs/)\n\n### \u53c2\u8003\u4ed5\u69d8\n\u672c\u30d1\u30c3\u30b1\u30fc\u30b8\u306f\u3001\u4ee5\u4e0b\u306e\u8ad6\u6587\u3067\u63d0\u6848\u3055\u308c\u3066\u3044\u308b\u6642\u9593\u60c5\u5831\u30a2\u30ce\u30c6\u30fc\u30b7\u30e7\u30f3\u306e\u67a0\u7d44\u307f\u3092\u5143\u306b\u4f5c\u6210\u3057\u3066\u3044\u307e\u3059\u3002\n\n- [1] [\u5c0f\u897f\u5149, \u6d45\u539f\u6b63\u5e78, & \u524d\u5ddd\u559c\u4e45\u96c4. (2013). \u300e\u73fe\u4ee3\u65e5\u672c\u8a9e\u66f8\u304d\u8a00\u8449\u5747\u8861\u30b3\u30fc\u30d1\u30b9\u300f \u306b\u5bfe\u3059\u308b\u6642\u9593\u60c5\u5831\u30a2\u30ce\u30c6\u30fc\u30b7\u30e7\u30f3. \u81ea\u7136\u8a00\u8a9e\u51e6\u7406, 20(2), 201-221.](https://www.jstage.jst.go.jp/article/jnlp/20/2/20_201/_article/-char/ja/)\n- [2] [\u6210\u6fa4\u514b\u9ebb (2014)\u300c\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u306b\u304a\u3051\u308b\u6570\u91cf\u8868\u73fe\u306e\u53d6\u308a\u6271\u3044\u300d\u6771\u5317\u5927\u5b66\u5927\u5b66\u9662 \u4fee\u58eb\u8ad6\u6587](http://www.cl.ecei.tohoku.ac.jp/publications/2015/mthesis2013_narisawa_submitted.pdf)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Analyze and parse natural language temporal expression from Japanese sentences",
"version": "0.2.8",
"project_urls": {
"Homepage": "https://github.com/yagays/ja-timex",
"Repository": "https://github.com/yagays/ja-timex"
},
"split_keywords": [
"ja_timex",
"nlp",
"japanese"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1cdddc73c7871daa0e927cb12d5ddbe8357255dd92af94ea3095238f576adde8",
"md5": "3406bf7a3f71f9617a95d03fdbefc1ed",
"sha256": "5a68dd432ebd56dafb4831bd3cae3b49d3517030c6a3e6a7b9c540f5d1ea887d"
},
"downloads": -1,
"filename": "ja_timex-0.2.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3406bf7a3f71f9617a95d03fdbefc1ed",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.1,<4.0.0",
"size": 27705,
"upload_time": "2023-11-04T05:59:57",
"upload_time_iso_8601": "2023-11-04T05:59:57.248176Z",
"url": "https://files.pythonhosted.org/packages/1c/dd/dc73c7871daa0e927cb12d5ddbe8357255dd92af94ea3095238f576adde8/ja_timex-0.2.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6997c8cd9fc4e6c0e80aabaae1a822f7d1737e44bf149cc4a2a4e0f8934d788a",
"md5": "ccfaee4b4d68f8c35bba52f490d56fc0",
"sha256": "5942d294050eee520a62ae4c7c6c1fc9132d2a00f137ff53013c901c9104d4bd"
},
"downloads": -1,
"filename": "ja_timex-0.2.8.tar.gz",
"has_sig": false,
"md5_digest": "ccfaee4b4d68f8c35bba52f490d56fc0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.1,<4.0.0",
"size": 23771,
"upload_time": "2023-11-04T05:59:58",
"upload_time_iso_8601": "2023-11-04T05:59:58.595738Z",
"url": "https://files.pythonhosted.org/packages/69/97/c8cd9fc4e6c0e80aabaae1a822f7d1737e44bf149cc4a2a4e0f8934d788a/ja_timex-0.2.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-04 05:59:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yagays",
"github_project": "ja-timex",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "ja-timex"
}