# SudachiPy
[](https://pypi.python.org/pypi/sudachipy/)
[](https://www.python.org/downloads/release/python-360/)
[Documentation](https://worksapplications.github.io/sudachi.rs/python)
SudachiPy is a Python version of [Sudachi](https://github.com/WorksApplications/Sudachi), a Japanese morphological analyzer.
This is not a pure Python implementation, but bindings for the
[Sudachi.rs](https://github.com/WorksApplications/sudachi.rs).
## Binary wheels
We provide binary builds for macOS (10.14+), Windows and Linux only for x86_64 architecture.
x86 32-bit architecture is not supported and is not tested.
MacOS source builds seem to work on ARM-based (Aarch64) Macs,
but this architecture also is not tested and require installing Rust toolchain and Cargo.
More information [here](https://worksapplications.github.io/sudachi.rs/python/topics/wheels.html).
## TL;DR
```bash
$ pip install sudachipy sudachidict_core
$ echo "高輪ゲートウェイ駅" | sudachipy
高輪ゲートウェイ駅 名詞,固有名詞,一般,*,*,* 高輪ゲートウェイ駅
EOS
$ echo "高輪ゲートウェイ駅" | sudachipy -m A
高輪 名詞,固有名詞,地名,一般,*,* 高輪
ゲートウェイ 名詞,普通名詞,一般,*,*,* ゲートウェー
駅 名詞,普通名詞,一般,*,*,* 駅
EOS
$ echo "空缶空罐空きカン" | sudachipy -a
空缶 名詞,普通名詞,一般,*,*,* 空き缶 空缶 アキカン 0
空罐 名詞,普通名詞,一般,*,*,* 空き缶 空罐 アキカン 0
空きカン 名詞,普通名詞,一般,*,*,* 空き缶 空きカン アキカン 0
EOS
```
```python
from sudachipy import Dictionary, SplitMode
tokenizer = Dictionary().create()
morphemes = tokenizer.tokenize("国会議事堂前駅")
print(morphemes[0].surface()) # '国会議事堂前駅'
print(morphemes[0].reading_form()) # 'コッカイギジドウマエエキ'
print(morphemes[0].part_of_speech()) # ['名詞', '固有名詞', '一般', '*', '*', '*']
morphemes = tokenizer.tokenize("国会議事堂前駅", SplitMode.A)
print([m.surface() for m in morphemes]) # ['国会', '議事', '堂', '前', '駅']
```
## Setup
You need SudachiPy and a dictionary.
### Step 1. Install SudachiPy
```bash
$ pip install sudachipy
```
### Step 2. Get a Dictionary
You can get dictionary as a Python package. It make take a while to download the dictionary file (around 70MB for the `core` edition).
```bash
$ pip install sudachidict_core
```
Alternatively, you can choose other dictionary editions. See [this section](#dictionary-edition) for the detail.
## Usage: As a command
There is a CLI command `sudachipy`.
```bash
$ echo "外国人参政権" | sudachipy
外国人参政権 名詞,普通名詞,一般,*,*,* 外国人参政権
EOS
$ echo "外国人参政権" | sudachipy -m A
外国 名詞,普通名詞,一般,*,*,* 外国
人 接尾辞,名詞的,一般,*,*,* 人
参政 名詞,普通名詞,一般,*,*,* 参政
権 接尾辞,名詞的,一般,*,*,* 権
EOS
```
```bash
$ sudachipy tokenize -h
usage: sudachipy tokenize [-h] [-r file] [-m {A,B,C}] [-o file] [-s string]
[-a] [-d] [-v]
[file [file ...]]
Tokenize Text
positional arguments:
file text written in utf-8
optional arguments:
-h, --help show this help message and exit
-r file the setting file in JSON format
-m {A,B,C} the mode of splitting
-o file the output file
-s string sudachidict type
-a print all of the fields
-d print the debug information
-v, --version print sudachipy version
```
__Note: The Debug option (`-d`) is disabled in version 0.6.0.__
### Output
Columns are tab separated.
- Surface
- Part-of-Speech Tags (comma separated)
- Normalized Form
When you add the `-a` option, it additionally outputs
- Dictionary Form
- Reading Form
- Dictionary ID
- `0` for the system dictionary
- `1` and above for the [user dictionaries](#user-dictionary)
- `-1` if a word is Out-of-Vocabulary (not in the dictionary)
- Synonym group IDs
- `(OOV)` if a word is Out-of-Vocabulary (not in the dictionary)
```bash
$ echo "外国人参政権" | sudachipy -a
外国人参政権 名詞,普通名詞,一般,*,*,* 外国人参政権 外国人参政権 ガイコクジンサンセイケン 0 []
EOS
```
```bash
echo "阿quei" | sudachipy -a
阿 名詞,普通名詞,一般,*,*,* 阿 阿 -1 [] (OOV)
quei 名詞,普通名詞,一般,*,*,* quei quei -1 [] (OOV)
EOS
```
## Usage: As a Python package
### API
See [API reference page](https://worksapplications.github.io/sudachi.rs/python/).
### Example
```python
from sudachipy import Dictionary, SplitMode
tokenizer_obj = Dictionary().create()
```
```python
# Multi-granular Tokenization
# SplitMode.C is the default mode
[m.surface() for m in tokenizer_obj.tokenize("国家公務員", SplitMode.C)]
# => ['国家公務員']
[m.surface() for m in tokenizer_obj.tokenize("国家公務員", SplitMode.B)]
# => ['国家', '公務員']
[m.surface() for m in tokenizer_obj.tokenize("国家公務員", SplitMode.A)]
# => ['国家', '公務', '員']
```
```python
# Morpheme information
m = tokenizer_obj.tokenize("食べ")[0]
m.surface() # => '食べ'
m.dictionary_form() # => '食べる'
m.reading_form() # => 'タベ'
m.part_of_speech() # => ['動詞', '一般', '*', '*', '下一段-バ行', '連用形-一般']
```
```python
# Normalization
tokenizer_obj.tokenize("附属", mode)[0].normalized_form()
# => '付属'
tokenizer_obj.tokenize("SUMMER", mode)[0].normalized_form()
# => 'サマー'
tokenizer_obj.tokenize("シュミレーション", mode)[0].normalized_form()
# => 'シミュレーション'
```
(With `20210802` `core` dictionary. The results may change when you use other versions)
## Dictionary Edition
There are three editions of Sudachi Dictionary, namely, `small`, `core`, and `full`. See [WorksApplications/SudachiDict](https://github.com/WorksApplications/SudachiDict) for the detail.
SudachiPy uses `sudachidict_core` by default.
Dictionaries are installed as Python packages `sudachidict_small`, `sudachidict_core`, and `sudachidict_full`.
* [SudachiDict-small · PyPI](https://pypi.org/project/SudachiDict-small/)
* [SudachiDict-core · PyPI](https://pypi.org/project/SudachiDict-core/)
* [SudachiDict-full · PyPI](https://pypi.org/project/SudachiDict-full/)
The dictionary files are not in the package itself, but it is downloaded upon installation.
### Dictionary option: command line
You can specify the dictionary with the tokenize option `-s`.
```bash
$ pip install sudachidict_small
$ echo "外国人参政権" | sudachipy -s small
```
```bash
$ pip install sudachidict_full
$ echo "外国人参政権" | sudachipy -s full
```
### Dictionary option: Python package
You can specify the dictionary with the `Dicionary()` argument; `config_path` or `dict_type`.
```python
class Dictionary(config_path=None, resource_dir=None, dict_type=None)
```
1. `config_path`
* You can specify the file path to the setting file with `config_path` (See [Dictionary in The Setting File](#Dictionary in The Setting File) for the detail).
* If the dictionary file is specified in the setting file as `systemDict`, SudachiPy will use the dictionary.
2. `dict_type`
* You can also specify the dictionary type with `dict_type`.
* The available arguments are `small`, `core`, or `full`.
* If different dictionaries are specified with `config_path` and `dict_type`, **a dictionary defined `dict_type` overrides** those defined in the config path.
```python
from sudachipy import Dictionary
# default: sudachidict_core
tokenizer_obj = Dictionary().create()
# The dictionary given by the `systemDict` key in the config file (/path/to/sudachi.json) will be used
tokenizer_obj = Dictionary(config_path="/path/to/sudachi.json").create()
# The dictionary specified by `dict_type` will be set.
tokenizer_obj = Dictionary(dict_type="core").create() # sudachidict_core (same as default)
tokenizer_obj = Dictionary(dict_type="small").create() # sudachidict_small
tokenizer_obj = Dictionary(dict_type="full").create() # sudachidict_full
# The dictionary specified by `dict_type` overrides those defined in the config path.
# In the following code, `sudachidict_full` will be used regardless of a dictionary defined in the config file.
tokenizer_obj = Dictionary(config_path="/path/to/sudachi.json", dict_type="full").create()
```
### Dictionary in The Setting File
Alternatively, if the dictionary file is specified in the setting file, `sudachi.json`, SudachiPy will use that file.
```js
{
"systemDict" : "relative/path/from/resourceDir/to/system.dic",
...
}
```
The default setting file is [sudachi.json](https://github.com/WorksApplications/sudachi.rs/blob/develop/python/py_src/sudachi/resources/sudachi.json). You can specify your `sudachi.json` with the `-r` option.
```bash
$ sudachipy -r path/to/sudachi.json
```
## User Dictionary
To use a user dictionary, `user.dic`, place [sudachi.json](https://github.com/WorksApplications/sudachi.rs/blob/develop/python/py_src/sudachi/resources/sudachi.json) to anywhere you like, and add `userDict` value with the relative path from `sudachi.json` to your `user.dic`.
```js
{
"userDict" : ["relative/path/to/user.dic"],
...
}
```
Then specify your `sudachi.json` with the `-r` option.
```bash
$ sudachipy -r path/to/sudachi.json
```
You can build a user dictionary with the subcommand `ubuild`.
```bash
$ sudachipy ubuild -h
usage: sudachipy ubuild [-h] [-d string] [-o file] [-s file] file [file ...]
Build User Dictionary
positional arguments:
file source files with CSV format (one or more)
optional arguments:
-h, --help show this help message and exit
-d string description comment to be embedded on dictionary
-o file output file (default: user.dic)
-s file system dictionary path (default: system core dictionary path)
```
About the dictionary file format, please refer to [this document](https://github.com/WorksApplications/Sudachi/blob/develop/docs/user_dict.md) (written in Japanese, English version is not available yet).
## Customized System Dictionary
```bash
$ sudachipy build -h
usage: sudachipy build [-h] [-o file] [-d string] -m file file [file ...]
Build Sudachi Dictionary
positional arguments:
file source files with CSV format (one of more)
optional arguments:
-h, --help show this help message and exit
-o file output file (default: system.dic)
-d string description comment to be embedded on dictionary
required named arguments:
-m file connection matrix file with MeCab's matrix.def format
```
To use your customized `system.dic`, place [sudachi.json](https://github.com/WorksApplications/sudachi.rs/blob/develop/python/py_src/sudachi/resources/sudachi.json) to anywhere you like, and overwrite `systemDict` value with the relative path from `sudachi.json` to your `system.dic`.
```js
{
"systemDict" : "relative/path/to/system.dic",
...
}
```
Then specify your `sudachi.json` with the `-r` option.
```bash
$ sudachipy -r path/to/sudachi.json
```
## For Developers
### Build from source
#### Install sdist via pip
1. Install python module `setuptools` and `setuptools-rust`.
2. Run `./build-sdist.sh` in `python` dir.
- source distribution will be generated under `python/dist/` dir.
3. Install it via pip: `pip install ./python/dist/SudachiPy-[version].tar.gz`
#### Install develop build
1. Install python module `setuptools` and `setuptools-rust`.
2. Run `python3 setup.py develop`.
- `develop` will create a debug build, while `install` will create a release build.
3. Now you can import the module by `import sudachipy`.
ref: [setuptools-rust](https://github.com/PyO3/setuptools-rust)
### Test
Run `build_and_test.sh` to run the tests.
## Contact
Sudachi and SudachiPy are developed by [WAP Tokushima Laboratory of AI and NLP](http://nlp.worksap.co.jp/).
Open an issue, or come to our Slack workspace for questions and discussion.
https://sudachi-dev.slack.com/ (Get invitation [here](https://join.slack.com/t/sudachi-dev/shared_invite/enQtMzg2NTI2NjYxNTUyLTMyYmNkZWQ0Y2E5NmQxMTI3ZGM3NDU0NzU4NGE1Y2UwYTVmNTViYjJmNDI0MWZiYTg4ODNmMzgxYTQ3ZmI2OWU))
Enjoy tokenization!
Raw data
{
"_id": null,
"home_page": "https://github.com/WorksApplications/sudachi.rs/tree/develop/python",
"name": "SudachiPy",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Works Applications",
"author_email": "sudachi@worksap.co.jp",
"download_url": "https://files.pythonhosted.org/packages/b4/40/11f8f08adce726f89da640a9e6cee987020a2ebcf4162217429367df1b9a/SudachiPy-0.6.7.tar.gz",
"platform": null,
"description": "# SudachiPy\n[](https://pypi.python.org/pypi/sudachipy/)\n[](https://www.python.org/downloads/release/python-360/)\n[Documentation](https://worksapplications.github.io/sudachi.rs/python)\n\nSudachiPy is a Python version of [Sudachi](https://github.com/WorksApplications/Sudachi), a Japanese morphological analyzer.\n\nThis is not a pure Python implementation, but bindings for the\n[Sudachi.rs](https://github.com/WorksApplications/sudachi.rs).\n\n## Binary wheels\n\nWe provide binary builds for macOS (10.14+), Windows and Linux only for x86_64 architecture.\nx86 32-bit architecture is not supported and is not tested.\nMacOS source builds seem to work on ARM-based (Aarch64) Macs,\nbut this architecture also is not tested and require installing Rust toolchain and Cargo.\n\nMore information [here](https://worksapplications.github.io/sudachi.rs/python/topics/wheels.html).\n\n## TL;DR\n\n```bash\n$ pip install sudachipy sudachidict_core\n\n$ echo \"\u9ad8\u8f2a\u30b2\u30fc\u30c8\u30a6\u30a7\u30a4\u99c5\" | sudachipy\n\u9ad8\u8f2a\u30b2\u30fc\u30c8\u30a6\u30a7\u30a4\u99c5\t\u540d\u8a5e,\u56fa\u6709\u540d\u8a5e,\u4e00\u822c,*,*,*\t\u9ad8\u8f2a\u30b2\u30fc\u30c8\u30a6\u30a7\u30a4\u99c5\nEOS\n\n$ echo \"\u9ad8\u8f2a\u30b2\u30fc\u30c8\u30a6\u30a7\u30a4\u99c5\" | sudachipy -m A\n\u9ad8\u8f2a\t\u540d\u8a5e,\u56fa\u6709\u540d\u8a5e,\u5730\u540d,\u4e00\u822c,*,*\t\u9ad8\u8f2a\n\u30b2\u30fc\u30c8\u30a6\u30a7\u30a4\t\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*\t\u30b2\u30fc\u30c8\u30a6\u30a7\u30fc\n\u99c5\t\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*\t\u99c5\nEOS\n\n$ echo \"\u7a7a\u7f36\u7a7a\u7f50\u7a7a\u304d\u30ab\u30f3\" | sudachipy -a\n\u7a7a\u7f36\t\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*\t\u7a7a\u304d\u7f36\t\u7a7a\u7f36\t\u30a2\u30ad\u30ab\u30f3\t0\n\u7a7a\u7f50\t\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*\t\u7a7a\u304d\u7f36\t\u7a7a\u7f50\t\u30a2\u30ad\u30ab\u30f3\t0\n\u7a7a\u304d\u30ab\u30f3\t\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*\t\u7a7a\u304d\u7f36\t\u7a7a\u304d\u30ab\u30f3\t\u30a2\u30ad\u30ab\u30f3\t0\nEOS\n```\n\n```python\nfrom sudachipy import Dictionary, SplitMode\n\ntokenizer = Dictionary().create()\n\nmorphemes = tokenizer.tokenize(\"\u56fd\u4f1a\u8b70\u4e8b\u5802\u524d\u99c5\")\nprint(morphemes[0].surface()) # '\u56fd\u4f1a\u8b70\u4e8b\u5802\u524d\u99c5'\nprint(morphemes[0].reading_form()) # '\u30b3\u30c3\u30ab\u30a4\u30ae\u30b8\u30c9\u30a6\u30de\u30a8\u30a8\u30ad'\nprint(morphemes[0].part_of_speech()) # ['\u540d\u8a5e', '\u56fa\u6709\u540d\u8a5e', '\u4e00\u822c', '*', '*', '*']\n\nmorphemes = tokenizer.tokenize(\"\u56fd\u4f1a\u8b70\u4e8b\u5802\u524d\u99c5\", SplitMode.A)\nprint([m.surface() for m in morphemes]) # ['\u56fd\u4f1a', '\u8b70\u4e8b', '\u5802', '\u524d', '\u99c5']\n```\n\n\n## Setup\n\nYou need SudachiPy and a dictionary.\n\n### Step 1. Install SudachiPy\n\n```bash\n$ pip install sudachipy\n```\n\n### Step 2. Get a Dictionary\n\nYou can get dictionary as a Python package. It make take a while to download the dictionary file (around 70MB for the `core` edition).\n\n```bash\n$ pip install sudachidict_core\n```\n\nAlternatively, you can choose other dictionary editions. See [this section](#dictionary-edition) for the detail.\n\n\n## Usage: As a command\n\nThere is a CLI command `sudachipy`.\n\n```bash\n$ echo \"\u5916\u56fd\u4eba\u53c2\u653f\u6a29\" | sudachipy\n\u5916\u56fd\u4eba\u53c2\u653f\u6a29\t\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*\t\u5916\u56fd\u4eba\u53c2\u653f\u6a29\nEOS\n$ echo \"\u5916\u56fd\u4eba\u53c2\u653f\u6a29\" | sudachipy -m A\n\u5916\u56fd\t\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*\t\u5916\u56fd\n\u4eba\t\u63a5\u5c3e\u8f9e,\u540d\u8a5e\u7684,\u4e00\u822c,*,*,*\t\u4eba\n\u53c2\u653f\t\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*\t\u53c2\u653f\n\u6a29\t\u63a5\u5c3e\u8f9e,\u540d\u8a5e\u7684,\u4e00\u822c,*,*,*\t\u6a29\nEOS\n```\n\n```bash\n$ sudachipy tokenize -h\nusage: sudachipy tokenize [-h] [-r file] [-m {A,B,C}] [-o file] [-s string]\n [-a] [-d] [-v]\n [file [file ...]]\n\nTokenize Text\n\npositional arguments:\n file text written in utf-8\n\noptional arguments:\n -h, --help show this help message and exit\n -r file the setting file in JSON format\n -m {A,B,C} the mode of splitting\n -o file the output file\n -s string sudachidict type\n -a print all of the fields\n -d print the debug information\n -v, --version print sudachipy version\n```\n\n__Note: The Debug option (`-d`) is disabled in version 0.6.0.__\n\n\n### Output\n\nColumns are tab separated.\n\n- Surface\n- Part-of-Speech Tags (comma separated)\n- Normalized Form\n\nWhen you add the `-a` option, it additionally outputs\n\n- Dictionary Form\n- Reading Form\n- Dictionary ID\n - `0` for the system dictionary\n - `1` and above for the [user dictionaries](#user-dictionary)\n - `-1` if a word is Out-of-Vocabulary (not in the dictionary)\n- Synonym group IDs\n- `(OOV)` if a word is Out-of-Vocabulary (not in the dictionary)\n\n```bash\n$ echo \"\u5916\u56fd\u4eba\u53c2\u653f\u6a29\" | sudachipy -a\n\u5916\u56fd\u4eba\u53c2\u653f\u6a29\t\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*\t\u5916\u56fd\u4eba\u53c2\u653f\u6a29\t\u5916\u56fd\u4eba\u53c2\u653f\u6a29\t\u30ac\u30a4\u30b3\u30af\u30b8\u30f3\u30b5\u30f3\u30bb\u30a4\u30b1\u30f3\t0\t[]\nEOS\n```\n\n```bash\necho \"\u963fquei\" | sudachipy -a\n\u963f\t\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*\t\u963f\t\u963f\t\t-1\t[]\t(OOV)\nquei\t\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*\tquei\tquei\t\t-1\t[]\t(OOV)\nEOS\n```\n\n\n## Usage: As a Python package\n\n### API\n\nSee [API reference page](https://worksapplications.github.io/sudachi.rs/python/).\n\n\n### Example\n\n```python\nfrom sudachipy import Dictionary, SplitMode\n\ntokenizer_obj = Dictionary().create()\n```\n\n```python\n# Multi-granular Tokenization\n\n# SplitMode.C is the default mode\n[m.surface() for m in tokenizer_obj.tokenize(\"\u56fd\u5bb6\u516c\u52d9\u54e1\", SplitMode.C)]\n# => ['\u56fd\u5bb6\u516c\u52d9\u54e1']\n\n[m.surface() for m in tokenizer_obj.tokenize(\"\u56fd\u5bb6\u516c\u52d9\u54e1\", SplitMode.B)]\n# => ['\u56fd\u5bb6', '\u516c\u52d9\u54e1']\n\n[m.surface() for m in tokenizer_obj.tokenize(\"\u56fd\u5bb6\u516c\u52d9\u54e1\", SplitMode.A)]\n# => ['\u56fd\u5bb6', '\u516c\u52d9', '\u54e1']\n```\n\n```python\n# Morpheme information\n\nm = tokenizer_obj.tokenize(\"\u98df\u3079\")[0]\n\nm.surface() # => '\u98df\u3079'\nm.dictionary_form() # => '\u98df\u3079\u308b'\nm.reading_form() # => '\u30bf\u30d9'\nm.part_of_speech() # => ['\u52d5\u8a5e', '\u4e00\u822c', '*', '*', '\u4e0b\u4e00\u6bb5-\u30d0\u884c', '\u9023\u7528\u5f62-\u4e00\u822c']\n```\n\n```python\n# Normalization\n\ntokenizer_obj.tokenize(\"\u9644\u5c5e\", mode)[0].normalized_form()\n# => '\u4ed8\u5c5e'\ntokenizer_obj.tokenize(\"SUMMER\", mode)[0].normalized_form()\n# => '\u30b5\u30de\u30fc'\ntokenizer_obj.tokenize(\"\u30b7\u30e5\u30df\u30ec\u30fc\u30b7\u30e7\u30f3\", mode)[0].normalized_form()\n# => '\u30b7\u30df\u30e5\u30ec\u30fc\u30b7\u30e7\u30f3'\n```\n\n(With `20210802` `core` dictionary. The results may change when you use other versions)\n\n\n## Dictionary Edition\n\nThere are three editions of Sudachi Dictionary, namely, `small`, `core`, and `full`. See [WorksApplications/SudachiDict](https://github.com/WorksApplications/SudachiDict) for the detail.\n\nSudachiPy uses `sudachidict_core` by default.\n\nDictionaries are installed as Python packages `sudachidict_small`, `sudachidict_core`, and `sudachidict_full`.\n\n* [SudachiDict-small \u00b7 PyPI](https://pypi.org/project/SudachiDict-small/)\n* [SudachiDict-core \u00b7 PyPI](https://pypi.org/project/SudachiDict-core/)\n* [SudachiDict-full \u00b7 PyPI](https://pypi.org/project/SudachiDict-full/)\n\nThe dictionary files are not in the package itself, but it is downloaded upon installation.\n\n### Dictionary option: command line\n\nYou can specify the dictionary with the tokenize option `-s`.\n\n```bash\n$ pip install sudachidict_small\n$ echo \"\u5916\u56fd\u4eba\u53c2\u653f\u6a29\" | sudachipy -s small\n```\n\n```bash\n$ pip install sudachidict_full\n$ echo \"\u5916\u56fd\u4eba\u53c2\u653f\u6a29\" | sudachipy -s full\n```\n\n\n### Dictionary option: Python package\n\nYou can specify the dictionary with the `Dicionary()` argument; `config_path` or `dict_type`.\n\n```python\nclass Dictionary(config_path=None, resource_dir=None, dict_type=None)\n```\n\n1. `config_path`\n * You can specify the file path to the setting file with `config_path` (See [Dictionary in The Setting File](#Dictionary in The Setting File) for the detail).\n * If the dictionary file is specified in the setting file as `systemDict`, SudachiPy will use the dictionary.\n2. `dict_type`\n * You can also specify the dictionary type with `dict_type`.\n * The available arguments are `small`, `core`, or `full`.\n * If different dictionaries are specified with `config_path` and `dict_type`, **a dictionary defined `dict_type` overrides** those defined in the config path.\n\n```python\nfrom sudachipy import Dictionary\n\n# default: sudachidict_core\ntokenizer_obj = Dictionary().create()\n\n# The dictionary given by the `systemDict` key in the config file (/path/to/sudachi.json) will be used\ntokenizer_obj = Dictionary(config_path=\"/path/to/sudachi.json\").create()\n\n# The dictionary specified by `dict_type` will be set.\ntokenizer_obj = Dictionary(dict_type=\"core\").create() # sudachidict_core (same as default)\ntokenizer_obj = Dictionary(dict_type=\"small\").create() # sudachidict_small\ntokenizer_obj = Dictionary(dict_type=\"full\").create() # sudachidict_full\n\n# The dictionary specified by `dict_type` overrides those defined in the config path.\n# In the following code, `sudachidict_full` will be used regardless of a dictionary defined in the config file.\ntokenizer_obj = Dictionary(config_path=\"/path/to/sudachi.json\", dict_type=\"full\").create()\n```\n\n\n### Dictionary in The Setting File\n\nAlternatively, if the dictionary file is specified in the setting file, `sudachi.json`, SudachiPy will use that file.\n\n```js\n{\n \"systemDict\" : \"relative/path/from/resourceDir/to/system.dic\",\n ...\n}\n```\n\nThe default setting file is [sudachi.json](https://github.com/WorksApplications/sudachi.rs/blob/develop/python/py_src/sudachi/resources/sudachi.json). You can specify your `sudachi.json` with the `-r` option.\n\n```bash\n$ sudachipy -r path/to/sudachi.json\n```\n\n\n## User Dictionary\n\nTo use a user dictionary, `user.dic`, place [sudachi.json](https://github.com/WorksApplications/sudachi.rs/blob/develop/python/py_src/sudachi/resources/sudachi.json) to anywhere you like, and add `userDict` value with the relative path from `sudachi.json` to your `user.dic`.\n\n```js\n{\n \"userDict\" : [\"relative/path/to/user.dic\"],\n ...\n}\n```\n\nThen specify your `sudachi.json` with the `-r` option.\n\n```bash\n$ sudachipy -r path/to/sudachi.json\n```\n\n\nYou can build a user dictionary with the subcommand `ubuild`.\n\n\n```bash\n$ sudachipy ubuild -h\nusage: sudachipy ubuild [-h] [-d string] [-o file] [-s file] file [file ...]\n\nBuild User Dictionary\n\npositional arguments:\n file source files with CSV format (one or more)\n\noptional arguments:\n -h, --help show this help message and exit\n -d string description comment to be embedded on dictionary\n -o file output file (default: user.dic)\n -s file system dictionary path (default: system core dictionary path)\n```\n\nAbout the dictionary file format, please refer to [this document](https://github.com/WorksApplications/Sudachi/blob/develop/docs/user_dict.md) (written in Japanese, English version is not available yet).\n\n\n## Customized System Dictionary\n\n```bash\n$ sudachipy build -h\nusage: sudachipy build [-h] [-o file] [-d string] -m file file [file ...]\n\nBuild Sudachi Dictionary\n\npositional arguments:\n file source files with CSV format (one of more)\n\noptional arguments:\n -h, --help show this help message and exit\n -o file output file (default: system.dic)\n -d string description comment to be embedded on dictionary\n\nrequired named arguments:\n -m file connection matrix file with MeCab's matrix.def format\n```\n\nTo use your customized `system.dic`, place [sudachi.json](https://github.com/WorksApplications/sudachi.rs/blob/develop/python/py_src/sudachi/resources/sudachi.json) to anywhere you like, and overwrite `systemDict` value with the relative path from `sudachi.json` to your `system.dic`.\n\n```js\n{\n \"systemDict\" : \"relative/path/to/system.dic\",\n ...\n}\n```\n\nThen specify your `sudachi.json` with the `-r` option.\n\n```bash\n$ sudachipy -r path/to/sudachi.json\n```\n\n\n## For Developers\n\n### Build from source\n\n#### Install sdist via pip\n\n1. Install python module `setuptools` and `setuptools-rust`.\n2. Run `./build-sdist.sh` in `python` dir.\n - source distribution will be generated under `python/dist/` dir.\n3. Install it via pip: `pip install ./python/dist/SudachiPy-[version].tar.gz`\n\n\n#### Install develop build\n\n1. Install python module `setuptools` and `setuptools-rust`.\n2. Run `python3 setup.py develop`.\n - `develop` will create a debug build, while `install` will create a release build.\n3. Now you can import the module by `import sudachipy`.\n\nref: [setuptools-rust](https://github.com/PyO3/setuptools-rust)\n\n\n### Test\n\nRun `build_and_test.sh` to run the tests.\n\n\n## Contact\n\nSudachi and SudachiPy are developed by [WAP Tokushima Laboratory of AI and NLP](http://nlp.worksap.co.jp/).\n\nOpen an issue, or come to our Slack workspace for questions and discussion.\n\nhttps://sudachi-dev.slack.com/ (Get invitation [here](https://join.slack.com/t/sudachi-dev/shared_invite/enQtMzg2NTI2NjYxNTUyLTMyYmNkZWQ0Y2E5NmQxMTI3ZGM3NDU0NzU4NGE1Y2UwYTVmNTViYjJmNDI0MWZiYTg4ODNmMzgxYTQ3ZmI2OWU))\n\nEnjoy tokenization!\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Python version of Sudachi, the Japanese Morphological Analyzer",
"version": "0.6.7",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "395943df810463acc3cae623b38cb09f98ee469d0e123e236c349a5e98c0383d",
"md5": "5f797f472e3989092c96973ad83f40ba",
"sha256": "8f14ae0ffeb5fa90c3c47fd25af0afdcec8cb3714f936c9cd4eb15132336148c"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7-cp310-cp310-macosx_10_12_universal2.whl",
"has_sig": false,
"md5_digest": "5f797f472e3989092c96973ad83f40ba",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": null,
"size": 2432973,
"upload_time": "2023-02-15T07:44:05",
"upload_time_iso_8601": "2023-02-15T07:44:05.642081Z",
"url": "https://files.pythonhosted.org/packages/39/59/43df810463acc3cae623b38cb09f98ee469d0e123e236c349a5e98c0383d/SudachiPy-0.6.7-cp310-cp310-macosx_10_12_universal2.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "dc74db1c83a11e68abca8dadce6719f02c05042a03cc9b9a95080c269705d78f",
"md5": "0a0640cb3d0b046e4346dde82d12fc92",
"sha256": "3e3f54006158d3112d86502f1ff3ab81f1d80c75d00ffeb853522f919a853063"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "0a0640cb3d0b046e4346dde82d12fc92",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": null,
"size": 2208970,
"upload_time": "2023-02-15T07:44:07",
"upload_time_iso_8601": "2023-02-15T07:44:07.805350Z",
"url": "https://files.pythonhosted.org/packages/dc/74/db1c83a11e68abca8dadce6719f02c05042a03cc9b9a95080c269705d78f/SudachiPy-0.6.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2fe5bb6064bad2ba3c8e85475d75b772705633a63beea3c883a693185b17d2ff",
"md5": "f071541329f777a6888ff52d5663cbf3",
"sha256": "9fdc7bd9d059821ef84e7600d69b150109ef5b71be0c68dce47616fce5bf8dac"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7-cp310-cp310-win_amd64.whl",
"has_sig": false,
"md5_digest": "f071541329f777a6888ff52d5663cbf3",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": null,
"size": 1044905,
"upload_time": "2023-02-15T07:44:10",
"upload_time_iso_8601": "2023-02-15T07:44:10.090582Z",
"url": "https://files.pythonhosted.org/packages/2f/e5/bb6064bad2ba3c8e85475d75b772705633a63beea3c883a693185b17d2ff/SudachiPy-0.6.7-cp310-cp310-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d0665c7013f2fdeb97aab3389ccc24dee4f920993059503c65e51e2698539b82",
"md5": "aa42df30540201c38357f24781d7f20d",
"sha256": "5b7d5882e1450cdfd3235b14e569cde793d878fd018ade2871cb35e6f9f13ea6"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7-cp311-cp311-macosx_10_12_universal2.whl",
"has_sig": false,
"md5_digest": "aa42df30540201c38357f24781d7f20d",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": null,
"size": 2432972,
"upload_time": "2023-02-15T07:44:12",
"upload_time_iso_8601": "2023-02-15T07:44:12.261113Z",
"url": "https://files.pythonhosted.org/packages/d0/66/5c7013f2fdeb97aab3389ccc24dee4f920993059503c65e51e2698539b82/SudachiPy-0.6.7-cp311-cp311-macosx_10_12_universal2.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "25d754cd33fdc467923ebb1fda2384f6ea61a2614af59e44dc29e0f2f50a3290",
"md5": "20302698b743559d3dfebdc698d1b4f7",
"sha256": "789d448bc5a3c5e0ed1e4bff505eed1be0fab3bf2e3480edd0b885c58bcd4cff"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "20302698b743559d3dfebdc698d1b4f7",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": null,
"size": 2208972,
"upload_time": "2023-02-15T07:44:14",
"upload_time_iso_8601": "2023-02-15T07:44:14.213330Z",
"url": "https://files.pythonhosted.org/packages/25/d7/54cd33fdc467923ebb1fda2384f6ea61a2614af59e44dc29e0f2f50a3290/SudachiPy-0.6.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ecdfc6919aa348e69749f8ea9a9dc9ffed6a77fa7142e07b925b287877ccce49",
"md5": "bfbb12afa069f5f27d34b5870a6fe009",
"sha256": "06e9af5af655bd74fcb793305efe0d46689b4827f295807b332f935abd974d83"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7-cp311-cp311-win_amd64.whl",
"has_sig": false,
"md5_digest": "bfbb12afa069f5f27d34b5870a6fe009",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": null,
"size": 1044914,
"upload_time": "2023-02-15T07:44:16",
"upload_time_iso_8601": "2023-02-15T07:44:16.070832Z",
"url": "https://files.pythonhosted.org/packages/ec/df/c6919aa348e69749f8ea9a9dc9ffed6a77fa7142e07b925b287877ccce49/SudachiPy-0.6.7-cp311-cp311-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e05ee447a9ae99036b349ff131000e9671559c2ffbb0a21cb12902e8e6b3c73f",
"md5": "523df833b5cb5d16a7d78305a0e92e9f",
"sha256": "c660fb9dfd65d4d03109dfc5ab6b0e7415c79ed78c59fd456cd30e26ea6c340d"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7-cp37-cp37m-macosx_10_12_universal2.whl",
"has_sig": false,
"md5_digest": "523df833b5cb5d16a7d78305a0e92e9f",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": null,
"size": 2434051,
"upload_time": "2023-02-15T07:44:17",
"upload_time_iso_8601": "2023-02-15T07:44:17.695531Z",
"url": "https://files.pythonhosted.org/packages/e0/5e/e447a9ae99036b349ff131000e9671559c2ffbb0a21cb12902e8e6b3c73f/SudachiPy-0.6.7-cp37-cp37m-macosx_10_12_universal2.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "cea7e30cdf3b535602b76906b5a76a1077a0be2736305e1c8bfb1863142d5acc",
"md5": "6dfd555c872fbd442fccff083ebf1230",
"sha256": "8f1aa21328289206520e62547d3ab8a62e1c6ed370a8b025585b302e1de772bc"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "6dfd555c872fbd442fccff083ebf1230",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": null,
"size": 2209482,
"upload_time": "2023-02-15T07:44:19",
"upload_time_iso_8601": "2023-02-15T07:44:19.388429Z",
"url": "https://files.pythonhosted.org/packages/ce/a7/e30cdf3b535602b76906b5a76a1077a0be2736305e1c8bfb1863142d5acc/SudachiPy-0.6.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7e311a7dc182d7174476b2d9d9ecea65c93dfab3bd19c0fd695be75e15d8c788",
"md5": "ae141fedd9d9301078bfd1dbeff2763a",
"sha256": "656628cd74b7483f1c10d25dee048983f062404f8cd04dfc3c5e28f987bc5a39"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7-cp37-cp37m-win_amd64.whl",
"has_sig": false,
"md5_digest": "ae141fedd9d9301078bfd1dbeff2763a",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": null,
"size": 1044704,
"upload_time": "2023-02-15T07:44:21",
"upload_time_iso_8601": "2023-02-15T07:44:21.443208Z",
"url": "https://files.pythonhosted.org/packages/7e/31/1a7dc182d7174476b2d9d9ecea65c93dfab3bd19c0fd695be75e15d8c788/SudachiPy-0.6.7-cp37-cp37m-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "66801b4851c5fe38c0a511a413a2fcb550eaeef18adddf23c437fbe414bf8aa7",
"md5": "aee17d4b117783210f331f3475266aca",
"sha256": "fb1085d372169df8187206dc505710d1a440f87523dbb287c4c639fb3d1eac78"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7-cp38-cp38-macosx_10_12_universal2.whl",
"has_sig": false,
"md5_digest": "aee17d4b117783210f331f3475266aca",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": null,
"size": 2434143,
"upload_time": "2023-02-15T07:44:23",
"upload_time_iso_8601": "2023-02-15T07:44:23.376610Z",
"url": "https://files.pythonhosted.org/packages/66/80/1b4851c5fe38c0a511a413a2fcb550eaeef18adddf23c437fbe414bf8aa7/SudachiPy-0.6.7-cp38-cp38-macosx_10_12_universal2.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e415be1410095116908b1ada29b31ea6482c37d492f036324ae69ae8dad9205b",
"md5": "7600c82ff67b8fd33ae04c10a012b13a",
"sha256": "b487890c499c927e8c6af2c76bdea3b8602f7c0a9145fe5e25cd9a82cbb96c05"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "7600c82ff67b8fd33ae04c10a012b13a",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": null,
"size": 2209475,
"upload_time": "2023-02-15T07:44:25",
"upload_time_iso_8601": "2023-02-15T07:44:25.800468Z",
"url": "https://files.pythonhosted.org/packages/e4/15/be1410095116908b1ada29b31ea6482c37d492f036324ae69ae8dad9205b/SudachiPy-0.6.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3b52122187d801f4c8fee829c2d218a1c147601e724204dd68e7390bf2bdc5a6",
"md5": "f96937679e18d1b9d66cd2d54bf2ffda",
"sha256": "4f60e20cd50a940f3ef57606a09f20ac2ef4262989f545e552eebb18b5484608"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7-cp38-cp38-win_amd64.whl",
"has_sig": false,
"md5_digest": "f96937679e18d1b9d66cd2d54bf2ffda",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": null,
"size": 1044778,
"upload_time": "2023-02-15T07:44:28",
"upload_time_iso_8601": "2023-02-15T07:44:28.336963Z",
"url": "https://files.pythonhosted.org/packages/3b/52/122187d801f4c8fee829c2d218a1c147601e724204dd68e7390bf2bdc5a6/SudachiPy-0.6.7-cp38-cp38-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8a2d03eb68787fd40ce900105d0dd6ad48d8ce80f40b22144d5b63035219d143",
"md5": "47d929324b01ee18bf72ef3f581fccff",
"sha256": "2d9ad67c54463eadce8728bf991c18d936b0756eedd3ed7661082aa36e7e5d60"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7-cp39-cp39-macosx_10_12_universal2.whl",
"has_sig": false,
"md5_digest": "47d929324b01ee18bf72ef3f581fccff",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 2433487,
"upload_time": "2023-02-15T07:44:30",
"upload_time_iso_8601": "2023-02-15T07:44:30.046578Z",
"url": "https://files.pythonhosted.org/packages/8a/2d/03eb68787fd40ce900105d0dd6ad48d8ce80f40b22144d5b63035219d143/SudachiPy-0.6.7-cp39-cp39-macosx_10_12_universal2.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "176af77c7d078ed189fdcf96215c89e9c55698cb11327433f872256cef2352cb",
"md5": "63044cacaf1b567a26f478bc28875486",
"sha256": "d167cb59e044aff1123b534d045c66147ffa76a75196284e188c245af6fed013"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "63044cacaf1b567a26f478bc28875486",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 2209227,
"upload_time": "2023-02-15T07:44:32",
"upload_time_iso_8601": "2023-02-15T07:44:32.066796Z",
"url": "https://files.pythonhosted.org/packages/17/6a/f77c7d078ed189fdcf96215c89e9c55698cb11327433f872256cef2352cb/SudachiPy-0.6.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2909aee4905281cdb831e6772e09564083fbb1647049057cab83cc94dfc57ae0",
"md5": "5ba0363e65e143c22f50c46578ea07a0",
"sha256": "4e368e7bfa83885fd9a6ffb0366fe73ce100caa72ab86f69b149091ce813da27"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7-cp39-cp39-win_amd64.whl",
"has_sig": false,
"md5_digest": "5ba0363e65e143c22f50c46578ea07a0",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 1045177,
"upload_time": "2023-02-15T07:44:33",
"upload_time_iso_8601": "2023-02-15T07:44:33.760988Z",
"url": "https://files.pythonhosted.org/packages/29/09/aee4905281cdb831e6772e09564083fbb1647049057cab83cc94dfc57ae0/SudachiPy-0.6.7-cp39-cp39-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b44011f8f08adce726f89da640a9e6cee987020a2ebcf4162217429367df1b9a",
"md5": "78c8b9c7580c27ef8c47b40e6cba3744",
"sha256": "e4fb026cc367e0dff7d0b8a2fce510e66d5bf7ef4728af76bfe63132b22753cd"
},
"downloads": -1,
"filename": "SudachiPy-0.6.7.tar.gz",
"has_sig": false,
"md5_digest": "78c8b9c7580c27ef8c47b40e6cba3744",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 157139,
"upload_time": "2023-02-15T07:44:34",
"upload_time_iso_8601": "2023-02-15T07:44:34.940637Z",
"url": "https://files.pythonhosted.org/packages/b4/40/11f8f08adce726f89da640a9e6cee987020a2ebcf4162217429367df1b9a/SudachiPy-0.6.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-02-15 07:44:34",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "sudachipy"
}