yurenizer

Name	yurenizer JSON
Version	0.2.2 JSON
	download
home_page	https://github.com/sea-turt1e/yurenizer
Summary	A library for standardizing terms with spelling variations using a synonym dictionary.
upload_time	2024-12-08 08:03:52
maintainer	None
docs_url	None
author	sea-turt1e
requires_python	<4.0,>=3.11
license	Apache-2.0
keywords	nlp text-processing japanese synonym normalization
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![Python](https://img.shields.io/badge/-Python-F9DC3E.svg?logo=python&style=flat)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
![PyPI Downloads](https://static.pepy.tech/badge/yurenizer)

# yurenizer
This is a Japanese text normalizer that resolves spelling inconsistencies.

Japanese README is Here.（日本語のREADMEはこちら）  
https://github.com/sea-turt1e/yurenizer/blob/main/README_ja.md

## Overview
yurenizer is a tool for detecting and unifying variations in Japanese text notation.  
For example, it can unify variations like "パソコン" (pasokon), "パーソナル・コンピュータ" (personal computer), and "パーソナルコンピュータ" into "パーソナルコンピューター".  
These rules follow the [Sudachi Synonym Dictionary](https://github.com/WorksApplications/SudachiDict/blob/develop/docs/synonyms.md).

## web-based Demo
You can try the web-based demo here.  
[yurenizer Web-demo](https://yurenizer.net/)  
<div><video controls src="https://github.com/user-attachments/assets/fdcbaa1a-5692-4c30-a8e1-188d5016443d" muted="false"></video></div>

## Installation
```bash
pip install yurenizer
```

## Download Synonym Dictionary
```bash
curl -L -o synonyms.txt https://raw.githubusercontent.com/WorksApplications/SudachiDict/refs/heads/develop/src/main/text/synonyms.txt
```

## Usage
### Quick Start
```python
from yurenizer import SynonymNormalizer, NormalizerConfig
normalizer = SynonymNormalizer(synonym_file_path="synonyms.txt")
text = "「パソコン」は「パーソナルコンピュータ」の「synonym」で、「パーソナル・コンピュータ」と表記することもあります。"
print(normalizer.normalize(text))
# Output: 「パーソナルコンピューター」は「パーソナルコンピューター」の「シノニム」で、「パーソナルコンピューター」と表記することもあります。
```

### Customizing Settings
You can control normalization by specifying `NormalizerConfig` as an argument to the normalize function.

#### Example with Custom Settings
```python
from yurenizer import SynonymNormalizer, NormalizerConfig
normalizer = SynonymNormalizer(synonym_file_path="synonyms.txt")
text = "「東日本旅客鉄道」は「JR東」や「JR-East」とも呼ばれます"
config = NormalizerConfig(
            taigen=True, 
            yougen=False,
            expansion="from_another", 
            unify_level="lexeme",
            other_language=False,
            alias=False,
            old_name=False,
            misuse=False,
            alphabetic_abbreviation=True, # Normalize only alphabetic abbreviations
            non_alphabetic_abbreviation=False,
            alphabet=False,
            orthographic_variation=False,
            misspelling=False
        )
print(f"Output: {normalizer.normalize(text, config)}")
# Output: 「東日本旅客鉄道」は「JR東」や「東日本旅客鉄道」とも呼ばれます
```


---

## **Configuration Details**

The settings in *yurenizer* are organized hierarchically, allowing you to control the scope and target of normalization.

---

### **1. taigen / yougen (Target Selection)**

Use the `taigen` and `yougen` flags to control which parts of speech are included in the normalization.

| **Setting**   | **Default Value** | **Description**                                                                                              |
|---------------|-------------------|--------------------------------------------------------------------------------------------------------------|
| `taigen`      | `True`            | Includes nouns and other substantives in the normalization. Set to `False` to exclude them.                  |
| `yougen`      | `False`           | Includes verbs and other predicates in the normalization. Set to `True` to include them (normalized to their lemma). |

---

### **2. expansion (Expansion Flag)**

The expansion flag determines how synonyms are expanded based on the synonym dictionary's internal control flags.

| **Value**         | **Description**                                                                                       |
|--------------------|-------------------------------------------------------------------------------------------------------|
| `from_another`   | Expands only the synonyms with a control flag value of `0` in the synonym dictionary.                 |
| `any`            | Expands all synonyms regardless of their control flag value.                                         |

---

### **3. unify_level (Normalization Level)**

Specify the **level of normalization** with the `unify_level` parameter.

| **Value**          | **Description**                                                                                                                                 |
|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|
| `lexeme`          | Performs the most comprehensive normalization, targeting **all groups (a, b, c)** mentioned below.                                              |
| `word_form`       | Normalizes by word form, targeting **groups b and c**.                                                                                         |
| `abbreviation`    | Normalizes by abbreviation, targeting **group c** only.                                                                                        |

---

### **4. Detailed Normalization Settings (a, b, c Groups)**

#### **a Group: Comprehensive Lexical Normalization**
Controls normalization based on vocabulary and semantics using the following settings:

| **Setting**       | **Default Value** | **Description**                                                                                              |
|--------------------|-------------------|--------------------------------------------------------------------------------------------------------------|
| `other_language`  | `True`            | Normalizes non-Japanese terms (e.g., English) to Japanese. Set to `False` to disable this feature.            |
| `alias`           | `True`            | Normalizes aliases. Set to `False` to disable this feature.                                                  |
| `old_name`        | `True`            | Normalizes old names. Set to `False` to disable this feature.                                                |
| `misuse`          | `True`            | Normalizes misused terms. Set to `False` to disable this feature.                                            |

---

#### **b Group: Abbreviation Normalization**
Controls normalization of abbreviations using the following settings:

| **Setting**                 | **Default Value** | **Description**                                                                                              |
|------------------------------|-------------------|--------------------------------------------------------------------------------------------------------------|
| `alphabetic_abbreviation`   | `True`            | Normalizes abbreviations written in alphabetic characters. Set to `False` to disable this feature.           |
| `non_alphabetic_abbreviation` | `True`          | Normalizes abbreviations written in non-alphabetic characters (e.g., Japanese). Set to `False` to disable this feature. |

---

#### **c Group: Orthographic Normalization**
Controls normalization of orthographic variations and errors using the following settings:

| **Setting**              | **Default Value** | **Description**                                                                                              |
|---------------------------|-------------------|--------------------------------------------------------------------------------------------------------------|
| `alphabet`               | `True`            | Normalizes alphabetic variations. Set to `False` to disable this feature.                                    |
| `orthographic_variation` | `True`            | Normalizes orthographic variations. Set to `False` to disable this feature.                                  |
| `misspelling`            | `True`            | Normalizes misspellings. Set to `False` to disable this feature.                                             |

---

### **5. custom_synonym (Custom Dictionary)**

If you want to use a custom dictionary, control its behavior with the following setting:

| **Setting**       | **Default Value** | **Description**                                                                                              |
|--------------------|-------------------|--------------------------------------------------------------------------------------------------------------|
| `custom_synonym`   | `True`            | Enables the use of a custom dictionary. Set to `False` to disable it.                                        |

---

This hierarchical configuration allows for flexible normalization by defining the scope and target in detail.

## Custom Dictionary Specification
You can specify your own custom dictionary.  
If the same word exists in both the custom dictionary and Sudachi synonym dictionary, the custom dictionary takes precedence.

### Custom Dictionary Format
The custom dictionary file should be in JSON, CSV, or TSV format.
- JSON file
```json
{
    "Representative word 1": ["Synonym 1_1", "Synonym 1_2", ...],
    "Representative word 2": ["Synonym 2_1", "Synonym 2_2", ...],
}
```
- CSV file
```
Representative word 1,Synonym 1_1,Synonym 1_2,...
Representative word 2,Synonym 2_1,Synonym 2_2,...
```
- TSV file
```
Representative word 1	Synonym 1_1	Synonym 1_2	...
Representative word 2	Synonym 2_1	Synonym 2_2	...
...
```

#### Example
If you create a file like the one below, "幽白", "ゆうはく", and "幽☆遊☆白書" will be normalized to "幽遊白書".

- JSON file
```json
{
    "幽遊白書": ["幽白", "ゆうはく", "幽☆遊☆白書"],
}
```
- CSV file
```csv
幽遊白書,幽白,ゆうはく,幽☆遊☆白書
```
- TSV file
```tsv
幽遊白書	幽白	ゆうはく	幽☆遊☆白書
```

### How to Specify
```python
normalizer = SynonymNormalizer(custom_synonyms_file="path/to/custom_dict_file")
```

## Normalization Using a CSV File
You can also normalize text using a CSV file.

### Example
```csv:input.csv
JR東日本
JR東
JR-East
```

Normalize using `CsvSynonymNormalizer` as shown below.
```python
from yurenizer import CsvSynonymNormalizer
input_file_path = "input.csv"
output_file_path = "output.csv"
csv_normalizer = CsvSynonymNormalizer(synonym_file_path="synonyms.txt")
csv_normalizer.normalize_csv(input_file_path, output_file_path)
```

The `output.csv` file will be output as follows.
```csv:output.csv
raw,normalized
JR東日本,東日本旅客鉄道
JR東,東日本旅客鉄道
JR-East,東日本旅客鉄道
```

## Specifying SudachiDict
The length of text segmentation varies depending on the type of SudachiDict. Default is "full", but you can specify "small" or "core".  
To use "small" or "core", install it and specify in the `SynonymNormalizer()` arguments:
```bash
pip install sudachidict_small
# or
pip install sudachidict_core
```

```python
normalizer = SynonymNormalizer(sudachi_dict="small")
# or
normalizer = SynonymNormalizer(sudachi_dict="core")
```
※ Please refer to [SudachiDict documentation](https://github.com/WorksApplications/SudachiDict) for details.

## License
This project is licensed under the [Apache License 2.0](LICENSE).

### Open Source Software Used
- [Sudachi Synonym Dictionary](https://github.com/WorksApplications/SudachiDict/blob/develop/docs/synonyms.md): Apache License 2.0
- [SudachiPy](https://github.com/WorksApplications/SudachiPy): Apache License 2.0
- [SudachiDict](https://github.com/WorksApplications/SudachiDict): Apache License 2.0

This library uses SudachiPy and its dictionary SudachiDict for morphological analysis. These are also distributed under the Apache License 2.0.

For detailed license information, please check the LICENSE files of each project:
- [Sudachi Synonym Dictionary LICENSE](https://github.com/WorksApplications/SudachiDict/blob/develop/LICENSE-2.0.txt)
※ Provided under the same license as the Sudachi dictionary.
- [SudachiPy LICENSE](https://github.com/WorksApplications/SudachiPy/blob/develop/LICENSE)
- [SudachiDict LICENSE](https://github.com/WorksApplications/SudachiDict/blob/develop/LICENSE-2.0.txt)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sea-turt1e/yurenizer",
    "name": "yurenizer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.11",
    "maintainer_email": null,
    "keywords": "nlp, text-processing, japanese, synonym, normalization",
    "author": "sea-turt1e",
    "author_email": "h.yamada.bg@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/d9/f4/09327ce929db341cb49303bf7b2df9a4c099c94d68a7506922b1ab4e2188/yurenizer-0.2.2.tar.gz",
    "platform": null,
    "description": "![Python](https://img.shields.io/badge/-Python-F9DC3E.svg?logo=python&style=flat)\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n![PyPI Downloads](https://static.pepy.tech/badge/yurenizer)\n\n# yurenizer\nThis is a Japanese text normalizer that resolves spelling inconsistencies.\n\nJapanese README is Here.\uff08\u65e5\u672c\u8a9e\u306eREADME\u306f\u3053\u3061\u3089\uff09  \nhttps://github.com/sea-turt1e/yurenizer/blob/main/README_ja.md\n\n## Overview\nyurenizer is a tool for detecting and unifying variations in Japanese text notation.  \nFor example, it can unify variations like \"\u30d1\u30bd\u30b3\u30f3\" (pasokon), \"\u30d1\u30fc\u30bd\u30ca\u30eb\u30fb\u30b3\u30f3\u30d4\u30e5\u30fc\u30bf\" (personal computer), and \"\u30d1\u30fc\u30bd\u30ca\u30eb\u30b3\u30f3\u30d4\u30e5\u30fc\u30bf\" into \"\u30d1\u30fc\u30bd\u30ca\u30eb\u30b3\u30f3\u30d4\u30e5\u30fc\u30bf\u30fc\".  \nThese rules follow the [Sudachi Synonym Dictionary](https://github.com/WorksApplications/SudachiDict/blob/develop/docs/synonyms.md).\n\n## web-based Demo\nYou can try the web-based demo here.  \n[yurenizer Web-demo](https://yurenizer.net/)  \n<div><video controls src=\"https://github.com/user-attachments/assets/fdcbaa1a-5692-4c30-a8e1-188d5016443d\" muted=\"false\"></video></div>\n\n## Installation\n```bash\npip install yurenizer\n```\n\n## Download Synonym Dictionary\n```bash\ncurl -L -o synonyms.txt https://raw.githubusercontent.com/WorksApplications/SudachiDict/refs/heads/develop/src/main/text/synonyms.txt\n```\n\n## Usage\n### Quick Start\n```python\nfrom yurenizer import SynonymNormalizer, NormalizerConfig\nnormalizer = SynonymNormalizer(synonym_file_path=\"synonyms.txt\")\ntext = \"\u300c\u30d1\u30bd\u30b3\u30f3\u300d\u306f\u300c\u30d1\u30fc\u30bd\u30ca\u30eb\u30b3\u30f3\u30d4\u30e5\u30fc\u30bf\u300d\u306e\u300csynonym\u300d\u3067\u3001\u300c\u30d1\u30fc\u30bd\u30ca\u30eb\u30fb\u30b3\u30f3\u30d4\u30e5\u30fc\u30bf\u300d\u3068\u8868\u8a18\u3059\u308b\u3053\u3068\u3082\u3042\u308a\u307e\u3059\u3002\"\nprint(normalizer.normalize(text))\n# Output: \u300c\u30d1\u30fc\u30bd\u30ca\u30eb\u30b3\u30f3\u30d4\u30e5\u30fc\u30bf\u30fc\u300d\u306f\u300c\u30d1\u30fc\u30bd\u30ca\u30eb\u30b3\u30f3\u30d4\u30e5\u30fc\u30bf\u30fc\u300d\u306e\u300c\u30b7\u30ce\u30cb\u30e0\u300d\u3067\u3001\u300c\u30d1\u30fc\u30bd\u30ca\u30eb\u30b3\u30f3\u30d4\u30e5\u30fc\u30bf\u30fc\u300d\u3068\u8868\u8a18\u3059\u308b\u3053\u3068\u3082\u3042\u308a\u307e\u3059\u3002\n```\n\n### Customizing Settings\nYou can control normalization by specifying `NormalizerConfig` as an argument to the normalize function.\n\n#### Example with Custom Settings\n```python\nfrom yurenizer import SynonymNormalizer, NormalizerConfig\nnormalizer = SynonymNormalizer(synonym_file_path=\"synonyms.txt\")\ntext = \"\u300c\u6771\u65e5\u672c\u65c5\u5ba2\u9244\u9053\u300d\u306f\u300cJR\u6771\u300d\u3084\u300cJR-East\u300d\u3068\u3082\u547c\u3070\u308c\u307e\u3059\"\nconfig = NormalizerConfig(\n            taigen=True, \n            yougen=False,\n            expansion=\"from_another\", \n            unify_level=\"lexeme\",\n            other_language=False,\n            alias=False,\n            old_name=False,\n            misuse=False,\n            alphabetic_abbreviation=True, # Normalize only alphabetic abbreviations\n            non_alphabetic_abbreviation=False,\n            alphabet=False,\n            orthographic_variation=False,\n            misspelling=False\n        )\nprint(f\"Output: {normalizer.normalize(text, config)}\")\n# Output: \u300c\u6771\u65e5\u672c\u65c5\u5ba2\u9244\u9053\u300d\u306f\u300cJR\u6771\u300d\u3084\u300c\u6771\u65e5\u672c\u65c5\u5ba2\u9244\u9053\u300d\u3068\u3082\u547c\u3070\u308c\u307e\u3059\n```\n\n\n---\n\n## **Configuration Details**\n\nThe settings in *yurenizer* are organized hierarchically, allowing you to control the scope and target of normalization.\n\n---\n\n### **1. taigen / yougen (Target Selection)**\n\nUse the `taigen` and `yougen` flags to control which parts of speech are included in the normalization.\n\n| **Setting**   | **Default Value** | **Description**                                                                                              |\n|---------------|-------------------|--------------------------------------------------------------------------------------------------------------|\n| `taigen`      | `True`            | Includes nouns and other substantives in the normalization. Set to `False` to exclude them.                  |\n| `yougen`      | `False`           | Includes verbs and other predicates in the normalization. Set to `True` to include them (normalized to their lemma). |\n\n---\n\n### **2. expansion (Expansion Flag)**\n\nThe expansion flag determines how synonyms are expanded based on the synonym dictionary's internal control flags.\n\n| **Value**         | **Description**                                                                                       |\n|--------------------|-------------------------------------------------------------------------------------------------------|\n| `from_another`   | Expands only the synonyms with a control flag value of `0` in the synonym dictionary.                 |\n| `any`            | Expands all synonyms regardless of their control flag value.                                         |\n\n---\n\n### **3. unify_level (Normalization Level)**\n\nSpecify the **level of normalization** with the `unify_level` parameter.\n\n| **Value**          | **Description**                                                                                                                                 |\n|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|\n| `lexeme`          | Performs the most comprehensive normalization, targeting **all groups (a, b, c)** mentioned below.                                              |\n| `word_form`       | Normalizes by word form, targeting **groups b and c**.                                                                                         |\n| `abbreviation`    | Normalizes by abbreviation, targeting **group c** only.                                                                                        |\n\n---\n\n### **4. Detailed Normalization Settings (a, b, c Groups)**\n\n#### **a Group: Comprehensive Lexical Normalization**\nControls normalization based on vocabulary and semantics using the following settings:\n\n| **Setting**       | **Default Value** | **Description**                                                                                              |\n|--------------------|-------------------|--------------------------------------------------------------------------------------------------------------|\n| `other_language`  | `True`            | Normalizes non-Japanese terms (e.g., English) to Japanese. Set to `False` to disable this feature.            |\n| `alias`           | `True`            | Normalizes aliases. Set to `False` to disable this feature.                                                  |\n| `old_name`        | `True`            | Normalizes old names. Set to `False` to disable this feature.                                                |\n| `misuse`          | `True`            | Normalizes misused terms. Set to `False` to disable this feature.                                            |\n\n---\n\n#### **b Group: Abbreviation Normalization**\nControls normalization of abbreviations using the following settings:\n\n| **Setting**                 | **Default Value** | **Description**                                                                                              |\n|------------------------------|-------------------|--------------------------------------------------------------------------------------------------------------|\n| `alphabetic_abbreviation`   | `True`            | Normalizes abbreviations written in alphabetic characters. Set to `False` to disable this feature.           |\n| `non_alphabetic_abbreviation` | `True`          | Normalizes abbreviations written in non-alphabetic characters (e.g., Japanese). Set to `False` to disable this feature. |\n\n---\n\n#### **c Group: Orthographic Normalization**\nControls normalization of orthographic variations and errors using the following settings:\n\n| **Setting**              | **Default Value** | **Description**                                                                                              |\n|---------------------------|-------------------|--------------------------------------------------------------------------------------------------------------|\n| `alphabet`               | `True`            | Normalizes alphabetic variations. Set to `False` to disable this feature.                                    |\n| `orthographic_variation` | `True`            | Normalizes orthographic variations. Set to `False` to disable this feature.                                  |\n| `misspelling`            | `True`            | Normalizes misspellings. Set to `False` to disable this feature.                                             |\n\n---\n\n### **5. custom_synonym (Custom Dictionary)**\n\nIf you want to use a custom dictionary, control its behavior with the following setting:\n\n| **Setting**       | **Default Value** | **Description**                                                                                              |\n|--------------------|-------------------|--------------------------------------------------------------------------------------------------------------|\n| `custom_synonym`   | `True`            | Enables the use of a custom dictionary. Set to `False` to disable it.                                        |\n\n---\n\nThis hierarchical configuration allows for flexible normalization by defining the scope and target in detail.\n\n## Custom Dictionary Specification\nYou can specify your own custom dictionary.  \nIf the same word exists in both the custom dictionary and Sudachi synonym dictionary, the custom dictionary takes precedence.\n\n### Custom Dictionary Format\nThe custom dictionary file should be in JSON, CSV, or TSV format.\n- JSON file\n```json\n{\n    \"Representative word 1\": [\"Synonym 1_1\", \"Synonym 1_2\", ...],\n    \"Representative word 2\": [\"Synonym 2_1\", \"Synonym 2_2\", ...],\n}\n```\n- CSV file\n```\nRepresentative word 1,Synonym 1_1,Synonym 1_2,...\nRepresentative word 2,Synonym 2_1,Synonym 2_2,...\n```\n- TSV file\n```\nRepresentative word 1\tSynonym 1_1\tSynonym 1_2\t...\nRepresentative word 2\tSynonym 2_1\tSynonym 2_2\t...\n...\n```\n\n#### Example\nIf you create a file like the one below, \"\u5e7d\u767d\", \"\u3086\u3046\u306f\u304f\", and \"\u5e7d\u2606\u904a\u2606\u767d\u66f8\" will be normalized to \"\u5e7d\u904a\u767d\u66f8\".\n\n- JSON file\n```json\n{\n    \"\u5e7d\u904a\u767d\u66f8\": [\"\u5e7d\u767d\", \"\u3086\u3046\u306f\u304f\", \"\u5e7d\u2606\u904a\u2606\u767d\u66f8\"],\n}\n```\n- CSV file\n```csv\n\u5e7d\u904a\u767d\u66f8,\u5e7d\u767d,\u3086\u3046\u306f\u304f,\u5e7d\u2606\u904a\u2606\u767d\u66f8\n```\n- TSV file\n```tsv\n\u5e7d\u904a\u767d\u66f8\t\u5e7d\u767d\t\u3086\u3046\u306f\u304f\t\u5e7d\u2606\u904a\u2606\u767d\u66f8\n```\n\n### How to Specify\n```python\nnormalizer = SynonymNormalizer(custom_synonyms_file=\"path/to/custom_dict_file\")\n```\n\n## Normalization Using a CSV File\nYou can also normalize text using a CSV file.\n\n### Example\n```csv:input.csv\nJR\u6771\u65e5\u672c\nJR\u6771\nJR-East\n```\n\nNormalize using `CsvSynonymNormalizer` as shown below.\n```python\nfrom yurenizer import CsvSynonymNormalizer\ninput_file_path = \"input.csv\"\noutput_file_path = \"output.csv\"\ncsv_normalizer = CsvSynonymNormalizer(synonym_file_path=\"synonyms.txt\")\ncsv_normalizer.normalize_csv(input_file_path, output_file_path)\n```\n\nThe `output.csv` file will be output as follows.\n```csv:output.csv\nraw,normalized\nJR\u6771\u65e5\u672c,\u6771\u65e5\u672c\u65c5\u5ba2\u9244\u9053\nJR\u6771,\u6771\u65e5\u672c\u65c5\u5ba2\u9244\u9053\nJR-East,\u6771\u65e5\u672c\u65c5\u5ba2\u9244\u9053\n```\n\n## Specifying SudachiDict\nThe length of text segmentation varies depending on the type of SudachiDict. Default is \"full\", but you can specify \"small\" or \"core\".  \nTo use \"small\" or \"core\", install it and specify in the `SynonymNormalizer()` arguments:\n```bash\npip install sudachidict_small\n# or\npip install sudachidict_core\n```\n\n```python\nnormalizer = SynonymNormalizer(sudachi_dict=\"small\")\n# or\nnormalizer = SynonymNormalizer(sudachi_dict=\"core\")\n```\n\u203b Please refer to [SudachiDict documentation](https://github.com/WorksApplications/SudachiDict) for details.\n\n## License\nThis project is licensed under the [Apache License 2.0](LICENSE).\n\n### Open Source Software Used\n- [Sudachi Synonym Dictionary](https://github.com/WorksApplications/SudachiDict/blob/develop/docs/synonyms.md): Apache License 2.0\n- [SudachiPy](https://github.com/WorksApplications/SudachiPy): Apache License 2.0\n- [SudachiDict](https://github.com/WorksApplications/SudachiDict): Apache License 2.0\n\nThis library uses SudachiPy and its dictionary SudachiDict for morphological analysis. These are also distributed under the Apache License 2.0.\n\nFor detailed license information, please check the LICENSE files of each project:\n- [Sudachi Synonym Dictionary LICENSE](https://github.com/WorksApplications/SudachiDict/blob/develop/LICENSE-2.0.txt)\n\u203b Provided under the same license as the Sudachi dictionary.\n- [SudachiPy LICENSE](https://github.com/WorksApplications/SudachiPy/blob/develop/LICENSE)\n- [SudachiDict LICENSE](https://github.com/WorksApplications/SudachiDict/blob/develop/LICENSE-2.0.txt)\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "A library for standardizing terms with spelling variations using a synonym dictionary.",
    "version": "0.2.2",
    "project_urls": {
        "Homepage": "https://github.com/sea-turt1e/yurenizer",
        "Repository": "https://github.com/sea-turt1e/yurenizer"
    },
    "split_keywords": [
        "nlp",
        " text-processing",
        " japanese",
        " synonym",
        " normalization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "525df337751428a672080736f8ad8561bbb5a9378e9eb54c6f8d28aab911e01d",
                "md5": "4f28d795f96c3f78f40f9cf2e2f818ae",
                "sha256": "452b3925839eea2c8477d853036f534215c64aa2877bd9ab697de9066ab19925"
            },
            "downloads": -1,
            "filename": "yurenizer-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4f28d795f96c3f78f40f9cf2e2f818ae",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.11",
            "size": 21242,
            "upload_time": "2024-12-08T08:03:51",
            "upload_time_iso_8601": "2024-12-08T08:03:51.546107Z",
            "url": "https://files.pythonhosted.org/packages/52/5d/f337751428a672080736f8ad8561bbb5a9378e9eb54c6f8d28aab911e01d/yurenizer-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d9f409327ce929db341cb49303bf7b2df9a4c099c94d68a7506922b1ab4e2188",
                "md5": "6f69f043e5cb5649ec77c71d15bcf011",
                "sha256": "08344955a29ac3f4d4665764927affa42a3b20f6606b0e8ed0a897b7cc4e6548"
            },
            "downloads": -1,
            "filename": "yurenizer-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "6f69f043e5cb5649ec77c71d15bcf011",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.11",
            "size": 19550,
            "upload_time": "2024-12-08T08:03:52",
            "upload_time_iso_8601": "2024-12-08T08:03:52.597511Z",
            "url": "https://files.pythonhosted.org/packages/d9/f4/09327ce929db341cb49303bf7b2df9a4c099c94d68a7506922b1ab4e2188/yurenizer-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-08 08:03:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sea-turt1e",
    "github_project": "yurenizer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "yurenizer"
}

sea-turt1e