lexifuzz-ner

Name	lexifuzz-ner JSON
Version	0.0.8 JSON
	download
home_page	https://github.com/hanifabd/lexifuzz-ner
Summary	Python package for detecting entities in text based on a dictionary and fuzzy similarity
upload_time	2024-05-14 06:54:14
maintainer	None
docs_url	None
author	Hanif Yuli Abdillah P
requires_python	>=3.7
license	None
keywords	ktp segmentation segmentasi id-card identity-card
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # **LexiFuzz NER: Named Entity Recognition Based on Dictionary and Fuzzy Matching**

![Image](https://github.com/hanifabd/lexifuzz-ner/blob/master/assets/lexifuzz-mascot.png)

## **About**
LexiFuzz NER is a Named Entity Recognition (NER) package designed to identify and extract named entities from unstructured text data. Leveraging a combination of dictionary-based and fuzzy matching techniques, LexiFuzz NER offers state-of-the-art accuracy in recognizing named entities in various domains, making it an invaluable tool for information extraction, natural language understanding, and text analytics.

## **Requirements**
- Python 3.7 or Higher
- NLTK
- TheFuzz

## **Key Features**

1. **Dictionary-Based Recognition**: LexiFuzz NER utilizes a comprehensive dictionary of named entities, encompassing a wide range of entities such as person names, organizations, locations, dates, and more. This dictionary is continuously updated to ensure high precision in entity recognition.

2. **Fuzzy Matching**: The package employs advanced fuzzy matching algorithms to identify named entities even in cases of typographical errors, misspellings, or variations in naming conventions. This ensures robustness in recognizing entities with varying textual representations.

3. **Customization**: LexiFuzz NER allows users to easily customize and expand the entity dictionary to suit specific domain or application requirements. This flexibility makes it adaptable to a wide array of use cases.

## Usage
### Manual Installation via Github
1. Clone Repository
    ```
    git clone https://github.com/hanifabd/lexifuzz-ner.git
    ```
2. Installation
    ```
    cd lexifuzz-ner && pip install .
    ```
### Installation Using Pip
1. Installation
    ```sh
    pip install lexifuzz-ner
    ```
### Inference
1. Usage
    ```py
    from lexifuzz_ner.ner import find_entity

    dictionary = {
        'individual_product' : ['tahapan', 'xpresi', 'gold', 'berjangka'],
        'brand' : ["bca", "bank central asia"]
    }

    text = "i wanna ask about bca tahapn savings product"
    entities = find_entity(text, dictionary, 90)
    print(entities)
    ```

3. Result
    ```md
    {
        'entities': [
            {
                'id': '55a20c6b-bd4a-43ee-8853-b961ac537ca8',
                'entity': 'bca',
                'category': 'brand',
                'score': 100,
                'index': {'start': 18, 'end': 20}},
            {
                'id': '08917da5-ed51-44bb-9be9-52f17df2640a',
                'entity': 'tahapn',
                'category': 'individual_product',
                'score': 92,
                'index': {'start': 22, 'end': 28}
            }
        ],
        'text': 'i wanna ask about bca tahapn savings product',
        'text_annotated': 'i wanna ask about [bca]{55a20c6b-bd4a-43ee-8853-b961ac537ca8} [tahapn]{08917da5-ed51-44bb-9be9-52f17df2640a} savings product'
    }
    ```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hanifabd/lexifuzz-ner",
    "name": "lexifuzz-ner",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "ktp, segmentation, segmentasi, id-card, identity-card",
    "author": "Hanif Yuli Abdillah P",
    "author_email": "hanifabd23@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/db/7d/38381e4854298a30a5f6e8d07be94b9908a093a3f7336073649dab8bf0a5/lexifuzz_ner-0.0.8.tar.gz",
    "platform": null,
    "description": "# **LexiFuzz NER: Named Entity Recognition Based on Dictionary and Fuzzy Matching**\n\n![Image](https://github.com/hanifabd/lexifuzz-ner/blob/master/assets/lexifuzz-mascot.png)\n\n## **About**\nLexiFuzz NER is a Named Entity Recognition (NER) package designed to identify and extract named entities from unstructured text data. Leveraging a combination of dictionary-based and fuzzy matching techniques, LexiFuzz NER offers state-of-the-art accuracy in recognizing named entities in various domains, making it an invaluable tool for information extraction, natural language understanding, and text analytics.\n\n## **Requirements**\n- Python 3.7 or Higher\n- NLTK\n- TheFuzz\n\n## **Key Features**\n\n1. **Dictionary-Based Recognition**: LexiFuzz NER utilizes a comprehensive dictionary of named entities, encompassing a wide range of entities such as person names, organizations, locations, dates, and more. This dictionary is continuously updated to ensure high precision in entity recognition.\n\n2. **Fuzzy Matching**: The package employs advanced fuzzy matching algorithms to identify named entities even in cases of typographical errors, misspellings, or variations in naming conventions. This ensures robustness in recognizing entities with varying textual representations.\n\n3. **Customization**: LexiFuzz NER allows users to easily customize and expand the entity dictionary to suit specific domain or application requirements. This flexibility makes it adaptable to a wide array of use cases.\n\n## Usage\n### Manual Installation via Github\n1. Clone Repository\n    ```\n    git clone https://github.com/hanifabd/lexifuzz-ner.git\n    ```\n2. Installation\n    ```\n    cd lexifuzz-ner && pip install .\n    ```\n### Installation Using Pip\n1. Installation\n    ```sh\n    pip install lexifuzz-ner\n    ```\n### Inference\n1. Usage\n    ```py\n    from lexifuzz_ner.ner import find_entity\n\n    dictionary = {\n        'individual_product' : ['tahapan', 'xpresi', 'gold', 'berjangka'],\n        'brand' : [\"bca\", \"bank central asia\"]\n    }\n\n    text = \"i wanna ask about bca tahapn savings product\"\n    entities = find_entity(text, dictionary, 90)\n    print(entities)\n    ```\n\n3. Result\n    ```md\n    {\n        'entities': [\n            {\n                'id': '55a20c6b-bd4a-43ee-8853-b961ac537ca8',\n                'entity': 'bca',\n                'category': 'brand',\n                'score': 100,\n                'index': {'start': 18, 'end': 20}},\n            {\n                'id': '08917da5-ed51-44bb-9be9-52f17df2640a',\n                'entity': 'tahapn',\n                'category': 'individual_product',\n                'score': 92,\n                'index': {'start': 22, 'end': 28}\n            }\n        ],\n        'text': 'i wanna ask about bca tahapn savings product',\n        'text_annotated': 'i wanna ask about [bca]{55a20c6b-bd4a-43ee-8853-b961ac537ca8} [tahapn]{08917da5-ed51-44bb-9be9-52f17df2640a} savings product'\n    }\n    ```\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python package for detecting entities in text based on a dictionary and fuzzy similarity",
    "version": "0.0.8",
    "project_urls": {
        "Homepage": "https://github.com/hanifabd/lexifuzz-ner",
        "Repository": "https://github.com/hanifabd/lexifuzz-ner"
    },
    "split_keywords": [
        "ktp",
        " segmentation",
        " segmentasi",
        " id-card",
        " identity-card"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dbb4246340a34bc438fd9cbd6217da49a6a5f8e1686909d1aaebcc00e06409de",
                "md5": "240d23c31dd4c857fa90438d3256e645",
                "sha256": "4433d714a4e0e01eb6b74d5858fabd22a66bf12e1f70286a378212ebb6611db2"
            },
            "downloads": -1,
            "filename": "lexifuzz_ner-0.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "240d23c31dd4c857fa90438d3256e645",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 5634,
            "upload_time": "2024-05-14T06:54:13",
            "upload_time_iso_8601": "2024-05-14T06:54:13.306306Z",
            "url": "https://files.pythonhosted.org/packages/db/b4/246340a34bc438fd9cbd6217da49a6a5f8e1686909d1aaebcc00e06409de/lexifuzz_ner-0.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "db7d38381e4854298a30a5f6e8d07be94b9908a093a3f7336073649dab8bf0a5",
                "md5": "43a63b5026f062c3ec18466771a45856",
                "sha256": "fcb510728d77a69d63011f28bdf9b4dedfb01e5b0a789d9f942d90a7099982e2"
            },
            "downloads": -1,
            "filename": "lexifuzz_ner-0.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "43a63b5026f062c3ec18466771a45856",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 5301,
            "upload_time": "2024-05-14T06:54:14",
            "upload_time_iso_8601": "2024-05-14T06:54:14.569792Z",
            "url": "https://files.pythonhosted.org/packages/db/7d/38381e4854298a30a5f6e8d07be94b9908a093a3f7336073649dab8bf0a5/lexifuzz_ner-0.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-14 06:54:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hanifabd",
    "github_project": "lexifuzz-ner",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "lexifuzz-ner"
}

Hanif Yuli Abdillah P