schema-classification


Nameschema-classification JSON
Version 0.1.7 PyPI version JSON
download
home_pagehttps://github.com/craigtrim/schema-classification
SummaryPerform Intent Classification using an External Schema
upload_time2022-12-01 00:21:59
maintainerCraig Trim
docs_urlNone
authorCraig Trim
requires_python>=3.8.5,<4.0.0
licenseNone
keywords nlp nlu ai classification intents
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # schema-classification
This microservice performs the classification of parse results

## Usage
The input format looks like this
```python
input_tokens = [
    {
        "normal": "my",
    },
    {
        "normal": "late",
    },
    {
        "normal": "transport",
    },
    {
        "normal": "late_transport",
        "swaps": {
            "canon": "late_transport",
            "type": "chitchat"
        }
    },
]
```

Calling the service looks like this
```python
from schema_classification import classify

absolute_path = os.path.normpath(
    os.path.join(os.getcwd(), 'resources/testing',
                    'test-intents-0.1.0.yaml'))

svcresult = classify(
    absolute_path=absolute_path,
    input_tokens=input_tokens)
```

The output from this call looks like
```python
{
    'result': [{
        'classification': 'Late_Transport',
        'confidence': 99 }],
    'tokens': {
        'late': '',
        'late_transport': 'chitchat',
        'my': '',
        'transport': ''}
}
```


## Classification via Mapping
Classification of Unstructured Text is a mapping exercise

The mapping is composed of these elements
1. Include One Of
2. Include All Of
3. Exclude One Of
4. Exclude All Of

The classifier will map extracted entities from unstructured text using the listed elements.

for example,

```yaml
TEST_INTENT
  - include_one_of:
    - alpha
    - apple
  - include_all_of:
    - beta
    - gamma
  - exclude_one_of:
    - delta
  - exclude_all_of:
    - epsilon
    - digamma
```

This intent will be selected if the set of extracted entities has either `alpha` or `apple` and has both `(beta, gamma)`. The intent will be discarded if `delta` occurs or if both `(epsilon, digamma)` occur.

In python, everything can be loaded into a native set structure and use native operations like `difference`, `intersection`, `union`, and `symmetric difference`.

Because all set operations are native (underlying C modules), it's extremely fast to find an accurate classification.

The system adds more smarts by figuring out what to do if the rule states `delta` is excluded, and a descendant of `delta` is present.

Or if `alpha` should be included and a sibling or child of `alpha` is present, etc.

In this case, I usually rely on a heuristic to boost or lower confidence and tweak that overtime to get a good result.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/craigtrim/schema-classification",
    "name": "schema-classification",
    "maintainer": "Craig Trim",
    "docs_url": null,
    "requires_python": ">=3.8.5,<4.0.0",
    "maintainer_email": "craigtrim@gmail.com",
    "keywords": "nlp,nlu,ai,classification,intents",
    "author": "Craig Trim",
    "author_email": "craigtrim@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/88/74/73f366c388696420602c555e75cf9636bf2998d37ab2ca66d5cd4eebfa53/schema-classification-0.1.7.tar.gz",
    "platform": null,
    "description": "# schema-classification\nThis microservice performs the classification of parse results\n\n## Usage\nThe input format looks like this\n```python\ninput_tokens = [\n    {\n        \"normal\": \"my\",\n    },\n    {\n        \"normal\": \"late\",\n    },\n    {\n        \"normal\": \"transport\",\n    },\n    {\n        \"normal\": \"late_transport\",\n        \"swaps\": {\n            \"canon\": \"late_transport\",\n            \"type\": \"chitchat\"\n        }\n    },\n]\n```\n\nCalling the service looks like this\n```python\nfrom schema_classification import classify\n\nabsolute_path = os.path.normpath(\n    os.path.join(os.getcwd(), 'resources/testing',\n                    'test-intents-0.1.0.yaml'))\n\nsvcresult = classify(\n    absolute_path=absolute_path,\n    input_tokens=input_tokens)\n```\n\nThe output from this call looks like\n```python\n{\n    'result': [{\n        'classification': 'Late_Transport',\n        'confidence': 99 }],\n    'tokens': {\n        'late': '',\n        'late_transport': 'chitchat',\n        'my': '',\n        'transport': ''}\n}\n```\n\n\n## Classification via Mapping\nClassification of Unstructured Text is a mapping exercise\n\nThe mapping is composed of these elements\n1. Include One Of\n2. Include All Of\n3. Exclude One Of\n4. Exclude All Of\n\nThe classifier will map extracted entities from unstructured text using the listed elements.\n\nfor example,\n\n```yaml\nTEST_INTENT\n  - include_one_of:\n    - alpha\n    - apple\n  - include_all_of:\n    - beta\n    - gamma\n  - exclude_one_of:\n    - delta\n  - exclude_all_of:\n    - epsilon\n    - digamma\n```\n\nThis intent will be selected if the set of extracted entities has either `alpha` or `apple` and has both `(beta, gamma)`. The intent will be discarded if `delta` occurs or if both `(epsilon, digamma)` occur.\n\nIn python, everything can be loaded into a native set structure and use native operations like `difference`, `intersection`, `union`, and `symmetric difference`.\n\nBecause all set operations are native (underlying C modules), it's extremely fast to find an accurate classification.\n\nThe system adds more smarts by figuring out what to do if the rule states `delta` is excluded, and a descendant of `delta` is present.\n\nOr if `alpha` should be included and a sibling or child of `alpha` is present, etc.\n\nIn this case, I usually rely on a heuristic to boost or lower confidence and tweak that overtime to get a good result.\n",
    "bugtrack_url": null,
    "license": "None",
    "summary": "Perform Intent Classification using an External Schema",
    "version": "0.1.7",
    "split_keywords": [
        "nlp",
        "nlu",
        "ai",
        "classification",
        "intents"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "74383ab5435e9f41238d3a7bad5450ca",
                "sha256": "d4f4b5215251e211be44a4a49fdbcd8eb53012b9554b1839971a6a6b92fb0cda"
            },
            "downloads": -1,
            "filename": "schema_classification-0.1.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "74383ab5435e9f41238d3a7bad5450ca",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8.5,<4.0.0",
            "size": 24140,
            "upload_time": "2022-12-01T00:22:01",
            "upload_time_iso_8601": "2022-12-01T00:22:01.779523Z",
            "url": "https://files.pythonhosted.org/packages/93/90/4ca82c2362f447724382f3674400b70e1925414d518943d53ca2fe0e1fbe/schema_classification-0.1.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "9f4b8b2438fd5baa92c03bc4ee7a4a05",
                "sha256": "901f1e77b3b84dcbf71c2e5c42f3aff834c803cd32a5cdd5d582315f5286ae20"
            },
            "downloads": -1,
            "filename": "schema-classification-0.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "9f4b8b2438fd5baa92c03bc4ee7a4a05",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8.5,<4.0.0",
            "size": 12997,
            "upload_time": "2022-12-01T00:21:59",
            "upload_time_iso_8601": "2022-12-01T00:21:59.978257Z",
            "url": "https://files.pythonhosted.org/packages/88/74/73f366c388696420602c555e75cf9636bf2998d37ab2ca66d5cd4eebfa53/schema-classification-0.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-01 00:21:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "craigtrim",
    "github_project": "schema-classification",
    "lcname": "schema-classification"
}
        
Elapsed time: 0.04671s