<h1 align="center">
Telekom NLU Bridge
</h1>
<p align="center">
<a href="https://github.com/telekom/nlu-bridge/commits" title="Last Commit"><img src="https://img.shields.io/github/last-commit/telekom/nlu-bridge?style=flat"></a>
<a href="https://github.com/telekom/nlu-bridge/issues" title="Open Issues"><img src="https://img.shields.io/github/issues/telekom/nlu-bridge?style=flat"></a>
<a href="https://github.com/telekom/nlu-bridge/blob/main/LICENSE" title="License"><img src="https://img.shields.io/badge/License-MIT-green.svg?style=flat"></a>
</p>
<p align="center">
<a href="#development">Development</a> •
<a href="#documentation">Documentation</a> •
<a href="#support-and-feedback">Support</a> •
<a href="#how-to-contribute">Contribute</a> •
<a href="#licensing">Licensing</a>
</p>
The goal of this project is to provide a unified API to several popular intent recognition
applications.
## About this component
### Installation
The core package including NLUdataset and Baseline vendors can be installed for
Python>=3.8 using pip
```
pip install nlubridge
```
Note that some vendors come with restrictions regarding the Python version, e.g. Rasa3
requires Python\<3.11.
To include optional dependencies for the vendors, e.g. Watson Assistant, type
```
pip install nlubridge[watson]
```
Following install options are available:
- `watson`
- `fasttext`
- `luis`
- `rasa2`
- `rasa3`
- `spacy`
- `huggingface`
Development tools can be installed with option `develop`.
Some vendors require access credentials like API tokens, URLs etc. These can be passed
on construction of the objects. Alternatively, such arguments can be passed as
environment variables, where the vendor will look for variables named variable
VENDORNAME_PARAM_NAME.
Some vendors require additional dependencies. E.g., Spacy requires a model that
can be downloaded (for the model de_core_news_sm) with
```
python -m spacy download de_core_news_sm
```
### Migration from v0
With realease 1.0.0 we introduce a couple of changes to the names of files and vendor
classes(see also https://github.com/telekom/nlu-bridge/issues/18).
Most notably:
- datasets.NLUdataset -> nlu_dataset.NluDataset
- vendors.vendors.Vendor -> - vendors.vendor.Vendor
- new supackage `dataloaders` that holds all functions for loading data into an NluDataset
- new function `nlu_dataset.concat` to concatenate NluDatasets passed in a list
- can load dataloaders, NluDataset, Vendor, OUT_OF_SCOPE_TOKEN, EntityKeys, concat,
directly from nlubridge like `from nlubridge import Vendor`
- Load vendors like `from nlubridge.vendors import Rasa3`
- former `TelekomModel` now called `CharNgramIntentClassifier`
- Some vendor names changed for clarity and consistency (see "List of supported vendors"
for the new names)
### Usage
Here is an example for the TfidfIntentClassifier:
```python
import os
import pandas as pd
from nlubridge.vendors import TfidfIntentClassifier
from nlubridge import NluDataset
dataset = NluDataset(texts, intents)
dataset = dataset.shuffle()
classifier = TfidfIntentClassifier()
train, test = dataset.train_test_split(test_size=0.25, random_state=0)
classifier = classifier.train_intent(train)
predicted = classifier.test_intent(test)
res = pd.DataFrame(list(zip(test.intents, predicted)), columns=['true', 'predicted'])
```
If you need to configure **stratification**, use the `stratification` parameter (defaults to `"intents"` and uses the intents in the dataset as stratification basis; whatever _else_ you pass along has to conform to `sklearn.model_selection.train_test_split(stratify=)`:
```python
train, test = dataset.train_test_split(test_size=0.25, random_state=0, stratification=None) # deactivate stratification (sklearn default for train_test_split)
```
To compare your own vendor or algorithm to existing vendors in this package, you can
write a Vendor Subclass for your vendor, and possibly a dataloader function. Feel free
to share your implementation using this repo. Similarly, fixes and extensions for the
existing vendors are always welcome.
### Logging
Most of the code uses python logging to report its progress. To get logs printed out
to console or Jupyter notebook, a logger needs to be configured, before the nlutests
code. Usually, log messages are on INFO level. This can be configured like this:
```python
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())
```
### Concepts / Architecture
- **Vendors**\
The [`vendors`](/nlubridge/vendors/) subpackage implements standardized interfaces to
the specific vendors. A specific `Vendor` instance is in charge of dealing with
converting the data to the required format, uploading data to the cloud if applicable,
training models and making predictions.
- **Datasets**\
The [`nlu_dataset`](/nlubridge/nlu_dataset/) module provides a standard interface to
NLU data. Data stored in different vendor's custom format can be loaded as a dataset
and provided to any different vendor.
- **Data Loaders**\
The [`dataloaders`](/nlubridge/dataloaders/) subpackage provides functions to load
data that are in a vendor-specific format as NluDataset.
### List of supported vendors
| Vendor Class | Status | Intents | Entities | Algorithm |
| ------ | ------ | ------- | -------- | --------- |
| [TfidfIntentClassifier](/nlubridge/vendors/tfidf_intent_classifier.py) | ✓ | ✓ | ✗ | TFIDF on words + SVM |
| [FastText](https://fasttext.cc) | ✓ | ✓ | ✗ | fasttext |
| [Spacy](https://spacy.io/usage/training#section-textcat) | ✓ | ✓ | ✗ | BoW linear + CNN |
| [WatsonAssistant](https://www.ibm.com/watson/services/conversation/) | ✓ | ✓ | ✗ | Propietary (probably LR) |
| [Luis](https://www.luis.ai/home) | needs testing | ✓ | ✗ | Propietary (probably LR) |
| [CharNgramIntentClassifier](/nlubridge/vendors/char_ngram_intent_classifier.py) | ✓ | ✓ | ✗ | tf-idf on char n-grams + SGD |
| [Rasa2](https://github.com/RasaHQ/rasa) | ✓ | ✓ | ✓ | configurable |
| [Rasa3](https://github.com/RasaHQ/rasa) | ✓ | ✓ | ✓ | configurable |
### Features
- Abstract class for Vendors with convenience methods (ex: scoring and scikit-learn compatibility)
- Abstract class for datasets with convenience methods (ex: train_test_split, indexing, iteration)
- Rate limiting to comply with cloud providers requirements
## Development
_TBD_
### Build
_TBD_
## Code of Conduct
This project has adopted the [Contributor Covenant](https://www.contributor-covenant.org/) in version 2.0 as our code of conduct. Please see the details in our [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md). All contributors must abide by the code of conduct.
## Working Language
We decided to apply _English_ as the primary project language.
Consequently, all content will be made available primarily in English. We also ask all interested people to use English as language to create issues, in their code (comments, documentation etc.) and when you send requests to us. The application itself and all end-user facing content will be made available in other languages as needed.
## Documentation
The full documentation for the telekom nlu-bridge can be found in _TBD_
## Support and Feedback
The following channels are available for discussions, feedback, and support requests:
| Type | Channel |
| ------------------------ | ------------------------------------------------------ |
| **Issues** | <a href="/../../issues/new/choose" title="General Discussion"><img src="https://img.shields.io/github/issues/telekom/nlu-bridge?style=flat-square"></a> </a> |
| **Other Requests** | <a href="mailto:opensource@telekom.de" title="Email Open Source Team"><img src="https://img.shields.io/badge/email-Open%20Source%20Team-green?logo=mail.ru&style=flat-square&logoColor=white"></a> |
## How to Contribute
Contribution and feedback is encouraged and always welcome. For more information about how to contribute, the project structure, as well as additional contribution information, see our [Contribution Guidelines](./CONTRIBUTING.md). By participating in this project, you agree to abide by its [Code of Conduct](./CODE_OF_CONDUCT.md) at all times.
## Licensing
Copyright (c) 2021 Deutsche Telekom AG.
Licensed under the **MIT License** (the "License"); you may not use this file except in compliance with the License.
You may obtain a copy of the License by reviewing the file [LICENSE](./LICENSE) in the repository.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the [LICENSE](./LICENSE) for the specific language governing permissions and limitations under the License.
Raw data
{
"_id": null,
"home_page": "https://github.com/telekom/nlu-bridge",
"name": "nlubridge",
"maintainer": "Klaus-Peter Engelbrecht",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "nlu,intent recognition,natural language understanding,evaluation,performance",
"author": "Klaus-Peter Engelbrecht",
"author_email": "k.engelbrecht@telekom.de",
"download_url": "https://files.pythonhosted.org/packages/ed/f0/f49a26ea88ef7c9da5766dd5db5e3141e46eaddc72f24ebc1a40ae5fe6ac/nlubridge-1.0.2.tar.gz",
"platform": null,
"description": "<h1 align=\"center\">\n Telekom NLU Bridge\n</h1>\n\n<p align=\"center\">\n <a href=\"https://github.com/telekom/nlu-bridge/commits\" title=\"Last Commit\"><img src=\"https://img.shields.io/github/last-commit/telekom/nlu-bridge?style=flat\"></a>\n <a href=\"https://github.com/telekom/nlu-bridge/issues\" title=\"Open Issues\"><img src=\"https://img.shields.io/github/issues/telekom/nlu-bridge?style=flat\"></a>\n <a href=\"https://github.com/telekom/nlu-bridge/blob/main/LICENSE\" title=\"License\"><img src=\"https://img.shields.io/badge/License-MIT-green.svg?style=flat\"></a>\n</p>\n\n<p align=\"center\">\n <a href=\"#development\">Development</a> \u2022\n <a href=\"#documentation\">Documentation</a> \u2022\n <a href=\"#support-and-feedback\">Support</a> \u2022\n <a href=\"#how-to-contribute\">Contribute</a> \u2022\n <a href=\"#licensing\">Licensing</a>\n</p>\n\nThe goal of this project is to provide a unified API to several popular intent recognition\napplications.\n\n## About this component\n\n### Installation\n\nThe core package including NLUdataset and Baseline vendors can be installed for\nPython>=3.8 using pip\n\n```\npip install nlubridge\n```\n\nNote that some vendors come with restrictions regarding the Python version, e.g. Rasa3\nrequires Python\\<3.11.\n\nTo include optional dependencies for the vendors, e.g. Watson Assistant, type\n\n```\npip install nlubridge[watson]\n```\n\nFollowing install options are available:\n\n- `watson`\n- `fasttext`\n- `luis`\n- `rasa2`\n- `rasa3`\n- `spacy`\n- `huggingface`\n\nDevelopment tools can be installed with option `develop`.\n\nSome vendors require access credentials like API tokens, URLs etc. These can be passed\non construction of the objects. Alternatively, such arguments can be passed as\nenvironment variables, where the vendor will look for variables named variable\nVENDORNAME_PARAM_NAME.\n\nSome vendors require additional dependencies. E.g., Spacy requires a model that\ncan be downloaded (for the model de_core_news_sm) with\n\n```\npython -m spacy download de_core_news_sm\n```\n\n### Migration from v0\n\nWith realease 1.0.0 we introduce a couple of changes to the names of files and vendor\nclasses(see also https://github.com/telekom/nlu-bridge/issues/18).\n\nMost notably:\n\n- datasets.NLUdataset -> nlu_dataset.NluDataset\n- vendors.vendors.Vendor -> - vendors.vendor.Vendor\n- new supackage `dataloaders` that holds all functions for loading data into an NluDataset\n- new function `nlu_dataset.concat` to concatenate NluDatasets passed in a list\n- can load dataloaders, NluDataset, Vendor, OUT_OF_SCOPE_TOKEN, EntityKeys, concat,\n directly from nlubridge like `from nlubridge import Vendor`\n- Load vendors like `from nlubridge.vendors import Rasa3`\n- former `TelekomModel` now called `CharNgramIntentClassifier`\n- Some vendor names changed for clarity and consistency (see \"List of supported vendors\"\n for the new names)\n\n### Usage\n\nHere is an example for the TfidfIntentClassifier:\n\n```python\nimport os\n\nimport pandas as pd\n\nfrom nlubridge.vendors import TfidfIntentClassifier\nfrom nlubridge import NluDataset\n\ndataset = NluDataset(texts, intents)\ndataset = dataset.shuffle()\n\nclassifier = TfidfIntentClassifier()\n\ntrain, test = dataset.train_test_split(test_size=0.25, random_state=0)\nclassifier = classifier.train_intent(train)\npredicted = classifier.test_intent(test)\nres = pd.DataFrame(list(zip(test.intents, predicted)), columns=['true', 'predicted'])\n```\n\nIf you need to configure **stratification**, use the `stratification` parameter (defaults to `\"intents\"` and uses the intents in the dataset as stratification basis; whatever _else_ you pass along has to conform to `sklearn.model_selection.train_test_split(stratify=)`:\n\n```python\ntrain, test = dataset.train_test_split(test_size=0.25, random_state=0, stratification=None) # deactivate stratification (sklearn default for train_test_split)\n```\n\nTo compare your own vendor or algorithm to existing vendors in this package, you can\nwrite a Vendor Subclass for your vendor, and possibly a dataloader function. Feel free\nto share your implementation using this repo. Similarly, fixes and extensions for the\nexisting vendors are always welcome.\n\n### Logging\n\nMost of the code uses python logging to report its progress. To get logs printed out\nto console or Jupyter notebook, a logger needs to be configured, before the nlutests\ncode. Usually, log messages are on INFO level. This can be configured like this:\n\n```python\nimport logging\n\nlogger = logging.getLogger()\nlogger.setLevel(logging.INFO)\nlogger.addHandler(logging.StreamHandler())\n```\n\n### Concepts / Architecture\n\n- **Vendors**\\\n The [`vendors`](/nlubridge/vendors/) subpackage implements standardized interfaces to\n the specific vendors. A specific `Vendor` instance is in charge of dealing with\n converting the data to the required format, uploading data to the cloud if applicable,\n training models and making predictions.\n\n- **Datasets**\\\n The [`nlu_dataset`](/nlubridge/nlu_dataset/) module provides a standard interface to\n NLU data. Data stored in different vendor's custom format can be loaded as a dataset\n and provided to any different vendor.\n\n- **Data Loaders**\\\n The [`dataloaders`](/nlubridge/dataloaders/) subpackage provides functions to load\n data that are in a vendor-specific format as NluDataset.\n\n### List of supported vendors\n\n| Vendor Class | Status | Intents | Entities | Algorithm |\n| ------ | ------ | ------- | -------- | --------- |\n| [TfidfIntentClassifier](/nlubridge/vendors/tfidf_intent_classifier.py) | \u2713 | \u2713 | \u2717 | TFIDF on words + SVM |\n| [FastText](https://fasttext.cc) | \u2713 | \u2713 | \u2717 | fasttext |\n| [Spacy](https://spacy.io/usage/training#section-textcat) | \u2713 | \u2713 | \u2717 | BoW linear + CNN |\n| [WatsonAssistant](https://www.ibm.com/watson/services/conversation/) | \u2713 | \u2713 | \u2717 | Propietary (probably LR) |\n| [Luis](https://www.luis.ai/home) | needs testing | \u2713 | \u2717 | Propietary (probably LR) |\n| [CharNgramIntentClassifier](/nlubridge/vendors/char_ngram_intent_classifier.py) | \u2713 | \u2713 | \u2717 | tf-idf on char n-grams + SGD |\n| [Rasa2](https://github.com/RasaHQ/rasa) | \u2713 | \u2713 | \u2713 | configurable |\n| [Rasa3](https://github.com/RasaHQ/rasa) | \u2713 | \u2713 | \u2713 | configurable |\n\n### Features\n\n- Abstract class for Vendors with convenience methods (ex: scoring and scikit-learn compatibility)\n- Abstract class for datasets with convenience methods (ex: train_test_split, indexing, iteration)\n- Rate limiting to comply with cloud providers requirements\n\n## Development\n\n_TBD_\n\n### Build\n\n_TBD_\n\n## Code of Conduct\n\nThis project has adopted the [Contributor Covenant](https://www.contributor-covenant.org/) in version 2.0 as our code of conduct. Please see the details in our [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md). All contributors must abide by the code of conduct.\n\n## Working Language\n\nWe decided to apply _English_ as the primary project language.\n\nConsequently, all content will be made available primarily in English. We also ask all interested people to use English as language to create issues, in their code (comments, documentation etc.) and when you send requests to us. The application itself and all end-user facing content will be made available in other languages as needed.\n\n## Documentation\n\nThe full documentation for the telekom nlu-bridge can be found in _TBD_\n\n## Support and Feedback\n\nThe following channels are available for discussions, feedback, and support requests:\n\n| Type | Channel |\n| ------------------------ | ------------------------------------------------------ |\n| **Issues** | <a href=\"/../../issues/new/choose\" title=\"General Discussion\"><img src=\"https://img.shields.io/github/issues/telekom/nlu-bridge?style=flat-square\"></a> </a> |\n| **Other Requests** | <a href=\"mailto:opensource@telekom.de\" title=\"Email Open Source Team\"><img src=\"https://img.shields.io/badge/email-Open%20Source%20Team-green?logo=mail.ru&style=flat-square&logoColor=white\"></a> |\n\n## How to Contribute\n\nContribution and feedback is encouraged and always welcome. For more information about how to contribute, the project structure, as well as additional contribution information, see our [Contribution Guidelines](./CONTRIBUTING.md). By participating in this project, you agree to abide by its [Code of Conduct](./CODE_OF_CONDUCT.md) at all times.\n\n## Licensing\n\nCopyright (c) 2021 Deutsche Telekom AG.\n\nLicensed under the **MIT License** (the \"License\"); you may not use this file except in compliance with the License.\n\nYou may obtain a copy of the License by reviewing the file [LICENSE](./LICENSE) in the repository.\n\nUnless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the [LICENSE](./LICENSE) for the specific language governing permissions and limitations under the License.\n",
"bugtrack_url": null,
"license": "",
"summary": "Provides a unified API to several popular intent recognition applications",
"version": "1.0.2",
"project_urls": {
"Bug Tracker": "https://github.com/telekom/nlu-bridge/issues",
"Code of Conduct": "https://github.com/telekom/nlu-bridge/blob/main/CODE_OF_CONDUCT.md",
"Contributing": "https://github.com/telekom/nlu-bridge/blob/main/CONTRIBUTING.md",
"Homepage": "https://github.com/telekom/nlu-bridge",
"Source Code": "https://github.com/telekom/nlu-bridge"
},
"split_keywords": [
"nlu",
"intent recognition",
"natural language understanding",
"evaluation",
"performance"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "721e81719ba2af53d2572b184506dc8ef9a65057e63d0f577cb12871d3f98cba",
"md5": "69e0a99681852d757da386dde7745685",
"sha256": "cdf4d29728f005d0f0a0fca89f777de9e18853ce5c198bd8a76def7cdeaa3dba"
},
"downloads": -1,
"filename": "nlubridge-1.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "69e0a99681852d757da386dde7745685",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 44408,
"upload_time": "2023-08-02T15:29:56",
"upload_time_iso_8601": "2023-08-02T15:29:56.045842Z",
"url": "https://files.pythonhosted.org/packages/72/1e/81719ba2af53d2572b184506dc8ef9a65057e63d0f577cb12871d3f98cba/nlubridge-1.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "edf0f49a26ea88ef7c9da5766dd5db5e3141e46eaddc72f24ebc1a40ae5fe6ac",
"md5": "8b21c37c799b584a6c5f778cc241ae45",
"sha256": "6b9629025d3ec2daaa8ddf47750c310854278c6653d55bde823e4eb5c141e188"
},
"downloads": -1,
"filename": "nlubridge-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "8b21c37c799b584a6c5f778cc241ae45",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 42037,
"upload_time": "2023-08-02T15:29:57",
"upload_time_iso_8601": "2023-08-02T15:29:57.331598Z",
"url": "https://files.pythonhosted.org/packages/ed/f0/f49a26ea88ef7c9da5766dd5db5e3141e46eaddc72f24ebc1a40ae5fe6ac/nlubridge-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-02 15:29:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "telekom",
"github_project": "nlu-bridge",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "nlubridge"
}