basiclingua


Namebasiclingua JSON
Version 2.0.0 PyPI version JSON
download
home_pageNone
SummaryA Python library based on various LLMs to perform basic and advanced natural language processing (NLP) tasks
upload_time2024-04-29 13:35:40
maintainerNone
docs_urlNone
authorFareed Hassan Khan
requires_pythonNone
licenseNone
keywords python nlp natural language processing linguistics gemini llm google gemini llm
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <!-- omit in toc -->
# BasicLINGUA

[![Documentation](https://img.shields.io/badge/AI%20Powered%20Documentation-Link-yellow)](https://ai-powered-basiclingua-documentation.streamlit.app/) [![License](https://img.shields.io/badge/License-MIT-yellow)](https://opensource.org/licenses/MIT) [![Python](https://img.shields.io/badge/Python-3.9%2B-green)](https://www.python.org/downloads/) [![Version](https://img.shields.io/badge/BasicLingua%20Version-2.0-green)]()

![Basic-Lingua Logo](https://i.ibb.co/KzvQPW9/Group-1000006159-2.png)

Basiclingua is a LLM based Python library that provides functionalities for linguistic tasks such as pattern extractions, intent recognition and many others, (**Imagination is the limit**).

## Why Building this Project?

The problem that I plan to tackle is the increasing complexity and difficulty of handling text data as its size and complexity increase. NLP libraries that offer solutions are either limited in their ability to solve the required problem or require a great deal of human intervention to handle text data. We have used Multiple Language Models (Both Open-Source and Closed Source), which have demonstrated promising results in dealing with text data, to address the complexity that no NLP library has yet solved. As a result, we will be able to handle text-related tasks with minimal human intervention. We have created a powerful NLP library capable of solving any type of human text-related task, producing accurate results.

## BasicLingua Architecture


![Basic-Lingua Architecture](https://i.ibb.co/DwrDFbR/mainnnn-1-1.png)


The BasicLingua architecture takes a straightforward and efficient approach: a user inputs text and then selects a model of their choice, either open-source or closed-source. Next, they choose the feature that meets their needs, with each feature designed to be cost-effective to ensure full utilization at a minimal cost. The output can vary—it might be a dictionary, a list, or another format depending on the selected feature.

One of the biggest challenges in building this architecture was ensuring the cost associated with using the features was minimal. To address this, we use effective `prompt engineering` techniques to optimize the performance of the models and reduce the cost of using the library.

Here is one of our spellcheck feature that uses OpenAI model to correct spelling mistakes in text, the backend engineering is shown below:

![Prompt-Engineering-Guide](https://i.ibb.co/GVJJdxb/Group-1000006159-4.png)

The spellcheck feature takes the user input and passes it to the selected model, which then corrects the spelling mistakes in the text but only returns the corrected words which are then replaced in the original text. This approach ensures that only the necessary corrections are made, reducing the cost of using the feature.


## Evaluation Metrics

Our library has been tested on a wide range of text data, including clinical notes, documents, and general text.

| Dataset Category | Dataset Name | Description | Size | Source |
| --- | --- | --- | --- | --- |
| Clinical Notes | MIMIC-IV | Medical Information Mart for Intensive Care | Random 3000 notes | [MIMIC-IV](https://mimic-iv.mit.edu/) |
| Documents | Wikipedia Articles | Wikipedia articles on various topics | Random 1000 articles | [Wikipedia](https://www.wikipedia.org/) |
| General Text | AI generated text | Text generated by OpenAI and Gemini models | Random 5000 text samples | [OpenAI](https://platform.openai.com/), [Gemini](https://ai.google.dev/gemini-api/docs/models/gemini) |

We evaluated the library for different features, such as entity extraction, text summarization and more, below are the results:

| Task Name | Task Description | Evaluation Metric | OpenAI<br> (GPT-3.5) | Gemini<br> (Gemini-1.0) | AnyScale<br> (Llama-3-70b) |
| --- | --- | --- | --- | --- | --- |
| Entity Extraction | Extracting entities from text diseases, person names,  locations etc. | F1 Score | 0.85 | 0.61 | 0.76 |
| Text Summarization | Summarizing long text into a shorter version | ROUGE Score | 0.78 | 0.68 | 0.72 |
| Text Classification | Classifying text into predefined categories | Accuracy | 0.92 | 0.80 | 0.89 |
| Text Sentiment Analysis | Analyzing the sentiment of text (positive, negative, neutral) | Accuracy | 0.88 | 0.75 | 0.82 |
| Text Coreference Resolution | Resolving coreferences in text | F1 Score | 0.89 | 0.73 | 0.83 |
| Text Intent Recognition | Recognizing the intent of text (e.g., booking a flight, ordering food) | Accuracy | 0.90 | 0.78 | 0.85 |
| Text OCR | Extracting text from images and scanned documents | Accuracy | 0.85 | 0.68 | 0.78 |
| Text Anomaly Detection | Detecting anomalies in text data | F1 Score | 0.80 | 0.65 | 0.72 |
| Text Sense Disambiguation | Disambiguating word senses in text | Accuracy | 0.90 | 0.78 | 0.85 |
| Text Spellcheck | Correcting spelling mistakes in text | Accuracy | 0.85 | 0.68 | 0.78 |

The evaluation metrics show that the library performs well across different tasks, with OpenAI achieving the highest scores in most tasks. However, Gemini and AnyScale also demonstrate strong performance, making them suitable for various text-related tasks. Apart from the evaluation metrics, we have also focused on the **cost-effectiveness** of the library, ensuring that users can access the features at a minimal cost.

## AI Powered Documentation

![AI-Powered-Documentation](https://i.ibb.co/BzLdKBM/Group-1000006158-2.png)

Given that our NLP library can handle a wide range of domain-related tasks, it is crucial to provide an **AI-powered documentation** for **BasicLingua**. This documentation allows developers to ask questions related to the library, and it responds promptly by offering the exact features they need, along with examples. Additionally, it can answer other common queries, such as how to get started with the library and much more.

Given that our NLP library is based on LLMs, it is crucial to provide an **AI-powered documentation** for **BasicLingua**. Makes the library more efficient to use and understand. 


AI Documentation Webapp is available at -  [![GitHub](https://img.shields.io/badge/AI%20Powered%20Documentation%20Link-blue?logo=Meta)](https://ai-powered-basiclingua-documentation.streamlit.app/)


## Updates
- **`2024/4/20`** We have released the second version of the library. The new version includes additional features and new LLMs for text and vision tasks. Our library now supports `OpenAI`, `Gemini`, and `AnyScale` models for various linguistic tasks. We have also improved the performance of the library and added more functionalities to make it more versatile and user-friendly.
- **`2024/4/16`** We have added the AI-powered documentation for the library. The documentation is now available for use. We are currently working on improving the documentation and adding more features to the library.
- **`2024/3/3`** We have released the first version of the library. The library is now available for use. We are currently working on the documentation and the next version of the library. We are also working on the integration of the library with other LLMs.
- **`2024/1/10`** We have released the baby version of this library containing limited number of pre-processing features.


<!-- omit in toc -->
## Table of Content
- [Why Building this Project?](#why-building-this-project)
- [BasicLingua Architecture](#basiclingua-architecture)
- [Evaluation Metrics](#evaluation-metrics)
- [AI Powered Documentation](#ai-powered-documentation)
- [Updates](#updates)
- [Installation / Updation](#installation--updation)
- [Initialization](#initialization)
- [Supported LLMs](#supported-llms)
- [Usage](#usage)
- [Features of the library](#features-of-the-library)
- [Playground](#playground)
- [Acknowledgements](#acknowledgements)

## Installation / Updation

Before installing BasicLingua, ensure that you have Python installed on your system. BasicLingua tested on `Python 3.9` or greater. Earlier version may work but not guaranteed. To check your Python version, run the following command in your terminal:

```bash
python --version
```

If you don't have Python installed or need to upgrade, visit the [official Python website](https://www.python.org/downloads/) to download and install the latest version.

Once Python is set up, you can install BasicLingua using pip:

```bash
pip install basiclingua
```

or you can upgrade to the latest version using:

```bash
pip install --upgrade basiclingua
```

## Initialization

After installing BasicLingua, you need to import the models you want to use. You can choose to import specific models or all available models.

Import a Specific Model

```python
# Importing OpenAI Model
from basiclingua import OpenAILingua

# Importing Google Gemini Model
from basiclingua import GeminiLingua

# Importing Anyscale Model
from basiclingua import AnyScaleLingua
```

Import All Models at Once

```python
from basiclingua import OpenAILingua, GeminiLingua, AnyScaleLingua
```

Before using any model, you must set the API key for that specific platform. This is a mandatory step. Each model class has a constructor that takes the API key and optional additional parameters, such as model names.

**Gemini 1.0** Model is available for free within API limits, while AnyScale offer $10 free credit to get started.

* Get your OpenAI API Key from [OpenAI Platform](https://platform.openai.com/api-keys)
* Get your Gemini api key from [Gemini Platform](https://aistudio.google.com/app/apikey)
* Get anyscale api key from [AnyScale Platform](https://app.endpoints.anyscale.com/credentials)

**For Initializing OpenAI**
```python
# Initializing OpenAI Model
openai_model = OpenAILingua(
    api_key="YOUR_OPENAI_API_KEY", # Your OpenAI API Key
    model_name='gpt-3.5-turbo-0125', # Text Model Name
    vision_model_name='gpt-4-turbo' # Vision Model Name
)
```

Default models are `gpt-3.5-turbo-0125` and `gpt-4-turbo` for text and vision respectively.

**For Initializing Gemini**

```python
# Initializing Gemini Model
gemini_model = GeminiLingua(
    api_key="YOUR_GEMINI_API_KEY", # Your Gemini API Key
    model_name='gemini-1.0-pro-latest', # Text Model Name
    vision_model_name='models/gemini-1.5-pro-latest' # Vision Model Name
)
```
Default models are `gemini-1.0-pro-latest` and `gemini-1.5-pro-latest` for text and vision respectively.

**For Initializing AnyScale**

```python
# Initializing AnyScale Model
anyscale_model = AnyScaleLingua(
    api_key="YOUR_ANY_SCALE_API_KEY", # Your AnyScale API Key
    model_name="meta-llama/Llama-3-70b-chat-hf" # Text Model Name
)
```

Default model is `meta-llama/Llama-3-70b-chat-hf`.

## Supported LLMs

For `OpenAILingua`, all text and vision models are supported as available on the [OpenAI Platform](https://platform.openai.com/docs/models).

For `GeminiLingua`, all text and vision models are supported as available on the [Gemini Platform](https://ai.google.dev/gemini-api/docs/models/gemini).

A complete list of open-source supported models under `AnyScaleLingua` is available on the [AnyScale Platform](https://docs.endpoints.anyscale.com/pricing).

Default models are:

Source | Text Model | Vision Model | Embedding Model
--- | --- | --- | ---
OpenAI | `gpt-3.5-turbo-0125` | `gpt-4-turbo` | `text-embedding-3-large`
Gemini | `gemini-1.0-pro-latest` | `gemini-1.5-pro-latest` | `models/embedding-001`
AnyScale | `meta-llama/Llama-3-70b-chat-hf` | `meta-llama/Llama-3-70b-chat-hf` | `thenlper/gte-large`



## Usage

The library provides a wide range of functionalities for linguistic tasks some of which are mentioned below. You can use our [AI-powered documentation](https://ai-powered-basiclingua-documentation.streamlit.app/)  to learn more about the functionalities provided by the library.

**Entity extraction** is crucial for transforming unstructured text into structured data, enabling efficient analysis and automation in fields like finance, healthcare, and cybersecurity. However, it can be challenging due to the complexity and ambiguity of language, which often requires intricate regex or NLP techniques.

Regex-based entity extraction is time-consuming due to its detailed pattern definitions, but with our approach, you only need to define the pattern name to extract entities with minimal effort. Here's an example of extracting ICD (International Classification of Diseases) codes:

```python
# ClinicalNote with complex structure and formatting
user_input = """Patient John, last name: Doe; 45 yrs
                Symptoms: fatigue + frequent urination (possible diabetes); dizziness
                Diagnosis - Type 2 Diabetes (E11.9), Hypertension (I10)
                Prescribed: Metformin @ 500mg/day; Amlodipine, twice a day
                Allergic: PCN (penicillin)
                Family history of diabetes and HBP (high blood pressure)
                Additional notes: testing for cholesterol and kidney function
                Patient was advised to monitor blood sugar levels regularly.
                Mentioned: Father - Type 2 Diabetes; Mother - Hypertension
                Description - T2 Diabetes without complications; Essential Hypertension."""

# Define the patterns to extract
patterns = "ICD-10 Codes, Diseases, Medications, Allergies, Symptoms, Family History, Descriptions"

# Using OpenAI to extract entities
openai_entities = openai_model.extract_patterns(user_input, patterns=patterns)

# Displaying the extracted entities
print(openai_entities)

######## Output ########
{
  "ICD-10 Codes": ["E11.9", "I10"],
  "Diseases": ["Type 2 Diabetes", "Hypertension"],
  "Medications": ["Metformin", "Amlodipine"],
  "Allergies": ["Penicillin"],
  "Symptoms": ["fatigue", "frequent urination", "dizziness"],
  "Family History": ["Father with Type 2 Diabetes", "Mother with Hypertension"],
  "Descriptions": [
    "Type 2 Diabetes without complications",
    "Essential (primary) hypertension",
    "testing for cholesterol and kidney function"
  ]
}
######## Output ########
```
Similarly, **Text coreference** is difficult because it involves figuring out which words refer to the same person or thing in a sentence or a text. It requires a deep understanding of context and the way language is used to connect different parts of a text.

Here's an example of how `BasicLingua` can help you resolve coreferences in a text:

```python
# User input with complex co-references
user_input = """
Jane and her colleague Tom were preparing for the upcoming meeting with the new clients. She had worked on the presentation slides, while he focused on the data analysis. 
When the day of the meeting arrived, Jane noticed that the projector was not working properly, so she asked Tom to check it out. 
He found that it needed a new cable, but they had none in the office. They had to improvise with a laptop. During the presentation, Jane felt nervous because the setup wasn't ideal, but Tom reassured her that everything would be fine. 
The clients appreciated their efforts, and both Jane and Tom were relieved when the meeting concluded successfully. As they left, Jane told Tom that she was grateful for his support.
"""

# Using AnyScale model to resolve coreferences
anyscale_coref = anyscale_model.text_coreference(user_input)

# Displaying the resolved coreferences
print("AnyScale Coreference:", anyscale_coref)

######## Output ########
{
  "she": "Jane",
  "he": "Tom",
  "they": ["Jane", "Tom"],
  "her": "Jane",
  "him": "Tom"
}
######## Output ########
```

There are many other functionalities provided by the library that can help you with various linguistic tasks. You can refer to our [AI-powered documentation](https://ai-powered-basiclingua-documentation.streamlit.app/) that can help you understand the functionalities of the library and how to use them effectively.

## Features of the library

There are more than **20** functionalities provided by the library. But due to their effectiveness across different domains, we have created an AI-powered documentation to help you understand the functionalities in a more broadened way.

| Function Name           | Python Function Name  | Parameters                                 | Returns                                                                  |
|-------------------------|-----------------------|--------------------------------------------|--------------------------------------------------------------------------|
| Extract Patterns        | `extract_patterns`      | `user_input`, `patterns`                      | A `JSON` of extracted patterns from the input sentence                     |
| Detect NER              | `detect_ner`            | `user_input`, `ner_tags`                      | A `JSON` of detected Named Entity Recognition (NER) entities              |          |
| Text Intent             | `text_intent`           | `user_input`                                 | A `list` of identified intents from the input sentence                     |
| **. . .**          | **. . .**             | **. . .**                                   | **. . .**                                                                  |

You can explore more by chatting with our [Documentation Chatbot](https://ai-powered-basiclingua-documentation.streamlit.app/) to get a better understanding of the functionalities provided by the library.

## Playground

Since this library is available under the `MIT license`, you can use it in your projects. You can also contribute to the library by adding new functionalities or improving the existing ones. All the backend code is available in the **backend-engineering folder**.

##  Acknowledgements

- Rohan Anil et al., **"Gemini: A Family of Highly Capable Multimodal Models"**, *arXiv*, April 2024. [DOI: 10.48550/arXiv.2312.11805](https://doi.org/10.48550/arXiv.2312.11805)
- OpenAI Team. (2024). **OpenAI GPT-3.5: The Next Evolution of Language Models**. *OpenAI Blog*. [https://openai.com/blog/gpt-3-5](https://openai.com/blog/chatgpt)
- Meta AI. (2024, April 18). Introducing **Meta Llama 3: The most capable openly available LLM** to date. *Meta AI Blog*, from [https://ai.meta.com/blog/meta-llama-3](https://llama.meta.com/llama3/)
- Ye, Q., Axmed, M., Pryzant, R., & Khani, F. (2024). **Prompt engineering a prompt engineer**. *arXiv*. https://doi.org/10.48550/arXiv.2311.05661


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "basiclingua",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "python, NLP, Natural Language Processing, Linguistics, Gemini LLM, Google Gemini LLM",
    "author": "Fareed Hassan Khan",
    "author_email": "<fareedhassankhan12@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/85/99/00cbb949a423d691e964a07b2e8748b43b41004a8b20dc9d629316dd4875/basiclingua-2.0.0.tar.gz",
    "platform": null,
    "description": "<!-- omit in toc -->\r\n# BasicLINGUA\r\n\r\n[![Documentation](https://img.shields.io/badge/AI%20Powered%20Documentation-Link-yellow)](https://ai-powered-basiclingua-documentation.streamlit.app/) [![License](https://img.shields.io/badge/License-MIT-yellow)](https://opensource.org/licenses/MIT) [![Python](https://img.shields.io/badge/Python-3.9%2B-green)](https://www.python.org/downloads/) [![Version](https://img.shields.io/badge/BasicLingua%20Version-2.0-green)]()\r\n\r\n![Basic-Lingua Logo](https://i.ibb.co/KzvQPW9/Group-1000006159-2.png)\r\n\r\nBasiclingua is a LLM based Python library that provides functionalities for linguistic tasks such as pattern extractions, intent recognition and many others, (**Imagination is the limit**).\r\n\r\n## Why Building this Project?\r\n\r\nThe problem that I plan to tackle is the increasing complexity and difficulty of handling text data as its size and complexity increase. NLP libraries that offer solutions are either limited in their ability to solve the required problem or require a great deal of human intervention to handle text data. We have used Multiple Language Models (Both Open-Source and Closed Source), which have demonstrated promising results in dealing with text data, to address the complexity that no NLP library has yet solved. As a result, we will be able to handle text-related tasks with minimal human intervention. We have created a powerful NLP library capable of solving any type of human text-related task, producing accurate results.\r\n\r\n## BasicLingua Architecture\r\n\r\n\r\n![Basic-Lingua Architecture](https://i.ibb.co/DwrDFbR/mainnnn-1-1.png)\r\n\r\n\r\nThe BasicLingua architecture takes a straightforward and efficient approach: a user inputs text and then selects a model of their choice, either open-source or closed-source. Next, they choose the feature that meets their needs, with each feature designed to be cost-effective to ensure full utilization at a minimal cost. The output can vary\u00e2\u20ac\u201dit might be a dictionary, a list, or another format depending on the selected feature.\r\n\r\nOne of the biggest challenges in building this architecture was ensuring the cost associated with using the features was minimal. To address this, we use effective `prompt engineering` techniques to optimize the performance of the models and reduce the cost of using the library.\r\n\r\nHere is one of our spellcheck feature that uses OpenAI model to correct spelling mistakes in text, the backend engineering is shown below:\r\n\r\n![Prompt-Engineering-Guide](https://i.ibb.co/GVJJdxb/Group-1000006159-4.png)\r\n\r\nThe spellcheck feature takes the user input and passes it to the selected model, which then corrects the spelling mistakes in the text but only returns the corrected words which are then replaced in the original text. This approach ensures that only the necessary corrections are made, reducing the cost of using the feature.\r\n\r\n\r\n## Evaluation Metrics\r\n\r\nOur library has been tested on a wide range of text data, including clinical notes, documents, and general text.\r\n\r\n| Dataset Category | Dataset Name | Description | Size | Source |\r\n| --- | --- | --- | --- | --- |\r\n| Clinical Notes | MIMIC-IV | Medical Information Mart for Intensive Care | Random 3000 notes | [MIMIC-IV](https://mimic-iv.mit.edu/) |\r\n| Documents | Wikipedia Articles | Wikipedia articles on various topics | Random 1000 articles | [Wikipedia](https://www.wikipedia.org/) |\r\n| General Text | AI generated text | Text generated by OpenAI and Gemini models | Random 5000 text samples | [OpenAI](https://platform.openai.com/), [Gemini](https://ai.google.dev/gemini-api/docs/models/gemini) |\r\n\r\nWe evaluated the library for different features, such as entity extraction, text summarization and more, below are the results:\r\n\r\n| Task Name | Task Description | Evaluation Metric | OpenAI<br> (GPT-3.5) | Gemini<br> (Gemini-1.0) | AnyScale<br> (Llama-3-70b) |\r\n| --- | --- | --- | --- | --- | --- |\r\n| Entity Extraction | Extracting entities from text diseases, person names,  locations etc. | F1 Score | 0.85 | 0.61 | 0.76 |\r\n| Text Summarization | Summarizing long text into a shorter version | ROUGE Score | 0.78 | 0.68 | 0.72 |\r\n| Text Classification | Classifying text into predefined categories | Accuracy | 0.92 | 0.80 | 0.89 |\r\n| Text Sentiment Analysis | Analyzing the sentiment of text (positive, negative, neutral) | Accuracy | 0.88 | 0.75 | 0.82 |\r\n| Text Coreference Resolution | Resolving coreferences in text | F1 Score | 0.89 | 0.73 | 0.83 |\r\n| Text Intent Recognition | Recognizing the intent of text (e.g., booking a flight, ordering food) | Accuracy | 0.90 | 0.78 | 0.85 |\r\n| Text OCR | Extracting text from images and scanned documents | Accuracy | 0.85 | 0.68 | 0.78 |\r\n| Text Anomaly Detection | Detecting anomalies in text data | F1 Score | 0.80 | 0.65 | 0.72 |\r\n| Text Sense Disambiguation | Disambiguating word senses in text | Accuracy | 0.90 | 0.78 | 0.85 |\r\n| Text Spellcheck | Correcting spelling mistakes in text | Accuracy | 0.85 | 0.68 | 0.78 |\r\n\r\nThe evaluation metrics show that the library performs well across different tasks, with OpenAI achieving the highest scores in most tasks. However, Gemini and AnyScale also demonstrate strong performance, making them suitable for various text-related tasks. Apart from the evaluation metrics, we have also focused on the **cost-effectiveness** of the library, ensuring that users can access the features at a minimal cost.\r\n\r\n## AI Powered Documentation\r\n\r\n![AI-Powered-Documentation](https://i.ibb.co/BzLdKBM/Group-1000006158-2.png)\r\n\r\nGiven that our NLP library can handle a wide range of domain-related tasks, it is crucial to provide an **AI-powered documentation** for **BasicLingua**. This documentation allows developers to ask questions related to the library, and it responds promptly by offering the exact features they need, along with examples. Additionally, it can answer other common queries, such as how to get started with the library and much more.\r\n\r\nGiven that our NLP library is based on LLMs, it is crucial to provide an **AI-powered documentation** for **BasicLingua**. Makes the library more efficient to use and understand. \r\n\r\n\r\nAI Documentation Webapp is available at -  [![GitHub](https://img.shields.io/badge/AI%20Powered%20Documentation%20Link-blue?logo=Meta)](https://ai-powered-basiclingua-documentation.streamlit.app/)\r\n\r\n\r\n## Updates\r\n- **`2024/4/20`** We have released the second version of the library. The new version includes additional features and new LLMs for text and vision tasks. Our library now supports `OpenAI`, `Gemini`, and `AnyScale` models for various linguistic tasks. We have also improved the performance of the library and added more functionalities to make it more versatile and user-friendly.\r\n- **`2024/4/16`** We have added the AI-powered documentation for the library. The documentation is now available for use. We are currently working on improving the documentation and adding more features to the library.\r\n- **`2024/3/3`** We have released the first version of the library. The library is now available for use. We are currently working on the documentation and the next version of the library. We are also working on the integration of the library with other LLMs.\r\n- **`2024/1/10`** We have released the baby version of this library containing limited number of pre-processing features.\r\n\r\n\r\n<!-- omit in toc -->\r\n## Table of Content\r\n- [Why Building this Project?](#why-building-this-project)\r\n- [BasicLingua Architecture](#basiclingua-architecture)\r\n- [Evaluation Metrics](#evaluation-metrics)\r\n- [AI Powered Documentation](#ai-powered-documentation)\r\n- [Updates](#updates)\r\n- [Installation / Updation](#installation--updation)\r\n- [Initialization](#initialization)\r\n- [Supported LLMs](#supported-llms)\r\n- [Usage](#usage)\r\n- [Features of the library](#features-of-the-library)\r\n- [Playground](#playground)\r\n- [Acknowledgements](#acknowledgements)\r\n\r\n## Installation / Updation\r\n\r\nBefore installing BasicLingua, ensure that you have Python installed on your system. BasicLingua tested on `Python 3.9` or greater. Earlier version may work but not guaranteed. To check your Python version, run the following command in your terminal:\r\n\r\n```bash\r\npython --version\r\n```\r\n\r\nIf you don't have Python installed or need to upgrade, visit the [official Python website](https://www.python.org/downloads/) to download and install the latest version.\r\n\r\nOnce Python is set up, you can install BasicLingua using pip:\r\n\r\n```bash\r\npip install basiclingua\r\n```\r\n\r\nor you can upgrade to the latest version using:\r\n\r\n```bash\r\npip install --upgrade basiclingua\r\n```\r\n\r\n## Initialization\r\n\r\nAfter installing BasicLingua, you need to import the models you want to use. You can choose to import specific models or all available models.\r\n\r\nImport a Specific Model\r\n\r\n```python\r\n# Importing OpenAI Model\r\nfrom basiclingua import OpenAILingua\r\n\r\n# Importing Google Gemini Model\r\nfrom basiclingua import GeminiLingua\r\n\r\n# Importing Anyscale Model\r\nfrom basiclingua import AnyScaleLingua\r\n```\r\n\r\nImport All Models at Once\r\n\r\n```python\r\nfrom basiclingua import OpenAILingua, GeminiLingua, AnyScaleLingua\r\n```\r\n\r\nBefore using any model, you must set the API key for that specific platform. This is a mandatory step. Each model class has a constructor that takes the API key and optional additional parameters, such as model names.\r\n\r\n**Gemini 1.0** Model is available for free within API limits, while AnyScale offer $10 free credit to get started.\r\n\r\n* Get your OpenAI API Key from [OpenAI Platform](https://platform.openai.com/api-keys)\r\n* Get your Gemini api key from [Gemini Platform](https://aistudio.google.com/app/apikey)\r\n* Get anyscale api key from [AnyScale Platform](https://app.endpoints.anyscale.com/credentials)\r\n\r\n**For Initializing OpenAI**\r\n```python\r\n# Initializing OpenAI Model\r\nopenai_model = OpenAILingua(\r\n    api_key=\"YOUR_OPENAI_API_KEY\", # Your OpenAI API Key\r\n    model_name='gpt-3.5-turbo-0125', # Text Model Name\r\n    vision_model_name='gpt-4-turbo' # Vision Model Name\r\n)\r\n```\r\n\r\nDefault models are `gpt-3.5-turbo-0125` and `gpt-4-turbo` for text and vision respectively.\r\n\r\n**For Initializing Gemini**\r\n\r\n```python\r\n# Initializing Gemini Model\r\ngemini_model = GeminiLingua(\r\n    api_key=\"YOUR_GEMINI_API_KEY\", # Your Gemini API Key\r\n    model_name='gemini-1.0-pro-latest', # Text Model Name\r\n    vision_model_name='models/gemini-1.5-pro-latest' # Vision Model Name\r\n)\r\n```\r\nDefault models are `gemini-1.0-pro-latest` and `gemini-1.5-pro-latest` for text and vision respectively.\r\n\r\n**For Initializing AnyScale**\r\n\r\n```python\r\n# Initializing AnyScale Model\r\nanyscale_model = AnyScaleLingua(\r\n    api_key=\"YOUR_ANY_SCALE_API_KEY\", # Your AnyScale API Key\r\n    model_name=\"meta-llama/Llama-3-70b-chat-hf\" # Text Model Name\r\n)\r\n```\r\n\r\nDefault model is `meta-llama/Llama-3-70b-chat-hf`.\r\n\r\n## Supported LLMs\r\n\r\nFor `OpenAILingua`, all text and vision models are supported as available on the [OpenAI Platform](https://platform.openai.com/docs/models).\r\n\r\nFor `GeminiLingua`, all text and vision models are supported as available on the [Gemini Platform](https://ai.google.dev/gemini-api/docs/models/gemini).\r\n\r\nA complete list of open-source supported models under `AnyScaleLingua` is available on the [AnyScale Platform](https://docs.endpoints.anyscale.com/pricing).\r\n\r\nDefault models are:\r\n\r\nSource | Text Model | Vision Model | Embedding Model\r\n--- | --- | --- | ---\r\nOpenAI | `gpt-3.5-turbo-0125` | `gpt-4-turbo` | `text-embedding-3-large`\r\nGemini | `gemini-1.0-pro-latest` | `gemini-1.5-pro-latest` | `models/embedding-001`\r\nAnyScale | `meta-llama/Llama-3-70b-chat-hf` | `meta-llama/Llama-3-70b-chat-hf` | `thenlper/gte-large`\r\n\r\n\r\n\r\n## Usage\r\n\r\nThe library provides a wide range of functionalities for linguistic tasks some of which are mentioned below. You can use our [AI-powered documentation](https://ai-powered-basiclingua-documentation.streamlit.app/)  to learn more about the functionalities provided by the library.\r\n\r\n**Entity extraction** is crucial for transforming unstructured text into structured data, enabling efficient analysis and automation in fields like finance, healthcare, and cybersecurity. However, it can be challenging due to the complexity and ambiguity of language, which often requires intricate regex or NLP techniques.\r\n\r\nRegex-based entity extraction is time-consuming due to its detailed pattern definitions, but with our approach, you only need to define the pattern name to extract entities with minimal effort. Here's an example of extracting ICD (International Classification of Diseases) codes:\r\n\r\n```python\r\n# ClinicalNote with complex structure and formatting\r\nuser_input = \"\"\"Patient John, last name: Doe; 45 yrs\r\n                Symptoms: fatigue + frequent urination (possible diabetes); dizziness\r\n                Diagnosis - Type 2 Diabetes (E11.9), Hypertension (I10)\r\n                Prescribed: Metformin @ 500mg/day; Amlodipine, twice a day\r\n                Allergic: PCN (penicillin)\r\n                Family history of diabetes and HBP (high blood pressure)\r\n                Additional notes: testing for cholesterol and kidney function\r\n                Patient was advised to monitor blood sugar levels regularly.\r\n                Mentioned: Father - Type 2 Diabetes; Mother - Hypertension\r\n                Description - T2 Diabetes without complications; Essential Hypertension.\"\"\"\r\n\r\n# Define the patterns to extract\r\npatterns = \"ICD-10 Codes, Diseases, Medications, Allergies, Symptoms, Family History, Descriptions\"\r\n\r\n# Using OpenAI to extract entities\r\nopenai_entities = openai_model.extract_patterns(user_input, patterns=patterns)\r\n\r\n# Displaying the extracted entities\r\nprint(openai_entities)\r\n\r\n######## Output ########\r\n{\r\n  \"ICD-10 Codes\": [\"E11.9\", \"I10\"],\r\n  \"Diseases\": [\"Type 2 Diabetes\", \"Hypertension\"],\r\n  \"Medications\": [\"Metformin\", \"Amlodipine\"],\r\n  \"Allergies\": [\"Penicillin\"],\r\n  \"Symptoms\": [\"fatigue\", \"frequent urination\", \"dizziness\"],\r\n  \"Family History\": [\"Father with Type 2 Diabetes\", \"Mother with Hypertension\"],\r\n  \"Descriptions\": [\r\n    \"Type 2 Diabetes without complications\",\r\n    \"Essential (primary) hypertension\",\r\n    \"testing for cholesterol and kidney function\"\r\n  ]\r\n}\r\n######## Output ########\r\n```\r\nSimilarly, **Text coreference** is difficult because it involves figuring out which words refer to the same person or thing in a sentence or a text. It requires a deep understanding of context and the way language is used to connect different parts of a text.\r\n\r\nHere's an example of how `BasicLingua` can help you resolve coreferences in a text:\r\n\r\n```python\r\n# User input with complex co-references\r\nuser_input = \"\"\"\r\nJane and her colleague Tom were preparing for the upcoming meeting with the new clients. She had worked on the presentation slides, while he focused on the data analysis. \r\nWhen the day of the meeting arrived, Jane noticed that the projector was not working properly, so she asked Tom to check it out. \r\nHe found that it needed a new cable, but they had none in the office. They had to improvise with a laptop. During the presentation, Jane felt nervous because the setup wasn't ideal, but Tom reassured her that everything would be fine. \r\nThe clients appreciated their efforts, and both Jane and Tom were relieved when the meeting concluded successfully. As they left, Jane told Tom that she was grateful for his support.\r\n\"\"\"\r\n\r\n# Using AnyScale model to resolve coreferences\r\nanyscale_coref = anyscale_model.text_coreference(user_input)\r\n\r\n# Displaying the resolved coreferences\r\nprint(\"AnyScale Coreference:\", anyscale_coref)\r\n\r\n######## Output ########\r\n{\r\n  \"she\": \"Jane\",\r\n  \"he\": \"Tom\",\r\n  \"they\": [\"Jane\", \"Tom\"],\r\n  \"her\": \"Jane\",\r\n  \"him\": \"Tom\"\r\n}\r\n######## Output ########\r\n```\r\n\r\nThere are many other functionalities provided by the library that can help you with various linguistic tasks. You can refer to our [AI-powered documentation](https://ai-powered-basiclingua-documentation.streamlit.app/) that can help you understand the functionalities of the library and how to use them effectively.\r\n\r\n## Features of the library\r\n\r\nThere are more than **20** functionalities provided by the library. But due to their effectiveness across different domains, we have created an AI-powered documentation to help you understand the functionalities in a more broadened way.\r\n\r\n| Function Name           | Python Function Name  | Parameters                                 | Returns                                                                  |\r\n|-------------------------|-----------------------|--------------------------------------------|--------------------------------------------------------------------------|\r\n| Extract Patterns        | `extract_patterns`      | `user_input`, `patterns`                      | A `JSON` of extracted patterns from the input sentence                     |\r\n| Detect NER              | `detect_ner`            | `user_input`, `ner_tags`                      | A `JSON` of detected Named Entity Recognition (NER) entities              |          |\r\n| Text Intent             | `text_intent`           | `user_input`                                 | A `list` of identified intents from the input sentence                     |\r\n| **. . .**          | **. . .**             | **. . .**                                   | **. . .**                                                                  |\r\n\r\nYou can explore more by chatting with our [Documentation Chatbot](https://ai-powered-basiclingua-documentation.streamlit.app/) to get a better understanding of the functionalities provided by the library.\r\n\r\n## Playground\r\n\r\nSince this library is available under the `MIT license`, you can use it in your projects. You can also contribute to the library by adding new functionalities or improving the existing ones. All the backend code is available in the **backend-engineering folder**.\r\n\r\n##  Acknowledgements\r\n\r\n- Rohan Anil et al., **\"Gemini: A Family of Highly Capable Multimodal Models\"**, *arXiv*, April 2024. [DOI: 10.48550/arXiv.2312.11805](https://doi.org/10.48550/arXiv.2312.11805)\r\n- OpenAI Team. (2024). **OpenAI GPT-3.5: The Next Evolution of Language Models**. *OpenAI Blog*. [https://openai.com/blog/gpt-3-5](https://openai.com/blog/chatgpt)\r\n- Meta AI. (2024, April 18). Introducing **Meta Llama 3: The most capable openly available LLM** to date. *Meta AI Blog*, from [https://ai.meta.com/blog/meta-llama-3](https://llama.meta.com/llama3/)\r\n- Ye, Q., Axmed, M., Pryzant, R., & Khani, F. (2024). **Prompt engineering a prompt engineer**. *arXiv*. https://doi.org/10.48550/arXiv.2311.05661\r\n\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python library based on various LLMs to perform basic and advanced natural language processing (NLP) tasks",
    "version": "2.0.0",
    "project_urls": null,
    "split_keywords": [
        "python",
        " nlp",
        " natural language processing",
        " linguistics",
        " gemini llm",
        " google gemini llm"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e2943b60ae662ac44fd14d7173c2a41582405b5526e4eaa597f641377cc791d2",
                "md5": "b84c6bbe009e544ee59c93b9f7e877a9",
                "sha256": "648a277622973f8d0f4745db59f4d41865abed23a0dc1182df3ee0bcea3bfad0"
            },
            "downloads": -1,
            "filename": "basiclingua-2.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b84c6bbe009e544ee59c93b9f7e877a9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 28757,
            "upload_time": "2024-04-29T13:35:34",
            "upload_time_iso_8601": "2024-04-29T13:35:34.732740Z",
            "url": "https://files.pythonhosted.org/packages/e2/94/3b60ae662ac44fd14d7173c2a41582405b5526e4eaa597f641377cc791d2/basiclingua-2.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "859900cbb949a423d691e964a07b2e8748b43b41004a8b20dc9d629316dd4875",
                "md5": "7e18074f41d35b11ffbfe3c1c1d299d8",
                "sha256": "df55612143610154e70c5c2457125ab7b71338204c76961018e6df4241915ad5"
            },
            "downloads": -1,
            "filename": "basiclingua-2.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "7e18074f41d35b11ffbfe3c1c1d299d8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 36080,
            "upload_time": "2024-04-29T13:35:40",
            "upload_time_iso_8601": "2024-04-29T13:35:40.385149Z",
            "url": "https://files.pythonhosted.org/packages/85/99/00cbb949a423d691e964a07b2e8748b43b41004a8b20dc9d629316dd4875/basiclingua-2.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-29 13:35:40",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "basiclingua"
}
        
Elapsed time: 0.74340s