Name | ozonetel-ai JSON |
Version |
0.0.14
JSON |
| download |
home_page | None |
Summary | The Ozonetel AI project is designed to provide a user-friendly interface for software development using Ozonetel's in-house AI libraries, models, and software solutions. It offers seamless integration with Ozonetel's advanced AI capabilities, allowing developers to harness the power of AI to enhance their applications. |
upload_time | 2024-05-06 05:53:51 |
maintainer | None |
docs_url | None |
author | Biswajit Satapathy |
requires_python | None |
license | MIT License |
keywords |
machine learning
artificial intellegence
neural network
indexing
searching
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Ozonetel AI
## Overview
The Ozonetel AI project is designed to provide a user-friendly interface for software development using Ozonetel's in-house AI libraries, models, and software solutions. It offers seamless integration with Ozonetel's advanced AI capabilities, allowing developers to harness the power of AI to enhance their applications.
## Features
- Text Embedding (Binary Embeddings): The Ozonetel AI project currently offers text embedding functionality, allowing users to convert text into high-dimensional bit vectors for various natural language processing tasks.
## Getting Started
To get started with the Ozonetel AI project, follow the steps below:
1. Usage
1. Installation
```bash
pip install ozonetel-ai
```
2. compute binary embedding
```python
from ozoneai.embeddings import BinarizeEmbedding
from ozoneai.utils import bit_sim
from sentence_transformers import SentenceTransformer
credential = {"username":"", "bearer_token":""}
model = SentenceTransformer("BAAI/bge-m3")
binary_encoder = BinarizeEmbedding(endcoder_modelid="BAAI/bge-m3",credential=credential)
```
1. Set Credentials:
Before using the text embedding feature, set your credentials by importing the os module and setting the `OZAI_API_CREDENTIALS` environment variable to point to your credentials file.
Example:
```python
import os
os.environ["OZAI_API_CREDENTIALS"] = "./cred.json"
```
or,
``` python
credential = {"username":"", "bearer_token":""}
```
2. Text Embedding Extraction
Text embedding converts textual data into numerical representations, aiding natural language processing tasks. By capturing semantic meaning, it enhances sentiment analysis, document classification, and named entity recognition. Efficient and transferable, embeddings facilitate faster computation and enable machine learning models to better understand and process text. `BinarizeSentenceEmbedding` binarizes base embeddings and represents in bits.
Example:
- Binary Embedding extraction
```python
# Import `BinarizeSentenceEmbedding` class from the `ozoneai.embeddings` module.
from ozoneai.embeddings import BinarizeSentenceEmbedding
# Extract Embeddings: Use the `binarize` method to obtain binarized embeddings for given texts .
# Supported models encoders are `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` and `BAAI/bge-m3`
# Alternatively if you have stored these models in local directory you can use like `/path/to/paraphrase-multilingual-mpnet-base-v2` or `/path/to/bge-m3`
# Note: for safest use `OZAI_API_CREDENTIALS`. In that case `credential` argument can be ignored. if `credential` is not defined `OZAI_API_CREDENTIALS` will be considered.
binary_encoder = BinarizeSentenceEmbedding(endcoder_modelid="BAAI/bge-m3", credential = credential)
emb = binary_encoder.encode(["Try me Out"])
emb_binarized = binary_encoder.get_binary_embeddings(emb, model="sieve-bge-m3-en-aug-v1") # max limit 20 vectors per request
# Access Embedding Attributes: Retrieve various attributes of the embedding object, such as bits, unsigned binary, and signed binary.
# Get binary representation
embeddings = emb_binarized.embedding
```
- List the supported models
```python
from ozoneai.embeddings import list_models
list_models()
```
- Document Similarity
```python
from ozoneai.embeddings import BinarizeEmbedding
from ozoneai.utils import bit_sim
from sentence_transformers import SentenceTransformer
credential = {"username":"", "bearer_token":""}
model = SentenceTransformer("BAAI/bge-m3")
binary_encoder = BinarizeEmbedding(endcoder_modelid="BAAI/bge-m3",credential=credential)
# Compute embedding for both lists
query = ["What is BGE M3?", "Define BM25"]
document = ["BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.",
"BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document"]
query_embeddings = binary_encoder.get_binary_embeddings(model.encode(query), model="sieve-bge-m3-en-aug-v1").embedding
doc_embeddings = binary_encoder.get_binary_embeddings(model.encode(document), model="sieve-bge-m3-en-aug-v1").embedding
# Compute similarities
scores = bit_sim(query_embeddings, doc_embeddings)
for i, qs in enumerate(scores):
for j, ds in enumerate(qs):
print(f"query {i} vs doc {j} : similarity - {round(float(ds), 3)}")
```
```
output:
query 0 vs doc 0 : similarity - 0.627
query 0 vs doc 1 : similarity - 0.537
query 1 vs doc 0 : similarity - 0.528
query 1 vs doc 1 : similarity - 0.634
```
## Sample Applications
### Search And Index:
```python
import numpy as np, os
from ozoneai.embeddings import BinarizeEmbedding
from sentence_transformers import SentenceTransformer
# define credentials
# os.environ["OZAI_API_CREDENTIALS"] = "./cred.json"
credential = {"username":"", "bearer_token":""}
base_model = SentenceTransformer("BAAI/bge-m3")
binary_encoder = BinarizeEmbedding(endcoder_modelid="BAAI/bge-m3",credential=credential)
# Documents to be Indexed
docs = [
"The cat sits outside",
"A man is playing guitar",
"I love pasta",
"The new movie is awesome",
"The cat plays in the garden",
"A woman watches TV",
"The new movie is so great",
"Do you like pizza?",
"Artifical intelligence is all about enabling computers to simulate human capabilities.",
"Artifical intelligence is there to help human",
"Language moodel has lot of impact on artifical intelligence",
"Artifical intelligence is reigning in industry",
"ROI is one the useful measure for any investors",
"Understand before you invest"
]
# Extract Embeddings for documents to be indexed
docs_emb = base_model.encode(docs)
# Binarise embeedings
docs_emb_binarized = binary_encoder.get_binary_embeddings(docs_emb, model="sieve-bge-m3-en-aug-v1")
# Searching
# query_text = "the girl was sitting in the park"
query_text = "I love Artifical Intellegence"
# query_text = "Artifical intellegence helps optimising software"
# query_text = "what is machine learning"
# Extract Embeddings query
# Compute sentence embedding using SentenceTransformers (i.e. bge-m3 or paraphrase-multilingual-mpnet-base-v2)
query_emb = base_model.encode([query_text])
# Binarise embeedings
query_emb_binarized = binary_encoder.get_binary_embeddings(query_emb, model="sieve-bge-m3-en-aug-v1")
scores = bit_sim(query_emb_binarized.embedding, docs_emb_binarized.embedding).flatten()
topn = 5
topn_indices = np.argsort(-scores)[:topn]
# Print search result in order
for i, doci in enumerate(topn_indices):
print(f"{i}. {docs[doci]} [{doci}] ({scores[doci]})")
```
[try more..](https://github.com/ozonetelgit/ozonetel-ai-sdk/blob/main/examples/search-index/)
## Benchmarks
### Classification
| S.N | Dataset | paraphrase-multilingual-mpnet-base-v2 | siv-sentence-bitnet-pmbv2-wikid-small | bge-m3 | sieve-bge-m3-en-aug-v0 | sieve-bge-m3-en-aug-v1 |
| --- | ----------------------------------------- | ------------------------------------- | ------------------------------------- | ------ | ---------------------- | ---------------------- |
| 1 | Amazon Counterfactual Classification(en) | 75.81 | 79.06 | 75.63 | 79.23 | 78.28 |
| 2 | Amazon Polarity Classification | 76.41 | 70.19 | 91.01 | 86.81 | 85.69 |
| 3 | Amazon Reviews Classification | 38.51 | 34.29 | 46.99 | 43.51 | 43 |
| 4 | Banking 77 Classification | 81.07 | 75.89 | 81.93 | 82.06 | 82.75 |
| 5 | Emotion Classification | 45.83 | 40.26 | 50.16 | 42.34 | 42.4 |
| 6 | Imdb Classification | 64.57 | 61.14 | 87.84 | 85.06 | 84.44 |
| 7 | Massive Intent Classification (en) | 69.32 | 65.6 | 71.08 | 68.9 | 70.22 |
| 8 | Massive Scenario Classification (en) | 75.35 | 70.37 | 76.64 | 71.29 | 72.68 |
| 9 | MTOP Domain Classification (en) | 89.24 | 87.22 | 93.36 | 87.56 | 88.37 |
| 10 | MTOP Intent Classification (en) | 68.69 | 69.45 | 66.58 | 74.11 | 74.02 |
| 11 | Toxic Conversations Classification | 71.02 | 70.26 | 72.6 | 68 | 68.59 |
| 12 | Tweet Sentiment Extraction Classification | 59.03 | 54.49 | 63.71 | 56.96 | 56.91 |
### STS
| S.N | Dataset | paraphrase-multilingual-mpnet-base-v2 | siv-sentence-bitnet-pmbv2-wikid-small | bge-m3 | sieve-bge-m3-en-aug-v0 | sieve-bge-m3-en-aug-v1 |
| --- | ------------- | ------------------------------------- | ------------------------------------- | ------ | ---------------------- | ---------------------- |
| 1 | BIOSSES | 76.27 | 65.29 | 83.38 | 82.91 | 83.97 |
| 2 | SICK-R | 79.62 | 76.01 | 79.91 | 75.26 | 76.5 |
| 3 | STS12 | 77.9 | 71.25 | 78.73 | 66.95 | 68.88 |
| 4 | STS13 | 85.11 | 78.4 | 79.6 | 64 | 69.09 |
| 5 | STS14 | 80.81 | 74.23 | 79 | 62.83 | 67.74 |
| 6 | STS15 | 87.48 | 81.41 | 87.81 | 80 | 82.08 |
| 7 | STS16 | 83.2 | 79.13 | 85.4 | 77.87 | 79.31 |
| 8 | STS17(en-en) | 86.99 | 85.4 | 87.13 | 84.1 | 85.32 |
| 9 | STS Benchmark | 86.82 | 81.34 | 84.85 | 74.13 | 77.19 |
## License
The Ozonetel AI project is licensed under the MIT License. Please refer to the LICENSE file for more information.
Raw data
{
"_id": null,
"home_page": null,
"name": "ozonetel-ai",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "Machine Learning, Artificial Intellegence, Neural Network, Indexing, Searching",
"author": "Biswajit Satapathy",
"author_email": "biswajit@ozonetel.com",
"download_url": "https://files.pythonhosted.org/packages/0b/1f/8fd556331f76eba99dc9af99face26bc5699a0cbb8e9e975e9364d3e7e6e/ozonetel-ai-0.0.14.tar.gz",
"platform": null,
"description": "# Ozonetel AI\n## Overview\nThe Ozonetel AI project is designed to provide a user-friendly interface for software development using Ozonetel's in-house AI libraries, models, and software solutions. It offers seamless integration with Ozonetel's advanced AI capabilities, allowing developers to harness the power of AI to enhance their applications.\n\n## Features\n- Text Embedding (Binary Embeddings): The Ozonetel AI project currently offers text embedding functionality, allowing users to convert text into high-dimensional bit vectors for various natural language processing tasks.\n\n## Getting Started\nTo get started with the Ozonetel AI project, follow the steps below:\n\n1. Usage\n 1. Installation\n \n ```bash\n pip install ozonetel-ai\n ```\n\n 2. compute binary embedding \n\n ```python\n from ozoneai.embeddings import BinarizeEmbedding\n from ozoneai.utils import bit_sim\n\n from sentence_transformers import SentenceTransformer\n\n credential = {\"username\":\"\", \"bearer_token\":\"\"}\n\n model = SentenceTransformer(\"BAAI/bge-m3\")\n binary_encoder = BinarizeEmbedding(endcoder_modelid=\"BAAI/bge-m3\",credential=credential)\n ```\n1. Set Credentials:\n Before using the text embedding feature, set your credentials by importing the os module and setting the `OZAI_API_CREDENTIALS` environment variable to point to your credentials file.\n \n Example:\n \n ```python\n import os\n os.environ[\"OZAI_API_CREDENTIALS\"] = \"./cred.json\"\n ```\n or,\n ``` python\n credential = {\"username\":\"\", \"bearer_token\":\"\"}\n ```\n2. Text Embedding Extraction\n Text embedding converts textual data into numerical representations, aiding natural language processing tasks. By capturing semantic meaning, it enhances sentiment analysis, document classification, and named entity recognition. Efficient and transferable, embeddings facilitate faster computation and enable machine learning models to better understand and process text. `BinarizeSentenceEmbedding` binarizes base embeddings and represents in bits.\n\n Example:\n - Binary Embedding extraction\n ```python\n # Import `BinarizeSentenceEmbedding` class from the `ozoneai.embeddings` module.\n from ozoneai.embeddings import BinarizeSentenceEmbedding\n \n # Extract Embeddings: Use the `binarize` method to obtain binarized embeddings for given texts .\n # Supported models encoders are `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` and `BAAI/bge-m3`\n # Alternatively if you have stored these models in local directory you can use like `/path/to/paraphrase-multilingual-mpnet-base-v2` or `/path/to/bge-m3`\n \n # Note: for safest use `OZAI_API_CREDENTIALS`. In that case `credential` argument can be ignored. if `credential` is not defined `OZAI_API_CREDENTIALS` will be considered.\n binary_encoder = BinarizeSentenceEmbedding(endcoder_modelid=\"BAAI/bge-m3\", credential = credential)\n\n emb = binary_encoder.encode([\"Try me Out\"])\n emb_binarized = binary_encoder.get_binary_embeddings(emb, model=\"sieve-bge-m3-en-aug-v1\") # max limit 20 vectors per request\n \n # Access Embedding Attributes: Retrieve various attributes of the embedding object, such as bits, unsigned binary, and signed binary.\n # Get binary representation\n embeddings = emb_binarized.embedding\n\n ```\n\n - List the supported models\n \n ```python\n from ozoneai.embeddings import list_models\n\n list_models()\n ```\n\n - Document Similarity\n\n ```python\n from ozoneai.embeddings import BinarizeEmbedding\n from ozoneai.utils import bit_sim\n\n from sentence_transformers import SentenceTransformer\n\n credential = {\"username\":\"\", \"bearer_token\":\"\"}\n\n model = SentenceTransformer(\"BAAI/bge-m3\")\n binary_encoder = BinarizeEmbedding(endcoder_modelid=\"BAAI/bge-m3\",credential=credential)\n\n # Compute embedding for both lists\n query = [\"What is BGE M3?\", \"Define BM25\"]\n document = [\"BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.\", \n \"BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document\"]\n\n query_embeddings = binary_encoder.get_binary_embeddings(model.encode(query), model=\"sieve-bge-m3-en-aug-v1\").embedding\n doc_embeddings = binary_encoder.get_binary_embeddings(model.encode(document), model=\"sieve-bge-m3-en-aug-v1\").embedding\n\n # Compute similarities\n scores = bit_sim(query_embeddings, doc_embeddings)\n\n for i, qs in enumerate(scores):\n for j, ds in enumerate(qs):\n print(f\"query {i} vs doc {j} : similarity - {round(float(ds), 3)}\")\n \n ```\n ```\n output:\n\n query 0 vs doc 0 : similarity - 0.627\n query 0 vs doc 1 : similarity - 0.537\n query 1 vs doc 0 : similarity - 0.528\n query 1 vs doc 1 : similarity - 0.634\n ```\n\n\n## Sample Applications\n\n### Search And Index:\n\n```python\nimport numpy as np, os\nfrom ozoneai.embeddings import BinarizeEmbedding\n\nfrom sentence_transformers import SentenceTransformer\n\n# define credentials\n# os.environ[\"OZAI_API_CREDENTIALS\"] = \"./cred.json\"\ncredential = {\"username\":\"\", \"bearer_token\":\"\"}\n\nbase_model = SentenceTransformer(\"BAAI/bge-m3\")\nbinary_encoder = BinarizeEmbedding(endcoder_modelid=\"BAAI/bge-m3\",credential=credential)\n\n# Documents to be Indexed\ndocs = [\n \"The cat sits outside\",\n \"A man is playing guitar\",\n \"I love pasta\",\n \"The new movie is awesome\",\n \"The cat plays in the garden\",\n \"A woman watches TV\",\n \"The new movie is so great\",\n \"Do you like pizza?\",\n \"Artifical intelligence is all about enabling computers to simulate human capabilities.\",\n \"Artifical intelligence is there to help human\",\n \"Language moodel has lot of impact on artifical intelligence\",\n \"Artifical intelligence is reigning in industry\",\n \"ROI is one the useful measure for any investors\",\n \"Understand before you invest\"\n]\n\n# Extract Embeddings for documents to be indexed\ndocs_emb = base_model.encode(docs)\n\n# Binarise embeedings\ndocs_emb_binarized = binary_encoder.get_binary_embeddings(docs_emb, model=\"sieve-bge-m3-en-aug-v1\")\n\n# Searching\n\n# query_text = \"the girl was sitting in the park\"\nquery_text = \"I love Artifical Intellegence\"\n# query_text = \"Artifical intellegence helps optimising software\"\n# query_text = \"what is machine learning\"\n\n# Extract Embeddings query\n\n# Compute sentence embedding using SentenceTransformers (i.e. bge-m3 or paraphrase-multilingual-mpnet-base-v2)\nquery_emb = base_model.encode([query_text])\n\n# Binarise embeedings\nquery_emb_binarized = binary_encoder.get_binary_embeddings(query_emb, model=\"sieve-bge-m3-en-aug-v1\")\n\nscores = bit_sim(query_emb_binarized.embedding, docs_emb_binarized.embedding).flatten()\n\ntopn = 5\ntopn_indices = np.argsort(-scores)[:topn]\n\n# Print search result in order\nfor i, doci in enumerate(topn_indices):\n print(f\"{i}. {docs[doci]} [{doci}] ({scores[doci]})\")\n```\n \n[try more..](https://github.com/ozonetelgit/ozonetel-ai-sdk/blob/main/examples/search-index/)\n\n## Benchmarks\n### Classification\n| S.N | Dataset | paraphrase-multilingual-mpnet-base-v2 | siv-sentence-bitnet-pmbv2-wikid-small | bge-m3 | sieve-bge-m3-en-aug-v0 | sieve-bge-m3-en-aug-v1 |\n| --- | ----------------------------------------- | ------------------------------------- | ------------------------------------- | ------ | ---------------------- | ---------------------- |\n| 1 | Amazon Counterfactual Classification(en) | 75.81 | 79.06 | 75.63 | 79.23 | 78.28 |\n| 2 | Amazon Polarity Classification | 76.41 | 70.19 | 91.01 | 86.81 | 85.69 |\n| 3 | Amazon Reviews Classification | 38.51 | 34.29 | 46.99 | 43.51 | 43 |\n| 4 | Banking 77 Classification | 81.07 | 75.89 | 81.93 | 82.06 | 82.75 |\n| 5 | Emotion Classification | 45.83 | 40.26 | 50.16 | 42.34 | 42.4 |\n| 6 | Imdb Classification | 64.57 | 61.14 | 87.84 | 85.06 | 84.44 |\n| 7 | Massive Intent Classification (en) | 69.32 | 65.6 | 71.08 | 68.9 | 70.22 |\n| 8 | Massive Scenario Classification (en) | 75.35 | 70.37 | 76.64 | 71.29 | 72.68 |\n| 9 | MTOP Domain Classification (en) | 89.24 | 87.22 | 93.36 | 87.56 | 88.37 |\n| 10 | MTOP Intent Classification (en) | 68.69 | 69.45 | 66.58 | 74.11 | 74.02 |\n| 11 | Toxic Conversations Classification | 71.02 | 70.26 | 72.6 | 68 | 68.59 |\n| 12 | Tweet Sentiment Extraction Classification | 59.03 | 54.49 | 63.71 | 56.96 | 56.91 |\n\n### STS\n| S.N | Dataset | paraphrase-multilingual-mpnet-base-v2 | siv-sentence-bitnet-pmbv2-wikid-small | bge-m3 | sieve-bge-m3-en-aug-v0 | sieve-bge-m3-en-aug-v1 |\n| --- | ------------- | ------------------------------------- | ------------------------------------- | ------ | ---------------------- | ---------------------- |\n| 1 | BIOSSES | 76.27 | 65.29 | 83.38 | 82.91 | 83.97 |\n| 2 | SICK-R | 79.62 | 76.01 | 79.91 | 75.26 | 76.5 |\n| 3 | STS12 | 77.9 | 71.25 | 78.73 | 66.95 | 68.88 |\n| 4 | STS13 | 85.11 | 78.4 | 79.6 | 64 | 69.09 |\n| 5 | STS14 | 80.81 | 74.23 | 79 | 62.83 | 67.74 |\n| 6 | STS15 | 87.48 | 81.41 | 87.81 | 80 | 82.08 |\n| 7 | STS16 | 83.2 | 79.13 | 85.4 | 77.87 | 79.31 |\n| 8 | STS17(en-en) | 86.99 | 85.4 | 87.13 | 84.1 | 85.32 |\n| 9 | STS Benchmark | 86.82 | 81.34 | 84.85 | 74.13 | 77.19 |\n\n\n## License\nThe Ozonetel AI project is licensed under the MIT License. Please refer to the LICENSE file for more information.\n\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "The Ozonetel AI project is designed to provide a user-friendly interface for software development using Ozonetel's in-house AI libraries, models, and software solutions. It offers seamless integration with Ozonetel's advanced AI capabilities, allowing developers to harness the power of AI to enhance their applications.",
"version": "0.0.14",
"project_urls": null,
"split_keywords": [
"machine learning",
" artificial intellegence",
" neural network",
" indexing",
" searching"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0b1f8fd556331f76eba99dc9af99face26bc5699a0cbb8e9e975e9364d3e7e6e",
"md5": "7cd0f08f91f44c5c17bca02d0e75de1b",
"sha256": "4c6bef0b0a45cf3b10a6e841451388f2efaf17ec526e1c30bab1f9d1e8251255"
},
"downloads": -1,
"filename": "ozonetel-ai-0.0.14.tar.gz",
"has_sig": false,
"md5_digest": "7cd0f08f91f44c5c17bca02d0e75de1b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 12142,
"upload_time": "2024-05-06T05:53:51",
"upload_time_iso_8601": "2024-05-06T05:53:51.222436Z",
"url": "https://files.pythonhosted.org/packages/0b/1f/8fd556331f76eba99dc9af99face26bc5699a0cbb8e9e975e9364d3e7e6e/ozonetel-ai-0.0.14.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-06 05:53:51",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "ozonetel-ai"
}