azure-sql-vector-search

Name	azure-sql-vector-search JSON
Version	0.8.3 JSON
	download
home_page	https://github.com/projectAcetylcholine/sql_vector_search
Summary	Azure SQL Vector Search Clients
upload_time	2024-05-17 02:09:04
maintainer	None
docs_url	None
author	Microsoft Corporation
requires_python	None
license	MIT License
keywords	azure sql vector search langchain
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ## Azure SQL Vector Search

This project contains SDKs that allow developers to use Azure SQL Database to build AI applications that need vector search.

### Vectors in SQL Server
Vectors are ordered arrays of numbers (typically floats) that can represent information about some data. For example, an image can be represented as a vector of pixel values, or a string of text can be represented as a vector or ASCII values. The process to turn data into a vector is called vectorization.

### Embeddings
Embeddings are vectors that represent important features of data. Embeddings are often learned by using a deep learning model, and machine learning and AI models utilize them as features. Embeddings can also capture semantic similarity between similar concepts. For example, in generating an embedding for the words person and human, we would expect their embeddings (vector representation) to be similar in value since the words are also semantically similar.
Azure OpenAI features models to create embeddings from text data. The service breaks text out into tokens and generates embeddings using models pretrained by OpenAI. To learn more, see [Creating embeddings with Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=python-new).
Once embeddings are generated, they can be stored into a SQL Server database. This allows you to store the embeddings alongside the data they represent, and to perform vector search queries to find similar data points.

### Vector Search

Vector search refers to the process of finding all vectors in a dataset that are similar to a specific query vector. Therefore, a query vector for the word **human** searches the entire dataset for similar vectors, and thus similar words: in this example it should find the word **person** as a close match. This closeness, or distance, is measured using a distance metric such as cosine distance. The closer vectors are, the more similar they are. 

Consider a scenario where you run a query over millions of document to find the most similar documents in your data. You can create embeddings for your data and query documents using Azure OpenAI. Then, you can perform a vector search to find the most similar documents from your dataset. However, performing a vector search across a few examples is trivial. Performing this same search across thousands, or millions, of data points becomes challenging. There are also trade-offs between exhaustive search and approximate nearest neighbor (ANN) search methods including latency, throughput, accuracy, and cost, all of which depends on the requirements of your application.

Since Azure SQL Database embeddings can be efficiently stored and queried using to columnstore index support, allowing exact nearest neighbor search with great performance, you don't have to decide between accuracy and speed: you can have both. Storing vector embeddings alongside the data in an integrated solution minimizes the need to manage data synchronization and accelerates your time-to-market for AI application development

Similarity enables applications such as:
- Search (where items are ranked by relevance to a query string)
- Clustering (where items are grouped by similarity)
- Recommendations (where related items are recommended)
- Anomaly detection (where outliers with little relatedness are identified)
- Diversity measurement (where similarity distributions are analyzed)
- Classification (where items are classified by their most similar label)

### Classic and Native Vector Support in Azure SQL Database

Until recently, Azure SQL Database did not have a native vector type, a vector is nothing more than an ordered tuple, and relational databases are great at managing tuples. You can think of a tuple as the formal term for a row in a table.

However, this feature is currently in private preview and would be available in the coming months.

### Native Vector Support in Azure SQL and SQL Server 

The first wave of vector support will introduce specialized vector functions to create vectors from JSON array, as they are the most common way to represent a vector, to calculate Euclidean, Cosine distances as well as calculating the Dot Product between two vectors. 

Vectors are stored in an efficient binary format that also enables usage of dedicated CPU vector processing extensions like SIMD and AVX. 

To have the broadest compatibility with any language and platform in the first wave vectors will take advantage of existing VARBINARY data type to store vector binary format. Specialized functions will allow developers to transform stored vector data back into JSON arrays and to check and mandate vector dimensionality. 

Embeddings can be efficiently stored and queried using to columnstore index support, allowing exact nearest neighbour search with great performance.

### Vector Search Modes within this SDK

There are 2 modes available:

- Classic Vector Search
- Native Vector Search

The classic vector search allows you to store vectors leveraging the traditional clustered column index that makes it easy to retrieve the vectors and the associated metadata later.

The native vector search uses newly made available built-in functions to store and query the data for faster performance.

The native vector search is currently in private-preview, and you will need access to this feature to use this mode.

### Distance Strategies

There are three distance strategies when comparing the vectors for the records in the table
- COSINE_SIMILARITY
- EUCLIDEAN_DISTANCE
- DOT_PRODUCT

Cosine Similarity, Inner Product (aka Dot Product), and Euclidean Distance are all measures used to analyze and compute relationships between vectors, but they each serve different purposes and are used in different contexts:

### 1. Cosine Similarity
- **Definition**: Cosine similarity measures the cosine of the angle between two vectors. It is calculated as the dot product of the vectors normalized to both have length 1.
- **Range**: It ranges from -1 to 1. A value of 1 means the vectors are parallel (same direction), 0 means they are orthogonal (no correlation), and -1 means they are anti-parallel (opposite directions).
- **Usage**: It is widely used in text analysis and other areas where the magnitude of the vectors is not as important as the orientation. It's especially useful in measuring similarity in high-dimensional spaces.

### 2. Inner Product (Dot Product)
- **Definition**: The inner product of two vectors is the sum of the products of their corresponding components. This is equivalent to projecting one vector onto another and scaling it by the length of the other vector.
- **Result Type**: The result is a scalar. Positive values indicate a certain degree of alignment between the vectors, zero suggests orthogonality, and negative values indicate a degree of opposition.
- **Usage**: Inner product is fundamental in geometry for defining projections, in physics for work calculations, and in machine learning for various linear algebra operations.

When the vectors are normalized (modified so that their magnitude is 1), Cosine Similarity and Inner Product (dot product) computes to the same value, however the inner product is a faster operation and is usually preferred over large datasets.

### 3. Euclidean Distance
- **Definition**: Euclidean distance is the straight-line distance between two points in Euclidean space, calculated by taking the square root of the sum of the squared differences between corresponding elements of the vectors.
- **Range**: It ranges from 0 to infinity. A distance of 0 means the points are identical, and larger values indicate points that are further apart.
- **Usage**: It's commonly used in clustering and classification tasks to measure the actual 'distance' between samples. It's the basis for many algorithms, including K-means clustering and K-nearest neighbors.

### Key Differences
- **Context of Use**: Cosine similarity is used when the magnitude of the vectors is not important, such as in text similarity calculations. The inner product is often used in contexts where alignment or opposition between vectors is critical, and Euclidean distance is used when the actual spatial distance is needed.
- **Sensitivity to Magnitude**: The inner product and Euclidean distance are affected by the magnitude (length) of the vectors, whereas cosine similarity only depends on the angle between vectors, making it magnitude-invariant.
- **Computation**: While both inner product and cosine similarity involve dot products, cosine similarity includes an additional step of normalizing the vectors, and Euclidean distance involves computing the square root of the sum of squared differences.

Each of these metrics applies best to specific situations and can yield very different insights depending on the application.

## Pre-Requisites 

You will need to download the ODBC driver and install the python library via pip to get started

Visit [this page](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver16) for instructions on how to install the ODBC driver for your environment

There are binaries for Windows, Linux and Mac OS X

Once that is done, you will need to set up the connection string as an environment variable. We can use the following environment variable to connect

AZURE_SQL_CONNECTION_STRING

### Linux and Mac OS X

If you are running on Mac OS X or Linux you can set the environment variable as follows:

````bash
export AZURE_SQL_CONNECTION_STRING='your_connection_string_here'
````
### Windows 

````bash
set AZURE_SQL_CONNECTION_STRING=your_connection_string_here
````

### Installing the Python SDK via pip

Use the following command to install the python SDK for Azure SQL Vector Search

````bash
pip install azure-sql-vector-search
````

## Examples of How to Insert Data using the Vector Search Client

In this section, we are going to cover the following

- How to Insert Data with Classic Client
- How to Insert Data with Native Vector Client

#### Using AzureSQLClassicVectorSearchClient

````python 
import os
from azure_sql_vector_search.classic_vector_search import AzureSQLClassicVectorSearchClient

connection_string = os.environ.get("AZURE_SQL_CONNECTION_STRING")

embeddings: list[list[float]] = [
    [0.5, 0.5, 0.5, 0.5],
    [-0.5, -0.5, -0.5, -0.5],
    [0.5, 0.5, 0.5, 0.5],
    [1.0, 0.0, 0.0, 0.0],
    [0.0, 1, 0, 0],
    [0, 0, 0, 1],
    [0.8, 0.6, 0, 0],
    [0, 0, 0.6, 0.8],
    [0, 0.8, 0.6, 0],
    [0, 0, 0.8, 0.6],
    [0.0, 0.0, 0.0, 0.0],
]

active_options = [True, False]
code_options = ["python", "typescript", "", None]
age_options = [40, 50, 60, 65, 75]

vector_search = AzureSQLClassicVectorSearchClient(connection_string, "products")

for i in range(len(embeddings)):
    embedding = embeddings[i]
    name = "Izzy {}".format(i)
    content = "Israel Ekpo {} {}".format(i, embedding)
    active = active_options[i % 2]
    code = code_options[i % 4]
    age = age_options[i % 5]

    # metadata containing integer, float, string, boolean and null types
    metadata = {"content": content, "name": name, "active": active, "code": code, "age": age, "mass": i / age}

    print("processing id {} embedding -> {}, metadata -> {}".format(i, embedding, metadata))
    result = vector_search.insert_row(content, metadata, embedding)

````

#### Using AzureSQLNativeVectorSearchClient

````python 
import os
from azure_sql_vector_search.native_vector_search import AzureSQLNativeVectorSearchClient

connection_string = os.environ.get("AZURE_SQL_CONNECTION_STRING")

embeddings: list[list[float]] = [
    [0.5, 0.5, 0.5, 0.5],
    [-0.5, -0.5, -0.5, -0.5],
    [0.5, 0.5, 0.5, 0.5],
    [1.0, 0.0, 0.0, 0.0],
    [0.0, 1, 0, 0],
    [0, 0, 0, 1],
    [0.8, 0.6, 0, 0],
    [0, 0, 0.6, 0.8],
    [0, 0.8, 0.6, 0],
    [0, 0, 0.8, 0.6],
    [0.0, 0.0, 0.0, 0.0],
]

active_options = [True, False]
code_options = ["python", "typescript", "", None]
age_options = [40, 50, 60, 65, 75]

vector_search = AzureSQLNativeVectorSearchClient(connection_string, "n_products")

for i in range(len(embeddings)):
    embedding = embeddings[i]
    name = "Izzy {}".format(i)
    content = "Israel Ekpo {} {}".format(i, embedding)
    active = active_options[i % 2]
    code = code_options[i % 4]
    age = age_options[i % 5]

    # metadata containing integer, float, string, boolean and null types
    metadata = {"content": content, "name": name, "active": active, "code": code, "age": age, "mass": i / age}
    result = vector_search.insert_row(content, metadata, embedding)
    print(i, "->", embedding)

````
## Examples of How to Query the Vector Database using the Search Client

In this section, we are going to cover the following examples using the Classic and Native clients

- Query with default values for COSINE, INNER PRODUCT AND EUCLIDEAN
- Query with k
- Query with filters
- Query with k and filters
- Query with distance strategy method


#### Using AzureSQLClassicVectorSearchClient

````python 
import os
from azure_sql_vector_search import AzureSQLClassicVectorSearchClient, DistanceMetric

connection_string = os.environ.get("AZURE_SQL_CONNECTION_STRING")

embedding = [0.5, 0.5, 0.5, 0.5]
k = 40
filters = {"active": True, "age": 40}

vector_search = AzureSQLClassicVectorSearchClient(connection_string, "products")

# Using custom value for top k results and filters
print("Cosine Similarity for ", embedding)
results_1 = vector_search.compute_similarity(embedding, DistanceMetric.COSINE_SIMILARITY, k, filters=filters)
print(results_1)

# No filters but showing the top 40 results
print("Inner Product Distance for ", embedding)
results_2 = vector_search.compute_similarity(embedding, DistanceMetric.DOT_PRODUCT, k)
print(results_2)

# Using no filters with the default top 4 results for k
print("Euclidean Distance for ", embedding)
results_3 = vector_search.compute_similarity(embedding, DistanceMetric.EUCLIDEAN_DISTANCE)
print(results_3)

print("Cosine Similarity for ", embedding)
results_4 = vector_search.cosine_similarity(embedding, k=k, filters=filters)
print(results_4)

print("Inner Product Distance for ", embedding)
results_5 = vector_search.inner_product(embedding, k=k, filters=filters)
print(results_5)

print("Euclidean Distance for ", embedding)
results_6 = vector_search.euclidean_distance(embedding, k=k, filters=filters)
print(results_6)
````

#### Using AzureSQLNativeVectorSearchClient

````python 

import os
from azure_sql_vector_search import AzureSQLNativeVectorSearchClient, DistanceMetric

connection_string = os.environ.get("AZURE_SQL_CONNECTION_STRING")

embedding = [0.5, 0.5, 0.5, 0.5]
k = 40
filters = {"active": True, "age": 40}

vector_search = AzureSQLNativeVectorSearchClient(connection_string, "n_products")

# Using custom value for top k results and filters
print("Cosine Similarity for ", embedding)
results_1 = vector_search.compute_similarity(embedding, DistanceMetric.COSINE_SIMILARITY, k, filters=filters)
print(results_1)

# No filters but showing the top 40 results
print("Inner Product Distance for ", embedding)
results_2 = vector_search.compute_similarity(embedding, DistanceMetric.DOT_PRODUCT, k)
print(results_2)

# Using no filters with the default top 4 results for k
print("Euclidean Distance for ", embedding)
results_3 = vector_search.compute_similarity(embedding, DistanceMetric.EUCLIDEAN_DISTANCE)
print(results_3)

# Using filters to pick top 40 matches for Cosine Similarity
print("Cosine Similarity for ", embedding)
results_4 = vector_search.cosine_similarity(embedding, k=k, filters=filters)
print(results_4)

# Using filters to pick top 40 matches for Dot Product
print("Inner Product Distance for ", embedding)
results_5 = vector_search.inner_product(embedding, k=k, filters=filters)
print(results_5)

# Using filters to pick top 40 matches for Euclidean Distance
print("Euclidean Distance for ", embedding)
results_6 = vector_search.euclidean_distance(embedding, k=k, filters=filters)
print(results_6)


````
## How to Reach Out with Questions and Feedback

If you have any questions, please reach out to us at ***vectorsqlintegration at service dot microsoft dot com***

Happy vector search!

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/projectAcetylcholine/sql_vector_search",
    "name": "azure-sql-vector-search",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "azure sql vector search langchain",
    "author": "Microsoft Corporation",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/11/1b/bf35461207cd53be766cf59aae75e3404d560ff4fca3b4974d337500a7ad/azure_sql_vector_search-0.8.3.tar.gz",
    "platform": null,
    "description": "## Azure SQL Vector Search\n\nThis project contains SDKs that allow developers to use Azure SQL Database to build AI applications that need vector search.\n\n### Vectors in SQL Server\nVectors are ordered arrays of numbers (typically floats) that can represent information about some data. For example, an image can be represented as a vector of pixel values, or a string of text can be represented as a vector or ASCII values. The process to turn data into a vector is called vectorization.\n\n### Embeddings\nEmbeddings are vectors that represent important features of data. Embeddings are often learned by using a deep learning model, and machine learning and AI models utilize them as features. Embeddings can also capture semantic similarity between similar concepts. For example, in generating an embedding for the words person and human, we would expect their embeddings (vector representation) to be similar in value since the words are also semantically similar.\nAzure OpenAI features models to create embeddings from text data. The service breaks text out into tokens and generates embeddings using models pretrained by OpenAI. To learn more, see [Creating embeddings with Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=python-new).\nOnce embeddings are generated, they can be stored into a SQL Server database. This allows you to store the embeddings alongside the data they represent, and to perform vector search queries to find similar data points.\n\n### Vector Search\n\nVector search refers to the process of finding all vectors in a dataset that are similar to a specific query vector. Therefore, a query vector for the word **human** searches the entire dataset for similar vectors, and thus similar words: in this example it should find the word **person** as a close match. This closeness, or distance, is measured using a distance metric such as cosine distance. The closer vectors are, the more similar they are. \n\nConsider a scenario where you run a query over millions of document to find the most similar documents in your data. You can create embeddings for your data and query documents using Azure OpenAI. Then, you can perform a vector search to find the most similar documents from your dataset. However, performing a vector search across a few examples is trivial. Performing this same search across thousands, or millions, of data points becomes challenging. There are also trade-offs between exhaustive search and approximate nearest neighbor (ANN) search methods including latency, throughput, accuracy, and cost, all of which depends on the requirements of your application.\n\nSince Azure SQL Database embeddings can be efficiently stored and queried using to columnstore index support, allowing exact nearest neighbor search with great performance, you don't have to decide between accuracy and speed: you can have both. Storing vector embeddings alongside the data in an integrated solution minimizes the need to manage data synchronization and accelerates your time-to-market for AI application development\n\nSimilarity enables applications such as:\n- Search (where items are ranked by relevance to a query string)\n- Clustering (where items are grouped by similarity)\n- Recommendations (where related items are recommended)\n- Anomaly detection (where outliers with little relatedness are identified)\n- Diversity measurement (where similarity distributions are analyzed)\n- Classification (where items are classified by their most similar label)\n\n### Classic and Native Vector Support in Azure SQL Database\n\nUntil recently, Azure SQL Database did not have a native vector type, a vector is nothing more than an ordered tuple, and relational databases are great at managing tuples. You can think of a tuple as the formal term for a row in a table.\n\nHowever, this feature is currently in private preview and would be available in the coming months.\n\n### Native Vector Support in Azure SQL and SQL Server \n\nThe first wave of vector support will introduce specialized vector functions to create vectors from JSON array, as they are the most common way to represent a vector, to calculate Euclidean, Cosine distances as well as calculating the Dot Product between two vectors. \n\nVectors are stored in an efficient binary format that also enables usage of dedicated CPU vector processing extensions like SIMD and AVX. \n\nTo have the broadest compatibility with any language and platform in the first wave vectors will take advantage of existing VARBINARY data type to store vector binary format. Specialized functions will allow developers to transform stored vector data back into JSON arrays and to check and mandate vector dimensionality. \n\nEmbeddings can be efficiently stored and queried using to columnstore index support, allowing exact nearest neighbour search with great performance.\n\n### Vector Search Modes within this SDK\n\nThere are 2 modes available:\n\n- Classic Vector Search\n- Native Vector Search\n\nThe classic vector search allows you to store vectors leveraging the traditional clustered column index that makes it easy to retrieve the vectors and the associated metadata later.\n\nThe native vector search uses newly made available built-in functions to store and query the data for faster performance.\n\nThe native vector search is currently in private-preview, and you will need access to this feature to use this mode.\n\n### Distance Strategies\n\nThere are three distance strategies when comparing the vectors for the records in the table\n- COSINE_SIMILARITY\n- EUCLIDEAN_DISTANCE\n- DOT_PRODUCT\n\nCosine Similarity, Inner Product (aka Dot Product), and Euclidean Distance are all measures used to analyze and compute relationships between vectors, but they each serve different purposes and are used in different contexts:\n\n### 1. Cosine Similarity\n- **Definition**: Cosine similarity measures the cosine of the angle between two vectors. It is calculated as the dot product of the vectors normalized to both have length 1.\n- **Range**: It ranges from -1 to 1. A value of 1 means the vectors are parallel (same direction), 0 means they are orthogonal (no correlation), and -1 means they are anti-parallel (opposite directions).\n- **Usage**: It is widely used in text analysis and other areas where the magnitude of the vectors is not as important as the orientation. It's especially useful in measuring similarity in high-dimensional spaces.\n\n### 2. Inner Product (Dot Product)\n- **Definition**: The inner product of two vectors is the sum of the products of their corresponding components. This is equivalent to projecting one vector onto another and scaling it by the length of the other vector.\n- **Result Type**: The result is a scalar. Positive values indicate a certain degree of alignment between the vectors, zero suggests orthogonality, and negative values indicate a degree of opposition.\n- **Usage**: Inner product is fundamental in geometry for defining projections, in physics for work calculations, and in machine learning for various linear algebra operations.\n\nWhen the vectors are normalized (modified so that their magnitude is 1), Cosine Similarity and Inner Product (dot product) computes to the same value, however the inner product is a faster operation and is usually preferred over large datasets.\n\n### 3. Euclidean Distance\n- **Definition**: Euclidean distance is the straight-line distance between two points in Euclidean space, calculated by taking the square root of the sum of the squared differences between corresponding elements of the vectors.\n- **Range**: It ranges from 0 to infinity. A distance of 0 means the points are identical, and larger values indicate points that are further apart.\n- **Usage**: It's commonly used in clustering and classification tasks to measure the actual 'distance' between samples. It's the basis for many algorithms, including K-means clustering and K-nearest neighbors.\n\n### Key Differences\n- **Context of Use**: Cosine similarity is used when the magnitude of the vectors is not important, such as in text similarity calculations. The inner product is often used in contexts where alignment or opposition between vectors is critical, and Euclidean distance is used when the actual spatial distance is needed.\n- **Sensitivity to Magnitude**: The inner product and Euclidean distance are affected by the magnitude (length) of the vectors, whereas cosine similarity only depends on the angle between vectors, making it magnitude-invariant.\n- **Computation**: While both inner product and cosine similarity involve dot products, cosine similarity includes an additional step of normalizing the vectors, and Euclidean distance involves computing the square root of the sum of squared differences.\n\nEach of these metrics applies best to specific situations and can yield very different insights depending on the application.\n\n## Pre-Requisites \n\nYou will need to download the ODBC driver and install the python library via pip to get started\n\nVisit [this page](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver16) for instructions on how to install the ODBC driver for your environment\n\nThere are binaries for Windows, Linux and Mac OS X\n\nOnce that is done, you will need to set up the connection string as an environment variable. We can use the following environment variable to connect\n\nAZURE_SQL_CONNECTION_STRING\n\n### Linux and Mac OS X\n\nIf you are running on Mac OS X or Linux you can set the environment variable as follows:\n\n````bash\nexport AZURE_SQL_CONNECTION_STRING='your_connection_string_here'\n````\n### Windows \n\n````bash\nset AZURE_SQL_CONNECTION_STRING=your_connection_string_here\n````\n\n### Installing the Python SDK via pip\n\nUse the following command to install the python SDK for Azure SQL Vector Search\n\n````bash\npip install azure-sql-vector-search\n````\n\n## Examples of How to Insert Data using the Vector Search Client\n\nIn this section, we are going to cover the following\n\n- How to Insert Data with Classic Client\n- How to Insert Data with Native Vector Client\n\n#### Using AzureSQLClassicVectorSearchClient\n\n````python \nimport os\nfrom azure_sql_vector_search.classic_vector_search import AzureSQLClassicVectorSearchClient\n\nconnection_string = os.environ.get(\"AZURE_SQL_CONNECTION_STRING\")\n\nembeddings: list[list[float]] = [\n    [0.5, 0.5, 0.5, 0.5],\n    [-0.5, -0.5, -0.5, -0.5],\n    [0.5, 0.5, 0.5, 0.5],\n    [1.0, 0.0, 0.0, 0.0],\n    [0.0, 1, 0, 0],\n    [0, 0, 0, 1],\n    [0.8, 0.6, 0, 0],\n    [0, 0, 0.6, 0.8],\n    [0, 0.8, 0.6, 0],\n    [0, 0, 0.8, 0.6],\n    [0.0, 0.0, 0.0, 0.0],\n]\n\nactive_options = [True, False]\ncode_options = [\"python\", \"typescript\", \"\", None]\nage_options = [40, 50, 60, 65, 75]\n\nvector_search = AzureSQLClassicVectorSearchClient(connection_string, \"products\")\n\nfor i in range(len(embeddings)):\n    embedding = embeddings[i]\n    name = \"Izzy {}\".format(i)\n    content = \"Israel Ekpo {} {}\".format(i, embedding)\n    active = active_options[i % 2]\n    code = code_options[i % 4]\n    age = age_options[i % 5]\n\n    # metadata containing integer, float, string, boolean and null types\n    metadata = {\"content\": content, \"name\": name, \"active\": active, \"code\": code, \"age\": age, \"mass\": i / age}\n\n    print(\"processing id {} embedding -> {}, metadata -> {}\".format(i, embedding, metadata))\n    result = vector_search.insert_row(content, metadata, embedding)\n\n````\n\n#### Using AzureSQLNativeVectorSearchClient\n\n````python \nimport os\nfrom azure_sql_vector_search.native_vector_search import AzureSQLNativeVectorSearchClient\n\nconnection_string = os.environ.get(\"AZURE_SQL_CONNECTION_STRING\")\n\nembeddings: list[list[float]] = [\n    [0.5, 0.5, 0.5, 0.5],\n    [-0.5, -0.5, -0.5, -0.5],\n    [0.5, 0.5, 0.5, 0.5],\n    [1.0, 0.0, 0.0, 0.0],\n    [0.0, 1, 0, 0],\n    [0, 0, 0, 1],\n    [0.8, 0.6, 0, 0],\n    [0, 0, 0.6, 0.8],\n    [0, 0.8, 0.6, 0],\n    [0, 0, 0.8, 0.6],\n    [0.0, 0.0, 0.0, 0.0],\n]\n\nactive_options = [True, False]\ncode_options = [\"python\", \"typescript\", \"\", None]\nage_options = [40, 50, 60, 65, 75]\n\nvector_search = AzureSQLNativeVectorSearchClient(connection_string, \"n_products\")\n\nfor i in range(len(embeddings)):\n    embedding = embeddings[i]\n    name = \"Izzy {}\".format(i)\n    content = \"Israel Ekpo {} {}\".format(i, embedding)\n    active = active_options[i % 2]\n    code = code_options[i % 4]\n    age = age_options[i % 5]\n\n    # metadata containing integer, float, string, boolean and null types\n    metadata = {\"content\": content, \"name\": name, \"active\": active, \"code\": code, \"age\": age, \"mass\": i / age}\n    result = vector_search.insert_row(content, metadata, embedding)\n    print(i, \"->\", embedding)\n\n````\n## Examples of How to Query the Vector Database using the Search Client\n\nIn this section, we are going to cover the following examples using the Classic and Native clients\n\n- Query with default values for COSINE, INNER PRODUCT AND EUCLIDEAN\n- Query with k\n- Query with filters\n- Query with k and filters\n- Query with distance strategy method\n\n\n#### Using AzureSQLClassicVectorSearchClient\n\n````python \nimport os\nfrom azure_sql_vector_search import AzureSQLClassicVectorSearchClient, DistanceMetric\n\nconnection_string = os.environ.get(\"AZURE_SQL_CONNECTION_STRING\")\n\nembedding = [0.5, 0.5, 0.5, 0.5]\nk = 40\nfilters = {\"active\": True, \"age\": 40}\n\nvector_search = AzureSQLClassicVectorSearchClient(connection_string, \"products\")\n\n# Using custom value for top k results and filters\nprint(\"Cosine Similarity for \", embedding)\nresults_1 = vector_search.compute_similarity(embedding, DistanceMetric.COSINE_SIMILARITY, k, filters=filters)\nprint(results_1)\n\n# No filters but showing the top 40 results\nprint(\"Inner Product Distance for \", embedding)\nresults_2 = vector_search.compute_similarity(embedding, DistanceMetric.DOT_PRODUCT, k)\nprint(results_2)\n\n# Using no filters with the default top 4 results for k\nprint(\"Euclidean Distance for \", embedding)\nresults_3 = vector_search.compute_similarity(embedding, DistanceMetric.EUCLIDEAN_DISTANCE)\nprint(results_3)\n\nprint(\"Cosine Similarity for \", embedding)\nresults_4 = vector_search.cosine_similarity(embedding, k=k, filters=filters)\nprint(results_4)\n\nprint(\"Inner Product Distance for \", embedding)\nresults_5 = vector_search.inner_product(embedding, k=k, filters=filters)\nprint(results_5)\n\nprint(\"Euclidean Distance for \", embedding)\nresults_6 = vector_search.euclidean_distance(embedding, k=k, filters=filters)\nprint(results_6)\n````\n\n#### Using AzureSQLNativeVectorSearchClient\n\n````python \n\nimport os\nfrom azure_sql_vector_search import AzureSQLNativeVectorSearchClient, DistanceMetric\n\nconnection_string = os.environ.get(\"AZURE_SQL_CONNECTION_STRING\")\n\nembedding = [0.5, 0.5, 0.5, 0.5]\nk = 40\nfilters = {\"active\": True, \"age\": 40}\n\nvector_search = AzureSQLNativeVectorSearchClient(connection_string, \"n_products\")\n\n# Using custom value for top k results and filters\nprint(\"Cosine Similarity for \", embedding)\nresults_1 = vector_search.compute_similarity(embedding, DistanceMetric.COSINE_SIMILARITY, k, filters=filters)\nprint(results_1)\n\n# No filters but showing the top 40 results\nprint(\"Inner Product Distance for \", embedding)\nresults_2 = vector_search.compute_similarity(embedding, DistanceMetric.DOT_PRODUCT, k)\nprint(results_2)\n\n# Using no filters with the default top 4 results for k\nprint(\"Euclidean Distance for \", embedding)\nresults_3 = vector_search.compute_similarity(embedding, DistanceMetric.EUCLIDEAN_DISTANCE)\nprint(results_3)\n\n# Using filters to pick top 40 matches for Cosine Similarity\nprint(\"Cosine Similarity for \", embedding)\nresults_4 = vector_search.cosine_similarity(embedding, k=k, filters=filters)\nprint(results_4)\n\n# Using filters to pick top 40 matches for Dot Product\nprint(\"Inner Product Distance for \", embedding)\nresults_5 = vector_search.inner_product(embedding, k=k, filters=filters)\nprint(results_5)\n\n# Using filters to pick top 40 matches for Euclidean Distance\nprint(\"Euclidean Distance for \", embedding)\nresults_6 = vector_search.euclidean_distance(embedding, k=k, filters=filters)\nprint(results_6)\n\n\n````\n## How to Reach Out with Questions and Feedback\n\nIf you have any questions, please reach out to us at ***vectorsqlintegration at service dot microsoft dot com***\n\nHappy vector search! \n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Azure SQL Vector Search Clients",
    "version": "0.8.3",
    "project_urls": {
        "Homepage": "https://github.com/projectAcetylcholine/sql_vector_search"
    },
    "split_keywords": [
        "azure",
        "sql",
        "vector",
        "search",
        "langchain"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "064076663732c12e0c2893320298d12b8f79db13cb41d821cde292caef6f732e",
                "md5": "11c912afaaf79ddfbbed51d577304c21",
                "sha256": "75f9f1ea93f0def0cc728b299dabad931380ab0be7947f091568657ce6d03f36"
            },
            "downloads": -1,
            "filename": "azure_sql_vector_search-0.8.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "11c912afaaf79ddfbbed51d577304c21",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 14334,
            "upload_time": "2024-05-17T02:09:02",
            "upload_time_iso_8601": "2024-05-17T02:09:02.586585Z",
            "url": "https://files.pythonhosted.org/packages/06/40/76663732c12e0c2893320298d12b8f79db13cb41d821cde292caef6f732e/azure_sql_vector_search-0.8.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "111bbf35461207cd53be766cf59aae75e3404d560ff4fca3b4974d337500a7ad",
                "md5": "f8d8fef94093687181f5122e1b93450d",
                "sha256": "e4599124ac91a954d94553ecc7ee649d3a6e36c9fdff0cea5cef326cf2d2b3ca"
            },
            "downloads": -1,
            "filename": "azure_sql_vector_search-0.8.3.tar.gz",
            "has_sig": false,
            "md5_digest": "f8d8fef94093687181f5122e1b93450d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 16282,
            "upload_time": "2024-05-17T02:09:04",
            "upload_time_iso_8601": "2024-05-17T02:09:04.256182Z",
            "url": "https://files.pythonhosted.org/packages/11/1b/bf35461207cd53be766cf59aae75e3404d560ff4fca3b4974d337500a7ad/azure_sql_vector_search-0.8.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-17 02:09:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "projectAcetylcholine",
    "github_project": "sql_vector_search",
    "github_not_found": true,
    "lcname": "azure-sql-vector-search"
}

Microsoft Corporation