openaivec


Nameopenaivec JSON
Version 0.12.1 PyPI version JSON
download
home_pageNone
SummaryGenerative mutation for tabular calculation
upload_time2025-08-06 03:28:36
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords llm openai openai-api openai-python pandas pyspark
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # openaivec

**Transform your data analysis with AI-powered text processing at scale.**

**openaivec** enables data analysts to seamlessly integrate OpenAI's language models into their pandas and Spark workflows. Process thousands of text records with natural language instructions, turning unstructured data into actionable insights with just a few lines of code.

## 🚀 Quick Start: From Text to Insights in Seconds

Imagine analyzing 10,000 customer reviews. Instead of manual work, just write:

```python
import pandas as pd
from openaivec import pandas_ext

# Your data
reviews = pd.DataFrame({
    "review": ["Great product, fast delivery!", "Terrible quality, very disappointed", ...]
})

# AI-powered analysis in one line
results = reviews.assign(
    sentiment=lambda df: df.review.ai.responses("Classify sentiment: positive/negative/neutral"),
    issues=lambda df: df.review.ai.responses("Extract main issues or compliments"),
    priority=lambda df: df.review.ai.responses("Priority for follow-up: low/medium/high")
)
```

**Result**: Thousands of reviews classified and analyzed in minutes, not days.

📓 **[Try it yourself →](https://microsoft.github.io/openaivec/examples/pandas/)**

## 💡 Real-World Impact

### Customer Feedback Analysis
```python
# Process 50,000 support tickets automatically
tickets.assign(
    category=lambda df: df.description.ai.responses("Categorize: billing/technical/feature_request"),
    urgency=lambda df: df.description.ai.responses("Urgency level: low/medium/high/critical"),
    solution_type=lambda df: df.description.ai.responses("Best resolution approach")
)
```

### Market Research at Scale
```python
# Analyze multilingual social media data
social_data.assign(
    english_text=lambda df: df.post.ai.responses("Translate to English"),
    brand_mention=lambda df: df.english_text.ai.responses("Extract brand mentions and sentiment"),
    market_trend=lambda df: df.english_text.ai.responses("Identify emerging trends or concerns")
)
```

### Survey Data Transformation
```python
# Convert free-text responses to structured data
from pydantic import BaseModel

class Demographics(BaseModel):
    age_group: str
    location: str
    interests: list[str]

survey_responses.assign(
    structured=lambda df: df.response.ai.responses(
        "Extract demographics as structured data", 
        response_format=Demographics
    )
).ai.extract("structured")  # Auto-expands to columns
```

📓 **[See more examples →](https://microsoft.github.io/openaivec/examples/)**

# Overview

This package provides a vectorized interface for the OpenAI API, enabling you to process multiple inputs with a single
API call instead of sending requests one by one.
This approach helps reduce latency and simplifies your code.

Additionally, it integrates effortlessly with Pandas DataFrames and Apache Spark UDFs, making it easy to incorporate
into your data processing pipelines.

## Features

- Vectorized API requests for processing multiple inputs at once.
- Seamless integration with Pandas DataFrames.
- A UDF builder for Apache Spark.
- Compatibility with multiple OpenAI clients, including Azure OpenAI.

## Key Benefits

- **🚀 Performance**: Vectorized processing handles thousands of records in minutes, not hours
- **💰 Cost Efficiency**: Automatic deduplication significantly reduces API costs on typical datasets  
- **🔗 Integration**: Works within existing pandas/Spark workflows without architectural changes
- **📈 Scalability**: Same API scales from exploratory analysis (100s of records) to production systems (millions of records)
- **🎯 Pre-configured Tasks**: Ready-to-use task library with optimized prompts for common use cases
- **🏢 Enterprise Ready**: Microsoft Fabric integration, Apache Spark UDFs, Azure OpenAI compatibility

## Requirements

- Python 3.10 or higher

## Installation

Install the package with:

```bash
pip install openaivec
```

If you want to uninstall the package, you can do so with:

```bash
pip uninstall openaivec
```

## Basic Usage

### Direct API Usage

For maximum control over batch processing:

```python
import os
from openai import OpenAI
from openaivec import BatchResponses

# Initialize the batch client
client = BatchResponses(
    client=OpenAI(),
    model_name="gpt-4.1-mini",
    system_message="Please answer only with 'xx family' and do not output anything else."
)

result = client.parse(["panda", "rabbit", "koala"], batch_size=32)
print(result)  # Expected output: ['bear family', 'rabbit family', 'koala family']
```

📓 **[Complete tutorial →](https://microsoft.github.io/openaivec/examples/pandas/)**

### Pandas Integration (Recommended)

The easiest way to get started with your DataFrames:

```python
import pandas as pd
from openaivec import pandas_ext

# Setup (optional - uses OPENAI_API_KEY environment variable by default)
pandas_ext.responses_model("gpt-4.1-mini")

# Create your data
df = pd.DataFrame({"name": ["panda", "rabbit", "koala"]})

# Add AI-powered columns
result = df.assign(
    family=lambda df: df.name.ai.responses("What animal family? Answer with 'X family'"),
    habitat=lambda df: df.name.ai.responses("Primary habitat in one word"),
    fun_fact=lambda df: df.name.ai.responses("One interesting fact in 10 words or less")
)
```

| name   | family        | habitat | fun_fact                    |
|--------|---------------|---------|-----------------------------|
| panda  | bear family   | forest  | Eats bamboo 14 hours daily  |
| rabbit | rabbit family | meadow  | Can see nearly 360 degrees  |
| koala  | marsupial family | tree   | Sleeps 22 hours per day    |

📓 **[Interactive pandas examples →](https://microsoft.github.io/openaivec/examples/pandas/)**

### Using Pre-configured Tasks

For common text processing operations, openaivec provides ready-to-use tasks that eliminate the need to write custom prompts:

```python
from openaivec.task import nlp, customer_support

# Text analysis with pre-configured tasks
text_df = pd.DataFrame({
    "text": [
        "Great product, fast delivery!",
        "Need help with billing issue",
        "How do I reset my password?"
    ]
})

# Use pre-configured tasks for consistent, optimized results
results = text_df.assign(
    sentiment=lambda df: df.text.ai.task(nlp.SENTIMENT_ANALYSIS),
    entities=lambda df: df.text.ai.task(nlp.NAMED_ENTITY_RECOGNITION),
    intent=lambda df: df.text.ai.task(customer_support.INTENT_ANALYSIS),
    urgency=lambda df: df.text.ai.task(customer_support.URGENCY_ANALYSIS)
)

# Extract structured results into separate columns (one at a time)
extracted_results = (results
    .ai.extract("sentiment")
    .ai.extract("entities") 
    .ai.extract("intent")
    .ai.extract("urgency")
)
```

**Available Task Categories:**
- **Text Analysis**: `nlp.SENTIMENT_ANALYSIS`, `nlp.TRANSLATION`, `nlp.NAMED_ENTITY_RECOGNITION`, `nlp.KEYWORD_EXTRACTION`
- **Content Classification**: `customer_support.INTENT_ANALYSIS`, `customer_support.URGENCY_ANALYSIS`, `customer_support.INQUIRY_CLASSIFICATION`

**Benefits of Pre-configured Tasks:**
- Optimized prompts tested across diverse datasets
- Consistent structured outputs with Pydantic validation
- Multilingual support with standardized categorical fields
- Extensible framework for adding domain-specific tasks
- Direct compatibility with Spark UDFs

### Asynchronous Processing with `.aio`

For high-performance concurrent processing, use the `.aio` accessor which provides asynchronous versions of all AI operations:

```python
import asyncio
import pandas as pd
from openaivec import pandas_ext

# Setup (same as synchronous version)
pandas_ext.responses_model("gpt-4.1-mini")

df = pd.DataFrame({"text": [
    "This product is amazing!",
    "Terrible customer service",
    "Good value for money",
    "Not what I expected"
] * 250})  # 1000 rows for demonstration

async def process_data():
    # Asynchronous processing with fine-tuned concurrency control
    results = await df["text"].aio.responses(
        "Analyze sentiment and classify as positive/negative/neutral",
        batch_size=64,        # Process 64 items per API request
        max_concurrency=12    # Allow up to 12 concurrent requests
    )
    return results

# Run the async operation
sentiments = asyncio.run(process_data())
```

**Key Parameters for Performance Tuning:**

- **`batch_size`** (default: 128): Controls how many inputs are grouped into a single API request. Higher values reduce API call overhead but increase memory usage and request processing time.
- **`max_concurrency`** (default: 8): Limits the number of concurrent API requests. Higher values increase throughput but may hit rate limits or overwhelm the API.

**Performance Benefits:**
- Process thousands of records in parallel
- Automatic request batching and deduplication
- Built-in rate limiting and error handling
- Memory-efficient streaming for large datasets

## Using with Apache Spark UDFs

Scale to enterprise datasets with distributed processing:

📓 **[Complete Spark tutorial →](https://microsoft.github.io/openaivec/examples/spark/)**

First, obtain a Spark session and configure authentication:

```python
import os
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
sc = spark.sparkContext

# Configure authentication via SparkContext environment variables
# Option 1: Using OpenAI
sc.environment["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY")

# Option 2: Using Azure OpenAI
# sc.environment["AZURE_OPENAI_API_KEY"] = os.environ.get("AZURE_OPENAI_API_KEY")
# sc.environment["AZURE_OPENAI_API_ENDPOINT"] = os.environ.get("AZURE_OPENAI_API_ENDPOINT")
# sc.environment["AZURE_OPENAI_API_VERSION"] = os.environ.get("AZURE_OPENAI_API_VERSION")
```

Next, create and register UDFs using the provided functions:

```python
from openaivec.spark import responses_udf, responses_udf_from_task, embeddings_udf, count_tokens_udf
from pydantic import BaseModel

# --- Register Responses UDF (String Output) ---
spark.udf.register(
    "parse_flavor",
    responses_udf(
        instructions="Extract flavor-related information. Return only the concise flavor name.",
        response_format=str,        # Specify string output
        model_name="gpt-4.1-mini",  # Optional, defaults to gpt-4.1-mini
        batch_size=64,              # Optimize for Spark partition sizes
        max_concurrency=4           # Conservative for distributed processing
    )
)

# --- Register Responses UDF (Structured Output with Pydantic) ---
class Translation(BaseModel):
    en: str
    fr: str
    ja: str

spark.udf.register(
    "translate_struct",
    responses_udf(
        instructions="Translate the text to English, French, and Japanese.",
        response_format=Translation,    # Specify Pydantic model for structured output
        model_name="gpt-4.1-mini",      # Optional, defaults to gpt-4.1-mini
        batch_size=32,                  # Smaller batches for complex structured outputs
        max_concurrency=6               # Concurrent requests PER EXECUTOR
    )
)

# --- Register Embeddings UDF ---
spark.udf.register(
    "embed_text",
    embeddings_udf(
        model_name="text-embedding-3-small",  # Optional, defaults to text-embedding-3-small
        batch_size=128,                       # Larger batches for embeddings
        max_concurrency=8                     # Concurrent requests PER EXECUTOR
    )
)

# --- Register Token Counting UDF ---
spark.udf.register("count_tokens", count_tokens_udf("gpt-4o"))

# --- Register UDFs with Pre-configured Tasks ---
from openaivec.task import nlp, customer_support

spark.udf.register(
    "analyze_sentiment",
    responses_udf_from_task(
        task=nlp.SENTIMENT_ANALYSIS,
        model_name="gpt-4.1-mini",  # Optional, defaults to gpt-4.1-mini
        batch_size=64,
        max_concurrency=8           # Concurrent requests PER EXECUTOR
    )
)

spark.udf.register(
    "classify_intent",
    responses_udf_from_task(
        task=customer_support.INTENT_ANALYSIS,
        model_name="gpt-4.1-mini",  # Optional, defaults to gpt-4.1-mini
        batch_size=32,              # Smaller batches for complex analysis
        max_concurrency=6           # Conservative for customer support tasks
    )
)

```

You can now use these UDFs in Spark SQL:

```sql
-- Create a sample table (replace with your actual table)
CREATE OR REPLACE TEMP VIEW product_names AS SELECT * FROM VALUES
  ('4414732714624', 'Cafe Mocha Smoothie (Trial Size)'),
  ('4200162318339', 'Dark Chocolate Tea (New Product)'),
  ('4920122084098', 'Uji Matcha Tea (New Product)')
AS product_names(id, product_name);

-- Use the registered UDFs (including pre-configured tasks)
SELECT
    id,
    product_name,
    parse_flavor(product_name) AS flavor,
    translate_struct(product_name) AS translation,
    analyze_sentiment(product_name).sentiment AS sentiment,
    analyze_sentiment(product_name).confidence AS sentiment_confidence,
    classify_intent(product_name).primary_intent AS intent,
    classify_intent(product_name).action_required AS action_required,
    embed_text(product_name) AS embedding,
    count_tokens(product_name) AS token_count
FROM product_names;
```

Example Output (structure might vary slightly):

| id            | product_name                      | flavor    | translation              | sentiment | sentiment_confidence | intent      | action_required    | embedding           | token_count |
|---------------|-----------------------------------|-----------|--------------------------|-----------|---------------------|-------------|--------------------|---------------------|-------------|
| 4414732714624 | Cafe Mocha Smoothie (Trial Size)  | Mocha     | {en: ..., fr: ..., ja: ...} | positive  | 0.92               | seek_information | provide_information | [0.1, -0.2, ..., 0.5] | 8           |
| 4200162318339 | Dark Chocolate Tea (New Product)  | Chocolate | {en: ..., fr: ..., ja: ...} | neutral   | 0.87               | seek_information | provide_information | [-0.3, 0.1, ..., -0.1] | 7           |
| 4920122084098 | Uji Matcha Tea (New Product)      | Matcha    | {en: ..., fr: ..., ja: ...} | positive  | 0.89               | seek_information | provide_information | [0.0, 0.4, ..., 0.2] | 8           |

### Spark Performance Tuning

When using openaivec with Spark, proper configuration of `batch_size` and `max_concurrency` is crucial for optimal performance:

**`batch_size`** (default: 128):
- Controls how many rows are processed together in each API request within a partition
- **Larger values**: Fewer API calls per partition, reduced overhead
- **Smaller values**: More granular processing, better memory management
- **Recommendation**: 32-128 depending on data complexity and partition size

**`max_concurrency`** (default: 8):
- **Important**: This is the number of concurrent API requests **PER EXECUTOR**
- Total cluster concurrency = `max_concurrency × number_of_executors`
- **Higher values**: Faster processing but may overwhelm API rate limits
- **Lower values**: More conservative, better for shared API quotas
- **Recommendation**: 4-12 per executor, considering your OpenAI tier limits

**Example for a 10-executor cluster:**
```python
# With max_concurrency=8, total cluster concurrency = 8 × 10 = 80 concurrent requests
spark.udf.register(
    "analyze_sentiment",
    resp_builder.build(
        instructions="Analyze sentiment as positive/negative/neutral",
        batch_size=64,        # Good balance for most use cases
        max_concurrency=8     # 80 total concurrent requests across cluster
    )
)
```

**Monitoring and Scaling:**
- Monitor OpenAI API rate limits and adjust `max_concurrency` accordingly
- Use Spark UI to optimize partition sizes and executor configurations
- Consider your OpenAI tier limits when scaling clusters

## Building Prompts

Building prompt is a crucial step in using LLMs.
In particular, providing a few examples in a prompt can significantly improve an LLM’s performance,
a technique known as "few-shot learning." Typically, a few-shot prompt consists of a purpose, cautions,
and examples.

📓 **[Advanced prompting techniques →](https://microsoft.github.io/openaivec/examples/prompt/)**

The `FewShotPromptBuilder` helps you create structured, high-quality prompts with examples, cautions, and automatic improvement.

### Basic Usage

`FewShotPromptBuilder` requires simply a purpose, cautions, and examples, and `build` method will
return rendered prompt with XML format.

Here is an example:

```python
from openaivec.prompt import FewShotPromptBuilder

prompt: str = (
    FewShotPromptBuilder()
    .purpose("Return the smallest category that includes the given word")
    .caution("Never use proper nouns as categories")
    .example("Apple", "Fruit")
    .example("Car", "Vehicle")
    .example("Tokyo", "City")
    .example("Keiichi Sogabe", "Musician")
    .example("America", "Country")
    .build()
)
print(prompt)
```

The output will be:

```xml

<Prompt>
    <Purpose>Return the smallest category that includes the given word</Purpose>
    <Cautions>
        <Caution>Never use proper nouns as categories</Caution>
    </Cautions>
    <Examples>
        <Example>
            <Input>Apple</Input>
            <Output>Fruit</Output>
        </Example>
        <Example>
            <Input>Car</Input>
            <Output>Vehicle</Output>
        </Example>
        <Example>
            <Input>Tokyo</Input>
            <Output>City</Output>
        </Example>
        <Example>
            <Input>Keiichi Sogabe</Input>
            <Output>Musician</Output>
        </Example>
        <Example>
            <Input>America</Input>
            <Output>Country</Output>
        </Example>
    </Examples>
</Prompt>
```

### Improve with OpenAI

For most users, it can be challenging to write a prompt entirely free of contradictions, ambiguities, or
redundancies.
`FewShotPromptBuilder` provides an `improve` method to refine your prompt using OpenAI's API.

`improve` method will try to eliminate contradictions, ambiguities, and redundancies in the prompt with OpenAI's API,
and iterate the process up to `max_iter` times.

Here is an example:

```python
from openai import OpenAI
from openaivec.prompt import FewShotPromptBuilder

client = OpenAI(...)
model_name = "<your-model-name>"
improved_prompt: str = (
    FewShotPromptBuilder()
    .purpose("Return the smallest category that includes the given word")
    .caution("Never use proper nouns as categories")
    # Examples which has contradictions, ambiguities, or redundancies
    .example("Apple", "Fruit")
    .example("Apple", "Technology")
    .example("Apple", "Company")
    .example("Apple", "Color")
    .example("Apple", "Animal")
    # improve the prompt with OpenAI's API
    .improve(client, model_name)
    .build()
)
print(improved_prompt)
```

Then we will get the improved prompt with extra examples, improved purpose, and cautions:

```xml
<Prompt>
    <Purpose>Classify a given word into its most relevant category by considering its context and potential meanings.
        The input is a word accompanied by context, and the output is the appropriate category based on that context.
        This is useful for disambiguating words with multiple meanings, ensuring accurate understanding and
        categorization.
    </Purpose>
    <Cautions>
        <Caution>Ensure the context of the word is clear to avoid incorrect categorization.</Caution>
        <Caution>Be aware of words with multiple meanings and provide the most relevant category.</Caution>
        <Caution>Consider the possibility of new or uncommon contexts that may not fit traditional categories.</Caution>
    </Cautions>
    <Examples>
        <Example>
            <Input>Apple (as a fruit)</Input>
            <Output>Fruit</Output>
        </Example>
        <Example>
            <Input>Apple (as a tech company)</Input>
            <Output>Technology</Output>
        </Example>
        <Example>
            <Input>Java (as a programming language)</Input>
            <Output>Technology</Output>
        </Example>
        <Example>
            <Input>Java (as an island)</Input>
            <Output>Geography</Output>
        </Example>
        <Example>
            <Input>Mercury (as a planet)</Input>
            <Output>Astronomy</Output>
        </Example>
        <Example>
            <Input>Mercury (as an element)</Input>
            <Output>Chemistry</Output>
        </Example>
        <Example>
            <Input>Bark (as a sound made by a dog)</Input>
            <Output>Animal Behavior</Output>
        </Example>
        <Example>
            <Input>Bark (as the outer covering of a tree)</Input>
            <Output>Botany</Output>
        </Example>
        <Example>
            <Input>Bass (as a type of fish)</Input>
            <Output>Aquatic Life</Output>
        </Example>
        <Example>
            <Input>Bass (as a low-frequency sound)</Input>
            <Output>Music</Output>
        </Example>
    </Examples>
</Prompt>
```

## Using with Microsoft Fabric

[Microsoft Fabric](https://www.microsoft.com/en-us/microsoft-fabric/) is a unified, cloud-based analytics platform that
seamlessly integrates data engineering, warehousing, and business intelligence to simplify the journey from raw data to
actionable insights.

This section provides instructions on how to integrate and use `openaivec` within Microsoft Fabric. Follow these
steps:

1. **Create an Environment in Microsoft Fabric:**

   - In Microsoft Fabric, click on **New item** in your workspace.
   - Select **Environment** to create a new environment for Apache Spark.
   - Determine the environment name, eg. `openai-environment`.
   - ![image](https://github.com/user-attachments/assets/bd1754ef-2f58-46b4-83ed-b335b64aaa1c)
     _Figure: Creating a new Environment in Microsoft Fabric._

2. **Add `openaivec` to the Environment from Public Library**

   - Once your environment is set up, go to the **Custom Library** section within that environment.
   - Click on **Add from PyPI** and search for latest version of `openaivec`.
   - Save and publish to reflect the changes.
   - ![image](https://github.com/user-attachments/assets/7b6320db-d9d6-4b89-a49d-e55b1489d1ae)
     _Figure: Add `openaivec` from PyPI to Public Library_

3. **Use the Environment from a Notebook:**

   - Open a notebook within Microsoft Fabric.
   - Select the environment you created in the previous steps.
   - ![image](https://github.com/user-attachments/assets/2457c078-1691-461b-b66e-accc3989e419)
     _Figure: Using custom environment from a notebook._
   - In the notebook, import and use `openaivec.spark` functions as you normally would. For example:

     ```python
     import os
     from pyspark.sql import SparkSession
     from openaivec.spark import responses_udf, embeddings_udf
     
     spark = SparkSession.builder.getOrCreate()
     sc = spark.sparkContext
     
     # Configure Azure OpenAI authentication
     sc.environment["AZURE_OPENAI_API_KEY"] = "<your-api-key>"
     sc.environment["AZURE_OPENAI_API_ENDPOINT"] = "https://<your-resource-name>.openai.azure.com"
     sc.environment["AZURE_OPENAI_API_VERSION"] = "2024-10-21"
     
     # Register UDFs
     spark.udf.register(
         "analyze_text",
         responses_udf(
             instructions="Analyze the sentiment of the text",
             model_name="<your-deployment-name>"
         )
     )
     ```

Following these steps allows you to successfully integrate and use `openaivec` within Microsoft Fabric.

## Contributing

We welcome contributions to this project! If you would like to contribute, please follow these guidelines:

1. Fork the repository and create your branch from `main`.
2. If you've added code that should be tested, add tests.
3. Ensure the test suite passes.
4. Make sure your code lints.

### Installing Dependencies

To install the necessary dependencies for development, run:

```bash
uv sync --all-extras --dev
```

### Code Formatting

To reformat the code, use the following command:

```bash
uv run ruff check . --fix
```

## Additional Resources

📓 **[Customer feedback analysis →](https://microsoft.github.io/openaivec/examples/customer_analysis/)** - Sentiment analysis & prioritization  
📓 **[Survey data transformation →](https://microsoft.github.io/openaivec/examples/survey_transformation/)** - Unstructured to structured data  
📓 **[Asynchronous processing examples →](https://microsoft.github.io/openaivec/examples/aio/)** - High-performance async workflows  
📓 **[Auto-generate FAQs from documents →](https://microsoft.github.io/openaivec/examples/generate_faq/)** - Create FAQs using AI  
📓 **[All examples →](https://microsoft.github.io/openaivec/examples/)** - Complete collection of tutorials and use cases

## Community

Join our Discord community for developers: https://discord.gg/vbb83Pgn

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "openaivec",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "llm, openai, openai-api, openai-python, pandas, pyspark",
    "author": null,
    "author_email": "Hiroki Mizukami <hmizukami@microsoft.com>",
    "download_url": "https://files.pythonhosted.org/packages/53/32/876208ac9c10ac6e0cab2fd96047b2e620099272c2fdc6bbc679665836a8/openaivec-0.12.1.tar.gz",
    "platform": null,
    "description": "# openaivec\n\n**Transform your data analysis with AI-powered text processing at scale.**\n\n**openaivec** enables data analysts to seamlessly integrate OpenAI's language models into their pandas and Spark workflows. Process thousands of text records with natural language instructions, turning unstructured data into actionable insights with just a few lines of code.\n\n## \ud83d\ude80 Quick Start: From Text to Insights in Seconds\n\nImagine analyzing 10,000 customer reviews. Instead of manual work, just write:\n\n```python\nimport pandas as pd\nfrom openaivec import pandas_ext\n\n# Your data\nreviews = pd.DataFrame({\n    \"review\": [\"Great product, fast delivery!\", \"Terrible quality, very disappointed\", ...]\n})\n\n# AI-powered analysis in one line\nresults = reviews.assign(\n    sentiment=lambda df: df.review.ai.responses(\"Classify sentiment: positive/negative/neutral\"),\n    issues=lambda df: df.review.ai.responses(\"Extract main issues or compliments\"),\n    priority=lambda df: df.review.ai.responses(\"Priority for follow-up: low/medium/high\")\n)\n```\n\n**Result**: Thousands of reviews classified and analyzed in minutes, not days.\n\n\ud83d\udcd3 **[Try it yourself \u2192](https://microsoft.github.io/openaivec/examples/pandas/)**\n\n## \ud83d\udca1 Real-World Impact\n\n### Customer Feedback Analysis\n```python\n# Process 50,000 support tickets automatically\ntickets.assign(\n    category=lambda df: df.description.ai.responses(\"Categorize: billing/technical/feature_request\"),\n    urgency=lambda df: df.description.ai.responses(\"Urgency level: low/medium/high/critical\"),\n    solution_type=lambda df: df.description.ai.responses(\"Best resolution approach\")\n)\n```\n\n### Market Research at Scale\n```python\n# Analyze multilingual social media data\nsocial_data.assign(\n    english_text=lambda df: df.post.ai.responses(\"Translate to English\"),\n    brand_mention=lambda df: df.english_text.ai.responses(\"Extract brand mentions and sentiment\"),\n    market_trend=lambda df: df.english_text.ai.responses(\"Identify emerging trends or concerns\")\n)\n```\n\n### Survey Data Transformation\n```python\n# Convert free-text responses to structured data\nfrom pydantic import BaseModel\n\nclass Demographics(BaseModel):\n    age_group: str\n    location: str\n    interests: list[str]\n\nsurvey_responses.assign(\n    structured=lambda df: df.response.ai.responses(\n        \"Extract demographics as structured data\", \n        response_format=Demographics\n    )\n).ai.extract(\"structured\")  # Auto-expands to columns\n```\n\n\ud83d\udcd3 **[See more examples \u2192](https://microsoft.github.io/openaivec/examples/)**\n\n# Overview\n\nThis package provides a vectorized interface for the OpenAI API, enabling you to process multiple inputs with a single\nAPI call instead of sending requests one by one.\nThis approach helps reduce latency and simplifies your code.\n\nAdditionally, it integrates effortlessly with Pandas DataFrames and Apache Spark UDFs, making it easy to incorporate\ninto your data processing pipelines.\n\n## Features\n\n- Vectorized API requests for processing multiple inputs at once.\n- Seamless integration with Pandas DataFrames.\n- A UDF builder for Apache Spark.\n- Compatibility with multiple OpenAI clients, including Azure OpenAI.\n\n## Key Benefits\n\n- **\ud83d\ude80 Performance**: Vectorized processing handles thousands of records in minutes, not hours\n- **\ud83d\udcb0 Cost Efficiency**: Automatic deduplication significantly reduces API costs on typical datasets  \n- **\ud83d\udd17 Integration**: Works within existing pandas/Spark workflows without architectural changes\n- **\ud83d\udcc8 Scalability**: Same API scales from exploratory analysis (100s of records) to production systems (millions of records)\n- **\ud83c\udfaf Pre-configured Tasks**: Ready-to-use task library with optimized prompts for common use cases\n- **\ud83c\udfe2 Enterprise Ready**: Microsoft Fabric integration, Apache Spark UDFs, Azure OpenAI compatibility\n\n## Requirements\n\n- Python 3.10 or higher\n\n## Installation\n\nInstall the package with:\n\n```bash\npip install openaivec\n```\n\nIf you want to uninstall the package, you can do so with:\n\n```bash\npip uninstall openaivec\n```\n\n## Basic Usage\n\n### Direct API Usage\n\nFor maximum control over batch processing:\n\n```python\nimport os\nfrom openai import OpenAI\nfrom openaivec import BatchResponses\n\n# Initialize the batch client\nclient = BatchResponses(\n    client=OpenAI(),\n    model_name=\"gpt-4.1-mini\",\n    system_message=\"Please answer only with 'xx family' and do not output anything else.\"\n)\n\nresult = client.parse([\"panda\", \"rabbit\", \"koala\"], batch_size=32)\nprint(result)  # Expected output: ['bear family', 'rabbit family', 'koala family']\n```\n\n\ud83d\udcd3 **[Complete tutorial \u2192](https://microsoft.github.io/openaivec/examples/pandas/)**\n\n### Pandas Integration (Recommended)\n\nThe easiest way to get started with your DataFrames:\n\n```python\nimport pandas as pd\nfrom openaivec import pandas_ext\n\n# Setup (optional - uses OPENAI_API_KEY environment variable by default)\npandas_ext.responses_model(\"gpt-4.1-mini\")\n\n# Create your data\ndf = pd.DataFrame({\"name\": [\"panda\", \"rabbit\", \"koala\"]})\n\n# Add AI-powered columns\nresult = df.assign(\n    family=lambda df: df.name.ai.responses(\"What animal family? Answer with 'X family'\"),\n    habitat=lambda df: df.name.ai.responses(\"Primary habitat in one word\"),\n    fun_fact=lambda df: df.name.ai.responses(\"One interesting fact in 10 words or less\")\n)\n```\n\n| name   | family        | habitat | fun_fact                    |\n|--------|---------------|---------|-----------------------------|\n| panda  | bear family   | forest  | Eats bamboo 14 hours daily  |\n| rabbit | rabbit family | meadow  | Can see nearly 360 degrees  |\n| koala  | marsupial family | tree   | Sleeps 22 hours per day    |\n\n\ud83d\udcd3 **[Interactive pandas examples \u2192](https://microsoft.github.io/openaivec/examples/pandas/)**\n\n### Using Pre-configured Tasks\n\nFor common text processing operations, openaivec provides ready-to-use tasks that eliminate the need to write custom prompts:\n\n```python\nfrom openaivec.task import nlp, customer_support\n\n# Text analysis with pre-configured tasks\ntext_df = pd.DataFrame({\n    \"text\": [\n        \"Great product, fast delivery!\",\n        \"Need help with billing issue\",\n        \"How do I reset my password?\"\n    ]\n})\n\n# Use pre-configured tasks for consistent, optimized results\nresults = text_df.assign(\n    sentiment=lambda df: df.text.ai.task(nlp.SENTIMENT_ANALYSIS),\n    entities=lambda df: df.text.ai.task(nlp.NAMED_ENTITY_RECOGNITION),\n    intent=lambda df: df.text.ai.task(customer_support.INTENT_ANALYSIS),\n    urgency=lambda df: df.text.ai.task(customer_support.URGENCY_ANALYSIS)\n)\n\n# Extract structured results into separate columns (one at a time)\nextracted_results = (results\n    .ai.extract(\"sentiment\")\n    .ai.extract(\"entities\") \n    .ai.extract(\"intent\")\n    .ai.extract(\"urgency\")\n)\n```\n\n**Available Task Categories:**\n- **Text Analysis**: `nlp.SENTIMENT_ANALYSIS`, `nlp.TRANSLATION`, `nlp.NAMED_ENTITY_RECOGNITION`, `nlp.KEYWORD_EXTRACTION`\n- **Content Classification**: `customer_support.INTENT_ANALYSIS`, `customer_support.URGENCY_ANALYSIS`, `customer_support.INQUIRY_CLASSIFICATION`\n\n**Benefits of Pre-configured Tasks:**\n- Optimized prompts tested across diverse datasets\n- Consistent structured outputs with Pydantic validation\n- Multilingual support with standardized categorical fields\n- Extensible framework for adding domain-specific tasks\n- Direct compatibility with Spark UDFs\n\n### Asynchronous Processing with `.aio`\n\nFor high-performance concurrent processing, use the `.aio` accessor which provides asynchronous versions of all AI operations:\n\n```python\nimport asyncio\nimport pandas as pd\nfrom openaivec import pandas_ext\n\n# Setup (same as synchronous version)\npandas_ext.responses_model(\"gpt-4.1-mini\")\n\ndf = pd.DataFrame({\"text\": [\n    \"This product is amazing!\",\n    \"Terrible customer service\",\n    \"Good value for money\",\n    \"Not what I expected\"\n] * 250})  # 1000 rows for demonstration\n\nasync def process_data():\n    # Asynchronous processing with fine-tuned concurrency control\n    results = await df[\"text\"].aio.responses(\n        \"Analyze sentiment and classify as positive/negative/neutral\",\n        batch_size=64,        # Process 64 items per API request\n        max_concurrency=12    # Allow up to 12 concurrent requests\n    )\n    return results\n\n# Run the async operation\nsentiments = asyncio.run(process_data())\n```\n\n**Key Parameters for Performance Tuning:**\n\n- **`batch_size`** (default: 128): Controls how many inputs are grouped into a single API request. Higher values reduce API call overhead but increase memory usage and request processing time.\n- **`max_concurrency`** (default: 8): Limits the number of concurrent API requests. Higher values increase throughput but may hit rate limits or overwhelm the API.\n\n**Performance Benefits:**\n- Process thousands of records in parallel\n- Automatic request batching and deduplication\n- Built-in rate limiting and error handling\n- Memory-efficient streaming for large datasets\n\n## Using with Apache Spark UDFs\n\nScale to enterprise datasets with distributed processing:\n\n\ud83d\udcd3 **[Complete Spark tutorial \u2192](https://microsoft.github.io/openaivec/examples/spark/)**\n\nFirst, obtain a Spark session and configure authentication:\n\n```python\nimport os\nfrom pyspark.sql import SparkSession\n\nspark = SparkSession.builder.getOrCreate()\nsc = spark.sparkContext\n\n# Configure authentication via SparkContext environment variables\n# Option 1: Using OpenAI\nsc.environment[\"OPENAI_API_KEY\"] = os.environ.get(\"OPENAI_API_KEY\")\n\n# Option 2: Using Azure OpenAI\n# sc.environment[\"AZURE_OPENAI_API_KEY\"] = os.environ.get(\"AZURE_OPENAI_API_KEY\")\n# sc.environment[\"AZURE_OPENAI_API_ENDPOINT\"] = os.environ.get(\"AZURE_OPENAI_API_ENDPOINT\")\n# sc.environment[\"AZURE_OPENAI_API_VERSION\"] = os.environ.get(\"AZURE_OPENAI_API_VERSION\")\n```\n\nNext, create and register UDFs using the provided functions:\n\n```python\nfrom openaivec.spark import responses_udf, responses_udf_from_task, embeddings_udf, count_tokens_udf\nfrom pydantic import BaseModel\n\n# --- Register Responses UDF (String Output) ---\nspark.udf.register(\n    \"parse_flavor\",\n    responses_udf(\n        instructions=\"Extract flavor-related information. Return only the concise flavor name.\",\n        response_format=str,        # Specify string output\n        model_name=\"gpt-4.1-mini\",  # Optional, defaults to gpt-4.1-mini\n        batch_size=64,              # Optimize for Spark partition sizes\n        max_concurrency=4           # Conservative for distributed processing\n    )\n)\n\n# --- Register Responses UDF (Structured Output with Pydantic) ---\nclass Translation(BaseModel):\n    en: str\n    fr: str\n    ja: str\n\nspark.udf.register(\n    \"translate_struct\",\n    responses_udf(\n        instructions=\"Translate the text to English, French, and Japanese.\",\n        response_format=Translation,    # Specify Pydantic model for structured output\n        model_name=\"gpt-4.1-mini\",      # Optional, defaults to gpt-4.1-mini\n        batch_size=32,                  # Smaller batches for complex structured outputs\n        max_concurrency=6               # Concurrent requests PER EXECUTOR\n    )\n)\n\n# --- Register Embeddings UDF ---\nspark.udf.register(\n    \"embed_text\",\n    embeddings_udf(\n        model_name=\"text-embedding-3-small\",  # Optional, defaults to text-embedding-3-small\n        batch_size=128,                       # Larger batches for embeddings\n        max_concurrency=8                     # Concurrent requests PER EXECUTOR\n    )\n)\n\n# --- Register Token Counting UDF ---\nspark.udf.register(\"count_tokens\", count_tokens_udf(\"gpt-4o\"))\n\n# --- Register UDFs with Pre-configured Tasks ---\nfrom openaivec.task import nlp, customer_support\n\nspark.udf.register(\n    \"analyze_sentiment\",\n    responses_udf_from_task(\n        task=nlp.SENTIMENT_ANALYSIS,\n        model_name=\"gpt-4.1-mini\",  # Optional, defaults to gpt-4.1-mini\n        batch_size=64,\n        max_concurrency=8           # Concurrent requests PER EXECUTOR\n    )\n)\n\nspark.udf.register(\n    \"classify_intent\",\n    responses_udf_from_task(\n        task=customer_support.INTENT_ANALYSIS,\n        model_name=\"gpt-4.1-mini\",  # Optional, defaults to gpt-4.1-mini\n        batch_size=32,              # Smaller batches for complex analysis\n        max_concurrency=6           # Conservative for customer support tasks\n    )\n)\n\n```\n\nYou can now use these UDFs in Spark SQL:\n\n```sql\n-- Create a sample table (replace with your actual table)\nCREATE OR REPLACE TEMP VIEW product_names AS SELECT * FROM VALUES\n  ('4414732714624', 'Cafe Mocha Smoothie (Trial Size)'),\n  ('4200162318339', 'Dark Chocolate Tea (New Product)'),\n  ('4920122084098', 'Uji Matcha Tea (New Product)')\nAS product_names(id, product_name);\n\n-- Use the registered UDFs (including pre-configured tasks)\nSELECT\n    id,\n    product_name,\n    parse_flavor(product_name) AS flavor,\n    translate_struct(product_name) AS translation,\n    analyze_sentiment(product_name).sentiment AS sentiment,\n    analyze_sentiment(product_name).confidence AS sentiment_confidence,\n    classify_intent(product_name).primary_intent AS intent,\n    classify_intent(product_name).action_required AS action_required,\n    embed_text(product_name) AS embedding,\n    count_tokens(product_name) AS token_count\nFROM product_names;\n```\n\nExample Output (structure might vary slightly):\n\n| id            | product_name                      | flavor    | translation              | sentiment | sentiment_confidence | intent      | action_required    | embedding           | token_count |\n|---------------|-----------------------------------|-----------|--------------------------|-----------|---------------------|-------------|--------------------|---------------------|-------------|\n| 4414732714624 | Cafe Mocha Smoothie (Trial Size)  | Mocha     | {en: ..., fr: ..., ja: ...} | positive  | 0.92               | seek_information | provide_information | [0.1, -0.2, ..., 0.5] | 8           |\n| 4200162318339 | Dark Chocolate Tea (New Product)  | Chocolate | {en: ..., fr: ..., ja: ...} | neutral   | 0.87               | seek_information | provide_information | [-0.3, 0.1, ..., -0.1] | 7           |\n| 4920122084098 | Uji Matcha Tea (New Product)      | Matcha    | {en: ..., fr: ..., ja: ...} | positive  | 0.89               | seek_information | provide_information | [0.0, 0.4, ..., 0.2] | 8           |\n\n### Spark Performance Tuning\n\nWhen using openaivec with Spark, proper configuration of `batch_size` and `max_concurrency` is crucial for optimal performance:\n\n**`batch_size`** (default: 128):\n- Controls how many rows are processed together in each API request within a partition\n- **Larger values**: Fewer API calls per partition, reduced overhead\n- **Smaller values**: More granular processing, better memory management\n- **Recommendation**: 32-128 depending on data complexity and partition size\n\n**`max_concurrency`** (default: 8):\n- **Important**: This is the number of concurrent API requests **PER EXECUTOR**\n- Total cluster concurrency = `max_concurrency \u00d7 number_of_executors`\n- **Higher values**: Faster processing but may overwhelm API rate limits\n- **Lower values**: More conservative, better for shared API quotas\n- **Recommendation**: 4-12 per executor, considering your OpenAI tier limits\n\n**Example for a 10-executor cluster:**\n```python\n# With max_concurrency=8, total cluster concurrency = 8 \u00d7 10 = 80 concurrent requests\nspark.udf.register(\n    \"analyze_sentiment\",\n    resp_builder.build(\n        instructions=\"Analyze sentiment as positive/negative/neutral\",\n        batch_size=64,        # Good balance for most use cases\n        max_concurrency=8     # 80 total concurrent requests across cluster\n    )\n)\n```\n\n**Monitoring and Scaling:**\n- Monitor OpenAI API rate limits and adjust `max_concurrency` accordingly\n- Use Spark UI to optimize partition sizes and executor configurations\n- Consider your OpenAI tier limits when scaling clusters\n\n## Building Prompts\n\nBuilding prompt is a crucial step in using LLMs.\nIn particular, providing a few examples in a prompt can significantly improve an LLM\u2019s performance,\na technique known as \"few-shot learning.\" Typically, a few-shot prompt consists of a purpose, cautions,\nand examples.\n\n\ud83d\udcd3 **[Advanced prompting techniques \u2192](https://microsoft.github.io/openaivec/examples/prompt/)**\n\nThe `FewShotPromptBuilder` helps you create structured, high-quality prompts with examples, cautions, and automatic improvement.\n\n### Basic Usage\n\n`FewShotPromptBuilder` requires simply a purpose, cautions, and examples, and `build` method will\nreturn rendered prompt with XML format.\n\nHere is an example:\n\n```python\nfrom openaivec.prompt import FewShotPromptBuilder\n\nprompt: str = (\n    FewShotPromptBuilder()\n    .purpose(\"Return the smallest category that includes the given word\")\n    .caution(\"Never use proper nouns as categories\")\n    .example(\"Apple\", \"Fruit\")\n    .example(\"Car\", \"Vehicle\")\n    .example(\"Tokyo\", \"City\")\n    .example(\"Keiichi Sogabe\", \"Musician\")\n    .example(\"America\", \"Country\")\n    .build()\n)\nprint(prompt)\n```\n\nThe output will be:\n\n```xml\n\n<Prompt>\n    <Purpose>Return the smallest category that includes the given word</Purpose>\n    <Cautions>\n        <Caution>Never use proper nouns as categories</Caution>\n    </Cautions>\n    <Examples>\n        <Example>\n            <Input>Apple</Input>\n            <Output>Fruit</Output>\n        </Example>\n        <Example>\n            <Input>Car</Input>\n            <Output>Vehicle</Output>\n        </Example>\n        <Example>\n            <Input>Tokyo</Input>\n            <Output>City</Output>\n        </Example>\n        <Example>\n            <Input>Keiichi Sogabe</Input>\n            <Output>Musician</Output>\n        </Example>\n        <Example>\n            <Input>America</Input>\n            <Output>Country</Output>\n        </Example>\n    </Examples>\n</Prompt>\n```\n\n### Improve with OpenAI\n\nFor most users, it can be challenging to write a prompt entirely free of contradictions, ambiguities, or\nredundancies.\n`FewShotPromptBuilder` provides an `improve` method to refine your prompt using OpenAI's API.\n\n`improve` method will try to eliminate contradictions, ambiguities, and redundancies in the prompt with OpenAI's API,\nand iterate the process up to `max_iter` times.\n\nHere is an example:\n\n```python\nfrom openai import OpenAI\nfrom openaivec.prompt import FewShotPromptBuilder\n\nclient = OpenAI(...)\nmodel_name = \"<your-model-name>\"\nimproved_prompt: str = (\n    FewShotPromptBuilder()\n    .purpose(\"Return the smallest category that includes the given word\")\n    .caution(\"Never use proper nouns as categories\")\n    # Examples which has contradictions, ambiguities, or redundancies\n    .example(\"Apple\", \"Fruit\")\n    .example(\"Apple\", \"Technology\")\n    .example(\"Apple\", \"Company\")\n    .example(\"Apple\", \"Color\")\n    .example(\"Apple\", \"Animal\")\n    # improve the prompt with OpenAI's API\n    .improve(client, model_name)\n    .build()\n)\nprint(improved_prompt)\n```\n\nThen we will get the improved prompt with extra examples, improved purpose, and cautions:\n\n```xml\n<Prompt>\n    <Purpose>Classify a given word into its most relevant category by considering its context and potential meanings.\n        The input is a word accompanied by context, and the output is the appropriate category based on that context.\n        This is useful for disambiguating words with multiple meanings, ensuring accurate understanding and\n        categorization.\n    </Purpose>\n    <Cautions>\n        <Caution>Ensure the context of the word is clear to avoid incorrect categorization.</Caution>\n        <Caution>Be aware of words with multiple meanings and provide the most relevant category.</Caution>\n        <Caution>Consider the possibility of new or uncommon contexts that may not fit traditional categories.</Caution>\n    </Cautions>\n    <Examples>\n        <Example>\n            <Input>Apple (as a fruit)</Input>\n            <Output>Fruit</Output>\n        </Example>\n        <Example>\n            <Input>Apple (as a tech company)</Input>\n            <Output>Technology</Output>\n        </Example>\n        <Example>\n            <Input>Java (as a programming language)</Input>\n            <Output>Technology</Output>\n        </Example>\n        <Example>\n            <Input>Java (as an island)</Input>\n            <Output>Geography</Output>\n        </Example>\n        <Example>\n            <Input>Mercury (as a planet)</Input>\n            <Output>Astronomy</Output>\n        </Example>\n        <Example>\n            <Input>Mercury (as an element)</Input>\n            <Output>Chemistry</Output>\n        </Example>\n        <Example>\n            <Input>Bark (as a sound made by a dog)</Input>\n            <Output>Animal Behavior</Output>\n        </Example>\n        <Example>\n            <Input>Bark (as the outer covering of a tree)</Input>\n            <Output>Botany</Output>\n        </Example>\n        <Example>\n            <Input>Bass (as a type of fish)</Input>\n            <Output>Aquatic Life</Output>\n        </Example>\n        <Example>\n            <Input>Bass (as a low-frequency sound)</Input>\n            <Output>Music</Output>\n        </Example>\n    </Examples>\n</Prompt>\n```\n\n## Using with Microsoft Fabric\n\n[Microsoft Fabric](https://www.microsoft.com/en-us/microsoft-fabric/) is a unified, cloud-based analytics platform that\nseamlessly integrates data engineering, warehousing, and business intelligence to simplify the journey from raw data to\nactionable insights.\n\nThis section provides instructions on how to integrate and use `openaivec` within Microsoft Fabric. Follow these\nsteps:\n\n1. **Create an Environment in Microsoft Fabric:**\n\n   - In Microsoft Fabric, click on **New item** in your workspace.\n   - Select **Environment** to create a new environment for Apache Spark.\n   - Determine the environment name, eg. `openai-environment`.\n   - ![image](https://github.com/user-attachments/assets/bd1754ef-2f58-46b4-83ed-b335b64aaa1c)\n     _Figure: Creating a new Environment in Microsoft Fabric._\n\n2. **Add `openaivec` to the Environment from Public Library**\n\n   - Once your environment is set up, go to the **Custom Library** section within that environment.\n   - Click on **Add from PyPI** and search for latest version of `openaivec`.\n   - Save and publish to reflect the changes.\n   - ![image](https://github.com/user-attachments/assets/7b6320db-d9d6-4b89-a49d-e55b1489d1ae)\n     _Figure: Add `openaivec` from PyPI to Public Library_\n\n3. **Use the Environment from a Notebook:**\n\n   - Open a notebook within Microsoft Fabric.\n   - Select the environment you created in the previous steps.\n   - ![image](https://github.com/user-attachments/assets/2457c078-1691-461b-b66e-accc3989e419)\n     _Figure: Using custom environment from a notebook._\n   - In the notebook, import and use `openaivec.spark` functions as you normally would. For example:\n\n     ```python\n     import os\n     from pyspark.sql import SparkSession\n     from openaivec.spark import responses_udf, embeddings_udf\n     \n     spark = SparkSession.builder.getOrCreate()\n     sc = spark.sparkContext\n     \n     # Configure Azure OpenAI authentication\n     sc.environment[\"AZURE_OPENAI_API_KEY\"] = \"<your-api-key>\"\n     sc.environment[\"AZURE_OPENAI_API_ENDPOINT\"] = \"https://<your-resource-name>.openai.azure.com\"\n     sc.environment[\"AZURE_OPENAI_API_VERSION\"] = \"2024-10-21\"\n     \n     # Register UDFs\n     spark.udf.register(\n         \"analyze_text\",\n         responses_udf(\n             instructions=\"Analyze the sentiment of the text\",\n             model_name=\"<your-deployment-name>\"\n         )\n     )\n     ```\n\nFollowing these steps allows you to successfully integrate and use `openaivec` within Microsoft Fabric.\n\n## Contributing\n\nWe welcome contributions to this project! If you would like to contribute, please follow these guidelines:\n\n1. Fork the repository and create your branch from `main`.\n2. If you've added code that should be tested, add tests.\n3. Ensure the test suite passes.\n4. Make sure your code lints.\n\n### Installing Dependencies\n\nTo install the necessary dependencies for development, run:\n\n```bash\nuv sync --all-extras --dev\n```\n\n### Code Formatting\n\nTo reformat the code, use the following command:\n\n```bash\nuv run ruff check . --fix\n```\n\n## Additional Resources\n\n\ud83d\udcd3 **[Customer feedback analysis \u2192](https://microsoft.github.io/openaivec/examples/customer_analysis/)** - Sentiment analysis & prioritization  \n\ud83d\udcd3 **[Survey data transformation \u2192](https://microsoft.github.io/openaivec/examples/survey_transformation/)** - Unstructured to structured data  \n\ud83d\udcd3 **[Asynchronous processing examples \u2192](https://microsoft.github.io/openaivec/examples/aio/)** - High-performance async workflows  \n\ud83d\udcd3 **[Auto-generate FAQs from documents \u2192](https://microsoft.github.io/openaivec/examples/generate_faq/)** - Create FAQs using AI  \n\ud83d\udcd3 **[All examples \u2192](https://microsoft.github.io/openaivec/examples/)** - Complete collection of tutorials and use cases\n\n## Community\n\nJoin our Discord community for developers: https://discord.gg/vbb83Pgn\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Generative mutation for tabular calculation",
    "version": "0.12.1",
    "project_urls": {
        "Homepage": "https://microsoft.github.io/openaivec/",
        "Repository": "https://github.com/microsoft/openaivec"
    },
    "split_keywords": [
        "llm",
        " openai",
        " openai-api",
        " openai-python",
        " pandas",
        " pyspark"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "544e69c5fe2253d08c381ab9656e7dcc08e02405508f7db3050861521920a2cd",
                "md5": "e8ac1c64e9707ecb415678bbd795e45e",
                "sha256": "77fe0785d4ddc37a89ba4e55fc7e4b82ef8c0ec6c9adaad1bc7b1a01487f2028"
            },
            "downloads": -1,
            "filename": "openaivec-0.12.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e8ac1c64e9707ecb415678bbd795e45e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 76406,
            "upload_time": "2025-08-06T03:28:35",
            "upload_time_iso_8601": "2025-08-06T03:28:35.191122Z",
            "url": "https://files.pythonhosted.org/packages/54/4e/69c5fe2253d08c381ab9656e7dcc08e02405508f7db3050861521920a2cd/openaivec-0.12.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5332876208ac9c10ac6e0cab2fd96047b2e620099272c2fdc6bbc679665836a8",
                "md5": "c925d4093e4d64cb3aa08083cce13075",
                "sha256": "8843372706203764b669357113565d233bc5868139987de88391357fb16c678b"
            },
            "downloads": -1,
            "filename": "openaivec-0.12.1.tar.gz",
            "has_sig": false,
            "md5_digest": "c925d4093e4d64cb3aa08083cce13075",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 206658,
            "upload_time": "2025-08-06T03:28:36",
            "upload_time_iso_8601": "2025-08-06T03:28:36.267445Z",
            "url": "https://files.pythonhosted.org/packages/53/32/876208ac9c10ac6e0cab2fd96047b2e620099272c2fdc6bbc679665836a8/openaivec-0.12.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-06 03:28:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "microsoft",
    "github_project": "openaivec",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "openaivec"
}
        
Elapsed time: 0.74534s