azure-genai-utils


Nameazure-genai-utils JSON
Version 0.0.2.16 PyPI version JSON
download
home_pagehttps://github.com/daekeun-ml/azure-genai-utils
SummaryUtility functions for Azure GenAI
upload_time2025-01-21 00:55:09
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseApache License 2.0
keywords azure genai azure-genai-utils azure-genai langchain microsoft bing langgraph
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Azure GenAI Utils

This repository contains a set of utilities for working with Azure GenAI. The utilities are written in Python and are designed to be used for Hackathons, Workshops, and other events where you need to quickly get started with Azure GenAI.

## Requirements
- Azure Subscription
- Azure AI Foundry
- Bing Search API Key
- Python 3.8 or later
- `.env` file: Please do not forget to modify the `.env` file to match your account. Rename `.env.sample` to `.env` or copy and use it
    ```bash
    AZURE_OPENAI_ENDPOINT=xxxxx
    AZURE_OPENAI_API_KEY=xxxxx
    OPENAI_API_VERSION=2024-12-01-preview
    AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o-mini

    # Optinoal, but required for LangChain
    LANGCHAIN_TRACING_V2=false
    LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
    LANGCHAIN_API_KEY=xxxxx
    LANGCHAIN_PROJECT="YOUR-PROJECT"
    ```

## Installation

### PyPI
- `pip install azure-genai-utils`

### From Source
- `python setup.py install`

## Usage 

### Azure OpenAI Test
<details markdown="block">
<summary>Expand</summary>

```python
from azure_genai_utils.aoai import AOAI
aoai = AOAI()
aoai.test_api_call()
```
</details>

### PDF RAG Chain
<details markdown="block">
<summary>Expand</summary>

```python
from azure_genai_utils.rag.pdf import PDFRetrievalChain

pdf_path = "[YOUR-PDF-PATH]"

pdf = PDFRetrievalChain(
    source_uri=[pdf_path],
    loader_type="PDFPlumber",
    model_name="gpt-4o-mini",
    embedding_name="text-embedding-3-large",
    chunk_size=500,
    chunk_overlap=50,
).create_chain()

question = "[YOUR-QUESTION]"
docs = pdf.retriever.invoke(question)
results = pdf.chain.invoke({"chat_history": "", "question": question, "context": docs})
```
</details>

### Bing Search
Please make sure to set the following environment variables in your `.env` file:
```bash
BING_SUBSCRIPTION_KEY=xxxxx
```

<details markdown="block">
<summary>Expand</summary>

```python
from azure_genai_utils.tools import BingSearch
from dotenv import load_dotenv

# You need to add BING_SUBSCRIPTION_KEY=xxxx in .env file
load_dotenv()

# Basic usage
bing = BingSearch(max_results=2, locale="ko-KR")
results = bing.invoke("Microsoft AutoGen")
print(results)

## Include news search results and format output
bing = BingSearch(
    max_results=2,
    locale="ko-KR",
    include_news=True,
    include_entity=False,
    format_output=True,
)
results = bing.invoke("Microsoft AutoGen")
print(results)
```
</details>

### LangGraph Example (Bing Search + Azure GenAI)
<details markdown="block">
<summary>Expand</summary>

```python
import json
from typing import Annotated
from typing_extensions import TypedDict
from langchain_openai import AzureChatOpenAI
from langchain_core.messages import ToolMessage
from langgraph.graph.message import add_messages
from langgraph.graph import StateGraph
from langgraph.prebuilt import ToolNode
from langgraph.graph import START, END
from azure_genai_utils.tools import BingSearch
from dotenv import load_dotenv

load_dotenv()

class State(TypedDict):
    messages: Annotated[list, add_messages]

llm = AzureChatOpenAI(model="gpt-4o-mini")
tool = BingSearch(max_results=3, format_output=False)
tools = [tool]
llm_with_tools = llm.bind_tools(tools)

def chatbot(state: State):
    answer = llm_with_tools.invoke(state["messages"])
    return {"messages": [answer]}

def route_tools(
    state: State,
):
    if messages := state.get("messages", []):
        ai_message = messages[-1]
    else:
        raise ValueError(f"No messages found in input state to tool_edge: {state}")

    if hasattr(ai_message, "tool_calls") and len(ai_message.tool_calls) > 0:
        return "tools"

    return END

graph_builder = StateGraph(State)
graph_builder.add_node("chatbot", chatbot)
tool_node = ToolNode(tools=[tool])
graph_builder.add_node("tools", tool_node)

graph_builder.add_conditional_edges(
    source="chatbot",
    path=route_tools,
    path_map={"tools": "tools", END: END},
)

graph_builder.add_edge("tools", "chatbot")
graph_builder.add_edge(START, "chatbot")
graph = graph_builder.compile()

# Test
inputs = {"messages": "Microsoft AutoGen"}

for event in graph.stream(inputs, stream_mode="values"):
    for key, value in event.items():
        print(f"\n==============\nSTEP: {key}\n==============\n")
        print(value[-1])
```
</details>

### Synthetic Data Generation
<details markdown="block">
<summary>Expand</summary>

```python
from azure_genai_utils.synthetic import (
    QADataGenerator,
    CustomQADataGenerator,
    QAType,
    generate_qas,
)

input_batch = [
    "The quick brown fox jumps over the lazy dog.",
    "What is the capital of France?",
]

model_config = {
    "deployment": "gpt-4o-mini",
    "model": "gpt-4o-mini",
    "max_tokens": 256,
}

try:
    qa_generator = QADataGenerator(model_config=model_config)
    # qa_generator = CustomQADataGenerator(
    #     model_config=model_config, templates_dir=f"./azure_genai_utils/synthetic/prompt_templates/ko"
    # )
    task = generate_qas(
        input_texts=input_batch,
        qa_generator=qa_generator,
        qa_type=QAType.LONG_ANSWER,
        num_questions=2,
        concurrency=3,
    )
except Exception as e:
    print(f"Error generating QAs: {e}")
```
</details>

### Azure Custom Speech
Please make sure to set the following environment variables in your `.env` file:
```bash
AZURE_AI_SPEECH_REGION=xxxxx
AZURE_AI_SPEECH_API_KEY=xxxxx
```
<details markdown="block">
<summary>Expand</summary>

```python
from azure_genai_utils.stt.stt_generator import CustomSpeechToTextGenerator

# Initialize the CustomSpeechToTextGenerator
stt = CustomSpeechToTextGenerator(
    custom_speech_lang="Korean",
    synthetic_text_file="cc_support_expressions.jsonl",
    train_output_dir="synthetic_data_train",
    train_output_dir_aug="synthetic_data_train_aug",
    eval_output_dir="synthetic_data_eval",
)

### Training set
# Generate synthetic text
topic = "Call center QnA related expected spoken utterances"
content = stt.generate_synthetic_text(
    topic=topic, num_samples=2, model_name="gpt-4o-mini"
)
stt.save_synthetic_text(output_dir="plain_text")

# Generate synthetic wav files for training
train_tts_voice_list = [
    "ko-KR-InJoonNeural",
    "zh-CN-XiaoxiaoMultilingualNeural",
    "en-GB-AdaMultilingualNeural",
]
stt.generate_synthetic_wav(
    mode="train", tts_voice_list=train_tts_voice_list, delete_old_data=True
)

# Augment the train data (Optional)
stt.augment_wav_files(num_augments=4)
# Package the train data to be used in the training pipeline
stt.package_trainset(use_augmented_data=True)

### Evaluation set
# Generate synthetic wav files for evaluation
eval_tts_voice_list = ["ko-KR-YuJinNeural"]
stt.generate_synthetic_wav(
    mode="eval", tts_voice_list=eval_tts_voice_list, delete_old_data=True
)
# Package the eval data to be used in the evaluation pipeline
stt.package_evalset(eval_dataset_dir="eval_dataset")
```
</details>

## License Summary
This sample code is provided under the Apache 2.0 license. See the LICENSE file.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/daekeun-ml/azure-genai-utils",
    "name": "azure-genai-utils",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "azure, genai, azure-genai-utils, azure-genai, langchain, microsoft, bing, langgraph",
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/b8/30/005cef7303cd381b8f485c39fa24968b72616d5965f8d3deaf7431fd524c/azure_genai_utils-0.0.2.16.tar.gz",
    "platform": null,
    "description": "# Azure GenAI Utils\n\nThis repository contains a set of utilities for working with Azure GenAI. The utilities are written in Python and are designed to be used for Hackathons, Workshops, and other events where you need to quickly get started with Azure GenAI.\n\n## Requirements\n- Azure Subscription\n- Azure AI Foundry\n- Bing Search API Key\n- Python 3.8 or later\n- `.env` file: Please do not forget to modify the `.env` file to match your account. Rename `.env.sample` to `.env` or copy and use it\n    ```bash\n    AZURE_OPENAI_ENDPOINT=xxxxx\n    AZURE_OPENAI_API_KEY=xxxxx\n    OPENAI_API_VERSION=2024-12-01-preview\n    AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o-mini\n\n    # Optinoal, but required for LangChain\n    LANGCHAIN_TRACING_V2=false\n    LANGCHAIN_ENDPOINT=https://api.smith.langchain.com\n    LANGCHAIN_API_KEY=xxxxx\n    LANGCHAIN_PROJECT=\"YOUR-PROJECT\"\n    ```\n\n## Installation\n\n### PyPI\n- `pip install azure-genai-utils`\n\n### From Source\n- `python setup.py install`\n\n## Usage \n\n### Azure OpenAI Test\n<details markdown=\"block\">\n<summary>Expand</summary>\n\n```python\nfrom azure_genai_utils.aoai import AOAI\naoai = AOAI()\naoai.test_api_call()\n```\n</details>\n\n### PDF RAG Chain\n<details markdown=\"block\">\n<summary>Expand</summary>\n\n```python\nfrom azure_genai_utils.rag.pdf import PDFRetrievalChain\n\npdf_path = \"[YOUR-PDF-PATH]\"\n\npdf = PDFRetrievalChain(\n    source_uri=[pdf_path],\n    loader_type=\"PDFPlumber\",\n    model_name=\"gpt-4o-mini\",\n    embedding_name=\"text-embedding-3-large\",\n    chunk_size=500,\n    chunk_overlap=50,\n).create_chain()\n\nquestion = \"[YOUR-QUESTION]\"\ndocs = pdf.retriever.invoke(question)\nresults = pdf.chain.invoke({\"chat_history\": \"\", \"question\": question, \"context\": docs})\n```\n</details>\n\n### Bing Search\nPlease make sure to set the following environment variables in your `.env` file:\n```bash\nBING_SUBSCRIPTION_KEY=xxxxx\n```\n\n<details markdown=\"block\">\n<summary>Expand</summary>\n\n```python\nfrom azure_genai_utils.tools import BingSearch\nfrom dotenv import load_dotenv\n\n# You need to add BING_SUBSCRIPTION_KEY=xxxx in .env file\nload_dotenv()\n\n# Basic usage\nbing = BingSearch(max_results=2, locale=\"ko-KR\")\nresults = bing.invoke(\"Microsoft AutoGen\")\nprint(results)\n\n## Include news search results and format output\nbing = BingSearch(\n    max_results=2,\n    locale=\"ko-KR\",\n    include_news=True,\n    include_entity=False,\n    format_output=True,\n)\nresults = bing.invoke(\"Microsoft AutoGen\")\nprint(results)\n```\n</details>\n\n### LangGraph Example (Bing Search + Azure GenAI)\n<details markdown=\"block\">\n<summary>Expand</summary>\n\n```python\nimport json\nfrom typing import Annotated\nfrom typing_extensions import TypedDict\nfrom langchain_openai import AzureChatOpenAI\nfrom langchain_core.messages import ToolMessage\nfrom langgraph.graph.message import add_messages\nfrom langgraph.graph import StateGraph\nfrom langgraph.prebuilt import ToolNode\nfrom langgraph.graph import START, END\nfrom azure_genai_utils.tools import BingSearch\nfrom dotenv import load_dotenv\n\nload_dotenv()\n\nclass State(TypedDict):\n    messages: Annotated[list, add_messages]\n\nllm = AzureChatOpenAI(model=\"gpt-4o-mini\")\ntool = BingSearch(max_results=3, format_output=False)\ntools = [tool]\nllm_with_tools = llm.bind_tools(tools)\n\ndef chatbot(state: State):\n    answer = llm_with_tools.invoke(state[\"messages\"])\n    return {\"messages\": [answer]}\n\ndef route_tools(\n    state: State,\n):\n    if messages := state.get(\"messages\", []):\n        ai_message = messages[-1]\n    else:\n        raise ValueError(f\"No messages found in input state to tool_edge: {state}\")\n\n    if hasattr(ai_message, \"tool_calls\") and len(ai_message.tool_calls) > 0:\n        return \"tools\"\n\n    return END\n\ngraph_builder = StateGraph(State)\ngraph_builder.add_node(\"chatbot\", chatbot)\ntool_node = ToolNode(tools=[tool])\ngraph_builder.add_node(\"tools\", tool_node)\n\ngraph_builder.add_conditional_edges(\n    source=\"chatbot\",\n    path=route_tools,\n    path_map={\"tools\": \"tools\", END: END},\n)\n\ngraph_builder.add_edge(\"tools\", \"chatbot\")\ngraph_builder.add_edge(START, \"chatbot\")\ngraph = graph_builder.compile()\n\n# Test\ninputs = {\"messages\": \"Microsoft AutoGen\"}\n\nfor event in graph.stream(inputs, stream_mode=\"values\"):\n    for key, value in event.items():\n        print(f\"\\n==============\\nSTEP: {key}\\n==============\\n\")\n        print(value[-1])\n```\n</details>\n\n### Synthetic Data Generation\n<details markdown=\"block\">\n<summary>Expand</summary>\n\n```python\nfrom azure_genai_utils.synthetic import (\n    QADataGenerator,\n    CustomQADataGenerator,\n    QAType,\n    generate_qas,\n)\n\ninput_batch = [\n    \"The quick brown fox jumps over the lazy dog.\",\n    \"What is the capital of France?\",\n]\n\nmodel_config = {\n    \"deployment\": \"gpt-4o-mini\",\n    \"model\": \"gpt-4o-mini\",\n    \"max_tokens\": 256,\n}\n\ntry:\n    qa_generator = QADataGenerator(model_config=model_config)\n    # qa_generator = CustomQADataGenerator(\n    #     model_config=model_config, templates_dir=f\"./azure_genai_utils/synthetic/prompt_templates/ko\"\n    # )\n    task = generate_qas(\n        input_texts=input_batch,\n        qa_generator=qa_generator,\n        qa_type=QAType.LONG_ANSWER,\n        num_questions=2,\n        concurrency=3,\n    )\nexcept Exception as e:\n    print(f\"Error generating QAs: {e}\")\n```\n</details>\n\n### Azure Custom Speech\nPlease make sure to set the following environment variables in your `.env` file:\n```bash\nAZURE_AI_SPEECH_REGION=xxxxx\nAZURE_AI_SPEECH_API_KEY=xxxxx\n```\n<details markdown=\"block\">\n<summary>Expand</summary>\n\n```python\nfrom azure_genai_utils.stt.stt_generator import CustomSpeechToTextGenerator\n\n# Initialize the CustomSpeechToTextGenerator\nstt = CustomSpeechToTextGenerator(\n    custom_speech_lang=\"Korean\",\n    synthetic_text_file=\"cc_support_expressions.jsonl\",\n    train_output_dir=\"synthetic_data_train\",\n    train_output_dir_aug=\"synthetic_data_train_aug\",\n    eval_output_dir=\"synthetic_data_eval\",\n)\n\n### Training set\n# Generate synthetic text\ntopic = \"Call center QnA related expected spoken utterances\"\ncontent = stt.generate_synthetic_text(\n    topic=topic, num_samples=2, model_name=\"gpt-4o-mini\"\n)\nstt.save_synthetic_text(output_dir=\"plain_text\")\n\n# Generate synthetic wav files for training\ntrain_tts_voice_list = [\n    \"ko-KR-InJoonNeural\",\n    \"zh-CN-XiaoxiaoMultilingualNeural\",\n    \"en-GB-AdaMultilingualNeural\",\n]\nstt.generate_synthetic_wav(\n    mode=\"train\", tts_voice_list=train_tts_voice_list, delete_old_data=True\n)\n\n# Augment the train data (Optional)\nstt.augment_wav_files(num_augments=4)\n# Package the train data to be used in the training pipeline\nstt.package_trainset(use_augmented_data=True)\n\n### Evaluation set\n# Generate synthetic wav files for evaluation\neval_tts_voice_list = [\"ko-KR-YuJinNeural\"]\nstt.generate_synthetic_wav(\n    mode=\"eval\", tts_voice_list=eval_tts_voice_list, delete_old_data=True\n)\n# Package the eval data to be used in the evaluation pipeline\nstt.package_evalset(eval_dataset_dir=\"eval_dataset\")\n```\n</details>\n\n## License Summary\nThis sample code is provided under the Apache 2.0 license. See the LICENSE file.\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "Utility functions for Azure GenAI",
    "version": "0.0.2.16",
    "project_urls": {
        "Homepage": "https://github.com/daekeun-ml/azure-genai-utils"
    },
    "split_keywords": [
        "azure",
        " genai",
        " azure-genai-utils",
        " azure-genai",
        " langchain",
        " microsoft",
        " bing",
        " langgraph"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "219f52654d69683df490d29e20cb680cb7e28cbd178cfa7b5c3452c225ef2451",
                "md5": "df8fa799dd942cbdff48d8830bbe50ba",
                "sha256": "37d4b7346ff3b594344de6f9d3a5c66d87b1b5e6f7be282304fb3a6e59593376"
            },
            "downloads": -1,
            "filename": "azure_genai_utils-0.0.2.16-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "df8fa799dd942cbdff48d8830bbe50ba",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 234887,
            "upload_time": "2025-01-21T00:55:08",
            "upload_time_iso_8601": "2025-01-21T00:55:08.301215Z",
            "url": "https://files.pythonhosted.org/packages/21/9f/52654d69683df490d29e20cb680cb7e28cbd178cfa7b5c3452c225ef2451/azure_genai_utils-0.0.2.16-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b830005cef7303cd381b8f485c39fa24968b72616d5965f8d3deaf7431fd524c",
                "md5": "69a7cd8791df9b9818129d247839a0de",
                "sha256": "274ea7da8db2b0b18b2876d965cc97ead229d0dc2ac8a0d35a875bcb626b85aa"
            },
            "downloads": -1,
            "filename": "azure_genai_utils-0.0.2.16.tar.gz",
            "has_sig": false,
            "md5_digest": "69a7cd8791df9b9818129d247839a0de",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 228080,
            "upload_time": "2025-01-21T00:55:09",
            "upload_time_iso_8601": "2025-01-21T00:55:09.802019Z",
            "url": "https://files.pythonhosted.org/packages/b8/30/005cef7303cd381b8f485c39fa24968b72616d5965f8d3deaf7431fd524c/azure_genai_utils-0.0.2.16.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-21 00:55:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "daekeun-ml",
    "github_project": "azure-genai-utils",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "azure-genai-utils"
}
        
Elapsed time: 0.41536s