gat-llm

Name	gat-llm JSON
Version	0.1.10 JSON
	download
home_page	None
Summary	Generation Augmented by Tools in LLMs - Agentic AI
upload_time	2025-08-08 02:28:08
maintainer	None
docs_url	None
author	DCA
requires_python	>=3.9
license	MIT License Copyright (c) 2024 Douglas Coimbra de Andrade Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	large language models llms rag tool use ai agents
VCS
bugtrack_url
requirements	boto3 pandas lxml duckdb markitdown pypdf pdf2image requests beautifulsoup4 tabulate gradio sympy qrcode ffmpeg matplotlib pillow graphviz pydot black pre-commit tqdm ipywidgets jupyterlab jupyterlab-lsp python-lsp-server openai anthropic beautifulsoup4 pytest pytest-timeout coverage
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Self Testing GATs (Generation Augmented by Tools)

This project focuses on designing and self-testing GAT LLMs (Language Learning Models) that can effectively use a variety of tools to accomplish tasks.

Demonstration (will take you to YouTube):

[![GAT in action](https://img.youtube.com/vi/U1oxouaOf5g/0.jpg)](https://www.youtube.com/watch?v=U1oxouaOf5g)

**Paper pre-print**: in the folder `paper`

## Table of Contents
1. [Project Overview](#project-overview)
2. [Using this Code](#using-this-code)
3. [Inspecting the Tools and LLMs](#inspecting-the-tools-and-llms)
4. [Changing the Code](#changing-the-code)
5. [Self-assessment](#self-assessment)

## Project Overview

This project implements a flexible framework for:
- Integrating various tools with LLMs
- Generating test cases to evaluate LLM performance in tool selection and usage
- Performing self-tests on different LLM models
- Analyzing the results of these tests

The system supports multiple LLM providers (including OpenAI, Anthropic, and AWS Bedrock) and a wide range of tools for tasks such as date calculations, web scraping, plotting, file operations, and more.

### Current benchmarks

With the current prompts, tools, descriptions and native tool configuration use settings, this is the performance of LLMs in GAT tasks.

**Note: this is not a leaderboard or general evaluation of quality. It only refers to this test setting as a simulation of an industrial LLM GAT implementation.**

|                                           |   ('n_invented_tools', 'sum') |   ('accuracy', '%') |   ('score', '%') |   ('USD / 1M tokens', 'Input') |   ('USD / 1M tokens', 'Output') |
|:------------------------------------------|------------------------------:|--------------------:|-----------------:|-------------------------------:|--------------------------------:|
| ('DeepSeekV3 Chat - DeepSeek', False)     |                             1 |                79.4 |             89.6 |                          0.27  |                            1.1  |
| ('Claude 3.5 Sonnet - Anthropic', False)  |                             0 |                78   |             89.5 |                          3     |                           15    |
| ('GPT 4o - OpenAI', True)                 |                             1 |                79.9 |             89.4 |                          5     |                           15    |
| ('GPT 4o mini - OpenAI', True)            |                             3 |                79.9 |             89   |                          0.15  |                            0.6  |
| ('GPT 4.1 - OpenAI', True)                |                             1 |                78.6 |             89   |                          2     |                            8    |
| ('Claude 3.5 Haiku - Anthropic', True)    |                             2 |                76.6 |             89   |                          1     |                            5    |
| ('Amazon Nova Pro 1.0 - Bedrock', True)   |                             1 |                78   |             88.7 |                          0.8   |                            3.2  |
| ('Claude 3.5 Sonnet - Anthropic', True)   |                             0 |                76.6 |             88.7 |                          3     |                           15    |
| ('Claude 3 Haiku - Bedrock', True)        |                             2 |                77.5 |             88.6 |                          0.25  |                            1.25 |
| ('Claude 3.5 Haiku - Anthropic', False)   |                             9 |                73.9 |             87.9 |                          1     |                            5    |
| ('GPT 4o - OpenAI', False)                |                             4 |                76.6 |             87.7 |                          5     |                           15    |
| ('Llama3_1 405b instruct', False)         |                             3 |                75.5 |             87   |                          5.32  |                           16    |
| ('Claude 3.7 Sonnet - Anthropic', True)   |                             2 |                74.7 |             86.9 |                          3     |                           15    |
| ('Mistral Large v1', False)               |                             1 |                74.7 |             86.8 |                          4     |                           12    |
| ('GPT 4o mini - OpenAI', False)           |                             3 |                73.1 |             85.1 |                          0.15  |                            0.6  |
| ('GPT 5 - OpenAI', True)                  |                             3 |                69.5 |             84.3 |                          1.25  |                           10    |
| ('Command RPlus - Bedrock', False)        |                             4 |                72.8 |             83.8 |                          3     |                           15    |
| ('Claude 3 Haiku - Bedrock', False)       |                             3 |                70.6 |             83.3 |                          0.25  |                            1.25 |
| ('Sabia3 - Maritaca', True)               |                             6 |                70.6 |             83.2 |                          0.95  |                            1.9  |
| ('GPT 5 mini - OpenAI', True)             |                            16 |                69   |             82.1 |                          0.25  |                            2    |
| ('Amazon Nova Lite 1.0 - Bedrock', True)  |                             2 |                66.2 |             80.2 |                          0.06  |                            0.24 |
| ('Llama3_1 70b instruct', False)          |                            11 |                70   |             79.6 |                          2.65  |                            3.5  |
| ('GPT 5 nano - OpenAI', True)             |                            21 |                63.5 |             78.9 |                          0.25  |                            2    |
| ('GPT 3.5 - OpenAI', False)               |                             2 |                65.4 |             78.6 |                          0.5   |                            1.5  |
| ('GPT 3.5 - OpenAI', True)                |                            18 |                66.4 |             76.9 |                          0.5   |                            1.5  |
| ('OpenAI GPT OSS 20b - Ollama', True)     |                            17 |                60.7 |             76.7 |                          0     |                            0    |
| ('Sabia3 - Maritaca', False)              |                            14 |                61.8 |             75.7 |                          0.95  |                            1.9  |
| ('Mistral Mixtral 8x7B', False)           |                           156 |                50.1 |             67.5 |                          0.45  |                            0.7  |
| ('Amazon Nova Micro 1.0 - Bedrock', True) |                           145 |                52.5 |             66.5 |                          0.035 |                            0.14 |
| ('Command R - Bedrock', False)            |                           117 |                49.7 |             65.4 |                          0.5   |                            1.5  |
| ('Llama3 8b instruct', False)             |                            39 |                22.3 |             38.1 |                          0.3   |                            0.6  |
| ('Llama3 70b instruct', False)            |                            29 |                29.1 |             36.1 |                          2.65  |                            3.5  |
| ('Llama3_1 8b instruct', False)           |                            34 |                23.9 |             33.7 |                          0.3   |                            0.6  |
| ('Grok2Vision - Grok', True)              |                             1 |                25   |             29   |                          2     |                           10    |

## Using this Code

To use this code and run the implemented tools, follow these steps:

### With PIP

1. `pip install gat_llm`
2. (Optional) Install optional dependencies for MarkItDown with `pip install markitdown[all]` (this is used to open .DOCX, .XLSX, etc)
3. (Optional) Install `poppler` (this is used to convert PDF pages to images when PDF pages need OCR or to be handled as images). If using conda, `conda install pdf2image` should handle everything
4. Set up your API keys (depending on what tools and LLM providers you need):
   - For Linux:
     ```
     export AWS_ACCESS_KEY_ID=your_aws_access_key
     export AWS_SECRET_ACCESS_KEY=your_aws_secret_key
     export ANTHROPIC_API_KEY=your_anthropic_key
     export OPENAI_API_KEY=your_openai_key
	 export MARITACA_API_KEY=your_maritaca_key
     ```
   - For Windows:
     ```
     set AWS_ACCESS_KEY_ID=your_aws_access_key
     set AWS_SECRET_ACCESS_KEY=your_aws_secret_key
     set ANTHROPIC_API_KEY=your_anthropic_key
     set OPENAI_API_KEY=your_openai_key
	 set MARITACA_API_KEY=your_maritaca_key
     ```
5. Create a test file `test_gat.py` to check if the tools are being called correctly:
```
# Imports
import boto3
import botocore

import gat_llm.llm_invoker as inv
from gat_llm.tools.base import LLMTools
from gat_llm.prompts.prompt_generator import RAGPromptGenerator

use_native_LLM_tools = True

# pick one depending on which API key you want to use
llm_name = "GPT 4o - OpenAI"
llm_name = 'Claude 3.5 Sonnet - Bedrock'
llm_name = 'Claude 3.5 Sonnet - Anthropic'

config = botocore.client.Config(connect_timeout=9000, read_timeout=9000, region_name="us-west-2")  # us-east-1  us-west-2
bedrock_client = boto3.client(service_name='bedrock-runtime', config=config)

llm = inv.LLM_Provider.get_llm(bedrock_client, llm_name)
query_llm = inv.LLM_Provider.get_llm(bedrock_client, llm_name)

print("Testing LLM invoke")
ans = llm("and at night? Enclose your answer within <my_ans></my_ans> tags. Then explain further.",
          chat_history=[["What color is the sky?", "Blue"]],
          system_prompt="You are a very knowledgeable truck driver. Use a strong truck driver's language and make sure to mention your name is Jack.",
         )
prev = ""
for x in ans:
    cur_ans = x
    print('.', end='')
print('\n')
print(x)

# Test tool use
print("Testing GAT - LLM tool use")
lt = LLMTools(query_llm=query_llm)
tool_descriptions = lt.get_tool_descriptions()
rpg = RAGPromptGenerator(use_native_tools=use_native_LLM_tools)
system_prompt = rpg.prompt.replace('{{TOOLS}}', tool_descriptions)

cur_tools = [x.tool_description for x in lt.tools]

ans = llm(
    "What date will it be 10 days from now? Today is June 4, 2024. Use your tool do_date_math. Before calling any tools, explain your thoughts. Then, make a plot of y=x^2.",
    chat_history=[["I need to do some date math.", "Sure. I will help."]],
    system_prompt="You are a helpful assistant. Prefer to use tools when possible. Never mention tool names in the answer.",
    tools=cur_tools,
    tool_invoker_fn=lt.invoke_tool,
)

prev = ""
for x in ans:
    cur_ans = x
    print('.', end='')
print(cur_ans)
```
4. Run `python test_gat.py`. You should see a response like:
```
Testing LLM invoke
..................................

<my_ans>Black as the inside of my trailer, with little white dots all over it</my_ans>

Hey there, Jack here. Been drivin' rigs for over 20 years now, and let me tell ya, when you're haulin' freight through the night, that sky turns darker than a pot of truck stop coffee. You got them stars scattered all over like chrome bits on a custom Peterbilt, and sometimes that moon hangs up there like a big ol' headlight in the sky.

When you're cruisin' down them highways at 3 AM, with nothin' but your high beams and them stars above, it's one hell of a sight. Makes ya feel pretty damn small in your rig, if ya know what I mean. Course, sometimes you get them city lights polluting the view, but out in the boonies, man, that night sky is somethin' else.

Shoot, reminds me of this one haul I did through Montana - clearest dang night sky you'll ever see. But I better wrap this up, my 30-minute break is almost over, and I got another 400 miles to cover before sunrise.

Testing GAT - LLM tool use

In 10 days from June 4, 2024, it will be June 14, 2024 (Friday). I've also generated a plot showing the quadratic function y = x².
```

### From the repository

1. Clone this repository and `cd` to the repository folder.

2. Set up the environment:
   - If using conda, create the environment:
     ```
     conda env create -f environment.yml
     ```
   - Alternatively, install the requirements directly from `requirements.txt`
   - Activate the environment with `conda activate llm_gat_env`

3. Set up your API keys (depending on what tools and LLM providers you need):
   - For Linux:
     ```
     export AWS_ACCESS_KEY_ID=your_aws_access_key
     export AWS_SECRET_ACCESS_KEY=your_aws_secret_key
     export ANTHROPIC_API_KEY=your_anthropic_key
     export OPENAI_API_KEY=your_openai_key
	 export GROK_API_KEY=your_grok_key
	 export MARITACA_API_KEY=your_maritaca_key
     ```
   - For Windows:
     ```
     set AWS_ACCESS_KEY_ID=your_aws_access_key
     set AWS_SECRET_ACCESS_KEY=your_aws_secret_key
     set ANTHROPIC_API_KEY=your_anthropic_key
     set OPENAI_API_KEY=your_openai_key
	 set GROK_API_KEY=your_grok_key
	 set MARITACA_API_KEY=your_maritaca_key
     ```

4. Open and run `GAT-demo.ipynb` to launch the Gradio demo

5. Access the demo:
   - Click the `localhost` interface
   - To share the demo with a public Gradio link, set `share=True` in the launch command:
     ```python
     demo.queue().launch(show_api=False, share=True, inline=False)
     ```

## Inspecting the Tools and LLMs

The Jupyter Notebook (`GAT-demo.ipynb`) provides a convenient interface for inspecting:
- Direct tool call results
- Prompts used for LLM interactions
- Other relevant information about the system's operation

Refer to the comments in the notebook for detailed explanations of each section.

## Changing the Code

### Implementing a New Tool

To add a new tool to the system:

1. Create a new Python file in the `tools` folder (e.g., `new_tool.py`)
2. Define a new class for your tool (e.g., `ToolNewTool`)
3. Implement the following methods:
   - `__init__`: Initialize the tool, set its name and description
   - `__call__`: Implement the tool's functionality
4. Add the tool description in the `tool_description` attribute, following the format used in other tools
5. In `tools/base.py`, import your new tool and add it to the `get_all_tools` method in the `LLMTools` class

Example structure for a new tool:

```python
class ToolNewTool:
    def __init__(self):
        self.name = "new_tool_name"
        self.tool_description = {
            "name": self.name,
            "description": "Description of what the tool does",
            "input_schema": {
                "type": "object",
                "properties": {
                    "param1": {"type": "string", "description": "Description of param1"},
                    # Add more parameters as needed
                },
                "required": ["param1"]
            }
        }

    def __call__(self, param1, **kwargs):
        # Implement tool functionality here
        result = # ... your code ...
        return result
```

### Removing Tools

To remove a tool from the system:

1. Delete the tool's Python file from the `tools` folder
2. Remove the tool's import and reference from `tools/base.py`
3. Update any test cases or documentation that reference the removed tool

### Adding LLMs

To add support for a new LLM:

1. Create a new file in the `llm_providers` folder (e.g., `new_llm_provider.py`)
2. Implement a class for the new LLM, following the interface used by existing LLM classes
3. In `llm_invoker.py`, import your new LLM class and add it to the `allowed_llms` list in the `LLM_Provider` class
4. Implement the necessary logic in the `get_llm` method of `LLM_Provider` to instantiate your new LLM

## Self-assessment

The project includes a comprehensive self-assessment system for evaluating LLM performance in tool selection and usage. All test cases self-generated and the test results of each LLM are stored in the folder `self_tests`.

### Self-generating Test Cases

The `SelfTestGenerator` class in `self_tests/self_test_generator.py` is responsible for creating test cases. It supports three strategies for test case generation:

1. `use_all`: Generates test cases for all tools in a single prompt
2. `only_selected`: Generates test cases for each tool individually
3. `selected_with_dummies`: Generates test cases for specific tools while providing all tools as options

To generate test cases:

1. Instantiate a `SelfTestGenerator` with the desired LLM
2. Call the `gen_test_cases` method with the number of test cases and the desired strategy

### Using the Test Cases to Evaluate LLMs

The `SelfTestPerformer` class in `self_tests/self_test_performer.py` executes the generated test cases to evaluate LLM performance.

To run self-tests:

1. Prepare test case files (JSON format) using the `SelfTestGenerator`
2. Instantiate a `SelfTestPerformer` with the LLM you want to test
3. Call the `test_tool_use` method with the test cases

The results are saved in CSV format, allowing for easy analysis and comparison of different LLM models and configurations.

Use the utility functions in `self_tests/self_test_utils.py` to analyze the test results, including functions to detect invented tools, check for correct tool selection, and calculate performance scores.

# Changelog

## v0.1.4

- Added Grok as LLM
- Added caching to Claude Bedrock models (Haiku 3.5 and Sonnet 3.7)

## v0.1.5

- Changed the UI to show thinking / tools
- Fixed a bug in `test_llm_tools.py` when no tools were selected

## v0.1.6

- Add GPT 4.1 LLM
- Add GPT 4.1 image generator
- Add Claude 4 (Anthropic and Bedrock)

## v0.1.7

- Enable multiple images per user message

## v0.1.8

- Include Ollama as a local LLM provider
- Update `read_local_file` tool to read a much wider array of files
- Include Grok4 from xAI

## v0.1.9

- Include smaller qwen3 models

## v0.1.10

- Include qwen3-coder:30b from Ollama
- Include GPT OSS 20b and 120b from Ollama
- Include GPT 5, 5mini, 5nani

## TBD

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "gat-llm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "large language models, LLMs, RAG, tool use, AI agents",
    "author": "DCA",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/20/23/5955fd208a857ddd06cce89e32461a3eaaedacd81db2808b449d86a14c15/gat_llm-0.1.10.tar.gz",
    "platform": null,
    "description": "# Self Testing GATs (Generation Augmented by Tools)\r\n\r\nThis project focuses on designing and self-testing GAT LLMs (Language Learning Models) that can effectively use a variety of tools to accomplish tasks.\r\n\r\nDemonstration (will take you to YouTube):\r\n\r\n[![GAT in action](https://img.youtube.com/vi/U1oxouaOf5g/0.jpg)](https://www.youtube.com/watch?v=U1oxouaOf5g)\r\n\r\n**Paper pre-print**: in the folder `paper`\r\n\r\n## Table of Contents\r\n1. [Project Overview](#project-overview)\r\n2. [Using this Code](#using-this-code)\r\n3. [Inspecting the Tools and LLMs](#inspecting-the-tools-and-llms)\r\n4. [Changing the Code](#changing-the-code)\r\n5. [Self-assessment](#self-assessment)\r\n\r\n## Project Overview\r\n\r\nThis project implements a flexible framework for:\r\n- Integrating various tools with LLMs\r\n- Generating test cases to evaluate LLM performance in tool selection and usage\r\n- Performing self-tests on different LLM models\r\n- Analyzing the results of these tests\r\n\r\nThe system supports multiple LLM providers (including OpenAI, Anthropic, and AWS Bedrock) and a wide range of tools for tasks such as date calculations, web scraping, plotting, file operations, and more.\r\n\r\n### Current benchmarks\r\n\r\nWith the current prompts, tools, descriptions and native tool configuration use settings, this is the performance of LLMs in GAT tasks.\r\n\r\n**Note: this is not a leaderboard or general evaluation of quality. It only refers to this test setting as a simulation of an industrial LLM GAT implementation.**\r\n\r\n|                                           |   ('n_invented_tools', 'sum') |   ('accuracy', '%') |   ('score', '%') |   ('USD / 1M tokens', 'Input') |   ('USD / 1M tokens', 'Output') |\r\n|:------------------------------------------|------------------------------:|--------------------:|-----------------:|-------------------------------:|--------------------------------:|\r\n| ('DeepSeekV3 Chat - DeepSeek', False)     |                             1 |                79.4 |             89.6 |                          0.27  |                            1.1  |\r\n| ('Claude 3.5 Sonnet - Anthropic', False)  |                             0 |                78   |             89.5 |                          3     |                           15    |\r\n| ('GPT 4o - OpenAI', True)                 |                             1 |                79.9 |             89.4 |                          5     |                           15    |\r\n| ('GPT 4o mini - OpenAI', True)            |                             3 |                79.9 |             89   |                          0.15  |                            0.6  |\r\n| ('GPT 4.1 - OpenAI', True)                |                             1 |                78.6 |             89   |                          2     |                            8    |\r\n| ('Claude 3.5 Haiku - Anthropic', True)    |                             2 |                76.6 |             89   |                          1     |                            5    |\r\n| ('Amazon Nova Pro 1.0 - Bedrock', True)   |                             1 |                78   |             88.7 |                          0.8   |                            3.2  |\r\n| ('Claude 3.5 Sonnet - Anthropic', True)   |                             0 |                76.6 |             88.7 |                          3     |                           15    |\r\n| ('Claude 3 Haiku - Bedrock', True)        |                             2 |                77.5 |             88.6 |                          0.25  |                            1.25 |\r\n| ('Claude 3.5 Haiku - Anthropic', False)   |                             9 |                73.9 |             87.9 |                          1     |                            5    |\r\n| ('GPT 4o - OpenAI', False)                |                             4 |                76.6 |             87.7 |                          5     |                           15    |\r\n| ('Llama3_1 405b instruct', False)         |                             3 |                75.5 |             87   |                          5.32  |                           16    |\r\n| ('Claude 3.7 Sonnet - Anthropic', True)   |                             2 |                74.7 |             86.9 |                          3     |                           15    |\r\n| ('Mistral Large v1', False)               |                             1 |                74.7 |             86.8 |                          4     |                           12    |\r\n| ('GPT 4o mini - OpenAI', False)           |                             3 |                73.1 |             85.1 |                          0.15  |                            0.6  |\r\n| ('GPT 5 - OpenAI', True)                  |                             3 |                69.5 |             84.3 |                          1.25  |                           10    |\r\n| ('Command RPlus - Bedrock', False)        |                             4 |                72.8 |             83.8 |                          3     |                           15    |\r\n| ('Claude 3 Haiku - Bedrock', False)       |                             3 |                70.6 |             83.3 |                          0.25  |                            1.25 |\r\n| ('Sabia3 - Maritaca', True)               |                             6 |                70.6 |             83.2 |                          0.95  |                            1.9  |\r\n| ('GPT 5 mini - OpenAI', True)             |                            16 |                69   |             82.1 |                          0.25  |                            2    |\r\n| ('Amazon Nova Lite 1.0 - Bedrock', True)  |                             2 |                66.2 |             80.2 |                          0.06  |                            0.24 |\r\n| ('Llama3_1 70b instruct', False)          |                            11 |                70   |             79.6 |                          2.65  |                            3.5  |\r\n| ('GPT 5 nano - OpenAI', True)             |                            21 |                63.5 |             78.9 |                          0.25  |                            2    |\r\n| ('GPT 3.5 - OpenAI', False)               |                             2 |                65.4 |             78.6 |                          0.5   |                            1.5  |\r\n| ('GPT 3.5 - OpenAI', True)                |                            18 |                66.4 |             76.9 |                          0.5   |                            1.5  |\r\n| ('OpenAI GPT OSS 20b - Ollama', True)     |                            17 |                60.7 |             76.7 |                          0     |                            0    |\r\n| ('Sabia3 - Maritaca', False)              |                            14 |                61.8 |             75.7 |                          0.95  |                            1.9  |\r\n| ('Mistral Mixtral 8x7B', False)           |                           156 |                50.1 |             67.5 |                          0.45  |                            0.7  |\r\n| ('Amazon Nova Micro 1.0 - Bedrock', True) |                           145 |                52.5 |             66.5 |                          0.035 |                            0.14 |\r\n| ('Command R - Bedrock', False)            |                           117 |                49.7 |             65.4 |                          0.5   |                            1.5  |\r\n| ('Llama3 8b instruct', False)             |                            39 |                22.3 |             38.1 |                          0.3   |                            0.6  |\r\n| ('Llama3 70b instruct', False)            |                            29 |                29.1 |             36.1 |                          2.65  |                            3.5  |\r\n| ('Llama3_1 8b instruct', False)           |                            34 |                23.9 |             33.7 |                          0.3   |                            0.6  |\r\n| ('Grok2Vision - Grok', True)              |                             1 |                25   |             29   |                          2     |                           10    |\r\n\r\n## Using this Code\r\n\r\nTo use this code and run the implemented tools, follow these steps:\r\n\r\n### With PIP\r\n\r\n1. `pip install gat_llm`\r\n2. (Optional) Install optional dependencies for MarkItDown with `pip install markitdown[all]` (this is used to open .DOCX, .XLSX, etc)\r\n3. (Optional) Install `poppler` (this is used to convert PDF pages to images when PDF pages need OCR or to be handled as images). If using conda, `conda install pdf2image` should handle everything\r\n4. Set up your API keys (depending on what tools and LLM providers you need):\r\n   - For Linux:\r\n     ```\r\n     export AWS_ACCESS_KEY_ID=your_aws_access_key\r\n     export AWS_SECRET_ACCESS_KEY=your_aws_secret_key\r\n     export ANTHROPIC_API_KEY=your_anthropic_key\r\n     export OPENAI_API_KEY=your_openai_key\r\n\t export MARITACA_API_KEY=your_maritaca_key\r\n     ```\r\n   - For Windows:\r\n     ```\r\n     set AWS_ACCESS_KEY_ID=your_aws_access_key\r\n     set AWS_SECRET_ACCESS_KEY=your_aws_secret_key\r\n     set ANTHROPIC_API_KEY=your_anthropic_key\r\n     set OPENAI_API_KEY=your_openai_key\r\n\t set MARITACA_API_KEY=your_maritaca_key\r\n     ```\r\n5. Create a test file `test_gat.py` to check if the tools are being called correctly:\r\n```\r\n# Imports\r\nimport boto3\r\nimport botocore\r\n\r\nimport gat_llm.llm_invoker as inv\r\nfrom gat_llm.tools.base import LLMTools\r\nfrom gat_llm.prompts.prompt_generator import RAGPromptGenerator\r\n\r\nuse_native_LLM_tools = True\r\n\r\n# pick one depending on which API key you want to use\r\nllm_name = \"GPT 4o - OpenAI\"\r\nllm_name = 'Claude 3.5 Sonnet - Bedrock'\r\nllm_name = 'Claude 3.5 Sonnet - Anthropic'\r\n\r\nconfig = botocore.client.Config(connect_timeout=9000, read_timeout=9000, region_name=\"us-west-2\")  # us-east-1  us-west-2\r\nbedrock_client = boto3.client(service_name='bedrock-runtime', config=config)\r\n\r\nllm = inv.LLM_Provider.get_llm(bedrock_client, llm_name)\r\nquery_llm = inv.LLM_Provider.get_llm(bedrock_client, llm_name)\r\n\r\nprint(\"Testing LLM invoke\")\r\nans = llm(\"and at night? Enclose your answer within <my_ans></my_ans> tags. Then explain further.\",\r\n          chat_history=[[\"What color is the sky?\", \"Blue\"]],\r\n          system_prompt=\"You are a very knowledgeable truck driver. Use a strong truck driver's language and make sure to mention your name is Jack.\",\r\n         )\r\nprev = \"\"\r\nfor x in ans:\r\n    cur_ans = x\r\n    print('.', end='')\r\nprint('\\n')\r\nprint(x)\r\n\r\n# Test tool use\r\nprint(\"Testing GAT - LLM tool use\")\r\nlt = LLMTools(query_llm=query_llm)\r\ntool_descriptions = lt.get_tool_descriptions()\r\nrpg = RAGPromptGenerator(use_native_tools=use_native_LLM_tools)\r\nsystem_prompt = rpg.prompt.replace('{{TOOLS}}', tool_descriptions)\r\n\r\ncur_tools = [x.tool_description for x in lt.tools]\r\n\r\nans = llm(\r\n    \"What date will it be 10 days from now? Today is June 4, 2024. Use your tool do_date_math. Before calling any tools, explain your thoughts. Then, make a plot of y=x^2.\",\r\n    chat_history=[[\"I need to do some date math.\", \"Sure. I will help.\"]],\r\n    system_prompt=\"You are a helpful assistant. Prefer to use tools when possible. Never mention tool names in the answer.\",\r\n    tools=cur_tools,\r\n    tool_invoker_fn=lt.invoke_tool,\r\n)\r\n\r\nprev = \"\"\r\nfor x in ans:\r\n    cur_ans = x\r\n    print('.', end='')\r\nprint(cur_ans)\r\n```\r\n4. Run `python test_gat.py`. You should see a response like:\r\n```\r\nTesting LLM invoke\r\n..................................\r\n\r\n<my_ans>Black as the inside of my trailer, with little white dots all over it</my_ans>\r\n\r\nHey there, Jack here. Been drivin' rigs for over 20 years now, and let me tell ya, when you're haulin' freight through the night, that sky turns darker than a pot of truck stop coffee. You got them stars scattered all over like chrome bits on a custom Peterbilt, and sometimes that moon hangs up there like a big ol' headlight in the sky.\r\n\r\nWhen you're cruisin' down them highways at 3 AM, with nothin' but your high beams and them stars above, it's one hell of a sight. Makes ya feel pretty damn small in your rig, if ya know what I mean. Course, sometimes you get them city lights polluting the view, but out in the boonies, man, that night sky is somethin' else.\r\n\r\nShoot, reminds me of this one haul I did through Montana - clearest dang night sky you'll ever see. But I better wrap this up, my 30-minute break is almost over, and I got another 400 miles to cover before sunrise.\r\n\r\nTesting GAT - LLM tool use\r\n\r\nIn 10 days from June 4, 2024, it will be June 14, 2024 (Friday). I've also generated a plot showing the quadratic function y = x\u00b2.\r\n```\r\n\r\n### From the repository\r\n\r\n1. Clone this repository and `cd` to the repository folder.\r\n\r\n2. Set up the environment:\r\n   - If using conda, create the environment:\r\n     ```\r\n     conda env create -f environment.yml\r\n     ```\r\n   - Alternatively, install the requirements directly from `requirements.txt`\r\n   - Activate the environment with `conda activate llm_gat_env`\r\n\r\n3. Set up your API keys (depending on what tools and LLM providers you need):\r\n   - For Linux:\r\n     ```\r\n     export AWS_ACCESS_KEY_ID=your_aws_access_key\r\n     export AWS_SECRET_ACCESS_KEY=your_aws_secret_key\r\n     export ANTHROPIC_API_KEY=your_anthropic_key\r\n     export OPENAI_API_KEY=your_openai_key\r\n\t export GROK_API_KEY=your_grok_key\r\n\t export MARITACA_API_KEY=your_maritaca_key\r\n     ```\r\n   - For Windows:\r\n     ```\r\n     set AWS_ACCESS_KEY_ID=your_aws_access_key\r\n     set AWS_SECRET_ACCESS_KEY=your_aws_secret_key\r\n     set ANTHROPIC_API_KEY=your_anthropic_key\r\n     set OPENAI_API_KEY=your_openai_key\r\n\t set GROK_API_KEY=your_grok_key\r\n\t set MARITACA_API_KEY=your_maritaca_key\r\n     ```\r\n\r\n4. Open and run `GAT-demo.ipynb` to launch the Gradio demo\r\n\r\n5. Access the demo:\r\n   - Click the `localhost` interface\r\n   - To share the demo with a public Gradio link, set `share=True` in the launch command:\r\n     ```python\r\n     demo.queue().launch(show_api=False, share=True, inline=False)\r\n     ```\r\n\r\n## Inspecting the Tools and LLMs\r\n\r\nThe Jupyter Notebook (`GAT-demo.ipynb`) provides a convenient interface for inspecting:\r\n- Direct tool call results\r\n- Prompts used for LLM interactions\r\n- Other relevant information about the system's operation\r\n\r\nRefer to the comments in the notebook for detailed explanations of each section.\r\n\r\n## Changing the Code\r\n\r\n### Implementing a New Tool\r\n\r\nTo add a new tool to the system:\r\n\r\n1. Create a new Python file in the `tools` folder (e.g., `new_tool.py`)\r\n2. Define a new class for your tool (e.g., `ToolNewTool`)\r\n3. Implement the following methods:\r\n   - `__init__`: Initialize the tool, set its name and description\r\n   - `__call__`: Implement the tool's functionality\r\n4. Add the tool description in the `tool_description` attribute, following the format used in other tools\r\n5. In `tools/base.py`, import your new tool and add it to the `get_all_tools` method in the `LLMTools` class\r\n\r\nExample structure for a new tool:\r\n\r\n```python\r\nclass ToolNewTool:\r\n    def __init__(self):\r\n        self.name = \"new_tool_name\"\r\n        self.tool_description = {\r\n            \"name\": self.name,\r\n            \"description\": \"Description of what the tool does\",\r\n            \"input_schema\": {\r\n                \"type\": \"object\",\r\n                \"properties\": {\r\n                    \"param1\": {\"type\": \"string\", \"description\": \"Description of param1\"},\r\n                    # Add more parameters as needed\r\n                },\r\n                \"required\": [\"param1\"]\r\n            }\r\n        }\r\n\r\n    def __call__(self, param1, **kwargs):\r\n        # Implement tool functionality here\r\n        result = # ... your code ...\r\n        return result\r\n```\r\n\r\n### Removing Tools\r\n\r\nTo remove a tool from the system:\r\n\r\n1. Delete the tool's Python file from the `tools` folder\r\n2. Remove the tool's import and reference from `tools/base.py`\r\n3. Update any test cases or documentation that reference the removed tool\r\n\r\n### Adding LLMs\r\n\r\nTo add support for a new LLM:\r\n\r\n1. Create a new file in the `llm_providers` folder (e.g., `new_llm_provider.py`)\r\n2. Implement a class for the new LLM, following the interface used by existing LLM classes\r\n3. In `llm_invoker.py`, import your new LLM class and add it to the `allowed_llms` list in the `LLM_Provider` class\r\n4. Implement the necessary logic in the `get_llm` method of `LLM_Provider` to instantiate your new LLM\r\n\r\n## Self-assessment\r\n\r\nThe project includes a comprehensive self-assessment system for evaluating LLM performance in tool selection and usage. All test cases self-generated and the test results of each LLM are stored in the folder `self_tests`.\r\n\r\n### Self-generating Test Cases\r\n\r\nThe `SelfTestGenerator` class in `self_tests/self_test_generator.py` is responsible for creating test cases. It supports three strategies for test case generation:\r\n\r\n1. `use_all`: Generates test cases for all tools in a single prompt\r\n2. `only_selected`: Generates test cases for each tool individually\r\n3. `selected_with_dummies`: Generates test cases for specific tools while providing all tools as options\r\n\r\nTo generate test cases:\r\n\r\n1. Instantiate a `SelfTestGenerator` with the desired LLM\r\n2. Call the `gen_test_cases` method with the number of test cases and the desired strategy\r\n\r\n### Using the Test Cases to Evaluate LLMs\r\n\r\nThe `SelfTestPerformer` class in `self_tests/self_test_performer.py` executes the generated test cases to evaluate LLM performance.\r\n\r\nTo run self-tests:\r\n\r\n1. Prepare test case files (JSON format) using the `SelfTestGenerator`\r\n2. Instantiate a `SelfTestPerformer` with the LLM you want to test\r\n3. Call the `test_tool_use` method with the test cases\r\n\r\nThe results are saved in CSV format, allowing for easy analysis and comparison of different LLM models and configurations.\r\n\r\nUse the utility functions in `self_tests/self_test_utils.py` to analyze the test results, including functions to detect invented tools, check for correct tool selection, and calculate performance scores.\r\n\r\n# Changelog\r\n\r\n## v0.1.4\r\n\r\n- Added Grok as LLM\r\n- Added caching to Claude Bedrock models (Haiku 3.5 and Sonnet 3.7)\r\n\r\n## v0.1.5\r\n\r\n- Changed the UI to show thinking / tools\r\n- Fixed a bug in `test_llm_tools.py` when no tools were selected\r\n\r\n## v0.1.6\r\n\r\n- Add GPT 4.1 LLM\r\n- Add GPT 4.1 image generator\r\n- Add Claude 4 (Anthropic and Bedrock)\r\n\r\n## v0.1.7\r\n\r\n- Enable multiple images per user message\r\n\r\n## v0.1.8\r\n\r\n- Include Ollama as a local LLM provider\r\n- Update `read_local_file` tool to read a much wider array of files\r\n- Include Grok4 from xAI\r\n\r\n## v0.1.9\r\n\r\n- Include smaller qwen3 models\r\n\r\n## v0.1.10\r\n\r\n- Include qwen3-coder:30b from Ollama\r\n- Include GPT OSS 20b and 120b from Ollama\r\n- Include GPT 5, 5mini, 5nani\r\n\r\n## TBD\r\n",
    "bugtrack_url": null,
    "license": "MIT License\r\n        \r\n        Copyright (c) 2024 Douglas Coimbra de Andrade\r\n        \r\n        Permission is hereby granted, free of charge, to any person obtaining a copy\r\n        of this software and associated documentation files (the \"Software\"), to deal\r\n        in the Software without restriction, including without limitation the rights\r\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\r\n        copies of the Software, and to permit persons to whom the Software is\r\n        furnished to do so, subject to the following conditions:\r\n        \r\n        The above copyright notice and this permission notice shall be included in all\r\n        copies or substantial portions of the Software.\r\n        \r\n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\r\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\r\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\r\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\r\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\r\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\r\n        SOFTWARE.\r\n        ",
    "summary": "Generation Augmented by Tools in LLMs - Agentic AI",
    "version": "0.1.10",
    "project_urls": {
        "Homepage": "https://github.com/douglas125/SelfTestingGAT_LLM"
    },
    "split_keywords": [
        "large language models",
        " llms",
        " rag",
        " tool use",
        " ai agents"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1cf1dd7c5f816b4cf0b124bdd369788171bd9992e57588775e82051420204f1e",
                "md5": "133cafbfa8e30e42fea7fd7e9466ae85",
                "sha256": "4a068a7d4495d4e7f10e6a1feda0602a3df6afe098264aff9b44f3291a1fc897"
            },
            "downloads": -1,
            "filename": "gat_llm-0.1.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "133cafbfa8e30e42fea7fd7e9466ae85",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 156852,
            "upload_time": "2025-08-08T02:28:05",
            "upload_time_iso_8601": "2025-08-08T02:28:05.823851Z",
            "url": "https://files.pythonhosted.org/packages/1c/f1/dd7c5f816b4cf0b124bdd369788171bd9992e57588775e82051420204f1e/gat_llm-0.1.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "20235955fd208a857ddd06cce89e32461a3eaaedacd81db2808b449d86a14c15",
                "md5": "98ff16be2977ed056845dceed8b47441",
                "sha256": "9ea4d7e19539b8c9be4e4fe7f803f3a4a388bfaf2054770cf5d8d81d14bf7bb4"
            },
            "downloads": -1,
            "filename": "gat_llm-0.1.10.tar.gz",
            "has_sig": false,
            "md5_digest": "98ff16be2977ed056845dceed8b47441",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 145344,
            "upload_time": "2025-08-08T02:28:08",
            "upload_time_iso_8601": "2025-08-08T02:28:08.257948Z",
            "url": "https://files.pythonhosted.org/packages/20/23/5955fd208a857ddd06cce89e32461a3eaaedacd81db2808b449d86a14c15/gat_llm-0.1.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-08 02:28:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "douglas125",
    "github_project": "SelfTestingGAT_LLM",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "boto3",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "lxml",
            "specs": []
        },
        {
            "name": "duckdb",
            "specs": []
        },
        {
            "name": "markitdown",
            "specs": []
        },
        {
            "name": "pypdf",
            "specs": []
        },
        {
            "name": "pdf2image",
            "specs": []
        },
        {
            "name": "requests",
            "specs": []
        },
        {
            "name": "beautifulsoup4",
            "specs": []
        },
        {
            "name": "tabulate",
            "specs": []
        },
        {
            "name": "gradio",
            "specs": []
        },
        {
            "name": "sympy",
            "specs": []
        },
        {
            "name": "qrcode",
            "specs": []
        },
        {
            "name": "ffmpeg",
            "specs": []
        },
        {
            "name": "matplotlib",
            "specs": []
        },
        {
            "name": "pillow",
            "specs": []
        },
        {
            "name": "graphviz",
            "specs": []
        },
        {
            "name": "pydot",
            "specs": []
        },
        {
            "name": "black",
            "specs": []
        },
        {
            "name": "pre-commit",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "ipywidgets",
            "specs": []
        },
        {
            "name": "jupyterlab",
            "specs": []
        },
        {
            "name": "jupyterlab-lsp",
            "specs": []
        },
        {
            "name": "python-lsp-server",
            "specs": []
        },
        {
            "name": "openai",
            "specs": []
        },
        {
            "name": "anthropic",
            "specs": []
        },
        {
            "name": "beautifulsoup4",
            "specs": []
        },
        {
            "name": "pytest",
            "specs": []
        },
        {
            "name": "pytest-timeout",
            "specs": []
        },
        {
            "name": "coverage",
            "specs": []
        }
    ],
    "lcname": "gat-llm"
}

DCA