talktollm


Nametalktollm JSON
Version 0.4.1 PyPI version JSON
download
home_pagehttps://github.com/AMAMazing/talktollm
SummaryA Python utility for interacting with large language models (LLMs) via web automation
upload_time2025-07-11 06:51:42
maintainerNone
docs_urlNone
authorAlex M
requires_python>=3.6
licenseNone
keywords llm automation gui pyautogui gemini deepseek clipboard
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # talktollm

[![PyPI version](https://badge.fury.io/py/talktollm.svg)](https://badge.fury.io/py/talktollm)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A Python utility for interacting with large language models (LLMs) through a command-line interface. It leverages image recognition to automate interactions with LLM web interfaces, enabling seamless conversations and task execution.

## Features

-   **Command-Line Interaction:** Provides a simple and intuitive command-line interface for interacting with LLMs.
-   **Automated Image Recognition:** Employs image recognition techniques (via `optimisewait`) to identify and interact with elements on the LLM interface. Includes fallback if `optimisewait` is not installed.
-   **Multi-LLM Support:** Currently supports DeepSeek and Gemini.
-   **Automated Conversations:** Facilitates automated conversations and task execution by simulating user interactions.
-   **Image Support:** Allows sending images (base64 encoded) to the LLM.
-   **Robust Clipboard Handling:** Includes configurable retry mechanisms (default 5 retries) for setting text/images to the clipboard and reading text from the clipboard to handle access errors and timing issues.
-   **Dynamic Image Path Management:** Copies necessary recognition images to a temporary directory, ensuring they are accessible and up-to-date.
-   **Easy to use:** Designed for simple setup and usage.

## Core Functionality

The core function is `talkto(llm, prompt, imagedata=None, debug=False, tabswitch=True, read_retries=5, read_delay=0.3)`.

**Arguments:**

-   `llm` (str): The LLM name ('deepseek','gemini' or 'aistudio').
-   `prompt` (str): The text prompt.
-   `imagedata` (list[str] | None): Optional list of base64 encoded image strings (e.g., "data:image/png;base64,...").
-   `debug` (bool): Enable detailed console output. Defaults to `False`.
-   `tabswitch` (bool): Switch focus back to the previous window after closing the LLM tab. Defaults to `True`.
-   `read_retries` (int): Number of attempts to read the final response from the clipboard. Defaults to 5.
-   `read_delay` (float): Delay in seconds between clipboard read attempts. Defaults to 0.3.

**Steps:**

1.  Validates the LLM name.
2.  Sets up image paths for `optimisewait` using `set_image_path`.
3.  Opens the LLM's website in a new browser tab.
4.  Waits and clicks the message input area using `optimiseWait('message', clicks=2)`.
5.  If `imagedata` is provided:
    -   Iterates through images.
    -   Sets each image to the clipboard using `set_clipboard_image` (with retries).
    -   Pastes the image (`Ctrl+V`).
    -   Waits for potential upload (`sleep(7)`).
6.  Sets the `prompt` text to the clipboard using `set_clipboard` (with retries).
7.  Pastes the prompt (`Ctrl+V`).
8.  Waits and clicks the 'run' button using `optimiseWait('run')`.
9.  Waits for the response generation, using `optimiseWait('copy')` as an indicator that the response is ready and the copy button is visible.
10. Waits briefly (`sleep(0.5)`) after `optimiseWait('copy')` clicks the copy button.
11. Closes the browser tab (`Ctrl+W`).
12. Switches focus back if `tabswitch` is `True` (`Alt+Tab`).
13. Attempts to read the LLM's response from the clipboard with retry logic (`read_retries`, `read_delay`).
14. Returns the retrieved text response, or an empty string if reading fails.

## Helper Functions

**Clipboard Handling:**

-   `set_clipboard(text: str, retries: int = 5, delay: float = 0.2)`: Sets text to the clipboard, handling `CF_UNICODETEXT`. Retries on common access errors (`winerror 5` or `1418`).
-   `set_clipboard_image(image_data: str, retries: int = 5, delay: float = 0.2)`: Sets a base64 encoded image to the clipboard (`CF_DIB` format). Decodes, converts to BMP, and retries on common access errors.

**Image Path Management:**

-   `set_image_path(llm: str, debug: bool = False)`: Orchestrates copying images.
-   `copy_images_to_temp(llm: str, debug: bool = False)`: Copies necessary `.png` images for the specified `llm` from the package's `images/<llm>` directory to a temporary location (`%TEMP%\\talktollm_images\\<llm>`). Creates the temporary directory if needed and only copies if the source file is newer or the destination doesn't exist. Sets the `optimisewait` autopath. Includes error handling for missing package resources.

## Installation

```
pip install talktollm
```

*Note: Requires `optimisewait` for image recognition. Install separately if needed (`pip install optimisewait`).*

## Usage

Here are some examples of how to use `talktollm`.

**Example 1: Simple Text Prompt**

Send a basic text prompt to Gemini.

```python
import talktollm

prompt_text = "Explain quantum entanglement in simple terms."
response = talktollm.talkto('gemini', prompt_text)
print("--- Simple Gemini Response ---")
print(response)
```

**Example 2: Text Prompt with Debugging**

Send a text prompt and enable debugging output to see more details about the process.

```python
import talktollm

prompt_text = "What are the main features of Python 3.12?"
response = talktollm.talkto('deepseek', prompt_text, debug=True)
print("--- DeepSeek Debug Response ---")
print(response)
```

**Example 3: Preparing Image Data**

Load an image file, encode it in base64, and format it correctly for the `imagedata` argument.

```python
import base64
import io
from PIL import Image

# Load your image (replace 'path/to/your/image.png' with the actual path)
try:
    with open("path/to/your/image.png", "rb") as image_file:
        # Encode to base64
        encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
        # Format as a data URI
        image_data_uri = f"data:image/png;base64,{encoded_string}"
        print("Image prepared successfully!")
        # You can now pass [image_data_uri] to the imagedata parameter
except FileNotFoundError:
    print("Error: Image file not found. Please check the path.")
    image_data_uri = None
except Exception as e:
    print(f"Error processing image: {e}")
    image_data_uri = None

# This 'image_data_uri' variable holds the string needed for the next example
```

**Example 4: Text and Image Prompt**

Send a text prompt along with a prepared image to Gemini. (Assumes `image_data_uri` was successfully created in Example 3).

```python
import talktollm

# Assuming image_data_uri is available from the previous example
if image_data_uri:
    prompt_text = "Describe the main subject of this image."
    response = talktollm.talkto(
        'gemini',
        prompt_text,
        imagedata=[image_data_uri], # Pass the image data as a list
        debug=True
    )
    print("--- Gemini Image Response ---")
    print(response)
else:
    print("Skipping image example because image data is not available.")
```

## Dependencies

-   `pywin32`: For Windows API access (clipboard).
-   `pyautogui`: For GUI automation (keystrokes, potentially mouse if `optimisewait` fails).
-   `Pillow`: For image processing (opening, converting for clipboard).
-   `optimisewait` (Optional but Recommended): For robust image-based waiting and clicking.

## Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

## License

MIT

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AMAMazing/talktollm",
    "name": "talktollm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "llm, automation, gui, pyautogui, gemini, deepseek, clipboard",
    "author": "Alex M",
    "author_email": "alexmalone489@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/eb/3b/e17d366302272acb296006169f22b759b2a6da8f66331d092831247cd551/talktollm-0.4.1.tar.gz",
    "platform": null,
    "description": "# talktollm\r\n\r\n[![PyPI version](https://badge.fury.io/py/talktollm.svg)](https://badge.fury.io/py/talktollm)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n\r\nA Python utility for interacting with large language models (LLMs) through a command-line interface. It leverages image recognition to automate interactions with LLM web interfaces, enabling seamless conversations and task execution.\r\n\r\n## Features\r\n\r\n-   **Command-Line Interaction:** Provides a simple and intuitive command-line interface for interacting with LLMs.\r\n-   **Automated Image Recognition:** Employs image recognition techniques (via `optimisewait`) to identify and interact with elements on the LLM interface. Includes fallback if `optimisewait` is not installed.\r\n-   **Multi-LLM Support:** Currently supports DeepSeek and Gemini.\r\n-   **Automated Conversations:** Facilitates automated conversations and task execution by simulating user interactions.\r\n-   **Image Support:** Allows sending images (base64 encoded) to the LLM.\r\n-   **Robust Clipboard Handling:** Includes configurable retry mechanisms (default 5 retries) for setting text/images to the clipboard and reading text from the clipboard to handle access errors and timing issues.\r\n-   **Dynamic Image Path Management:** Copies necessary recognition images to a temporary directory, ensuring they are accessible and up-to-date.\r\n-   **Easy to use:** Designed for simple setup and usage.\r\n\r\n## Core Functionality\r\n\r\nThe core function is `talkto(llm, prompt, imagedata=None, debug=False, tabswitch=True, read_retries=5, read_delay=0.3)`.\r\n\r\n**Arguments:**\r\n\r\n-   `llm` (str): The LLM name ('deepseek','gemini' or 'aistudio').\r\n-   `prompt` (str): The text prompt.\r\n-   `imagedata` (list[str] | None): Optional list of base64 encoded image strings (e.g., \"data:image/png;base64,...\").\r\n-   `debug` (bool): Enable detailed console output. Defaults to `False`.\r\n-   `tabswitch` (bool): Switch focus back to the previous window after closing the LLM tab. Defaults to `True`.\r\n-   `read_retries` (int): Number of attempts to read the final response from the clipboard. Defaults to 5.\r\n-   `read_delay` (float): Delay in seconds between clipboard read attempts. Defaults to 0.3.\r\n\r\n**Steps:**\r\n\r\n1.  Validates the LLM name.\r\n2.  Sets up image paths for `optimisewait` using `set_image_path`.\r\n3.  Opens the LLM's website in a new browser tab.\r\n4.  Waits and clicks the message input area using `optimiseWait('message', clicks=2)`.\r\n5.  If `imagedata` is provided:\r\n    -   Iterates through images.\r\n    -   Sets each image to the clipboard using `set_clipboard_image` (with retries).\r\n    -   Pastes the image (`Ctrl+V`).\r\n    -   Waits for potential upload (`sleep(7)`).\r\n6.  Sets the `prompt` text to the clipboard using `set_clipboard` (with retries).\r\n7.  Pastes the prompt (`Ctrl+V`).\r\n8.  Waits and clicks the 'run' button using `optimiseWait('run')`.\r\n9.  Waits for the response generation, using `optimiseWait('copy')` as an indicator that the response is ready and the copy button is visible.\r\n10. Waits briefly (`sleep(0.5)`) after `optimiseWait('copy')` clicks the copy button.\r\n11. Closes the browser tab (`Ctrl+W`).\r\n12. Switches focus back if `tabswitch` is `True` (`Alt+Tab`).\r\n13. Attempts to read the LLM's response from the clipboard with retry logic (`read_retries`, `read_delay`).\r\n14. Returns the retrieved text response, or an empty string if reading fails.\r\n\r\n## Helper Functions\r\n\r\n**Clipboard Handling:**\r\n\r\n-   `set_clipboard(text: str, retries: int = 5, delay: float = 0.2)`: Sets text to the clipboard, handling `CF_UNICODETEXT`. Retries on common access errors (`winerror 5` or `1418`).\r\n-   `set_clipboard_image(image_data: str, retries: int = 5, delay: float = 0.2)`: Sets a base64 encoded image to the clipboard (`CF_DIB` format). Decodes, converts to BMP, and retries on common access errors.\r\n\r\n**Image Path Management:**\r\n\r\n-   `set_image_path(llm: str, debug: bool = False)`: Orchestrates copying images.\r\n-   `copy_images_to_temp(llm: str, debug: bool = False)`: Copies necessary `.png` images for the specified `llm` from the package's `images/<llm>` directory to a temporary location (`%TEMP%\\\\talktollm_images\\\\<llm>`). Creates the temporary directory if needed and only copies if the source file is newer or the destination doesn't exist. Sets the `optimisewait` autopath. Includes error handling for missing package resources.\r\n\r\n## Installation\r\n\r\n```\r\npip install talktollm\r\n```\r\n\r\n*Note: Requires `optimisewait` for image recognition. Install separately if needed (`pip install optimisewait`).*\r\n\r\n## Usage\r\n\r\nHere are some examples of how to use `talktollm`.\r\n\r\n**Example 1: Simple Text Prompt**\r\n\r\nSend a basic text prompt to Gemini.\r\n\r\n```python\r\nimport talktollm\r\n\r\nprompt_text = \"Explain quantum entanglement in simple terms.\"\r\nresponse = talktollm.talkto('gemini', prompt_text)\r\nprint(\"--- Simple Gemini Response ---\")\r\nprint(response)\r\n```\r\n\r\n**Example 2: Text Prompt with Debugging**\r\n\r\nSend a text prompt and enable debugging output to see more details about the process.\r\n\r\n```python\r\nimport talktollm\r\n\r\nprompt_text = \"What are the main features of Python 3.12?\"\r\nresponse = talktollm.talkto('deepseek', prompt_text, debug=True)\r\nprint(\"--- DeepSeek Debug Response ---\")\r\nprint(response)\r\n```\r\n\r\n**Example 3: Preparing Image Data**\r\n\r\nLoad an image file, encode it in base64, and format it correctly for the `imagedata` argument.\r\n\r\n```python\r\nimport base64\r\nimport io\r\nfrom PIL import Image\r\n\r\n# Load your image (replace 'path/to/your/image.png' with the actual path)\r\ntry:\r\n    with open(\"path/to/your/image.png\", \"rb\") as image_file:\r\n        # Encode to base64\r\n        encoded_string = base64.b64encode(image_file.read()).decode('utf-8')\r\n        # Format as a data URI\r\n        image_data_uri = f\"data:image/png;base64,{encoded_string}\"\r\n        print(\"Image prepared successfully!\")\r\n        # You can now pass [image_data_uri] to the imagedata parameter\r\nexcept FileNotFoundError:\r\n    print(\"Error: Image file not found. Please check the path.\")\r\n    image_data_uri = None\r\nexcept Exception as e:\r\n    print(f\"Error processing image: {e}\")\r\n    image_data_uri = None\r\n\r\n# This 'image_data_uri' variable holds the string needed for the next example\r\n```\r\n\r\n**Example 4: Text and Image Prompt**\r\n\r\nSend a text prompt along with a prepared image to Gemini. (Assumes `image_data_uri` was successfully created in Example 3).\r\n\r\n```python\r\nimport talktollm\r\n\r\n# Assuming image_data_uri is available from the previous example\r\nif image_data_uri:\r\n    prompt_text = \"Describe the main subject of this image.\"\r\n    response = talktollm.talkto(\r\n        'gemini',\r\n        prompt_text,\r\n        imagedata=[image_data_uri], # Pass the image data as a list\r\n        debug=True\r\n    )\r\n    print(\"--- Gemini Image Response ---\")\r\n    print(response)\r\nelse:\r\n    print(\"Skipping image example because image data is not available.\")\r\n```\r\n\r\n## Dependencies\r\n\r\n-   `pywin32`: For Windows API access (clipboard).\r\n-   `pyautogui`: For GUI automation (keystrokes, potentially mouse if `optimisewait` fails).\r\n-   `Pillow`: For image processing (opening, converting for clipboard).\r\n-   `optimisewait` (Optional but Recommended): For robust image-based waiting and clicking.\r\n\r\n## Contributing\r\n\r\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\r\n\r\n## License\r\n\r\nMIT\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python utility for interacting with large language models (LLMs) via web automation",
    "version": "0.4.1",
    "project_urls": {
        "Homepage": "https://github.com/AMAMazing/talktollm"
    },
    "split_keywords": [
        "llm",
        " automation",
        " gui",
        " pyautogui",
        " gemini",
        " deepseek",
        " clipboard"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6a708eeb0fce5ec9b2a0ebe9b8e3a53a00979a1a03e5271f28cc37ca535769f2",
                "md5": "37010de9cc27fc8194d5b9b722500854",
                "sha256": "fd41ee31d76335b1e93eea1e04c7c22870371c7247ce4a3d5a5952b407f0b764"
            },
            "downloads": -1,
            "filename": "talktollm-0.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "37010de9cc27fc8194d5b9b722500854",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 20128,
            "upload_time": "2025-07-11T06:51:38",
            "upload_time_iso_8601": "2025-07-11T06:51:38.842926Z",
            "url": "https://files.pythonhosted.org/packages/6a/70/8eeb0fce5ec9b2a0ebe9b8e3a53a00979a1a03e5271f28cc37ca535769f2/talktollm-0.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "eb3be17d366302272acb296006169f22b759b2a6da8f66331d092831247cd551",
                "md5": "3acde7d3ebbc2f6d921ad6c4130b9dfa",
                "sha256": "eef4e0e11856f74b22a906d0f0001b4979c752f8eeb9cba247fb8e9fa3e213f4"
            },
            "downloads": -1,
            "filename": "talktollm-0.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "3acde7d3ebbc2f6d921ad6c4130b9dfa",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 22724,
            "upload_time": "2025-07-11T06:51:42",
            "upload_time_iso_8601": "2025-07-11T06:51:42.712842Z",
            "url": "https://files.pythonhosted.org/packages/eb/3b/e17d366302272acb296006169f22b759b2a6da8f66331d092831247cd551/talktollm-0.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-11 06:51:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AMAMazing",
    "github_project": "talktollm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "talktollm"
}
        
Elapsed time: 0.41276s