talktollm

Name	talktollm JSON
Version	0.4.8 JSON
	download
home_page	https://github.com/AMAMazing/talktollm
Summary	A Python utility for interacting with large language models (LLMs) via web automation
upload_time	2025-07-24 13:04:30
maintainer	None
docs_url	None
author	Alex M
requires_python	>=3.6
license	None
keywords	llm automation gui pyautogui gemini deepseek clipboard aistudio
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # talktollm

[![PyPI version](https://badge.fury.io/py/talktollm.svg)](https://badge.fury.io/py/talktollm)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A Python utility for interacting with large language models (LLMs) through browser automation. It leverages image recognition to automate interactions with LLM web interfaces, enabling seamless conversations and task execution.

## Features

-   **Simple Interface:** Provides a single, intuitive function for interacting with LLMs.
-   **Automated Image Recognition:** Employs image recognition (`optimisewait`) to identify and interact with elements on the LLM interface.
-   **Multi-LLM Support:** Supports DeepSeek, Gemini, and Google AI Studio.
-   **Automated Conversations:** Facilitates automated conversations and task execution by simulating user interactions.
-   **Image Support:** Allows sending one or more images (as base64 data URIs) to the LLM.
-   **Robust Clipboard Handling:** Includes retry mechanisms for setting and getting clipboard data, handling common access errors and timing issues.
-   **Self-Healing Image Cache:** Creates a clean, temporary image cache for each run, preventing issues from stale or corrupted recognition assets.
-   **Easy to use:** Designed for simple setup and usage.

## Core Functionality

The core function is `talkto(llm, prompt, imagedata=None, debug=False, tabswitch=True)`.

**Arguments:**

-   `llm` (str): The LLM name ('deepseek', 'gemini', or 'aistudio').
-   `prompt` (str): The text prompt to send.
-   `imagedata` (list[str] | None): Optional list of base64 encoded image data URIs (e.g., "data:image/png;base64,...").
-   `debug` (bool): Enable detailed console output. Defaults to `False`.
-   `tabswitch` (bool): Switch focus back to the previous window after closing the LLM tab. Defaults to `True`.

**Steps:**

1.  Validates the LLM name.
2.  Ensures a clean temporary image cache is ready for `optimisewait`.
3.  Opens the LLM's website in a new browser tab.
4.  Waits for and clicks the message input area.
5.  If `imagedata` is provided, it pastes each image into the input area.
6.  Pastes the `prompt` text.
7.  Clicks the 'run' or 'send' button.
8.  Sets a placeholder value on the clipboard.
9.  Waits for the 'copy' button to appear (indicating the response is ready) and clicks it.
10. Polls the clipboard until its content changes from the placeholder value.
11. Closes the browser tab (`Ctrl+W`).
12. Switches focus back if `tabswitch` is `True` (`Alt+Tab`).
13. Returns the retrieved text response, or an empty string if the process times out.

## Helper Functions

**Clipboard Handling:**

-   `set_clipboard(text: str, retries: int = 5, delay: float = 0.2)`: Sets text to the clipboard. Retries on common access errors.
-   `set_clipboard_image(image_data: str, retries: int = 5, delay: float = 0.2)`: Sets a base64 encoded image to the clipboard. Retries on common access errors.
-   `_get_clipboard_content(...)`: Internal helper to read text from the clipboard with retry logic.

**Image Path Management:**

-   `copy_images_to_temp(llm: str, debug: bool = False)`: **Deletes and recreates** the LLM-specific temporary image folder to ensure a clean state. Copies necessary `.png` images from the package's internal `images/` directory to the temporary location.

## Installation

```
pip install talktollm
```

*Note: Requires `optimisewait` for image recognition. Install separately if needed (`pip install optimisewait`).*

## Usage

Here are some examples of how to use `talktollm`.

**Example 1: Simple Text Prompt**

Send a basic text prompt to Gemini.

```python
import talktollm

prompt_text = "Explain quantum entanglement in simple terms."
response = talktollm.talkto('gemini', prompt_text)
print("--- Simple Gemini Response ---")
print(response)
```

**Example 2: Text Prompt with Debugging**

Send a text prompt to AI Studio and enable debugging output.

```python
import talktollm

prompt_text = "What are the main features of Python 3.12?"
response = talktollm.talkto('aistudio', prompt_text, debug=True)
print("--- AI Studio Debug Response ---")
print(response)
```

**Example 3: Preparing Image Data**

Load an image file, encode it in base64, and format it correctly for the `imagedata` argument.

```python
import base64

# Load your image (replace 'path/to/your/image.png' with the actual path)
try:
    with open("path/to/your/image.png", "rb") as image_file:
        # Encode to base64
        encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
        # Format as a data URI
        image_data_uri = f"data:image/png;base64,{encoded_string}"
        print("Image prepared successfully!")
except FileNotFoundError:
    print("Error: Image file not found. Please check the path.")
    image_data_uri = None

# This 'image_data_uri' variable holds the string needed for the next example
```

**Example 4: Text and Image Prompt**

Send a text prompt along with a prepared image to DeepSeek. (Assumes `image_data_uri` was successfully created in Example 3).

```python
import talktollm

# Assuming image_data_uri is available from the previous example
if image_data_uri:
    prompt_text = "Describe the main subject of this image."
    response = talktollm.talkto(
        'deepseek',
        prompt_text,
        imagedata=[image_data_uri], # Pass the image data as a list
        debug=True
    )
    print("--- DeepSeek Image Response ---")
    print(response)
else:
    print("Skipping image example because image data is not available.")
```

## Dependencies

-   `pywin32`: For Windows API access (clipboard).
-   `pyautogui`: For GUI automation (keystrokes).
-   `Pillow`: For image processing.
-   `optimisewait` (Recommended): For robust image-based waiting and clicking.

## Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

## License

MIT

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AMAMazing/talktollm",
    "name": "talktollm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "llm, automation, gui, pyautogui, gemini, deepseek, clipboard, aistudio",
    "author": "Alex M",
    "author_email": "alexmalone489@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/9f/03/6067b9fbbcb0a6136e9d72c544ac1d2c9f261bc02d2fe87d8c714d00f5fa/talktollm-0.4.8.tar.gz",
    "platform": null,
    "description": "# talktollm\r\n\r\n[![PyPI version](https://badge.fury.io/py/talktollm.svg)](https://badge.fury.io/py/talktollm)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n\r\nA Python utility for interacting with large language models (LLMs) through browser automation. It leverages image recognition to automate interactions with LLM web interfaces, enabling seamless conversations and task execution.\r\n\r\n## Features\r\n\r\n-   **Simple Interface:** Provides a single, intuitive function for interacting with LLMs.\r\n-   **Automated Image Recognition:** Employs image recognition (`optimisewait`) to identify and interact with elements on the LLM interface.\r\n-   **Multi-LLM Support:** Supports DeepSeek, Gemini, and Google AI Studio.\r\n-   **Automated Conversations:** Facilitates automated conversations and task execution by simulating user interactions.\r\n-   **Image Support:** Allows sending one or more images (as base64 data URIs) to the LLM.\r\n-   **Robust Clipboard Handling:** Includes retry mechanisms for setting and getting clipboard data, handling common access errors and timing issues.\r\n-   **Self-Healing Image Cache:** Creates a clean, temporary image cache for each run, preventing issues from stale or corrupted recognition assets.\r\n-   **Easy to use:** Designed for simple setup and usage.\r\n\r\n## Core Functionality\r\n\r\nThe core function is `talkto(llm, prompt, imagedata=None, debug=False, tabswitch=True)`.\r\n\r\n**Arguments:**\r\n\r\n-   `llm` (str): The LLM name ('deepseek', 'gemini', or 'aistudio').\r\n-   `prompt` (str): The text prompt to send.\r\n-   `imagedata` (list[str] | None): Optional list of base64 encoded image data URIs (e.g., \"data:image/png;base64,...\").\r\n-   `debug` (bool): Enable detailed console output. Defaults to `False`.\r\n-   `tabswitch` (bool): Switch focus back to the previous window after closing the LLM tab. Defaults to `True`.\r\n\r\n**Steps:**\r\n\r\n1.  Validates the LLM name.\r\n2.  Ensures a clean temporary image cache is ready for `optimisewait`.\r\n3.  Opens the LLM's website in a new browser tab.\r\n4.  Waits for and clicks the message input area.\r\n5.  If `imagedata` is provided, it pastes each image into the input area.\r\n6.  Pastes the `prompt` text.\r\n7.  Clicks the 'run' or 'send' button.\r\n8.  Sets a placeholder value on the clipboard.\r\n9.  Waits for the 'copy' button to appear (indicating the response is ready) and clicks it.\r\n10. Polls the clipboard until its content changes from the placeholder value.\r\n11. Closes the browser tab (`Ctrl+W`).\r\n12. Switches focus back if `tabswitch` is `True` (`Alt+Tab`).\r\n13. Returns the retrieved text response, or an empty string if the process times out.\r\n\r\n## Helper Functions\r\n\r\n**Clipboard Handling:**\r\n\r\n-   `set_clipboard(text: str, retries: int = 5, delay: float = 0.2)`: Sets text to the clipboard. Retries on common access errors.\r\n-   `set_clipboard_image(image_data: str, retries: int = 5, delay: float = 0.2)`: Sets a base64 encoded image to the clipboard. Retries on common access errors.\r\n-   `_get_clipboard_content(...)`: Internal helper to read text from the clipboard with retry logic.\r\n\r\n**Image Path Management:**\r\n\r\n-   `copy_images_to_temp(llm: str, debug: bool = False)`: **Deletes and recreates** the LLM-specific temporary image folder to ensure a clean state. Copies necessary `.png` images from the package's internal `images/` directory to the temporary location.\r\n\r\n## Installation\r\n\r\n```\r\npip install talktollm\r\n```\r\n\r\n*Note: Requires `optimisewait` for image recognition. Install separately if needed (`pip install optimisewait`).*\r\n\r\n## Usage\r\n\r\nHere are some examples of how to use `talktollm`.\r\n\r\n**Example 1: Simple Text Prompt**\r\n\r\nSend a basic text prompt to Gemini.\r\n\r\n```python\r\nimport talktollm\r\n\r\nprompt_text = \"Explain quantum entanglement in simple terms.\"\r\nresponse = talktollm.talkto('gemini', prompt_text)\r\nprint(\"--- Simple Gemini Response ---\")\r\nprint(response)\r\n```\r\n\r\n**Example 2: Text Prompt with Debugging**\r\n\r\nSend a text prompt to AI Studio and enable debugging output.\r\n\r\n```python\r\nimport talktollm\r\n\r\nprompt_text = \"What are the main features of Python 3.12?\"\r\nresponse = talktollm.talkto('aistudio', prompt_text, debug=True)\r\nprint(\"--- AI Studio Debug Response ---\")\r\nprint(response)\r\n```\r\n\r\n**Example 3: Preparing Image Data**\r\n\r\nLoad an image file, encode it in base64, and format it correctly for the `imagedata` argument.\r\n\r\n```python\r\nimport base64\r\n\r\n# Load your image (replace 'path/to/your/image.png' with the actual path)\r\ntry:\r\n    with open(\"path/to/your/image.png\", \"rb\") as image_file:\r\n        # Encode to base64\r\n        encoded_string = base64.b64encode(image_file.read()).decode('utf-8')\r\n        # Format as a data URI\r\n        image_data_uri = f\"data:image/png;base64,{encoded_string}\"\r\n        print(\"Image prepared successfully!\")\r\nexcept FileNotFoundError:\r\n    print(\"Error: Image file not found. Please check the path.\")\r\n    image_data_uri = None\r\n\r\n# This 'image_data_uri' variable holds the string needed for the next example\r\n```\r\n\r\n**Example 4: Text and Image Prompt**\r\n\r\nSend a text prompt along with a prepared image to DeepSeek. (Assumes `image_data_uri` was successfully created in Example 3).\r\n\r\n```python\r\nimport talktollm\r\n\r\n# Assuming image_data_uri is available from the previous example\r\nif image_data_uri:\r\n    prompt_text = \"Describe the main subject of this image.\"\r\n    response = talktollm.talkto(\r\n        'deepseek',\r\n        prompt_text,\r\n        imagedata=[image_data_uri], # Pass the image data as a list\r\n        debug=True\r\n    )\r\n    print(\"--- DeepSeek Image Response ---\")\r\n    print(response)\r\nelse:\r\n    print(\"Skipping image example because image data is not available.\")\r\n```\r\n\r\n## Dependencies\r\n\r\n-   `pywin32`: For Windows API access (clipboard).\r\n-   `pyautogui`: For GUI automation (keystrokes).\r\n-   `Pillow`: For image processing.\r\n-   `optimisewait` (Recommended): For robust image-based waiting and clicking.\r\n\r\n## Contributing\r\n\r\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\r\n\r\n## License\r\n\r\nMIT\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python utility for interacting with large language models (LLMs) via web automation",
    "version": "0.4.8",
    "project_urls": {
        "Homepage": "https://github.com/AMAMazing/talktollm"
    },
    "split_keywords": [
        "llm",
        " automation",
        " gui",
        " pyautogui",
        " gemini",
        " deepseek",
        " clipboard",
        " aistudio"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ceecc27493c7e4098866b07454415f0e85334ca8c77a4ec38d2fa70bfb31a7ef",
                "md5": "ffb136fb1765267108d5d8a827610a51",
                "sha256": "28cf9228af7c5511cac395bf8c48ce3561dc1f31a86d0637c3a2fe5cb6cf3f75"
            },
            "downloads": -1,
            "filename": "talktollm-0.4.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ffb136fb1765267108d5d8a827610a51",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 44727,
            "upload_time": "2025-07-24T13:04:25",
            "upload_time_iso_8601": "2025-07-24T13:04:25.516514Z",
            "url": "https://files.pythonhosted.org/packages/ce/ec/c27493c7e4098866b07454415f0e85334ca8c77a4ec38d2fa70bfb31a7ef/talktollm-0.4.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9f036067b9fbbcb0a6136e9d72c544ac1d2c9f261bc02d2fe87d8c714d00f5fa",
                "md5": "4551c11ac06fe8df74478707b2795a08",
                "sha256": "950425def60d03ed1e85fc3cf7b55d6f6dfc65859fa4e3dd8f87f0a79a18b2ad"
            },
            "downloads": -1,
            "filename": "talktollm-0.4.8.tar.gz",
            "has_sig": false,
            "md5_digest": "4551c11ac06fe8df74478707b2795a08",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 44379,
            "upload_time": "2025-07-24T13:04:30",
            "upload_time_iso_8601": "2025-07-24T13:04:30.630732Z",
            "url": "https://files.pythonhosted.org/packages/9f/03/6067b9fbbcb0a6136e9d72c544ac1d2c9f261bc02d2fe87d8c714d00f5fa/talktollm-0.4.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-24 13:04:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AMAMazing",
    "github_project": "talktollm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "talktollm"
}

Alex M