# talktollm
[](https://badge.fury.io/py/talktollm)
[](https://opensource.org/licenses/MIT)
A Python utility for interacting with large language models (LLMs) through a command-line interface. It leverages image recognition to automate interactions with LLM web interfaces, enabling seamless conversations and task execution.
## Features
- **Command-Line Interaction:** Provides a simple and intuitive command-line interface for interacting with LLMs.
- **Automated Image Recognition:** Employs image recognition techniques (via `optimisewait`) to identify and interact with elements on the LLM interface. Includes fallback if `optimisewait` is not installed.
- **Multi-LLM Support:** Currently supports DeepSeek and Gemini.
- **Automated Conversations:** Facilitates automated conversations and task execution by simulating user interactions.
- **Image Support:** Allows sending images (base64 encoded) to the LLM.
- **Robust Clipboard Handling:** Includes configurable retry mechanisms (default 5 retries) for setting text/images to the clipboard and reading text from the clipboard to handle access errors and timing issues.
- **Dynamic Image Path Management:** Copies necessary recognition images to a temporary directory, ensuring they are accessible and up-to-date.
- **Easy to use:** Designed for simple setup and usage.
## Core Functionality
The core function is `talkto(llm, prompt, imagedata=None, debug=False, tabswitch=True, read_retries=5, read_delay=0.3)`.
**Arguments:**
- `llm` (str): The LLM name ('deepseek','gemini' or 'aistudio').
- `prompt` (str): The text prompt.
- `imagedata` (list[str] | None): Optional list of base64 encoded image strings (e.g., "data:image/png;base64,...").
- `debug` (bool): Enable detailed console output. Defaults to `False`.
- `tabswitch` (bool): Switch focus back to the previous window after closing the LLM tab. Defaults to `True`.
- `read_retries` (int): Number of attempts to read the final response from the clipboard. Defaults to 5.
- `read_delay` (float): Delay in seconds between clipboard read attempts. Defaults to 0.3.
**Steps:**
1. Validates the LLM name.
2. Sets up image paths for `optimisewait` using `set_image_path`.
3. Opens the LLM's website in a new browser tab.
4. Waits and clicks the message input area using `optimiseWait('message', clicks=2)`.
5. If `imagedata` is provided:
- Iterates through images.
- Sets each image to the clipboard using `set_clipboard_image` (with retries).
- Pastes the image (`Ctrl+V`).
- Waits for potential upload (`sleep(7)`).
6. Sets the `prompt` text to the clipboard using `set_clipboard` (with retries).
7. Pastes the prompt (`Ctrl+V`).
8. Waits and clicks the 'run' button using `optimiseWait('run')`.
9. Waits for the response generation, using `optimiseWait('copy')` as an indicator that the response is ready and the copy button is visible.
10. Waits briefly (`sleep(0.5)`) after `optimiseWait('copy')` clicks the copy button.
11. Closes the browser tab (`Ctrl+W`).
12. Switches focus back if `tabswitch` is `True` (`Alt+Tab`).
13. Attempts to read the LLM's response from the clipboard with retry logic (`read_retries`, `read_delay`).
14. Returns the retrieved text response, or an empty string if reading fails.
## Helper Functions
**Clipboard Handling:**
- `set_clipboard(text: str, retries: int = 5, delay: float = 0.2)`: Sets text to the clipboard, handling `CF_UNICODETEXT`. Retries on common access errors (`winerror 5` or `1418`).
- `set_clipboard_image(image_data: str, retries: int = 5, delay: float = 0.2)`: Sets a base64 encoded image to the clipboard (`CF_DIB` format). Decodes, converts to BMP, and retries on common access errors.
**Image Path Management:**
- `set_image_path(llm: str, debug: bool = False)`: Orchestrates copying images.
- `copy_images_to_temp(llm: str, debug: bool = False)`: Copies necessary `.png` images for the specified `llm` from the package's `images/<llm>` directory to a temporary location (`%TEMP%\\talktollm_images\\<llm>`). Creates the temporary directory if needed and only copies if the source file is newer or the destination doesn't exist. Sets the `optimisewait` autopath. Includes error handling for missing package resources.
## Installation
```
pip install talktollm
```
*Note: Requires `optimisewait` for image recognition. Install separately if needed (`pip install optimisewait`).*
## Usage
Here are some examples of how to use `talktollm`.
**Example 1: Simple Text Prompt**
Send a basic text prompt to Gemini.
```python
import talktollm
prompt_text = "Explain quantum entanglement in simple terms."
response = talktollm.talkto('gemini', prompt_text)
print("--- Simple Gemini Response ---")
print(response)
```
**Example 2: Text Prompt with Debugging**
Send a text prompt and enable debugging output to see more details about the process.
```python
import talktollm
prompt_text = "What are the main features of Python 3.12?"
response = talktollm.talkto('deepseek', prompt_text, debug=True)
print("--- DeepSeek Debug Response ---")
print(response)
```
**Example 3: Preparing Image Data**
Load an image file, encode it in base64, and format it correctly for the `imagedata` argument.
```python
import base64
import io
from PIL import Image
# Load your image (replace 'path/to/your/image.png' with the actual path)
try:
with open("path/to/your/image.png", "rb") as image_file:
# Encode to base64
encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
# Format as a data URI
image_data_uri = f"data:image/png;base64,{encoded_string}"
print("Image prepared successfully!")
# You can now pass [image_data_uri] to the imagedata parameter
except FileNotFoundError:
print("Error: Image file not found. Please check the path.")
image_data_uri = None
except Exception as e:
print(f"Error processing image: {e}")
image_data_uri = None
# This 'image_data_uri' variable holds the string needed for the next example
```
**Example 4: Text and Image Prompt**
Send a text prompt along with a prepared image to Gemini. (Assumes `image_data_uri` was successfully created in Example 3).
```python
import talktollm
# Assuming image_data_uri is available from the previous example
if image_data_uri:
prompt_text = "Describe the main subject of this image."
response = talktollm.talkto(
'gemini',
prompt_text,
imagedata=[image_data_uri], # Pass the image data as a list
debug=True
)
print("--- Gemini Image Response ---")
print(response)
else:
print("Skipping image example because image data is not available.")
```
## Dependencies
- `pywin32`: For Windows API access (clipboard).
- `pyautogui`: For GUI automation (keystrokes, potentially mouse if `optimisewait` fails).
- `Pillow`: For image processing (opening, converting for clipboard).
- `optimisewait` (Optional but Recommended): For robust image-based waiting and clicking.
## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
## License
MIT
Raw data
{
"_id": null,
"home_page": "https://github.com/AMAMazing/talktollm",
"name": "talktollm",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "llm, automation, gui, pyautogui, gemini, deepseek, clipboard",
"author": "Alex M",
"author_email": "alexmalone489@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/eb/3b/e17d366302272acb296006169f22b759b2a6da8f66331d092831247cd551/talktollm-0.4.1.tar.gz",
"platform": null,
"description": "# talktollm\r\n\r\n[](https://badge.fury.io/py/talktollm)\r\n[](https://opensource.org/licenses/MIT)\r\n\r\nA Python utility for interacting with large language models (LLMs) through a command-line interface. It leverages image recognition to automate interactions with LLM web interfaces, enabling seamless conversations and task execution.\r\n\r\n## Features\r\n\r\n- **Command-Line Interaction:** Provides a simple and intuitive command-line interface for interacting with LLMs.\r\n- **Automated Image Recognition:** Employs image recognition techniques (via `optimisewait`) to identify and interact with elements on the LLM interface. Includes fallback if `optimisewait` is not installed.\r\n- **Multi-LLM Support:** Currently supports DeepSeek and Gemini.\r\n- **Automated Conversations:** Facilitates automated conversations and task execution by simulating user interactions.\r\n- **Image Support:** Allows sending images (base64 encoded) to the LLM.\r\n- **Robust Clipboard Handling:** Includes configurable retry mechanisms (default 5 retries) for setting text/images to the clipboard and reading text from the clipboard to handle access errors and timing issues.\r\n- **Dynamic Image Path Management:** Copies necessary recognition images to a temporary directory, ensuring they are accessible and up-to-date.\r\n- **Easy to use:** Designed for simple setup and usage.\r\n\r\n## Core Functionality\r\n\r\nThe core function is `talkto(llm, prompt, imagedata=None, debug=False, tabswitch=True, read_retries=5, read_delay=0.3)`.\r\n\r\n**Arguments:**\r\n\r\n- `llm` (str): The LLM name ('deepseek','gemini' or 'aistudio').\r\n- `prompt` (str): The text prompt.\r\n- `imagedata` (list[str] | None): Optional list of base64 encoded image strings (e.g., \"data:image/png;base64,...\").\r\n- `debug` (bool): Enable detailed console output. Defaults to `False`.\r\n- `tabswitch` (bool): Switch focus back to the previous window after closing the LLM tab. Defaults to `True`.\r\n- `read_retries` (int): Number of attempts to read the final response from the clipboard. Defaults to 5.\r\n- `read_delay` (float): Delay in seconds between clipboard read attempts. Defaults to 0.3.\r\n\r\n**Steps:**\r\n\r\n1. Validates the LLM name.\r\n2. Sets up image paths for `optimisewait` using `set_image_path`.\r\n3. Opens the LLM's website in a new browser tab.\r\n4. Waits and clicks the message input area using `optimiseWait('message', clicks=2)`.\r\n5. If `imagedata` is provided:\r\n - Iterates through images.\r\n - Sets each image to the clipboard using `set_clipboard_image` (with retries).\r\n - Pastes the image (`Ctrl+V`).\r\n - Waits for potential upload (`sleep(7)`).\r\n6. Sets the `prompt` text to the clipboard using `set_clipboard` (with retries).\r\n7. Pastes the prompt (`Ctrl+V`).\r\n8. Waits and clicks the 'run' button using `optimiseWait('run')`.\r\n9. Waits for the response generation, using `optimiseWait('copy')` as an indicator that the response is ready and the copy button is visible.\r\n10. Waits briefly (`sleep(0.5)`) after `optimiseWait('copy')` clicks the copy button.\r\n11. Closes the browser tab (`Ctrl+W`).\r\n12. Switches focus back if `tabswitch` is `True` (`Alt+Tab`).\r\n13. Attempts to read the LLM's response from the clipboard with retry logic (`read_retries`, `read_delay`).\r\n14. Returns the retrieved text response, or an empty string if reading fails.\r\n\r\n## Helper Functions\r\n\r\n**Clipboard Handling:**\r\n\r\n- `set_clipboard(text: str, retries: int = 5, delay: float = 0.2)`: Sets text to the clipboard, handling `CF_UNICODETEXT`. Retries on common access errors (`winerror 5` or `1418`).\r\n- `set_clipboard_image(image_data: str, retries: int = 5, delay: float = 0.2)`: Sets a base64 encoded image to the clipboard (`CF_DIB` format). Decodes, converts to BMP, and retries on common access errors.\r\n\r\n**Image Path Management:**\r\n\r\n- `set_image_path(llm: str, debug: bool = False)`: Orchestrates copying images.\r\n- `copy_images_to_temp(llm: str, debug: bool = False)`: Copies necessary `.png` images for the specified `llm` from the package's `images/<llm>` directory to a temporary location (`%TEMP%\\\\talktollm_images\\\\<llm>`). Creates the temporary directory if needed and only copies if the source file is newer or the destination doesn't exist. Sets the `optimisewait` autopath. Includes error handling for missing package resources.\r\n\r\n## Installation\r\n\r\n```\r\npip install talktollm\r\n```\r\n\r\n*Note: Requires `optimisewait` for image recognition. Install separately if needed (`pip install optimisewait`).*\r\n\r\n## Usage\r\n\r\nHere are some examples of how to use `talktollm`.\r\n\r\n**Example 1: Simple Text Prompt**\r\n\r\nSend a basic text prompt to Gemini.\r\n\r\n```python\r\nimport talktollm\r\n\r\nprompt_text = \"Explain quantum entanglement in simple terms.\"\r\nresponse = talktollm.talkto('gemini', prompt_text)\r\nprint(\"--- Simple Gemini Response ---\")\r\nprint(response)\r\n```\r\n\r\n**Example 2: Text Prompt with Debugging**\r\n\r\nSend a text prompt and enable debugging output to see more details about the process.\r\n\r\n```python\r\nimport talktollm\r\n\r\nprompt_text = \"What are the main features of Python 3.12?\"\r\nresponse = talktollm.talkto('deepseek', prompt_text, debug=True)\r\nprint(\"--- DeepSeek Debug Response ---\")\r\nprint(response)\r\n```\r\n\r\n**Example 3: Preparing Image Data**\r\n\r\nLoad an image file, encode it in base64, and format it correctly for the `imagedata` argument.\r\n\r\n```python\r\nimport base64\r\nimport io\r\nfrom PIL import Image\r\n\r\n# Load your image (replace 'path/to/your/image.png' with the actual path)\r\ntry:\r\n with open(\"path/to/your/image.png\", \"rb\") as image_file:\r\n # Encode to base64\r\n encoded_string = base64.b64encode(image_file.read()).decode('utf-8')\r\n # Format as a data URI\r\n image_data_uri = f\"data:image/png;base64,{encoded_string}\"\r\n print(\"Image prepared successfully!\")\r\n # You can now pass [image_data_uri] to the imagedata parameter\r\nexcept FileNotFoundError:\r\n print(\"Error: Image file not found. Please check the path.\")\r\n image_data_uri = None\r\nexcept Exception as e:\r\n print(f\"Error processing image: {e}\")\r\n image_data_uri = None\r\n\r\n# This 'image_data_uri' variable holds the string needed for the next example\r\n```\r\n\r\n**Example 4: Text and Image Prompt**\r\n\r\nSend a text prompt along with a prepared image to Gemini. (Assumes `image_data_uri` was successfully created in Example 3).\r\n\r\n```python\r\nimport talktollm\r\n\r\n# Assuming image_data_uri is available from the previous example\r\nif image_data_uri:\r\n prompt_text = \"Describe the main subject of this image.\"\r\n response = talktollm.talkto(\r\n 'gemini',\r\n prompt_text,\r\n imagedata=[image_data_uri], # Pass the image data as a list\r\n debug=True\r\n )\r\n print(\"--- Gemini Image Response ---\")\r\n print(response)\r\nelse:\r\n print(\"Skipping image example because image data is not available.\")\r\n```\r\n\r\n## Dependencies\r\n\r\n- `pywin32`: For Windows API access (clipboard).\r\n- `pyautogui`: For GUI automation (keystrokes, potentially mouse if `optimisewait` fails).\r\n- `Pillow`: For image processing (opening, converting for clipboard).\r\n- `optimisewait` (Optional but Recommended): For robust image-based waiting and clicking.\r\n\r\n## Contributing\r\n\r\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\r\n\r\n## License\r\n\r\nMIT\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python utility for interacting with large language models (LLMs) via web automation",
"version": "0.4.1",
"project_urls": {
"Homepage": "https://github.com/AMAMazing/talktollm"
},
"split_keywords": [
"llm",
" automation",
" gui",
" pyautogui",
" gemini",
" deepseek",
" clipboard"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "6a708eeb0fce5ec9b2a0ebe9b8e3a53a00979a1a03e5271f28cc37ca535769f2",
"md5": "37010de9cc27fc8194d5b9b722500854",
"sha256": "fd41ee31d76335b1e93eea1e04c7c22870371c7247ce4a3d5a5952b407f0b764"
},
"downloads": -1,
"filename": "talktollm-0.4.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "37010de9cc27fc8194d5b9b722500854",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 20128,
"upload_time": "2025-07-11T06:51:38",
"upload_time_iso_8601": "2025-07-11T06:51:38.842926Z",
"url": "https://files.pythonhosted.org/packages/6a/70/8eeb0fce5ec9b2a0ebe9b8e3a53a00979a1a03e5271f28cc37ca535769f2/talktollm-0.4.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "eb3be17d366302272acb296006169f22b759b2a6da8f66331d092831247cd551",
"md5": "3acde7d3ebbc2f6d921ad6c4130b9dfa",
"sha256": "eef4e0e11856f74b22a906d0f0001b4979c752f8eeb9cba247fb8e9fa3e213f4"
},
"downloads": -1,
"filename": "talktollm-0.4.1.tar.gz",
"has_sig": false,
"md5_digest": "3acde7d3ebbc2f6d921ad6c4130b9dfa",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 22724,
"upload_time": "2025-07-11T06:51:42",
"upload_time_iso_8601": "2025-07-11T06:51:42.712842Z",
"url": "https://files.pythonhosted.org/packages/eb/3b/e17d366302272acb296006169f22b759b2a6da8f66331d092831247cd551/talktollm-0.4.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-11 06:51:42",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AMAMazing",
"github_project": "talktollm",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "talktollm"
}