# MimeFilesReader
Reads and processes various MIME type files (images, PDFs, audio, etc.) and YouTube videos using Google's Gemini models capable of multi-modal understanding. Provides both a command-line interface (CLI) and a Model Context Protocol (MCP) server interface.
## Features
* Leverages Google's Generative AI (Gemini) for understanding content within various file types and YouTube videos.
* Supports common MIME types like images (PNG, JPEG), PDFs, audio (MP3, WAV - depending on model support), etc.
* **NEW: YouTube URL Support** - Analyze YouTube videos directly by providing YouTube URLs (youtube.com/watch, youtu.be, youtube.com/embed formats).
* Provides a simple CLI for quick queries about local files and YouTube videos.
* Offers an MCP server interface for integration with AI Agent frameworks (LangChain, LlamaIndex, AutoGen, custom agents).
* Handles file uploads to the Google AI backend and optional automatic cleanup.
* Seamlessly mix local files and YouTube URLs in the same query for comprehensive analysis.
## Example use cases
### Use case 1: PDF and Document Analysis
With Gemini's OCR capability, agents can use the tool to fetch info from MIME contents such as PDF, images and audio etc.

```bash
# Try with
mime-reader \
-m "gemini-2.5-pro" \
-q "Get the full text out of the PDF paper" \
-f tests/test_data/1916_The_Foundation_of_the_General_Theory_of_Relativity_Einstein_part1.pdf
```
### Use case 2
Reading images can be helpful to let agents work with headless browsers, for debugging front end code or for web browsing.

```
# Try with
mime-reader \
-m "gemini-2.5-pro" \
-q "The picture is a screenshot of headless browser result. Describe in details of what is on it. Highlight any broken part or incorrect content" \
-f tests/test_data/test_screenshot.png
### Use case 3: YouTube Video Analysis
Analyze YouTube videos directly by providing YouTube URLs. Supports multiple URL formats including youtube.com/watch, youtu.be, and youtube.com/embed.
```bash
# Analyze a single YouTube video
mime-reader \
-m "gemini-2.5-pro" \
-q "Summarize the main points covered in this educational video" \
-f "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
# Using youtu.be short format
mime-reader \
-m "gemini-2.5-pro" \
-q "What programming concepts are explained in this tutorial?" \
-f "https://youtu.be/dQw4w9WgXcQ"
```
### Use case 4: Mixed Analysis (Local Files + YouTube URLs)
Seamlessly combine local files and YouTube URLs in the same analysis for comprehensive insights.
```bash
# Compare a PDF document with a YouTube video on the same topic
mime-reader \
-m "gemini-2.5-pro" \
-q "Compare the information in this PDF with the content of the YouTube video. What are the similarities and differences?" \
-f "research_paper.pdf" \
-f "https://www.youtube.com/watch?v=educational_video_id"
# Analyze multiple YouTube videos together
mime-reader \
-m "gemini-2.5-pro" \
-q "What are the common themes across these educational videos?" \
-f "https://www.youtube.com/watch?v=video1" \
-f "https://youtu.be/video2" \
-f "https://www.youtube.com/embed/video3"
```
## Installation
### For End Users (Standard Installation)
Install the package directly from PyPI with only runtime dependencies:
```bash
# Create and activate a virtual environment (Recommended)
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
# Install the package
pip install mime-files-reader
```
### For Developers (Development Installation)
If you're contributing to the project or need testing capabilities:
```bash
# Clone the repository
git clone <repository-url>
cd mime-files-reader
# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
# Install with testing dependencies
pip install -e .[test]
# Or install from requirements files
pip install -r requirements-dev.txt
```
### Requirements Files
- `requirements.txt`: Contains only runtime dependencies needed for normal package usage
- `requirements-dev.txt`: Contains both runtime and testing dependencies for development
### Environment Configuration
You need a Google Generative AI API key. Create a `.env` file in the project root or export the variable:
```dotenv
# .env
GEMINI_API_KEY="YOUR_GOOGLE_AI_API_KEY"
# Optional: Specify a model, defaults to gemini-2.0-flash
# GEMINI_MODEL_NAME="gemini-2.0-flash"
```
For complex tasks, consider using `gemini-2.5-pro`, though the quota is currently low.
## Command-Line Interface (CLI) Usage
The CLI provides a direct way to ask questions about local files and YouTube videos.
### Local File Examples
```bash
mime-reader -q "Describe the main subject of this image." -f /path/to/your/image.jpg
```
```bash
mime-reader --question "Summarize the first page of this document." --files /path/to/document.pdf --output summary.txt
```
### YouTube Video Examples
```bash
# Analyze a YouTube video using youtube.com/watch format
mime-reader -q "What are the key points discussed in this video?" -f "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
```
```bash
# Analyze a YouTube video using youtu.be format
mime-reader -q "Summarize this educational content." -f "https://youtu.be/dQw4w9WgXcQ" --output video_summary.txt
```
### Mixed Analysis Examples
```bash
# Combine local files and YouTube URLs in one analysis
mime-reader -q "Compare the content in this PDF with the YouTube video explanation." \
-f "research_paper.pdf" \
-f "https://www.youtube.com/watch?v=educational_video_id"
```
**Options:**
* `-q`, `--question`: (Required) The question to ask the AI model.
* `-f`, `--files`: (Required) One or more file paths or YouTube URLs. Use the option multiple times for multiple inputs (`-f file1.pdf -f image.png -f "https://youtu.be/video_id"`).
* `-o`, `--output`: (Optional) Path to save the text response to a file.
* `--model`: (Optional) Specify the Google AI model name (overrides `GEMINI_MODEL_NAME` env var).
* `--no-cleanup`: (Optional) Prevent automatic deletion of files uploaded to Google AI backend (YouTube URLs don't require cleanup).
*(Note: Ensure your original `mime_files_reader/cli.py` and the corresponding entry point in `setup.py` are correctly configured for this.)*
## MCP Server Interface
This package now includes an MCP server, allowing AI agents or other applications to use the multi-modal file reading capabilities as a tool.
**Prerequisites:**
* The environment where the server runs **must** have the `GEMINI_API_KEY` environment variable set.
* Optionally, set `GEMINI_MODEL_NAME` to use a specific model (defaults to `gemini-1.5-flash-latest`).
**Running the Server:**
After installing the package (`pip install -e .`), you can run the server using the registered console script:
```bash
# Make sure GEMINI_API_KEY is set in this shell
export GEMINI_API_KEY="YOUR_GOOGLE_AI_API_KEY"
# export GEMINI_MODEL_NAME="gemini-1.5-pro-latest" # Optional
mime-reader-mcp-server
```
By default, the server listens for MCP messages over **standard input/output (stdio)**.
**Exposed Tool:**
The server exposes a single tool:
* **Name:** `read_files`
* **Description:** Processes a list of local file paths and YouTube URLs with a question using a Generative AI model (like Gemini) capable of understanding various MIME types (images, PDFs, audio, etc.) and YouTube video content. Returns the generated text response.
* **Arguments:**
* `question` (string, required): The question to ask the AI model about the provided files.
* `files` (list of strings, required): A list of file paths or YouTube URLs. **File paths must be accessible from the environment where the `mime-reader-mcp-server` process is running.** YouTube URLs support formats: youtube.com/watch, youtu.be, and youtube.com/embed.
* `output` (string, optional): A local path (relative to the server's working directory, or absolute) where the AI's response should be saved. If provided, the tool returns a confirmation message instead of the full response.
* `auto_cleanup` (boolean, optional, default: `true`): Whether to automatically delete files uploaded to the Google AI service backend after processing.
**MCP Client Example (Python):**
```python
import asyncio
import os
import sys
import shutil
from mcp.client.session import ClientSession
from mcp.client.stdio import StdioServerParameters, stdio_client
from mcp import types as mcp_types # Import types for clarity
# --- Configuration ---
SERVER_EXECUTABLE = shutil.which("mime-reader-mcp-server")
if not SERVER_EXECUTABLE:
print("Error: Cannot find 'mime-reader-mcp-server'. Is the package installed?", file=sys.stderr)
sys.exit(1)
API_KEY = os.environ.get("GEMINI_API_KEY")
if not API_KEY:
print("Error: GEMINI_API_KEY needed for server environment.", file=sys.stderr)
sys.exit(1)
# --- Example Usage ---
async def run_mcp_client():
server_params = StdioServerParameters(
command=SERVER_EXECUTABLE,
args=[], # No extra args needed for entry point
env={ # Pass necessary env vars to the server process
**os.environ,
"GEMINI_API_KEY": API_KEY,
"PYTHONUNBUFFERED": "1"
# Optionally pass GEMINI_MODEL_NAME here too
}
)
image_path = "/path/to/your/local/image.png" # Replace with a real path accessible by server
youtube_url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ" # Example YouTube URL
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
print("Session Initialized.")
# Optional: List tools
list_resp = await session.list_tools()
print(f"Available Tools: {list_resp.tools}")
# Example 1: Call the read_files tool with local file
tool_args = {
"question": "What objects are prominent in this image?",
"files": [image_path]
}
print(f"\nCalling 'read_files' with args: {tool_args}")
call_resp: mcp_types.CallToolResponse = await session.call_tool("read_files", tool_args)
if hasattr(call_resp, 'content') and call_resp.content:
if isinstance(call_resp.content[0], mcp_types.TextContent):
print(f"\nServer Response:\n{call_resp.content[0].text}")
else:
print(f"\nReceived non-text content: {call_resp.content[0]}")
elif getattr(call_resp, 'isError', False):
print(f"\nServer returned an error flag.")
else:
# Example 2: Call the read_files tool with YouTube URL
youtube_args = {
"question": "What is the main topic of this video?",
"files": [youtube_url]
}
print(f"\nCalling 'read_files' with YouTube URL: {youtube_args}")
youtube_resp: mcp_types.CallToolResponse = await session.call_tool("read_files", youtube_args)
if hasattr(youtube_resp, 'content') and youtube_resp.content:
if isinstance(youtube_resp.content[0], mcp_types.TextContent):
print(f"\nYouTube Analysis Response:\n{youtube_resp.content[0].text}")
else:
print(f"\nReceived non-text content: {youtube_resp.content[0]}")
# Example 3: Mixed analysis with both local file and YouTube URL
mixed_args = {
"question": "Compare the content in this image with the YouTube video content.",
"files": [image_path, youtube_url]
}
print(f"\nCalling 'read_files' with mixed inputs: {mixed_args}")
mixed_resp: mcp_types.CallToolResponse = await session.call_tool("read_files", mixed_args)
if hasattr(mixed_resp, 'content') and mixed_resp.content:
if isinstance(mixed_resp.content[0], mcp_types.TextContent):
print(f"\nMixed Analysis Response:\n{mixed_resp.content[0].text}")
else:
print(f"\nReceived non-text content: {mixed_resp.content[0]}")
print(f"\nReceived unexpected response structure: {call_resp}")
if __name__ == "__main__":
# Make sure image_path above points to a real file
# Make sure API_KEY is set
try:
asyncio.run(run_mcp_client())
except Exception as e:
print(f"Client error: {e}")
```
## Why Use This with AI Agents (MCP Integration)?
AI agent frameworks (like LangChain, LlamaIndex, AutoGen, CrewAI, or custom implementations) often orchestrate complex tasks by allowing a central Language Model (LLM) to delegate specific capabilities to external "tools." Integrating `MimeFilesReader` as an MCP tool provides several advantages:
1. **Offloading Powerful Multi-modal Understanding:** Instead of requiring the primary agent LLM to have potentially expensive or less optimized built-in multi-modal capabilities, the agent can delegate tasks involving vision, document understanding, etc., to `MimeFilesReader`. This tool specifically utilizes Google's powerful Gemini models, which excel at these tasks. The agent simply needs to provide the file paths and a relevant question.
2. **Secure Access to Local Files:** Agents, especially when deployed as services, should operate with least privilege. Giving the core agent direct filesystem access can be a security risk. The MCP server acts as a secure bridge. The agent instructs the *locally running* `mime-reader-mcp-server` (which has legitimate access) to process specific files, rather than accessing them directly.
3. **Enabling Complex Workflows:**
* **PDF/Document Analysis:** An agent can ask the tool to "Extract the key financial figures from `/path/to/report.pdf`" or "Summarize the main arguments in `/path/to/research_paper.pdf`." The tool handles the PDF complexity, and Gemini provides the understanding.
* **Web Scraping + Vision:** An agent might use one tool (e.g., a Puppeteer/Selenium based MCP tool) to navigate a website and take a screenshot, saving it locally. It can then pass the screenshot's path to the `mime-reader-mcp-server` tool and ask, "What is the price shown for the product in `/tmp/screenshot.png`?" or "Describe the layout of this webpage section based on `/tmp/screenshot.png`."
* **Analyzing User Uploads:** In a chatbot application, if a user uploads an image or document, the backend can save it to a temporary location and use the `read_files` tool to understand its content based on the user's query (e.g., "What kind of flower is shown in the image I uploaded?").
4. **Modularity and Reusability:** The `mime-reader-mcp-server` can be run as a standalone service. Multiple different agents or applications can then connect to it via MCP to leverage its file-reading capabilities without each needing to implement the Google AI interaction logic themselves.
5. **YouTube Video Analysis:** Agents can analyze YouTube videos directly by providing URLs instead of requiring local video downloads. For example, an agent could process "Summarize the key points from this educational video at https://www.youtube.com/watch?v=abc123" or "What programming concepts are explained in https://youtu.be/def456?" This enables agents to incorporate video content analysis into their workflows seamlessly.
6. **Cross-Media Content Analysis:** Agents can compare and analyze information across different media types in a single tool call, such as "Compare the research findings in this PDF report with the explanations in this YouTube lecture" by providing both local PDF paths and YouTube URLs to the same `read_files` call.
By providing this functionality via MCP, `MimeFilesReader` becomes a powerful, reusable component in the growing ecosystem of AI agents and tools.
## Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/anges-ai/mime-files-reader",
"name": "mime-files-reader",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "generative ai, gemini, google genai, file processing, multimodal",
"author": "Anges",
"author_email": "me@anges.ai",
"download_url": "https://files.pythonhosted.org/packages/c7/e4/e1f2b49fca5b3a5baa40391fdfcf98c1104a9543f470a380d601117ca265/mime_files_reader-0.2.1.tar.gz",
"platform": null,
"description": "# MimeFilesReader\n\nReads and processes various MIME type files (images, PDFs, audio, etc.) and YouTube videos using Google's Gemini models capable of multi-modal understanding. Provides both a command-line interface (CLI) and a Model Context Protocol (MCP) server interface.\n\n## Features\n\n* Leverages Google's Generative AI (Gemini) for understanding content within various file types and YouTube videos.\n* Supports common MIME types like images (PNG, JPEG), PDFs, audio (MP3, WAV - depending on model support), etc.\n* **NEW: YouTube URL Support** - Analyze YouTube videos directly by providing YouTube URLs (youtube.com/watch, youtu.be, youtube.com/embed formats).\n* Provides a simple CLI for quick queries about local files and YouTube videos.\n* Offers an MCP server interface for integration with AI Agent frameworks (LangChain, LlamaIndex, AutoGen, custom agents).\n* Handles file uploads to the Google AI backend and optional automatic cleanup.\n* Seamlessly mix local files and YouTube URLs in the same query for comprehensive analysis.\n\n## Example use cases\n\n### Use case 1: PDF and Document Analysis\nWith Gemini's OCR capability, agents can use the tool to fetch info from MIME contents such as PDF, images and audio etc.\n\n\n\n```bash\n# Try with\nmime-reader \\\n -m \"gemini-2.5-pro\" \\\n -q \"Get the full text out of the PDF paper\" \\\n -f tests/test_data/1916_The_Foundation_of_the_General_Theory_of_Relativity_Einstein_part1.pdf\n```\n### Use case 2\nReading images can be helpful to let agents work with headless browsers, for debugging front end code or for web browsing.\n\n\n\n```\n# Try with\nmime-reader \\\n -m \"gemini-2.5-pro\" \\\n -q \"The picture is a screenshot of headless browser result. Describe in details of what is on it. Highlight any broken part or incorrect content\" \\\n -f tests/test_data/test_screenshot.png\n\n### Use case 3: YouTube Video Analysis\nAnalyze YouTube videos directly by providing YouTube URLs. Supports multiple URL formats including youtube.com/watch, youtu.be, and youtube.com/embed.\n\n```bash\n# Analyze a single YouTube video\nmime-reader \\\n -m \"gemini-2.5-pro\" \\\n -q \"Summarize the main points covered in this educational video\" \\\n -f \"https://www.youtube.com/watch?v=dQw4w9WgXcQ\"\n\n# Using youtu.be short format\nmime-reader \\\n -m \"gemini-2.5-pro\" \\\n -q \"What programming concepts are explained in this tutorial?\" \\\n -f \"https://youtu.be/dQw4w9WgXcQ\"\n```\n\n### Use case 4: Mixed Analysis (Local Files + YouTube URLs)\nSeamlessly combine local files and YouTube URLs in the same analysis for comprehensive insights.\n\n```bash\n# Compare a PDF document with a YouTube video on the same topic\nmime-reader \\\n -m \"gemini-2.5-pro\" \\\n -q \"Compare the information in this PDF with the content of the YouTube video. What are the similarities and differences?\" \\\n -f \"research_paper.pdf\" \\\n -f \"https://www.youtube.com/watch?v=educational_video_id\"\n\n# Analyze multiple YouTube videos together\nmime-reader \\\n -m \"gemini-2.5-pro\" \\\n -q \"What are the common themes across these educational videos?\" \\\n -f \"https://www.youtube.com/watch?v=video1\" \\\n -f \"https://youtu.be/video2\" \\\n -f \"https://www.youtube.com/embed/video3\"\n```\n\n## Installation\n\n### For End Users (Standard Installation)\n\nInstall the package directly from PyPI with only runtime dependencies:\n\n```bash\n# Create and activate a virtual environment (Recommended)\npython -m venv venv\nsource venv/bin/activate # On Windows use `venv\\Scripts\\activate`\n\n# Install the package\npip install mime-files-reader\n```\n\n### For Developers (Development Installation)\n\nIf you're contributing to the project or need testing capabilities:\n\n```bash\n# Clone the repository\ngit clone <repository-url>\ncd mime-files-reader\n\n# Create and activate a virtual environment\npython -m venv venv\nsource venv/bin/activate # On Windows use `venv\\Scripts\\activate`\n\n# Install with testing dependencies\npip install -e .[test]\n\n# Or install from requirements files\npip install -r requirements-dev.txt\n```\n\n### Requirements Files\n\n- `requirements.txt`: Contains only runtime dependencies needed for normal package usage\n- `requirements-dev.txt`: Contains both runtime and testing dependencies for development\n\n### Environment Configuration\nYou need a Google Generative AI API key. Create a `.env` file in the project root or export the variable:\n ```dotenv\n # .env\n GEMINI_API_KEY=\"YOUR_GOOGLE_AI_API_KEY\"\n # Optional: Specify a model, defaults to gemini-2.0-flash\n # GEMINI_MODEL_NAME=\"gemini-2.0-flash\"\n ```\n\n For complex tasks, consider using `gemini-2.5-pro`, though the quota is currently low.\n\n## Command-Line Interface (CLI) Usage\n\nThe CLI provides a direct way to ask questions about local files and YouTube videos.\n\n### Local File Examples\n```bash\nmime-reader -q \"Describe the main subject of this image.\" -f /path/to/your/image.jpg\n```\n\n```bash\nmime-reader --question \"Summarize the first page of this document.\" --files /path/to/document.pdf --output summary.txt\n```\n\n### YouTube Video Examples\n```bash\n# Analyze a YouTube video using youtube.com/watch format\nmime-reader -q \"What are the key points discussed in this video?\" -f \"https://www.youtube.com/watch?v=dQw4w9WgXcQ\"\n```\n\n```bash\n# Analyze a YouTube video using youtu.be format\nmime-reader -q \"Summarize this educational content.\" -f \"https://youtu.be/dQw4w9WgXcQ\" --output video_summary.txt\n```\n\n### Mixed Analysis Examples\n```bash\n# Combine local files and YouTube URLs in one analysis\nmime-reader -q \"Compare the content in this PDF with the YouTube video explanation.\" \\\n -f \"research_paper.pdf\" \\\n -f \"https://www.youtube.com/watch?v=educational_video_id\"\n```\n\n**Options:**\n\n* `-q`, `--question`: (Required) The question to ask the AI model.\n* `-f`, `--files`: (Required) One or more file paths or YouTube URLs. Use the option multiple times for multiple inputs (`-f file1.pdf -f image.png -f \"https://youtu.be/video_id\"`).\n* `-o`, `--output`: (Optional) Path to save the text response to a file.\n* `--model`: (Optional) Specify the Google AI model name (overrides `GEMINI_MODEL_NAME` env var).\n* `--no-cleanup`: (Optional) Prevent automatic deletion of files uploaded to Google AI backend (YouTube URLs don't require cleanup).\n\n*(Note: Ensure your original `mime_files_reader/cli.py` and the corresponding entry point in `setup.py` are correctly configured for this.)*\n\n## MCP Server Interface\n\nThis package now includes an MCP server, allowing AI agents or other applications to use the multi-modal file reading capabilities as a tool.\n\n**Prerequisites:**\n\n* The environment where the server runs **must** have the `GEMINI_API_KEY` environment variable set.\n* Optionally, set `GEMINI_MODEL_NAME` to use a specific model (defaults to `gemini-1.5-flash-latest`).\n\n**Running the Server:**\n\nAfter installing the package (`pip install -e .`), you can run the server using the registered console script:\n\n```bash\n# Make sure GEMINI_API_KEY is set in this shell\nexport GEMINI_API_KEY=\"YOUR_GOOGLE_AI_API_KEY\"\n# export GEMINI_MODEL_NAME=\"gemini-1.5-pro-latest\" # Optional\n\nmime-reader-mcp-server\n```\n\nBy default, the server listens for MCP messages over **standard input/output (stdio)**.\n\n**Exposed Tool:**\n\nThe server exposes a single tool:\n\n* **Name:** `read_files`\n* **Description:** Processes a list of local file paths and YouTube URLs with a question using a Generative AI model (like Gemini) capable of understanding various MIME types (images, PDFs, audio, etc.) and YouTube video content. Returns the generated text response.\n* **Arguments:**\n * `question` (string, required): The question to ask the AI model about the provided files.\n * `files` (list of strings, required): A list of file paths or YouTube URLs. **File paths must be accessible from the environment where the `mime-reader-mcp-server` process is running.** YouTube URLs support formats: youtube.com/watch, youtu.be, and youtube.com/embed.\n * `output` (string, optional): A local path (relative to the server's working directory, or absolute) where the AI's response should be saved. If provided, the tool returns a confirmation message instead of the full response.\n * `auto_cleanup` (boolean, optional, default: `true`): Whether to automatically delete files uploaded to the Google AI service backend after processing.\n\n**MCP Client Example (Python):**\n\n```python\nimport asyncio\nimport os\nimport sys\nimport shutil\nfrom mcp.client.session import ClientSession\nfrom mcp.client.stdio import StdioServerParameters, stdio_client\nfrom mcp import types as mcp_types # Import types for clarity\n\n# --- Configuration ---\nSERVER_EXECUTABLE = shutil.which(\"mime-reader-mcp-server\")\nif not SERVER_EXECUTABLE:\n print(\"Error: Cannot find 'mime-reader-mcp-server'. Is the package installed?\", file=sys.stderr)\n sys.exit(1)\n\nAPI_KEY = os.environ.get(\"GEMINI_API_KEY\")\nif not API_KEY:\n print(\"Error: GEMINI_API_KEY needed for server environment.\", file=sys.stderr)\n sys.exit(1)\n\n# --- Example Usage ---\nasync def run_mcp_client():\n server_params = StdioServerParameters(\n command=SERVER_EXECUTABLE,\n args=[], # No extra args needed for entry point\n env={ # Pass necessary env vars to the server process\n **os.environ,\n \"GEMINI_API_KEY\": API_KEY,\n \"PYTHONUNBUFFERED\": \"1\"\n # Optionally pass GEMINI_MODEL_NAME here too\n }\n )\n\n image_path = \"/path/to/your/local/image.png\" # Replace with a real path accessible by server\n youtube_url = \"https://www.youtube.com/watch?v=dQw4w9WgXcQ\" # Example YouTube URL\n\n async with stdio_client(server_params) as (read, write):\n async with ClientSession(read, write) as session:\n await session.initialize()\n print(\"Session Initialized.\")\n\n # Optional: List tools\n list_resp = await session.list_tools()\n print(f\"Available Tools: {list_resp.tools}\")\n\n # Example 1: Call the read_files tool with local file\n tool_args = {\n \"question\": \"What objects are prominent in this image?\",\n \"files\": [image_path]\n }\n print(f\"\\nCalling 'read_files' with args: {tool_args}\")\n call_resp: mcp_types.CallToolResponse = await session.call_tool(\"read_files\", tool_args)\n\n if hasattr(call_resp, 'content') and call_resp.content:\n if isinstance(call_resp.content[0], mcp_types.TextContent):\n print(f\"\\nServer Response:\\n{call_resp.content[0].text}\")\n else:\n print(f\"\\nReceived non-text content: {call_resp.content[0]}\")\n elif getattr(call_resp, 'isError', False):\n print(f\"\\nServer returned an error flag.\")\n else:\n\n # Example 2: Call the read_files tool with YouTube URL\n youtube_args = {\n \"question\": \"What is the main topic of this video?\",\n \"files\": [youtube_url]\n }\n print(f\"\\nCalling 'read_files' with YouTube URL: {youtube_args}\")\n youtube_resp: mcp_types.CallToolResponse = await session.call_tool(\"read_files\", youtube_args)\n \n if hasattr(youtube_resp, 'content') and youtube_resp.content:\n if isinstance(youtube_resp.content[0], mcp_types.TextContent):\n print(f\"\\nYouTube Analysis Response:\\n{youtube_resp.content[0].text}\")\n else:\n print(f\"\\nReceived non-text content: {youtube_resp.content[0]}\")\n \n # Example 3: Mixed analysis with both local file and YouTube URL\n mixed_args = {\n \"question\": \"Compare the content in this image with the YouTube video content.\",\n \"files\": [image_path, youtube_url]\n }\n print(f\"\\nCalling 'read_files' with mixed inputs: {mixed_args}\")\n mixed_resp: mcp_types.CallToolResponse = await session.call_tool(\"read_files\", mixed_args)\n \n if hasattr(mixed_resp, 'content') and mixed_resp.content:\n if isinstance(mixed_resp.content[0], mcp_types.TextContent):\n print(f\"\\nMixed Analysis Response:\\n{mixed_resp.content[0].text}\")\n else:\n print(f\"\\nReceived non-text content: {mixed_resp.content[0]}\")\n print(f\"\\nReceived unexpected response structure: {call_resp}\")\n\nif __name__ == \"__main__\":\n # Make sure image_path above points to a real file\n # Make sure API_KEY is set\n try:\n asyncio.run(run_mcp_client())\n except Exception as e:\n print(f\"Client error: {e}\")\n\n```\n\n## Why Use This with AI Agents (MCP Integration)?\n\nAI agent frameworks (like LangChain, LlamaIndex, AutoGen, CrewAI, or custom implementations) often orchestrate complex tasks by allowing a central Language Model (LLM) to delegate specific capabilities to external \"tools.\" Integrating `MimeFilesReader` as an MCP tool provides several advantages:\n\n1. **Offloading Powerful Multi-modal Understanding:** Instead of requiring the primary agent LLM to have potentially expensive or less optimized built-in multi-modal capabilities, the agent can delegate tasks involving vision, document understanding, etc., to `MimeFilesReader`. This tool specifically utilizes Google's powerful Gemini models, which excel at these tasks. The agent simply needs to provide the file paths and a relevant question.\n\n2. **Secure Access to Local Files:** Agents, especially when deployed as services, should operate with least privilege. Giving the core agent direct filesystem access can be a security risk. The MCP server acts as a secure bridge. The agent instructs the *locally running* `mime-reader-mcp-server` (which has legitimate access) to process specific files, rather than accessing them directly.\n\n3. **Enabling Complex Workflows:**\n * **PDF/Document Analysis:** An agent can ask the tool to \"Extract the key financial figures from `/path/to/report.pdf`\" or \"Summarize the main arguments in `/path/to/research_paper.pdf`.\" The tool handles the PDF complexity, and Gemini provides the understanding.\n * **Web Scraping + Vision:** An agent might use one tool (e.g., a Puppeteer/Selenium based MCP tool) to navigate a website and take a screenshot, saving it locally. It can then pass the screenshot's path to the `mime-reader-mcp-server` tool and ask, \"What is the price shown for the product in `/tmp/screenshot.png`?\" or \"Describe the layout of this webpage section based on `/tmp/screenshot.png`.\"\n * **Analyzing User Uploads:** In a chatbot application, if a user uploads an image or document, the backend can save it to a temporary location and use the `read_files` tool to understand its content based on the user's query (e.g., \"What kind of flower is shown in the image I uploaded?\").\n\n4. **Modularity and Reusability:** The `mime-reader-mcp-server` can be run as a standalone service. Multiple different agents or applications can then connect to it via MCP to leverage its file-reading capabilities without each needing to implement the Google AI interaction logic themselves.\n\n\n5. **YouTube Video Analysis:** Agents can analyze YouTube videos directly by providing URLs instead of requiring local video downloads. For example, an agent could process \"Summarize the key points from this educational video at https://www.youtube.com/watch?v=abc123\" or \"What programming concepts are explained in https://youtu.be/def456?\" This enables agents to incorporate video content analysis into their workflows seamlessly.\n\n6. **Cross-Media Content Analysis:** Agents can compare and analyze information across different media types in a single tool call, such as \"Compare the research findings in this PDF report with the explanations in this YouTube lecture\" by providing both local PDF paths and YouTube URLs to the same `read_files` call.\nBy providing this functionality via MCP, `MimeFilesReader` becomes a powerful, reusable component in the growing ecosystem of AI agents and tools.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit issues or pull requests.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n",
"bugtrack_url": null,
"license": null,
"summary": "A tool to process various file types (images, PDFs, audio) using Google Generative AI",
"version": "0.2.1",
"project_urls": {
"Homepage": "https://github.com/anges-ai/mime-files-reader"
},
"split_keywords": [
"generative ai",
" gemini",
" google genai",
" file processing",
" multimodal"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "635e4aa4fd79a80a90e39e55f6dae956338d7f8687ff63bc383400a242a9bab4",
"md5": "96daf94efa31e5865a346530789a3b3f",
"sha256": "e74c52d98c9a187dc682bed1c6c828b2cf31d2d6c933a0871324dc2216082b73"
},
"downloads": -1,
"filename": "mime_files_reader-0.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "96daf94efa31e5865a346530789a3b3f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 17810,
"upload_time": "2025-07-21T00:01:58",
"upload_time_iso_8601": "2025-07-21T00:01:58.218462Z",
"url": "https://files.pythonhosted.org/packages/63/5e/4aa4fd79a80a90e39e55f6dae956338d7f8687ff63bc383400a242a9bab4/mime_files_reader-0.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c7e4e1f2b49fca5b3a5baa40391fdfcf98c1104a9543f470a380d601117ca265",
"md5": "a4160551163d565534b42db0ca271e70",
"sha256": "9b7ca8db23595b3869c763d4028397538b764eafa62115f7a6e686ac17e83746"
},
"downloads": -1,
"filename": "mime_files_reader-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "a4160551163d565534b42db0ca271e70",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 29811,
"upload_time": "2025-07-21T00:01:59",
"upload_time_iso_8601": "2025-07-21T00:01:59.294110Z",
"url": "https://files.pythonhosted.org/packages/c7/e4/e1f2b49fca5b3a5baa40391fdfcf98c1104a9543f470a380d601117ca265/mime_files_reader-0.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-21 00:01:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "anges-ai",
"github_project": "mime-files-reader",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "google.genai",
"specs": []
},
{
"name": "python-dotenv",
"specs": []
},
{
"name": "httpx",
"specs": []
},
{
"name": "mcp",
"specs": []
},
{
"name": "click",
"specs": []
},
{
"name": "anyio",
"specs": []
},
{
"name": "pytest",
"specs": []
},
{
"name": "mcp-client",
"specs": []
}
],
"lcname": "mime-files-reader"
}