# Attachments โ the Python funnel for LLM context
### Turn *any* file into model-ready text ๏ผ images, in one line
Most users will not have to learn anything more than: `Attachments("path/to/file.pdf")`
## ๐ฌ Demo

> **TL;DR**
> ```bash
> pip install attachments
> ```
> ```python
> from attachments import Attachments
> ctx = Attachments("https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample.pdf",
> "https://github.com/MaximeRivest/attachments/raw/refs/heads/main/src/attachments/data/sample_multipage.pptx")
> llm_ready_text = str(ctx) # all extracted text, already "prompt-engineered"
> llm_ready_images = ctx.images # list[str] โ base64 PNGs
> ```
Attachments aims to be **the** community funnel from *file โ text + base64 images* for LLMs.
Stop re-writing that plumbing in every project โ contribute your *loader / modifier / presenter / refiner / adapter* plugin instead!
## Quick-start โก
```bash
pip install attachments
```
### Try it now with sample files
```python
from attachments import Attachments
from attachments.data import get_sample_path
# Option 1: Use included sample files (works offline)
pdf_path = get_sample_path("sample.pdf")
txt_path = get_sample_path("sample.txt")
ctx = Attachments(pdf_path, txt_path)
print(str(ctx)) # Pretty text view
print(len(ctx.images)) # Number of extracted images
# Try different file types
docx_path = get_sample_path("test_document.docx")
csv_path = get_sample_path("test.csv")
json_path = get_sample_path("sample.json")
ctx = Attachments(docx_path, csv_path, json_path)
print(f"Processed {len(ctx)} files: Word doc, CSV data, and JSON")
# Option 2: Use URLs (same API, works with any URL)
ctx = Attachments(
"https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample.pdf",
"https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample_multipage.pptx"
)
print(str(ctx)) # Pretty text view
print(len(ctx.images)) # Number of extracted images
```
### Advanced usage with DSL
```python
from attachments import Attachments
a = Attachments(
"https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/" \
"sample_multipage.pptx[3-5]"
)
print(a) # pretty text view
len(a.images) # ๐ base64 PNG list
```
### Send to OpenAI
```bash
pip install openai
```
```python
from openai import OpenAI
from attachments import Attachments
pptx = Attachments("https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample_multipage.pptx[3-5]")
client = OpenAI()
resp = client.chat.completions.create(
model="gpt-4.1-nano",
messages=pptx.openai_chat("Analyse the following document:")
)
print(resp.choices[0].message.content)
```
or with the response API
```python
from openai import OpenAI
from attachments import Attachments
pptx = Attachments("https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample_multipage.pptx[3-5]")
client = OpenAI()
resp = client.responses.create(
input=pptx.openai_responses("Analyse the following document:"),
model="gpt-4.1-nano"
)
print(resp.output[0].content[0].text)
```
### Send to Anthropic / Claude
```bash
pip install anthropic
```
```python
import anthropic
from attachments import Attachments
pptx = Attachments("https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample_multipage.pptx[3-5]")
msg = anthropic.Anthropic().messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=8_192,
messages=pptx.claude("Analyse the slides:")
)
print(msg.content)
```
### DSPy Integration
We have a special `dspy` module that allows you to use Attachments with DSPy.
```bash
pip install dspy-ai
```
```python
from attachments.dspy import Attachments # Automatic type registration!
import dspy
# Configure DSPy
dspy.configure(lm=dspy.LM('openai/gpt-4.1-nano'))
# Both approaches work seamlessly:
# 1. Class-based signatures (recommended)
class DocumentAnalyzer(dspy.Signature):
"""Analyze document content and extract insights."""
document: Attachments = dspy.InputField()
insights: str = dspy.OutputField()
# 2. String-based signatures (works automatically!)
analyzer = dspy.Signature("document: Attachments -> insights: str")
# Use with any file type
doc = Attachments("report.pdf")
result = dspy.ChainOfThought(DocumentAnalyzer)(document=doc)
print(result.insights)
```
**Key Features:**
- ๐ฏ **Automatic Type Registration**: Import from `attachments.dspy` and use `Attachments` in string signatures immediately
- ๐ **Seamless Serialization**: Handles complex multimodal content automatically
- ๐ผ๏ธ **Image Support**: Base64 images work perfectly with vision models
- ๐ **Rich Text**: Preserves formatting and structure
- ๐งฉ **Full Compatibility**: Works with all DSPy signatures and programs
### Optional: CSS Selector Highlighting ๐ฏ
For advanced web scraping with visual element highlighting in screenshots:
```bash
# Install Playwright for CSS selector highlighting
pip install playwright
playwright install chromium
# Or with uv
uv add playwright
uv run playwright install chromium
# Or install with browser extras
pip install attachments[browser]
playwright install chromium
```
**What this enables:**
- ๐ฏ Visual highlighting of selected elements with animations
- ๐ธ High-quality screenshots with JavaScript rendering
- ๐จ Professional styling with glowing borders and badges
- ๐ Perfect for extracting specific page elements
```python
# CSS selector highlighting examples
title = Attachments("https://example.com[select:h1]") # Highlights H1 elements
content = Attachments("https://example.com[select:.content]") # Highlights .content class
main = Attachments("https://example.com[select:#main]") # Highlights #main ID
# Multiple elements with counters and different colors
multi = Attachments("https://example.com[select:h1, .important][viewport:1920x1080]")
```
*Note: Without Playwright, CSS selectors still work for text extraction, but no visual highlighting screenshots are generated.*
### Optional: Microsoft Office Support ๐
For dedicated Microsoft Office format processing:
```bash
# Install just Office format support
pip install attachments[office]
# Or with uv
uv add attachments[office]
```
**What this enables:**
- ๐ PowerPoint (.pptx) slide extraction and processing
- ๐ Word (.docx) document text and formatting extraction
- ๐ Excel (.xlsx) spreadsheet data analysis
- ๐ฏ Lightweight installation for Office-only workflows
```python
# Office format examples
presentation = Attachments("slides.pptx[1-5]") # Extract specific slides
document = Attachments("report.docx") # Word document processing
spreadsheet = Attachments("data.xlsx[summary:true]") # Excel with summary
```
*Note: Office formats are also included in the `common` and `all` dependency groups.*
### Advanced Pipeline Processing
For power users, use the full grammar system with composable pipelines:
```python
from attachments import attach, load, modify, present, refine, adapt
# Custom processing pipeline
result = (attach("document.pdf[pages:1-5]")
| load.pdf_to_pdfplumber
| modify.pages
| present.markdown + present.images
| refine.add_headers | refine.truncate
| adapt.claude("Analyze this content"))
# Web scraping pipeline
title = (attach("https://en.wikipedia.org/wiki/Llama[select:title]")
| load.url_to_bs4
| modify.select
| present.text)
# Reusable processors
csv_analyzer = (load.csv_to_pandas
| modify.limit
| present.head + present.summary + present.metadata
| refine.add_headers)
# Use as function
result = csv_analyzer("data.csv[limit:1000]")
analysis = result.claude("What patterns do you see?")
```
---
## DSL cheatsheet ๐
| Piece | Example | Notes |
| ------------------------- | ------------------------- | --------------------------------------------- |
| **Select pages / slides** | `report.pdf[1,3-5,-1]` | Supports ranges, negative indices, `N` = last |
| **Image transforms** | `photo.jpg[rotate:90]` | Any token implemented by a `Transform` plugin |
| **Data-frame summary** | `table.csv[summary:true]` | Ships with a quick `df.describe()` renderer |
| **Web content selection** | `url[select:title]` | CSS selectors for web scraping |
| **Web element highlighting** | `url[select:h1][viewport:1920x1080]` | Visual highlighting in screenshots |
| **Image processing** | `image.jpg[crop:100,100,400,300][rotate:45]` | Chain multiple transformations |
| **Content filtering** | `doc.pdf[format:plain][images:false]` | Control text/image extraction |
| **Repository processing** | `repo[files:false][ignore:standard]` | Smart codebase analysis |
| **Content Control** | `doc.pdf[truncate:5000]` | *Explicit* truncation when needed (user choice) |
| **Repository Filtering** | `repo[max_files:100]` | Limit file processing (performance, not content) |
| **Processing Limits** | `data.csv[limit:1000]` | Row limits for large datasets (explicit) |
> ๐ **Default Philosophy**: All content preserved unless you explicitly request limits
---
## Supported formats (out of the box)
* **Docs**: PDF, PowerPoint (`.pptx`), CSV, TXT, Markdown, HTML
* **Images**: PNG, JPEG, BMP, GIF, WEBP, HEIC/HEIF, โฆ
* **Web**: URLs with BeautifulSoup parsing and CSS selection
* **Archives**: ZIP files โ image collections with tiling
* **Repositories**: Git repos with smart ignore patterns
* **Data**: CSV with pandas, JSON
---
## Advanced Examples ๐งฉ
### **Multimodal Document Processing**
```python
# PDF with image tiling and analysis
result = Attachments("report.pdf[tile:2x3][resize_images:400]")
analysis = result.claude("Analyze both text and visual elements")
# Multiple file types in one context
ctx = Attachments("report.pdf", "data.csv", "chart.png")
comparison = ctx.openai("Compare insights across all documents")
```
### **Repository Analysis**
```python
# Codebase structure only
structure = Attachments("./my-project[mode:structure]")
# Full codebase analysis with smart filtering
codebase = Attachments("./my-project[ignore:standard]")
review = codebase.claude("Review this code for best practices")
# Custom ignore patterns
filtered = Attachments("./app[ignore:.env,*.log,node_modules]")
```
### **Web Scraping with CSS Selectors**
```python
# Extract specific content from web pages
title = Attachments("https://example.com[select:h1]")
paragraphs = Attachments("https://example.com[select:p]")
# Visual highlighting in screenshots with animations
highlighted = Attachments("https://example.com[select:h1][viewport:1920x1080]")
# Creates screenshot with animated highlighting of h1 elements
# Multiple element highlighting with counters
multi_select = Attachments("https://example.com[select:h1, .important][fullpage:true]")
# Shows "H1 (1/3)", "DIV (2/3)", etc. with different colors for multiple selections
# Pipeline approach for complex scraping
content = (attach("https://en.wikipedia.org/wiki/Llama[select:p]")
| load.url_to_bs4
| modify.select
| present.text
| refine.truncate)
```
### **Image Processing Chains**
```python
# HEIC support with transformations
processed = Attachments("IMG_2160.HEIC[crop:100,100,400,300][rotate:90]")
# Batch image processing with tiling
collage = Attachments("photos.zip[tile:3x2][resize_images:800]")
description = collage.claude("Describe this image collage")
```
### **Data Analysis Workflows**
```python
# Rich data presentation
data_summary = Attachments("sales_data.csv[limit:1000][summary:true]")
# Pipeline for complex data processing
result = (attach("data.csv[limit:500]")
| load.csv_to_pandas
| modify.limit
| present.head + present.summary + present.metadata
| refine.add_headers
| adapt.claude("What trends do you see?"))
```
---
## Extending ๐งฉ
```python
# my_ocr_presenter.py
from attachments.core import Attachment, presenter
@presenter
def ocr_text(att: Attachment, pil_image: 'PIL.Image.Image') -> Attachment:
"""Extract text from images using OCR."""
try:
import pytesseract
# Extract text using OCR
extracted_text = pytesseract.image_to_string(pil_image)
# Add OCR text to attachment
att.text += f"\n## OCR Extracted Text\n\n{extracted_text}\n"
# Add metadata
att.metadata['ocr_extracted'] = True
att.metadata['ocr_text_length'] = len(extracted_text)
return att
except ImportError:
att.text += "\n## OCR Not Available\n\nInstall pytesseract: pip install pytesseract\n"
return att
```
**How it works:**
1. **Save the file** anywhere in your project
2. **Import it** before using attachments: `import my_ocr_presenter`
3. **Use automatically**: `Attachments("scanned_document.png")` will now include OCR text
**Other extension points:**
- `@loader` - Add support for new file formats
- `@modifier` - Add new transformations (crop, rotate, etc.)
- `@presenter` - Add new content extraction methods
- `@refiner` - Add post-processing steps
- `@adapter` - Add new API format outputs
---
## API reference (essentials)
| Object / method | Description |
| ----------------------- | --------------------------------------------------------------- |
| `Attachments(*sources)` | Many `Attachment` objects flattened into one container |
| `Attachments.text` | All text joined with blank lines |
| `Attachments.images` | Flat list of base64 PNGs |
| `.claude(prompt="")` | Claude API format with image support |
| `.openai_chat(prompt="")` | OpenAI Chat Completions API format |
| `.openai_responses(prompt="")` | OpenAI Responses API format (different structure) |
| `.openai(prompt="")` | Alias for openai_chat (backwards compatibility) |
| `.dspy()` | DSPy BaseType-compatible objects |
### Grammar System (Advanced)
| Namespace | Purpose | Examples |
|-----------|---------|----------|
| `load.*` | File format โ objects | `pdf_to_pdfplumber`, `csv_to_pandas`, `url_to_bs4` |
| `modify.*` | Transform objects | `pages`, `limit`, `select`, `crop`, `rotate` |
| `present.*` | Extract content | `text`, `images`, `markdown`, `summary` |
| `refine.*` | Post-process | `truncate`, `add_headers`, `tile_images` |
| `adapt.*` | Format for APIs | `claude`, `openai`, `dspy` |
**Operators**: `|` (sequential), `+` (additive)
---
### Roadmap
- [ ] **Documentation**: Architecture, Grammar, How to extend, examples (at least 1 per pipeline), DSL, API reference
- [ ] **Test coverage**: 100% for pipelines, 100% for DSL.
- [ ] **More pipelines**: Google Suite, Google Drive, Email(!?), Youtube url, X link, ChatGPT url, Slack url (?), data (parquet, duckdb, arrow, sqlite), etc.
- [ ] **More adapters**: Bedrock, Azure, Openrouter, Ollama (?), Litellm, Langchain, vllm(?), sglang(?), cossette, claudette, etc.
- [ ] **Add .audio and .video**: and corresponding pipelines.
Join us โ file an issue or open a PR! ๐
## Star History
[](https://www.star-history.com/#maximerivest/attachments&Date)
## Installation
```bash
# Stable release (recommended for most users)
pip install attachments
# Alpha testing (latest features, may have bugs)
pip install attachments==0.13.0a1
# or
pip install --pre attachments
```
### ๐งช Alpha Testing
We're actively developing new features! If you want to test the latest capabilities:
**Install alpha version:**
```bash
pip install attachments==0.13.0a1
```
**What's new in alpha:**
- ๐ Enhanced DSL cheatsheet with types, defaults, and allowable values
- ๐ Automatic DSL command discovery and documentation
- ๐ Improved logging and verbosity system
- ๐ ๏ธ Better error messages and suggestions
**Feedback welcome:** [GitHub Issues](https://github.com/MaximeRivest/attachments/issues)
---
Raw data
{
"_id": null,
"home_page": null,
"name": "attachments",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "llm, ai, pdf, document, multimodal, openai, claude, context, attachments",
"author": null,
"author_email": "Maxime Rivest <mrive052@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/dd/47/f2fe02c635297c9cad9455e768c05d4deeb6cce606d4d93a6b6113255092/attachments-0.20.0.tar.gz",
"platform": null,
"description": "# Attachments \u2013 the Python funnel for LLM context\n\n### Turn *any* file into model-ready text \uff0b images, in one line\n\nMost users will not have to learn anything more than: `Attachments(\"path/to/file.pdf\")`\n\n## \ud83c\udfac Demo\n\n\n\n> **TL;DR** \n> ```bash\n> pip install attachments\n> ```\n> ```python\n> from attachments import Attachments\n> ctx = Attachments(\"https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample.pdf\",\n> \"https://github.com/MaximeRivest/attachments/raw/refs/heads/main/src/attachments/data/sample_multipage.pptx\")\n> llm_ready_text = str(ctx) # all extracted text, already \"prompt-engineered\"\n> llm_ready_images = ctx.images # list[str] \u2013 base64 PNGs\n> ```\n\n\nAttachments aims to be **the** community funnel from *file \u2192 text + base64 images* for LLMs. \nStop re-writing that plumbing in every project \u2013 contribute your *loader / modifier / presenter / refiner / adapter* plugin instead!\n\n## Quick-start \u26a1\n\n```bash\npip install attachments\n```\n\n### Try it now with sample files\n\n```python\nfrom attachments import Attachments\nfrom attachments.data import get_sample_path\n\n# Option 1: Use included sample files (works offline)\npdf_path = get_sample_path(\"sample.pdf\")\ntxt_path = get_sample_path(\"sample.txt\")\nctx = Attachments(pdf_path, txt_path)\n\nprint(str(ctx)) # Pretty text view\nprint(len(ctx.images)) # Number of extracted images\n\n# Try different file types\ndocx_path = get_sample_path(\"test_document.docx\")\ncsv_path = get_sample_path(\"test.csv\")\njson_path = get_sample_path(\"sample.json\")\n\nctx = Attachments(docx_path, csv_path, json_path)\nprint(f\"Processed {len(ctx)} files: Word doc, CSV data, and JSON\")\n\n# Option 2: Use URLs (same API, works with any URL)\nctx = Attachments(\n \"https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample.pdf\",\n \"https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample_multipage.pptx\"\n)\n\nprint(str(ctx)) # Pretty text view \nprint(len(ctx.images)) # Number of extracted images\n```\n\n### Advanced usage with DSL\n\n```python\nfrom attachments import Attachments\n\na = Attachments(\n \"https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/\" \\\n \"sample_multipage.pptx[3-5]\"\n)\nprint(a) # pretty text view\nlen(a.images) # \ud83d\udc49 base64 PNG list\n```\n\n### Send to OpenAI\n\n```bash\npip install openai\n```\n\n```python\nfrom openai import OpenAI\nfrom attachments import Attachments\n\npptx = Attachments(\"https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample_multipage.pptx[3-5]\")\n\nclient = OpenAI()\nresp = client.chat.completions.create(\n model=\"gpt-4.1-nano\",\n messages=pptx.openai_chat(\"Analyse the following document:\")\n)\nprint(resp.choices[0].message.content)\n```\n\nor with the response API\n\n```python\nfrom openai import OpenAI\nfrom attachments import Attachments\n\npptx = Attachments(\"https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample_multipage.pptx[3-5]\")\n\nclient = OpenAI()\nresp = client.responses.create(\n input=pptx.openai_responses(\"Analyse the following document:\"),\n model=\"gpt-4.1-nano\"\n)\nprint(resp.output[0].content[0].text)\n```\n\n### Send to Anthropic / Claude\n\n```bash\npip install anthropic\n```\n\n```python\nimport anthropic\nfrom attachments import Attachments\n\npptx = Attachments(\"https://github.com/MaximeRivest/attachments/raw/main/src/attachments/data/sample_multipage.pptx[3-5]\")\n\nmsg = anthropic.Anthropic().messages.create(\n model=\"claude-3-5-haiku-20241022\",\n max_tokens=8_192,\n messages=pptx.claude(\"Analyse the slides:\")\n)\nprint(msg.content)\n```\n\n### DSPy Integration\n\nWe have a special `dspy` module that allows you to use Attachments with DSPy.\n\n```bash\npip install dspy-ai\n```\n\n```python\nfrom attachments.dspy import Attachments # Automatic type registration!\nimport dspy\n\n# Configure DSPy\ndspy.configure(lm=dspy.LM('openai/gpt-4.1-nano'))\n\n# Both approaches work seamlessly:\n\n# 1. Class-based signatures (recommended)\nclass DocumentAnalyzer(dspy.Signature):\n \"\"\"Analyze document content and extract insights.\"\"\"\n document: Attachments = dspy.InputField()\n insights: str = dspy.OutputField()\n\n# 2. String-based signatures (works automatically!)\nanalyzer = dspy.Signature(\"document: Attachments -> insights: str\")\n\n# Use with any file type\ndoc = Attachments(\"report.pdf\")\nresult = dspy.ChainOfThought(DocumentAnalyzer)(document=doc)\nprint(result.insights)\n```\n\n**Key Features:**\n- \ud83c\udfaf **Automatic Type Registration**: Import from `attachments.dspy` and use `Attachments` in string signatures immediately\n- \ud83d\udd04 **Seamless Serialization**: Handles complex multimodal content automatically \n- \ud83d\uddbc\ufe0f **Image Support**: Base64 images work perfectly with vision models\n- \ud83d\udcdd **Rich Text**: Preserves formatting and structure\n- \ud83e\udde9 **Full Compatibility**: Works with all DSPy signatures and programs\n\n### Optional: CSS Selector Highlighting \ud83c\udfaf\n\nFor advanced web scraping with visual element highlighting in screenshots:\n\n```bash\n# Install Playwright for CSS selector highlighting\npip install playwright\nplaywright install chromium\n\n# Or with uv\nuv add playwright\nuv run playwright install chromium\n\n# Or install with browser extras\npip install attachments[browser]\nplaywright install chromium\n```\n\n**What this enables:**\n- \ud83c\udfaf Visual highlighting of selected elements with animations\n- \ud83d\udcf8 High-quality screenshots with JavaScript rendering \n- \ud83c\udfa8 Professional styling with glowing borders and badges\n- \ud83d\udd0d Perfect for extracting specific page elements\n\n```python\n# CSS selector highlighting examples\ntitle = Attachments(\"https://example.com[select:h1]\") # Highlights H1 elements\ncontent = Attachments(\"https://example.com[select:.content]\") # Highlights .content class\nmain = Attachments(\"https://example.com[select:#main]\") # Highlights #main ID\n\n# Multiple elements with counters and different colors\nmulti = Attachments(\"https://example.com[select:h1, .important][viewport:1920x1080]\")\n```\n\n*Note: Without Playwright, CSS selectors still work for text extraction, but no visual highlighting screenshots are generated.*\n\n### Optional: Microsoft Office Support \ud83d\udcc4\n\nFor dedicated Microsoft Office format processing:\n\n```bash\n# Install just Office format support\npip install attachments[office]\n\n# Or with uv\nuv add attachments[office]\n```\n\n**What this enables:**\n- \ud83d\udcca PowerPoint (.pptx) slide extraction and processing\n- \ud83d\udcdd Word (.docx) document text and formatting extraction \n- \ud83d\udcc8 Excel (.xlsx) spreadsheet data analysis\n- \ud83c\udfaf Lightweight installation for Office-only workflows\n\n```python\n# Office format examples\npresentation = Attachments(\"slides.pptx[1-5]\") # Extract specific slides\ndocument = Attachments(\"report.docx\") # Word document processing\nspreadsheet = Attachments(\"data.xlsx[summary:true]\") # Excel with summary\n```\n\n*Note: Office formats are also included in the `common` and `all` dependency groups.*\n\n### Advanced Pipeline Processing\n\nFor power users, use the full grammar system with composable pipelines:\n\n```python\nfrom attachments import attach, load, modify, present, refine, adapt\n\n# Custom processing pipeline\nresult = (attach(\"document.pdf[pages:1-5]\") \n | load.pdf_to_pdfplumber \n | modify.pages \n | present.markdown + present.images\n | refine.add_headers | refine.truncate\n | adapt.claude(\"Analyze this content\"))\n\n# Web scraping pipeline\ntitle = (attach(\"https://en.wikipedia.org/wiki/Llama[select:title]\")\n | load.url_to_bs4 \n | modify.select \n | present.text)\n\n# Reusable processors\ncsv_analyzer = (load.csv_to_pandas \n | modify.limit \n | present.head + present.summary + present.metadata\n | refine.add_headers)\n\n# Use as function\nresult = csv_analyzer(\"data.csv[limit:1000]\")\nanalysis = result.claude(\"What patterns do you see?\")\n```\n\n\n\n---\n\n## DSL cheatsheet \ud83d\udcdd\n\n| Piece | Example | Notes |\n| ------------------------- | ------------------------- | --------------------------------------------- |\n| **Select pages / slides** | `report.pdf[1,3-5,-1]` | Supports ranges, negative indices, `N` = last |\n| **Image transforms** | `photo.jpg[rotate:90]` | Any token implemented by a `Transform` plugin |\n| **Data-frame summary** | `table.csv[summary:true]` | Ships with a quick `df.describe()` renderer |\n| **Web content selection** | `url[select:title]` | CSS selectors for web scraping |\n| **Web element highlighting** | `url[select:h1][viewport:1920x1080]` | Visual highlighting in screenshots |\n| **Image processing** | `image.jpg[crop:100,100,400,300][rotate:45]` | Chain multiple transformations |\n| **Content filtering** | `doc.pdf[format:plain][images:false]` | Control text/image extraction |\n| **Repository processing** | `repo[files:false][ignore:standard]` | Smart codebase analysis |\n| **Content Control** | `doc.pdf[truncate:5000]` | *Explicit* truncation when needed (user choice) |\n| **Repository Filtering** | `repo[max_files:100]` | Limit file processing (performance, not content) |\n| **Processing Limits** | `data.csv[limit:1000]` | Row limits for large datasets (explicit) |\n\n> \ud83d\udd12 **Default Philosophy**: All content preserved unless you explicitly request limits\n\n---\n\n## Supported formats (out of the box)\n\n* **Docs**: PDF, PowerPoint (`.pptx`), CSV, TXT, Markdown, HTML\n* **Images**: PNG, JPEG, BMP, GIF, WEBP, HEIC/HEIF, \u2026\n* **Web**: URLs with BeautifulSoup parsing and CSS selection\n* **Archives**: ZIP files \u2192 image collections with tiling\n* **Repositories**: Git repos with smart ignore patterns\n* **Data**: CSV with pandas, JSON\n\n---\n\n## Advanced Examples \ud83e\udde9\n\n### **Multimodal Document Processing**\n```python\n# PDF with image tiling and analysis\nresult = Attachments(\"report.pdf[tile:2x3][resize_images:400]\")\nanalysis = result.claude(\"Analyze both text and visual elements\")\n\n# Multiple file types in one context\nctx = Attachments(\"report.pdf\", \"data.csv\", \"chart.png\")\ncomparison = ctx.openai(\"Compare insights across all documents\")\n```\n\n### **Repository Analysis**\n```python\n# Codebase structure only\nstructure = Attachments(\"./my-project[mode:structure]\")\n\n# Full codebase analysis with smart filtering\ncodebase = Attachments(\"./my-project[ignore:standard]\")\nreview = codebase.claude(\"Review this code for best practices\")\n\n# Custom ignore patterns\nfiltered = Attachments(\"./app[ignore:.env,*.log,node_modules]\")\n```\n\n### **Web Scraping with CSS Selectors**\n```python\n# Extract specific content from web pages\ntitle = Attachments(\"https://example.com[select:h1]\")\nparagraphs = Attachments(\"https://example.com[select:p]\")\n\n# Visual highlighting in screenshots with animations\nhighlighted = Attachments(\"https://example.com[select:h1][viewport:1920x1080]\")\n# Creates screenshot with animated highlighting of h1 elements\n\n# Multiple element highlighting with counters\nmulti_select = Attachments(\"https://example.com[select:h1, .important][fullpage:true]\")\n# Shows \"H1 (1/3)\", \"DIV (2/3)\", etc. with different colors for multiple selections\n\n# Pipeline approach for complex scraping\ncontent = (attach(\"https://en.wikipedia.org/wiki/Llama[select:p]\")\n | load.url_to_bs4 \n | modify.select \n | present.text\n | refine.truncate)\n```\n\n### **Image Processing Chains**\n```python\n# HEIC support with transformations\nprocessed = Attachments(\"IMG_2160.HEIC[crop:100,100,400,300][rotate:90]\")\n\n# Batch image processing with tiling\ncollage = Attachments(\"photos.zip[tile:3x2][resize_images:800]\")\ndescription = collage.claude(\"Describe this image collage\")\n```\n\n### **Data Analysis Workflows**\n```python\n# Rich data presentation\ndata_summary = Attachments(\"sales_data.csv[limit:1000][summary:true]\")\n\n# Pipeline for complex data processing\nresult = (attach(\"data.csv[limit:500]\")\n | load.csv_to_pandas \n | modify.limit\n | present.head + present.summary + present.metadata\n | refine.add_headers\n | adapt.claude(\"What trends do you see?\"))\n```\n\n---\n\n## Extending \ud83e\udde9\n\n```python\n# my_ocr_presenter.py\nfrom attachments.core import Attachment, presenter\n\n@presenter\ndef ocr_text(att: Attachment, pil_image: 'PIL.Image.Image') -> Attachment:\n \"\"\"Extract text from images using OCR.\"\"\"\n try:\n import pytesseract\n \n # Extract text using OCR\n extracted_text = pytesseract.image_to_string(pil_image)\n \n # Add OCR text to attachment\n att.text += f\"\\n## OCR Extracted Text\\n\\n{extracted_text}\\n\"\n \n # Add metadata\n att.metadata['ocr_extracted'] = True\n att.metadata['ocr_text_length'] = len(extracted_text)\n \n return att\n \n except ImportError:\n att.text += \"\\n## OCR Not Available\\n\\nInstall pytesseract: pip install pytesseract\\n\"\n return att\n```\n\n**How it works:**\n1. **Save the file** anywhere in your project\n2. **Import it** before using attachments: `import my_ocr_presenter`\n3. **Use automatically**: `Attachments(\"scanned_document.png\")` will now include OCR text\n\n**Other extension points:**\n- `@loader` - Add support for new file formats\n- `@modifier` - Add new transformations (crop, rotate, etc.)\n- `@presenter` - Add new content extraction methods\n- `@refiner` - Add post-processing steps\n- `@adapter` - Add new API format outputs\n\n---\n\n## API reference (essentials)\n\n| Object / method | Description |\n| ----------------------- | --------------------------------------------------------------- |\n| `Attachments(*sources)` | Many `Attachment` objects flattened into one container |\n| `Attachments.text` | All text joined with blank lines |\n| `Attachments.images` | Flat list of base64 PNGs |\n| `.claude(prompt=\"\")` | Claude API format with image support |\n| `.openai_chat(prompt=\"\")` | OpenAI Chat Completions API format |\n| `.openai_responses(prompt=\"\")` | OpenAI Responses API format (different structure) |\n| `.openai(prompt=\"\")` | Alias for openai_chat (backwards compatibility) |\n| `.dspy()` | DSPy BaseType-compatible objects |\n\n### Grammar System (Advanced)\n\n| Namespace | Purpose | Examples |\n|-----------|---------|----------|\n| `load.*` | File format \u2192 objects | `pdf_to_pdfplumber`, `csv_to_pandas`, `url_to_bs4` |\n| `modify.*` | Transform objects | `pages`, `limit`, `select`, `crop`, `rotate` |\n| `present.*` | Extract content | `text`, `images`, `markdown`, `summary` |\n| `refine.*` | Post-process | `truncate`, `add_headers`, `tile_images` |\n| `adapt.*` | Format for APIs | `claude`, `openai`, `dspy` |\n\n**Operators**: `|` (sequential), `+` (additive)\n\n---\n\n### Roadmap\n- [ ] **Documentation**: Architecture, Grammar, How to extend, examples (at least 1 per pipeline), DSL, API reference\n- [ ] **Test coverage**: 100% for pipelines, 100% for DSL.\n- [ ] **More pipelines**: Google Suite, Google Drive, Email(!?), Youtube url, X link, ChatGPT url, Slack url (?), data (parquet, duckdb, arrow, sqlite), etc.\n- [ ] **More adapters**: Bedrock, Azure, Openrouter, Ollama (?), Litellm, Langchain, vllm(?), sglang(?), cossette, claudette, etc.\n- [ ] **Add .audio and .video**: and corresponding pipelines.\n\nJoin us \u2013 file an issue or open a PR! \ud83d\ude80\n\n## Star History\n\n[](https://www.star-history.com/#maximerivest/attachments&Date)\n\n## Installation\n\n```bash\n# Stable release (recommended for most users)\npip install attachments\n\n# Alpha testing (latest features, may have bugs)\npip install attachments==0.13.0a1\n# or\npip install --pre attachments\n```\n\n### \ud83e\uddea Alpha Testing\n\nWe're actively developing new features! If you want to test the latest capabilities:\n\n**Install alpha version:**\n```bash\npip install attachments==0.13.0a1\n```\n\n**What's new in alpha:**\n- \ud83d\udd0d Enhanced DSL cheatsheet with types, defaults, and allowable values\n- \ud83d\udcca Automatic DSL command discovery and documentation\n- \ud83d\ude80 Improved logging and verbosity system\n- \ud83d\udee0\ufe0f Better error messages and suggestions\n\n**Feedback welcome:** [GitHub Issues](https://github.com/MaximeRivest/attachments/issues)\n\n---\n",
"bugtrack_url": null,
"license": null,
"summary": "The Python funnel for LLM context - turn any file into model-ready text + images, in one line.",
"version": "0.20.0",
"project_urls": {
"Bug Tracker": "https://github.com/maximrivest/attachments/issues",
"Changelog": "https://maximerivest.github.io/attachments/changelog.html",
"Documentation": "https://maximerivest.github.io/attachments/",
"Homepage": "https://maximerivest.github.io/attachments/",
"Repository": "https://github.com/maximrivest/attachments"
},
"split_keywords": [
"llm",
" ai",
" pdf",
" document",
" multimodal",
" openai",
" claude",
" context",
" attachments"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "64e48c2dd751203b5cb320023bc4e7f0e192cd9964da733f55fdc80647ede895",
"md5": "7bc5ebb7bd614289ed9e17a4166e42d2",
"sha256": "6072ebb250754bb8ed4eb7c1397eeea76e355b0e28cba96f5ba21e790f39c587"
},
"downloads": -1,
"filename": "attachments-0.20.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7bc5ebb7bd614289ed9e17a4166e42d2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 136181,
"upload_time": "2025-07-10T00:04:59",
"upload_time_iso_8601": "2025-07-10T00:04:59.606658Z",
"url": "https://files.pythonhosted.org/packages/64/e4/8c2dd751203b5cb320023bc4e7f0e192cd9964da733f55fdc80647ede895/attachments-0.20.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "dd47f2fe02c635297c9cad9455e768c05d4deeb6cce606d4d93a6b6113255092",
"md5": "8f31fc3b0811eebbe2ba58182deed598",
"sha256": "5777573bb9f0a90544804a6c168927d5bf3367605cc5db2495a8650abd5710a9"
},
"downloads": -1,
"filename": "attachments-0.20.0.tar.gz",
"has_sig": false,
"md5_digest": "8f31fc3b0811eebbe2ba58182deed598",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 124289,
"upload_time": "2025-07-10T00:05:01",
"upload_time_iso_8601": "2025-07-10T00:05:01.543194Z",
"url": "https://files.pythonhosted.org/packages/dd/47/f2fe02c635297c9cad9455e768c05d4deeb6cce606d4d93a6b6113255092/attachments-0.20.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-10 00:05:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "maximrivest",
"github_project": "attachments",
"github_not_found": true,
"lcname": "attachments"
}