# ccontext
**ccontext (collect-context)** is a cross-platform utility designed to streamline the process of gathering and sending the context of a directory to large language models (LLMs) like ChatGPT-4o. Our mission is to make collecting and sending context to an LLM as easy as possible.
## π Demo: Witness ccontext in Action! π₯
β οΈ Warning: You May Be Amazed! π€―
https://github.com/user-attachments/assets/c0a98dbc-d971-41dc-abe1-dad4be42e1ee
## Features
**Features**
- π **Easy Setup**: Quick installation and configuration.
- π **Cross-Platform Support**: Supports Windows, macOS, and Linux.
- πΎ **Binary File Support**: Handle various binary files including PDFs, Word documents, images, audio, and video files.
- π **Markdown and PDF Generation**: Generate detailed Markdown and PDF files of the directory structure and file contents.
- π **Crawling of (documentation) Sites**: Crawl and gather data from multiple sites using a specified list of URLs.
- βοΈ **Tokenization and Chunking**: Automatically handles tokenization and chunking to stay within LLM token limits.
- π§ **Configurable Exclusions and Inclusions**: Flexibly specify which files and directories to include or exclude.
- π£οΈ **Verbose Output**: Optional verbose mode for detailed output and debugging.
- π **Prompt Templates** (Upcoming): Create and use custom templates for different types of prompts.
## Table of Contents
- [Installation](#installation)
- [Usage](#usage)
- [Configuration](#configuration)
- [Binary File Handling](#binary-file-handling)
- [Document Crawling](#document-crawling)
- [Use Cases and Examples](#use-cases-and-examples)
- [Troubleshooting](#troubleshooting)
- [Development Guide](#development-guide)
## Installation
### Using pipx (Recommended)
We recommend installing ccontext using pipx. pipx is a tool that lets you install and run Python applications in isolated environments, ensuring clean installation and easy management of CLI applications.
1. First, install pipx if you haven't already:
```sh
# On macOS
brew install pipx
pipx ensurepath
# On Ubuntu/Debian
sudo apt install pipx
pipx ensurepath
# On Windows
python -m pip install --user pipx
python -m pipx ensurepath
# or read https://pipx.pypa.io/stable/installation/#on-windows
```
2. Install ccontext using pipx:
```sh
pipx install ccontext
```
Why use pipx?
- **Isolated Environment**: Each application runs in its own virtual environment
- **No Dependency Conflicts**: Avoids conflicts with other Python packages
- **Easy Updates**: Simple command to upgrade (`pipx upgrade ccontext`)
- **Clean Uninstallation**: Remove everything with one command (`pipx uninstall ccontext`)
- **Global Access**: Installed applications are available system-wide
### Alternative: Installing from Source
If you prefer to install from source:
1. Clone the repository:
```sh
git clone https://github.com/oxillix/ccontext.git
cd ccontext
```
2. Set up a virtual environment:
```sh
python3 -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
```
3. Install dependencies:
```sh
pip install -r requirements.txt
```
4. Install the package:
```sh
pip install .
```
## Usage
### Basic Usage
1. Run `ccontext` in the folder to ccollect with default settings defined in `~/.ccontext/config.json`:
```sh
ccontext
```
2. Specify a root path, exclusions, and inclusions:
```sh
ccontext -p /path/to/directory -e ".git|node_modules" -i "important_file.txt|docs"
```
### Command-Line Arguments
- `-h, --help`: Show help message.
- `-p, --root_path`: The root path to start the directory tree (default: current directory).
- `-e, --excludes`: Additional files or directories to exclude, separated by `|`, e.g., `node_modules|.git`.
- `-i, --includes`: Files or directories to include, separated by `|`, e.g., `important_file.txt|docs`.
- `-m, --max_tokens`: Maximum number of tokens allowed before chunking.
- `-c, --config`: Path to a custom configuration file.
- `-v, --verbose`: Enable verbose output to stdout.
- `-ig, --ignore_gitignore`: Ignore the `.gitignore` file for exclusions.
- `-g, --generate-pdf`: Generate a PDF of the directory tree and file contents.
- `-gm, --generate-md`: Generate a Markdown file of the directory tree and file contents.
- `--crawl`: Crawls the sites specified in the config.
### Example
```sh
ccontext -p /home/user/project -e ".git|build" -i "README.md|src"
```
## Configuration
### Configuration File Location
ccontext looks for configuration in the following order:
1. Custom config file specified via `-c` argument
2. `.ccontext-config.json` in the current directory
- If present, ccontext will automatically detect and use this local configuration file
- Create this file in the same directory where you run the ccontext command
3. `~/.ccontext/config.json` (default user configuration)
### Configuration Options
```json
{
"verbose": false, // Enable detailed output
"max_tokens": 115000, // Maximum tokens before chunking
"model_type": "gpt-4o", // LLM model type for tokenization
"buffer_size": 0.05, // Token buffer size (0-1)
// System prompt for LLM context
"context_prompt": "[[SYSTEM INSTRUCTIONS]] The following output represents...",
// Web crawler configuration
"urls_to_crawl": [
{
"url": "https://www.django-rest-framework.org/",
"match": ["https://www.django-rest-framework.org/**"],
"exclude": ["https://www.django-rest-framework.org/community/**"],
"selector": "",
"maxPagesToCrawl": 100,
"outputFileName": "django-rest-framework.org.json",
"maxTokens": 10000000
}
],
// Files/folders to explicitly include
"included_folders_files": [],
// Files/folders to exclude (supports glob patterns)
"excluded_folders_files": [
"**/.git",
"**/bin",
"**/build",
"**/node_modules/**",
"**/venv",
"**/__pycache__",
"**/package-lock.json",
"**/ccontext.egg-info",
"**/dist",
"**/__tests__",
"**/coverage",
"**/.next",
"**/pnpm-lock.yaml",
"**/poetry.lock",
"**/ccontext-output.pdf",
"**/ccontext-output.md",
"**/*.phpstorm.meta.php",
"**/*.min.js",
"**/composer.lock",
"**/*.lock",
"**/vendor",
"**/laravel_access.log",
"**/gpt-crawler",
"**/*.DS_Store",
"**/*.tox"
],
// File extensions that can be uploaded to LLMs
"uploadable_extensions": [
// Documents
".pdf",
".doc",
".docx",
".xls",
".xlsx",
".ppt",
".pptx",
// Images
".jpg",
".jpeg",
".png",
".gif",
".bmp",
".tiff",
".webp",
".heic",
// Audio
".mp3",
".wav",
".ogg",
".flac",
".aac",
".m4a",
// Video
".mp4",
".mkv",
".avi",
".mov",
".wmv",
".webm",
// Archives
".zip",
".rar",
".7z",
".tar",
".gz",
// Binary/System
".exe",
".dll",
".iso",
".dmg",
".bin",
".dat",
".apk",
".img",
".so",
".swf",
".psd"
]
}
```
### Understanding Glob Patterns
ccontext uses the `wcmatch` library for glob pattern matching, which gives you powerful but easy-to-use file matching capabilities. Here's a simple guide to using glob patterns:
1. **Important Wildcards Explained:**
- `*` (single star): Matches anything in the current folder only
```
"*.txt" # Matches: a.txt, b.txt (in current folder)
"*.txt" # Won't match: sub/a.txt, deep/sub/b.txt
```
- `**` (double star): Matches any number of folders
```
"**/temp" # Matches: temp, sub/temp, deep/sub/temp
"**/temp" # Won't match: temp/file.txt
```
- `**/*` (double star slash star): Matches everything in all folders
```
"**/*.txt" # Matches: a.txt, sub/b.txt, very/deep/c.txt
"**/*" # Matches everything, everywhere
```
- `?` matches any single character
- `.txt` matches exact file extension
2. **Simple Examples:**
```json
{
"excluded_folders_files": [
// Basic matching
"temp.txt", // Matches exact file temp.txt
"*.txt", // Matches all .txt files in root folder
"**/*.txt", // Matches all .txt files in any folder
// Folder matching
"temp/*", // Matches everything in temp folder
"**/temp", // Matches temp folder anywhere
"**/temp/**", // Matches everything in any temp folder
// Common use cases
"**/node_modules", // Matches node_modules folders anywhere
"**/__pycache__", // Matches Python cache folders
"**/*.pyc", // Matches Python compiled files
"build/*" // Matches everything in build folder
]
}
```
3. **Tips for Beginners:**
- Start simple! Use `*.ext` for file extensions
- Use `**/` when you want to match in any folder
- Test your patterns with a small folder first
- When in doubt, be more specific
- Remember, patterns are case-sensitive
The glob system is very forgiving - if you make a mistake, it usually just won't match anything rather than causing errors. Feel free to experiment!
### Configuration Options Explained
| Option | Description | Default |
| ---------------------- | ------------------------------- | ------------- |
| verbose | Enable detailed output | false |
| max_tokens | Maximum tokens before chunking | 115000 |
| model_type | LLM model type for tokenization | "gpt-4o" |
| buffer_size | Token buffer size (0-1) | 0.05 |
| excluded_folders_files | Glob patterns for exclusion | [".git", ...] |
| included_folders_files | Glob patterns for inclusion | [] |
| uploadable_extensions | File extensions to upload | [".pdf", ...] |
## Binary File Handling
ccontext supports handling binary files through the `uploadable_extensions` configuration.
### Supported Binary Files
- **Documents**: `.pdf`, `.doc`, `.docx`, `.xls`, `.xlsx`, `.ppt`, `.pptx`
- **Images**: `.jpg`, `.jpeg`, `.png`, `.gif`, `.bmp`, `.tiff`, `.webp`, `.heic`
- **Audio**: `.mp3`, `.wav`, `.ogg`, `.flac`, `.aac`, `.m4a`
- **Video**: `.mp4`, `.mkv`, `.avi`, `.mov`, `.wmv`, `.webm`
- **Archives**: `.zip`, `.rar`, `.7z`, `.tar`, `.gz`
- **Binary/System**: `.exe`, `.dll`, `.iso`, `.dmg`, `.bin`, `.dat`, `.apk`, `.img`, `.so`, `.swf`, `.psd`
### Binary File Processing
- Binary files matching `uploadable_extensions` are prepared for upload to LLMs
- File references are automatically copied to clipboard
- Most LLM providers limit maximum of X binary files per prompt
- Rate limits may apply based on your LLM provider
Example configuration for handling specific file types:
```json
{
"uploadable_extensions": [".pdf", ".jpg", ".png", ".xlsx"]
}
```
## Document Crawling
The crawling feature allows you to gather documentation from websites for context.
### Crawler Configuration
```json
{
"urls_to_crawl": [
{
"url": "https://docs.example.com",
"match": ["https://docs.example.com/**"],
"exclude": ["https://docs.example.com/internal/**"],
"selector": "",
"maxPagesToCrawl": 100,
"outputFileName": "docs.json",
"maxTokens": 2000000
}
]
}
```
### Crawler Options
- **url**: Starting URL for crawling
- **match**: Glob patterns for URLs to include
- **exclude**: Glob patterns for URLs to exclude
- **selector**: CSS selector for content extraction
- **maxPagesToCrawl**: Limit on pages to crawl
- **outputFileName**: Name of output file
- **maxTokens**: Maximum tokens to collect
### Best Practices
- Use specific `match` patterns
- Respect robots.txt and site policies
## Use Cases and Examples
### Common Usage Patterns
1. **Analyzing a Python Project**
```sh
ccontext -p /path/to/project -e "venv|__pycache__|*.pyc"
```
2. **Processing Documentation**
```sh
ccontext -p ./docs --crawl -gm
```
3. **Including Specific Files**
```sh
ccontext -i "README.md|docs/*|*.py"
```
4. **Generating PDF and Markdown**
```sh
ccontext -g -gm # Generates both PDF and Markdown
```
### Integration Examples
1. **With GitHub Copilot**
```sh
ccontext -p . -e "node_modules|dist" -i "src/**/*.ts"
```
2. **With ChatGPT (webapp has max 32k) **
```sh
ccontext -p . --max_tokens 32000
```
## Troubleshooting
### Common Issues
1. **Clipboard Issues in SSH**
- Issue: Cannot copy to clipboard in SSH session
- Solution:
- Use SSH with X11 forwarding (`ssh -X user@host`), test using xeyes
- On Mac, install XQuartz (`brew install --cask xquartz`)
2. **Token Limit Exceeded**
- Issue: Content too large for LLM
- Solution: Adjust `max_tokens` or use chunking feature
3. **Binary File Handling**
- Issue: Binary files not being processed
- Solution: Check `uploadable_extensions` configuration
### Platform-Specific Issues
#### Windows: Use WSL if possible!
Otherwise:
- Issue: Path separators in configuration
- Solution: Use forward slashes or escaped backslashes
#### Linux
- Issue: X11 clipboard support
- Solution: Install xclip or xsel
#### macOS
- Issue: Clipboard permissions
- Solution: Grant terminal app accessibility permissions
## Development Guide
### Project Structure
```
ccontext/
βββ ccontext/ # Main package directory
β βββ __init__.py
β βββ main.py # Entry point
β βββ file_tree.py # Tree operations
β βββ ...
βββ tests/ # Test directory
βββ docs/ # Documentation
βββ examples/ # Example configurations
```
### Development Setup
1. Clone the repository
2. Create a virtual environment
3. Install development dependencies
4. Run tests
```sh
git clone https://github.com/oxillix/ccontext.git
# or
git clone git@github.com:NicolasArnouts/ccontext.git
cd ccontext
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
pip3 install -e .
```
### Contributing Guidelines
1. Fork the repository
2. Create a feature branch
3. Write tests for new features
4. Submit a pull request
### Code Style
- Follow PEP 8 guidelines
- use isort and black
- Use type hints
- Keep functions focused and small
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- Thanks to all contributors! π
- Inspired by the need for better context handling in AI interactions.
- Built with love and passion for the developer community! π
---
Feel free to raise issues or contribute to the project. We appreciate your support!
Happy coding adventures! π
**Nicolas Arnouts**
Looking for a skilled freelancer? Iβm available for hire!
Letβs collaborate β reach out to me at:
arnouts.software@gmail.com
---
### Badges
[![PyPI version](https://badge.fury.io/py/ccontext.svg)](https://badge.fury.io/py/ccontext)
[![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/NicolasArnouts/ccontext/blob/main/LICENSE)
[![Platform](https://img.shields.io/badge/platform-Windows%20|%20macOS%20|%20Linux-lightgrey.svg)]()
Raw data
{
"_id": null,
"home_page": null,
"name": "ccontext",
"maintainer": null,
"docs_url": null,
"requires_python": "<4,>=3.8",
"maintainer_email": null,
"keywords": "context, ccontenxt, collect context, llm, chatgpt",
"author": "Nicolas Arnouts",
"author_email": "arnouts.software@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/09/d1/cc32391326263b46788fbe821429218fa8c2bae3a2c75d82ea65026d89ca/ccontext-0.3.5.tar.gz",
"platform": null,
"description": "# ccontext\n\n**ccontext (collect-context)** is a cross-platform utility designed to streamline the process of gathering and sending the context of a directory to large language models (LLMs) like ChatGPT-4o. Our mission is to make collecting and sending context to an LLM as easy as possible.\n\n\n## \ud83d\ude80 Demo: Witness ccontext in Action! \ud83c\udfa5\n\u26a0\ufe0f Warning: You May Be Amazed! \ud83e\udd2f\n\nhttps://github.com/user-attachments/assets/c0a98dbc-d971-41dc-abe1-dad4be42e1ee\n\n\n## Features\n\n**Features**\n\n- \ud83c\udf1f **Easy Setup**: Quick installation and configuration.\n- \ud83c\udf0d **Cross-Platform Support**: Supports Windows, macOS, and Linux.\n- \ud83d\udcbe **Binary File Support**: Handle various binary files including PDFs, Word documents, images, audio, and video files.\n- \ud83d\udcc4 **Markdown and PDF Generation**: Generate detailed Markdown and PDF files of the directory structure and file contents.\n- \ud83c\udf10 **Crawling of (documentation) Sites**: Crawl and gather data from multiple sites using a specified list of URLs.\n- \u2702\ufe0f **Tokenization and Chunking**: Automatically handles tokenization and chunking to stay within LLM token limits.\n- \ud83d\udd27 **Configurable Exclusions and Inclusions**: Flexibly specify which files and directories to include or exclude.\n- \ud83d\udde3\ufe0f **Verbose Output**: Optional verbose mode for detailed output and debugging.\n- \ud83d\udcdd **Prompt Templates** (Upcoming): Create and use custom templates for different types of prompts.\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Usage](#usage)\n- [Configuration](#configuration)\n- [Binary File Handling](#binary-file-handling)\n- [Document Crawling](#document-crawling)\n- [Use Cases and Examples](#use-cases-and-examples)\n- [Troubleshooting](#troubleshooting)\n- [Development Guide](#development-guide)\n\n## Installation\n\n### Using pipx (Recommended)\n\nWe recommend installing ccontext using pipx. pipx is a tool that lets you install and run Python applications in isolated environments, ensuring clean installation and easy management of CLI applications.\n\n1. First, install pipx if you haven't already:\n\n ```sh\n # On macOS\n brew install pipx\n pipx ensurepath\n\n # On Ubuntu/Debian\n sudo apt install pipx\n pipx ensurepath\n\n # On Windows\n python -m pip install --user pipx\n python -m pipx ensurepath\n # or read https://pipx.pypa.io/stable/installation/#on-windows\n ```\n\n2. Install ccontext using pipx:\n\n ```sh\n pipx install ccontext\n ```\n\nWhy use pipx?\n\n- **Isolated Environment**: Each application runs in its own virtual environment\n- **No Dependency Conflicts**: Avoids conflicts with other Python packages\n- **Easy Updates**: Simple command to upgrade (`pipx upgrade ccontext`)\n- **Clean Uninstallation**: Remove everything with one command (`pipx uninstall ccontext`)\n- **Global Access**: Installed applications are available system-wide\n\n### Alternative: Installing from Source\n\nIf you prefer to install from source:\n\n1. Clone the repository:\n\n ```sh\n git clone https://github.com/oxillix/ccontext.git\n cd ccontext\n ```\n\n2. Set up a virtual environment:\n\n ```sh\n python3 -m venv venv\n source venv/bin/activate # On Windows, use `venv\\Scripts\\activate`\n ```\n\n3. Install dependencies:\n\n ```sh\n pip install -r requirements.txt\n ```\n\n4. Install the package:\n\n ```sh\n pip install .\n ```\n\n## Usage\n\n### Basic Usage\n\n1. Run `ccontext` in the folder to ccollect with default settings defined in `~/.ccontext/config.json`:\n\n ```sh\n ccontext\n ```\n\n2. Specify a root path, exclusions, and inclusions:\n\n ```sh\n ccontext -p /path/to/directory -e \".git|node_modules\" -i \"important_file.txt|docs\"\n ```\n\n### Command-Line Arguments\n\n- `-h, --help`: Show help message.\n- `-p, --root_path`: The root path to start the directory tree (default: current directory).\n- `-e, --excludes`: Additional files or directories to exclude, separated by `|`, e.g., `node_modules|.git`.\n- `-i, --includes`: Files or directories to include, separated by `|`, e.g., `important_file.txt|docs`.\n- `-m, --max_tokens`: Maximum number of tokens allowed before chunking.\n- `-c, --config`: Path to a custom configuration file.\n- `-v, --verbose`: Enable verbose output to stdout.\n- `-ig, --ignore_gitignore`: Ignore the `.gitignore` file for exclusions.\n- `-g, --generate-pdf`: Generate a PDF of the directory tree and file contents.\n- `-gm, --generate-md`: Generate a Markdown file of the directory tree and file contents.\n- `--crawl`: Crawls the sites specified in the config.\n\n### Example\n\n```sh\nccontext -p /home/user/project -e \".git|build\" -i \"README.md|src\"\n```\n\n## Configuration\n\n### Configuration File Location\n\nccontext looks for configuration in the following order:\n\n1. Custom config file specified via `-c` argument\n2. `.ccontext-config.json` in the current directory\n - If present, ccontext will automatically detect and use this local configuration file\n - Create this file in the same directory where you run the ccontext command\n3. `~/.ccontext/config.json` (default user configuration)\n\n### Configuration Options\n\n```json\n{\n \"verbose\": false, // Enable detailed output\n \"max_tokens\": 115000, // Maximum tokens before chunking\n \"model_type\": \"gpt-4o\", // LLM model type for tokenization\n \"buffer_size\": 0.05, // Token buffer size (0-1)\n\n // System prompt for LLM context\n \"context_prompt\": \"[[SYSTEM INSTRUCTIONS]] The following output represents...\",\n\n // Web crawler configuration\n \"urls_to_crawl\": [\n {\n \"url\": \"https://www.django-rest-framework.org/\",\n \"match\": [\"https://www.django-rest-framework.org/**\"],\n \"exclude\": [\"https://www.django-rest-framework.org/community/**\"],\n \"selector\": \"\",\n \"maxPagesToCrawl\": 100,\n \"outputFileName\": \"django-rest-framework.org.json\",\n \"maxTokens\": 10000000\n }\n ],\n\n // Files/folders to explicitly include\n \"included_folders_files\": [],\n\n // Files/folders to exclude (supports glob patterns)\n \"excluded_folders_files\": [\n \"**/.git\",\n \"**/bin\",\n \"**/build\",\n \"**/node_modules/**\",\n \"**/venv\",\n \"**/__pycache__\",\n \"**/package-lock.json\",\n \"**/ccontext.egg-info\",\n \"**/dist\",\n \"**/__tests__\",\n \"**/coverage\",\n \"**/.next\",\n \"**/pnpm-lock.yaml\",\n \"**/poetry.lock\",\n \"**/ccontext-output.pdf\",\n \"**/ccontext-output.md\",\n \"**/*.phpstorm.meta.php\",\n \"**/*.min.js\",\n \"**/composer.lock\",\n \"**/*.lock\",\n \"**/vendor\",\n \"**/laravel_access.log\",\n \"**/gpt-crawler\",\n \"**/*.DS_Store\",\n \"**/*.tox\"\n ],\n\n // File extensions that can be uploaded to LLMs\n \"uploadable_extensions\": [\n // Documents\n \".pdf\",\n \".doc\",\n \".docx\",\n \".xls\",\n \".xlsx\",\n \".ppt\",\n \".pptx\",\n\n // Images\n \".jpg\",\n \".jpeg\",\n \".png\",\n \".gif\",\n \".bmp\",\n \".tiff\",\n \".webp\",\n \".heic\",\n\n // Audio\n \".mp3\",\n \".wav\",\n \".ogg\",\n \".flac\",\n \".aac\",\n \".m4a\",\n\n // Video\n \".mp4\",\n \".mkv\",\n \".avi\",\n \".mov\",\n \".wmv\",\n \".webm\",\n\n // Archives\n \".zip\",\n \".rar\",\n \".7z\",\n \".tar\",\n \".gz\",\n\n // Binary/System\n \".exe\",\n \".dll\",\n \".iso\",\n \".dmg\",\n \".bin\",\n \".dat\",\n \".apk\",\n \".img\",\n \".so\",\n \".swf\",\n \".psd\"\n ]\n}\n```\n\n### Understanding Glob Patterns\n\nccontext uses the `wcmatch` library for glob pattern matching, which gives you powerful but easy-to-use file matching capabilities. Here's a simple guide to using glob patterns:\n\n1. **Important Wildcards Explained:**\n\n - `*` (single star): Matches anything in the current folder only\n\n ```\n \"*.txt\" # Matches: a.txt, b.txt (in current folder)\n \"*.txt\" # Won't match: sub/a.txt, deep/sub/b.txt\n ```\n\n - `**` (double star): Matches any number of folders\n\n ```\n \"**/temp\" # Matches: temp, sub/temp, deep/sub/temp\n \"**/temp\" # Won't match: temp/file.txt\n ```\n\n - `**/*` (double star slash star): Matches everything in all folders\n\n ```\n \"**/*.txt\" # Matches: a.txt, sub/b.txt, very/deep/c.txt\n \"**/*\" # Matches everything, everywhere\n ```\n\n - `?` matches any single character\n - `.txt` matches exact file extension\n\n2. **Simple Examples:**\n\n ```json\n {\n \"excluded_folders_files\": [\n // Basic matching\n \"temp.txt\", // Matches exact file temp.txt\n \"*.txt\", // Matches all .txt files in root folder\n \"**/*.txt\", // Matches all .txt files in any folder\n\n // Folder matching\n \"temp/*\", // Matches everything in temp folder\n \"**/temp\", // Matches temp folder anywhere\n \"**/temp/**\", // Matches everything in any temp folder\n\n // Common use cases\n \"**/node_modules\", // Matches node_modules folders anywhere\n \"**/__pycache__\", // Matches Python cache folders\n \"**/*.pyc\", // Matches Python compiled files\n \"build/*\" // Matches everything in build folder\n ]\n }\n ```\n\n3. **Tips for Beginners:**\n - Start simple! Use `*.ext` for file extensions\n - Use `**/` when you want to match in any folder\n - Test your patterns with a small folder first\n - When in doubt, be more specific\n - Remember, patterns are case-sensitive\n\nThe glob system is very forgiving - if you make a mistake, it usually just won't match anything rather than causing errors. Feel free to experiment!\n\n### Configuration Options Explained\n\n| Option | Description | Default |\n| ---------------------- | ------------------------------- | ------------- |\n| verbose | Enable detailed output | false |\n| max_tokens | Maximum tokens before chunking | 115000 |\n| model_type | LLM model type for tokenization | \"gpt-4o\" |\n| buffer_size | Token buffer size (0-1) | 0.05 |\n| excluded_folders_files | Glob patterns for exclusion | [\".git\", ...] |\n| included_folders_files | Glob patterns for inclusion | [] |\n| uploadable_extensions | File extensions to upload | [\".pdf\", ...] |\n\n## Binary File Handling\n\nccontext supports handling binary files through the `uploadable_extensions` configuration.\n\n### Supported Binary Files\n\n- **Documents**: `.pdf`, `.doc`, `.docx`, `.xls`, `.xlsx`, `.ppt`, `.pptx`\n- **Images**: `.jpg`, `.jpeg`, `.png`, `.gif`, `.bmp`, `.tiff`, `.webp`, `.heic`\n- **Audio**: `.mp3`, `.wav`, `.ogg`, `.flac`, `.aac`, `.m4a`\n- **Video**: `.mp4`, `.mkv`, `.avi`, `.mov`, `.wmv`, `.webm`\n- **Archives**: `.zip`, `.rar`, `.7z`, `.tar`, `.gz`\n- **Binary/System**: `.exe`, `.dll`, `.iso`, `.dmg`, `.bin`, `.dat`, `.apk`, `.img`, `.so`, `.swf`, `.psd`\n\n### Binary File Processing\n\n- Binary files matching `uploadable_extensions` are prepared for upload to LLMs\n- File references are automatically copied to clipboard\n- Most LLM providers limit maximum of X binary files per prompt\n- Rate limits may apply based on your LLM provider\n\nExample configuration for handling specific file types:\n\n```json\n{\n \"uploadable_extensions\": [\".pdf\", \".jpg\", \".png\", \".xlsx\"]\n}\n```\n\n## Document Crawling\n\nThe crawling feature allows you to gather documentation from websites for context.\n\n### Crawler Configuration\n\n```json\n{\n \"urls_to_crawl\": [\n {\n \"url\": \"https://docs.example.com\",\n \"match\": [\"https://docs.example.com/**\"],\n \"exclude\": [\"https://docs.example.com/internal/**\"],\n \"selector\": \"\",\n \"maxPagesToCrawl\": 100,\n \"outputFileName\": \"docs.json\",\n \"maxTokens\": 2000000\n }\n ]\n}\n```\n\n### Crawler Options\n\n- **url**: Starting URL for crawling\n- **match**: Glob patterns for URLs to include\n- **exclude**: Glob patterns for URLs to exclude\n- **selector**: CSS selector for content extraction\n- **maxPagesToCrawl**: Limit on pages to crawl\n- **outputFileName**: Name of output file\n- **maxTokens**: Maximum tokens to collect\n\n### Best Practices\n\n- Use specific `match` patterns\n- Respect robots.txt and site policies\n\n## Use Cases and Examples\n\n### Common Usage Patterns\n\n1. **Analyzing a Python Project**\n\n```sh\nccontext -p /path/to/project -e \"venv|__pycache__|*.pyc\"\n```\n\n2. **Processing Documentation**\n\n```sh\nccontext -p ./docs --crawl -gm\n```\n\n3. **Including Specific Files**\n\n```sh\nccontext -i \"README.md|docs/*|*.py\"\n```\n\n4. **Generating PDF and Markdown**\n\n```sh\nccontext -g -gm # Generates both PDF and Markdown\n```\n\n### Integration Examples\n\n1. **With GitHub Copilot**\n\n```sh\nccontext -p . -e \"node_modules|dist\" -i \"src/**/*.ts\"\n```\n\n2. **With ChatGPT (webapp has max 32k) **\n\n```sh\nccontext -p . --max_tokens 32000\n```\n\n## Troubleshooting\n\n### Common Issues\n\n1. **Clipboard Issues in SSH**\n\n - Issue: Cannot copy to clipboard in SSH session\n - Solution:\n - Use SSH with X11 forwarding (`ssh -X user@host`), test using xeyes\n - On Mac, install XQuartz (`brew install --cask xquartz`)\n\n2. **Token Limit Exceeded**\n\n - Issue: Content too large for LLM\n - Solution: Adjust `max_tokens` or use chunking feature\n\n3. **Binary File Handling**\n - Issue: Binary files not being processed\n - Solution: Check `uploadable_extensions` configuration\n\n### Platform-Specific Issues\n\n#### Windows: Use WSL if possible!\n\nOtherwise:\n\n- Issue: Path separators in configuration\n- Solution: Use forward slashes or escaped backslashes\n\n#### Linux\n\n- Issue: X11 clipboard support\n- Solution: Install xclip or xsel\n\n#### macOS\n\n- Issue: Clipboard permissions\n- Solution: Grant terminal app accessibility permissions\n\n## Development Guide\n\n### Project Structure\n\n```\nccontext/\n\u251c\u2500\u2500 ccontext/ # Main package directory\n\u2502 \u251c\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 main.py # Entry point\n\u2502 \u251c\u2500\u2500 file_tree.py # Tree operations\n\u2502 \u2514\u2500\u2500 ...\n\u251c\u2500\u2500 tests/ # Test directory\n\u251c\u2500\u2500 docs/ # Documentation\n\u2514\u2500\u2500 examples/ # Example configurations\n```\n\n### Development Setup\n\n1. Clone the repository\n2. Create a virtual environment\n3. Install development dependencies\n4. Run tests\n\n```sh\ngit clone https://github.com/oxillix/ccontext.git\n# or\ngit clone git@github.com:NicolasArnouts/ccontext.git\ncd ccontext\npython3 -m venv venv\nsource venv/bin/activate\npip3 install -r requirements.txt\npip3 install -e .\n```\n\n### Contributing Guidelines\n\n1. Fork the repository\n2. Create a feature branch\n3. Write tests for new features\n4. Submit a pull request\n\n### Code Style\n\n- Follow PEP 8 guidelines\n- use isort and black\n- Use type hints\n- Keep functions focused and small\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- Thanks to all contributors! \ud83d\ude0a\n- Inspired by the need for better context handling in AI interactions.\n- Built with love and passion for the developer community! \ud83d\udc96\n\n---\n\nFeel free to raise issues or contribute to the project. We appreciate your support!\n\nHappy coding adventures! \ud83d\ude80\n**Nicolas Arnouts**\n\nLooking for a skilled freelancer? I\u2019m available for hire!\nLet\u2019s collaborate \u2014 reach out to me at:\narnouts.software@gmail.com\n\n---\n\n### Badges\n\n[![PyPI version](https://badge.fury.io/py/ccontext.svg)](https://badge.fury.io/py/ccontext)\n[![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/NicolasArnouts/ccontext/blob/main/LICENSE)\n[![Platform](https://img.shields.io/badge/platform-Windows%20|%20macOS%20|%20Linux-lightgrey.svg)]()\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "collect-context: Makes the process of collecting and sending context to an LLM like ChatGPT-4o as easy as possible.",
"version": "0.3.5",
"project_urls": {
"Documentation": "https://github.com/NicolasArnouts/ccontext",
"Homepage": "https://github.com/NicolasArnouts/ccontext",
"Issues": "https://github.com/NicolasArnouts/ccontext/issues",
"Repository": "https://github.com/NicolasArnouts/ccontext"
},
"split_keywords": [
"context",
" ccontenxt",
" collect context",
" llm",
" chatgpt"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0973a9074c32cdd33237b0168e9f135a5392825b6ba9a647db3e8b9bb9ac1d7b",
"md5": "1e111156d3081efdfbdf4b459050c541",
"sha256": "4d7f93a90c032defa06d695052e44d1d3c618bf1a36cc2d27b3c2a5b59b75613"
},
"downloads": -1,
"filename": "ccontext-0.3.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1e111156d3081efdfbdf4b459050c541",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.8",
"size": 1417180,
"upload_time": "2025-01-12T19:10:04",
"upload_time_iso_8601": "2025-01-12T19:10:04.876402Z",
"url": "https://files.pythonhosted.org/packages/09/73/a9074c32cdd33237b0168e9f135a5392825b6ba9a647db3e8b9bb9ac1d7b/ccontext-0.3.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "09d1cc32391326263b46788fbe821429218fa8c2bae3a2c75d82ea65026d89ca",
"md5": "78acecc31e9fd258ed989dff7feaabdf",
"sha256": "2a1f135a00a33c72cd09ae4b33044e3954aa3d04e574241c9acae5e75c6c820a"
},
"downloads": -1,
"filename": "ccontext-0.3.5.tar.gz",
"has_sig": false,
"md5_digest": "78acecc31e9fd258ed989dff7feaabdf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.8",
"size": 1415160,
"upload_time": "2025-01-12T19:10:06",
"upload_time_iso_8601": "2025-01-12T19:10:06.823312Z",
"url": "https://files.pythonhosted.org/packages/09/d1/cc32391326263b46788fbe821429218fa8c2bae3a2c75d82ea65026d89ca/ccontext-0.3.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-12 19:10:06",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "NicolasArnouts",
"github_project": "ccontext",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "tiktoken",
"specs": [
[
"==",
"0.7.0"
]
]
},
{
"name": "colorama",
"specs": [
[
"==",
"0.4.6"
]
]
},
{
"name": "pyperclip",
"specs": [
[
"==",
"1.9.0"
]
]
},
{
"name": "pathspec",
"specs": [
[
"==",
"0.12.1"
]
]
},
{
"name": "reportlab",
"specs": [
[
"==",
"4.2.0"
]
]
},
{
"name": "poetry",
"specs": [
[
"==",
"1.8.3"
]
]
},
{
"name": "wcmatch",
"specs": [
[
"==",
"9.0"
]
]
},
{
"name": "pypdf",
"specs": [
[
"==",
"4.3.1"
]
]
},
{
"name": "mammoth",
"specs": [
[
"==",
"1.9.0"
]
]
}
],
"lcname": "ccontext"
}