ccontext

Name	ccontext JSON
Version	0.3.9 JSON
	download
home_page	None
Summary	collect-context: Makes the process of collecting and sending context to an LLM like ChatGPT-4o as easy as possible.
upload_time	2025-07-14 09:08:40
maintainer	None
docs_url	None
author	Nicolas Arnouts
requires_python	<4,>=3.9
license	MIT
keywords	context ccontenxt collect context llm chatgpt
VCS
bugtrack_url
requirements	tiktoken colorama pyperclip pathspec reportlab poetry wcmatch pypdf mammoth
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # ccontext

**ccontext (collect-context)** is a cross-platform utility designed to streamline the process of gathering and sending the context of a directory to large language models (LLMs) like ChatGPT-4o. Our mission is to make collecting and sending context to an LLM as easy as possible.

## 🚀 Demo: Witness ccontext in Action! 🎥

⚠️ Warning: You May Be Amazed! 🤯

https://github.com/user-attachments/assets/c0a98dbc-d971-41dc-abe1-dad4be42e1ee

## Features

**Features**

- 🌟 **Easy Setup**: Quick installation and configuration.
- 🌍 **Cross-Platform Support**: Supports Windows, macOS, and Linux.
- 💾 **Binary File Support**: Handle various binary files including PDFs, Word documents, images, audio, and video files.
- 📄 **Markdown and PDF Generation**: Generate detailed Markdown and PDF files of the directory structure and file contents.
- 🌐 **Crawling of (documentation) Sites**: Crawl and gather data from multiple sites using a specified list of URLs.
- ✂️ **Tokenization and Chunking**: Automatically handles tokenization and chunking to stay within LLM token limits.
- 🔧 **Configurable Exclusions and Inclusions**: Flexibly specify which files and directories to include or exclude.
- 🗣️ **Verbose Output**: Optional verbose mode for detailed output and debugging.
- 📝 **Prompt Templates** (Upcoming): Create and use custom templates for different types of prompts.

## Table of Contents

- [Installation](#installation)
- [Usage](#usage)
- [Configuration](#configuration)
- [Binary File Handling](#binary-file-handling)
- [Document Crawling](#document-crawling)
- [Use Cases and Examples](#use-cases-and-examples)
- [Troubleshooting](#troubleshooting)
- [Development Guide](#development-guide)

## Installation

### Using pipx (Recommended)

We recommend installing ccontext using pipx. pipx is a tool that lets you install and run Python applications in isolated environments, ensuring clean installation and easy management of CLI applications.

1. First, install pipx if you haven't already:

   ```sh
   # On macOS
   brew install pipx
   pipx ensurepath

   # On Ubuntu/Debian
   sudo apt install pipx
   pipx ensurepath

   # On Windows
   python -m pip install --user pipx
   python -m pipx ensurepath
   # or read https://pipx.pypa.io/stable/installation/#on-windows
   ```

2. Install ccontext using pipx:

   ```sh
   pipx install ccontext
   ```

Why use pipx?

- **Isolated Environment**: Each application runs in its own virtual environment
- **No Dependency Conflicts**: Avoids conflicts with other Python packages
- **Easy Updates**: Simple command to upgrade (`pipx upgrade ccontext`)
- **Clean Uninstallation**: Remove everything with one command (`pipx uninstall ccontext`)
- **Global Access**: Installed applications are available system-wide

### Alternative: Installing from Source

If you prefer to install from source:

1. Clone the repository:

   ```sh
   git clone https://github.com/oxillix/ccontext.git
   cd ccontext
   ```

2. Set up a virtual environment:

   ```sh
   python3 -m venv venv
   source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
   ```

3. Install dependencies:

   ```sh
   pip install -r requirements.txt
   ```

4. Install the package:

   ```sh
   pip install .
   ```

## Usage

### Basic Usage

1. Run `ccontext` in the folder to ccollect with default settings defined in `~/.ccontext/config.json`:

   ```sh
   ccontext
   ```

2. Specify a root path, exclusions, and inclusions:

   ```sh
   ccontext -p /path/to/directory -e ".git|node_modules" -i "important_file.txt|docs"
   ```

### Command-Line Arguments

- `-h, --help`: Show help message.
- `-p, --root_path`: The root path to start the directory tree (default: current directory).
- `-e, --excludes`: Additional files or directories to exclude, separated by `|`, e.g., `node_modules|.git`.
- `-i, --includes`: Files or directories to include, separated by `|`, e.g., `important_file.txt|docs`.
- `-m, --max_tokens`: Maximum number of tokens allowed before chunking.
- `-c, --config`: Path to a custom configuration file.
- `-v, --verbose`: Enable verbose output to stdout.
- `-ig, --ignore_gitignore`: Ignore the `.gitignore` file for exclusions.
- `-g, --generate-pdf`: Generate a PDF of the directory tree and file contents.
- `-gm, --generate-md`: Generate a Markdown file of the directory tree and file contents.
- `--crawl`: Crawls the sites specified in the config.

### Example

```sh
ccontext -p /home/user/project -e ".git|build" -i "README.md|src"
```

## Configuration

### Configuration File Location

ccontext looks for configuration in the following order:

1. Custom config file specified via `-c` argument
2. `.ccontext-config.json` in the current directory
   - If present, ccontext will automatically detect and use this local configuration file
   - Create this file in the same directory where you run the ccontext command
3. `~/.ccontext/config.json` (default user configuration)

### Configuration Options

```json
{
  "verbose": false, // Enable detailed output
  "max_tokens": 115000, // Maximum tokens before chunking
  "model_type": "gpt-4o", // LLM model type for tokenization
  "buffer_size": 0.05, // Token buffer size (0-1)

  // System prompt for LLM context
  "context_prompt": "[[SYSTEM INSTRUCTIONS]] The following output represents...",

  // Web crawler configuration
  "urls_to_crawl": [
    {
      "url": "https://www.django-rest-framework.org/",
      "match": ["https://www.django-rest-framework.org/**"],
      "exclude": ["https://www.django-rest-framework.org/community/**"],
      "selector": "",
      "maxPagesToCrawl": 100,
      "outputFileName": "django-rest-framework.org.json",
      "maxTokens": 10000000
    }
  ],

  // Files/folders to explicitly include
  "included_folders_files": [],

  // Files/folders to exclude (supports glob patterns)
  "excluded_folders_files": [
    "**/.git",
    "**/bin",
    "**/build",
    "**/node_modules/**",
    "**/venv",
    "**/__pycache__",
    "**/package-lock.json",
    "**/ccontext.egg-info",
    "**/dist",
    "**/__tests__",
    "**/coverage",
    "**/.next",
    "**/pnpm-lock.yaml",
    "**/poetry.lock",
    "**/ccontext-output.pdf",
    "**/ccontext-output.md",
    "**/*.phpstorm.meta.php",
    "**/*.min.js",
    "**/composer.lock",
    "**/*.lock",
    "**/vendor",
    "**/laravel_access.log",
    "**/*.DS_Store",
    "**/*.tox"
  ],

  // File extensions that can be uploaded to LLMs
  "uploadable_extensions": [
    // Documents
    ".pdf",
    ".doc",
    ".docx",
    ".xls",
    ".xlsx",
    ".ppt",
    ".pptx",

    // Images
    ".jpg",
    ".jpeg",
    ".png",
    ".gif",
    ".bmp",
    ".tiff",
    ".webp",
    ".heic",

    // Audio
    ".mp3",
    ".wav",
    ".ogg",
    ".flac",
    ".aac",
    ".m4a",

    // Video
    ".mp4",
    ".mkv",
    ".avi",
    ".mov",
    ".wmv",
    ".webm",

    // Archives
    ".zip",
    ".rar",
    ".7z",
    ".tar",
    ".gz",

    // Binary/System
    ".exe",
    ".dll",
    ".iso",
    ".dmg",
    ".bin",
    ".dat",
    ".apk",
    ".img",
    ".so",
    ".swf",
    ".psd"
  ]
}
```

### Understanding Glob Patterns

ccontext uses the `wcmatch` library for glob pattern matching, which gives you powerful but easy-to-use file matching capabilities. Here's a simple guide to using glob patterns:

1. **Important Wildcards Explained:**

   - `*` (single star): Matches anything in the current folder only

     ```
     "*.txt"      # Matches: a.txt, b.txt  (in current folder)
     "*.txt"      # Won't match: sub/a.txt, deep/sub/b.txt
     ```

   - `**` (double star): Matches any number of folders

     ```
     "**/temp"    # Matches: temp, sub/temp, deep/sub/temp
     "**/temp"    # Won't match: temp/file.txt
     ```

   - `**/*` (double star slash star): Matches everything in all folders

     ```
     "**/*.txt"   # Matches: a.txt, sub/b.txt, very/deep/c.txt
     "**/*"       # Matches everything, everywhere
     ```

   - `?` matches any single character
   - `.txt` matches exact file extension

2. **Simple Examples:**

   ```json
   {
     "excluded_folders_files": [
       // Basic matching
       "temp.txt", // Matches exact file temp.txt
       "*.txt", // Matches all .txt files in root folder
       "**/*.txt", // Matches all .txt files in any folder

       // Folder matching
       "temp/*", // Matches everything in temp folder
       "**/temp", // Matches temp folder anywhere
       "**/temp/**", // Matches everything in any temp folder

       // Common use cases
       "**/node_modules", // Matches node_modules folders anywhere
       "**/__pycache__", // Matches Python cache folders
       "**/*.pyc", // Matches Python compiled files
       "build/*" // Matches everything in build folder
     ]
   }
   ```

3. **Tips for Beginners:**
   - Start simple! Use `*.ext` for file extensions
   - Use `**/` when you want to match in any folder
   - Test your patterns with a small folder first
   - When in doubt, be more specific
   - Remember, patterns are case-sensitive

The glob system is very forgiving - if you make a mistake, it usually just won't match anything rather than causing errors. Feel free to experiment!

### Configuration Options Explained

| Option                 | Description                     | Default       |
| ---------------------- | ------------------------------- | ------------- |
| verbose                | Enable detailed output          | false         |
| max_tokens             | Maximum tokens before chunking  | 115000        |
| model_type             | LLM model type for tokenization | "gpt-4o"      |
| buffer_size            | Token buffer size (0-1)         | 0.05          |
| excluded_folders_files | Glob patterns for exclusion     | [".git", ...] |
| included_folders_files | Glob patterns for inclusion     | []            |
| uploadable_extensions  | File extensions to upload       | [".pdf", ...] |

## Binary File Handling

ccontext supports handling binary files through the `uploadable_extensions` configuration.

### Supported Binary Files

- **Documents**: `.pdf`, `.doc`, `.docx`, `.xls`, `.xlsx`, `.ppt`, `.pptx`
- **Images**: `.jpg`, `.jpeg`, `.png`, `.gif`, `.bmp`, `.tiff`, `.webp`, `.heic`
- **Audio**: `.mp3`, `.wav`, `.ogg`, `.flac`, `.aac`, `.m4a`
- **Video**: `.mp4`, `.mkv`, `.avi`, `.mov`, `.wmv`, `.webm`
- **Archives**: `.zip`, `.rar`, `.7z`, `.tar`, `.gz`
- **Binary/System**: `.exe`, `.dll`, `.iso`, `.dmg`, `.bin`, `.dat`, `.apk`, `.img`, `.so`, `.swf`, `.psd`

### Binary File Processing

- Binary files matching `uploadable_extensions` are prepared for upload to LLMs
- File references are automatically copied to clipboard
- Most LLM providers limit maximum of X binary files per prompt
- Rate limits may apply based on your LLM provider

Example configuration for handling specific file types:

```json
{
  "uploadable_extensions": [".pdf", ".jpg", ".png", ".xlsx"]
}
```

## Document Crawling

The crawling feature allows you to gather documentation from websites for context.

### Crawler Configuration

```json
{
  "urls_to_crawl": [
    {
      "url": "https://docs.example.com",
      "match": ["https://docs.example.com/**"],
      "exclude": ["https://docs.example.com/internal/**"],
      "selector": "",
      "maxPagesToCrawl": 100,
      "outputFileName": "docs.json",
      "maxTokens": 2000000
    }
  ]
}
```

### Crawler Options

- **url**: Starting URL for crawling
- **match**: Glob patterns for URLs to include
- **exclude**: Glob patterns for URLs to exclude
- **selector**: CSS selector for content extraction
- **maxPagesToCrawl**: Limit on pages to crawl
- **outputFileName**: Name of output file
- **maxTokens**: Maximum tokens to collect

### Best Practices

- Use specific `match` patterns
- Respect robots.txt and site policies

## Use Cases and Examples

### Common Usage Patterns

1. **Analyzing a Python Project**

```sh
ccontext -p /path/to/project -e "venv|__pycache__|*.pyc"
```

2. **Processing Documentation**

```sh
ccontext -p ./docs --crawl -gm
```

3. **Including Specific Files**

```sh
ccontext -i "README.md|docs/*|*.py"
```

4. **Generating PDF and Markdown**

```sh
ccontext -g -gm  # Generates both PDF and Markdown
```

### Integration Examples

1. **With GitHub Copilot**

```sh
ccontext -p . -e "node_modules|dist" -i "src/**/*.ts"
```

2. **With ChatGPT (webapp has max 32k) **

```sh
ccontext -p . --max_tokens 32000
```

## Troubleshooting

### Common Issues

1. **Clipboard Issues in SSH**

   - Issue: Cannot copy to clipboard in SSH session
   - Solution:
     - Use SSH with X11 forwarding (`ssh -X user@host`), test using xeyes
     - On Mac, install XQuartz (`brew install --cask xquartz`)

2. **Token Limit Exceeded**

   - Issue: Content too large for LLM
   - Solution: Adjust `max_tokens` or use chunking feature

3. **Binary File Handling**
   - Issue: Binary files not being processed
   - Solution: Check `uploadable_extensions` configuration

### Platform-Specific Issues

#### Windows: Use WSL if possible!

Otherwise:

- Issue: Path separators in configuration
- Solution: Use forward slashes or escaped backslashes

#### Linux

- Issue: X11 clipboard support
- Solution: Install xclip or xsel

#### macOS

- Issue: Clipboard permissions
- Solution: Grant terminal app accessibility permissions

## Development Guide

### Project Structure

```
ccontext/
├── ccontext/           # Main package directory
│   ├── __init__.py
│   ├── main.py         # Entry point
│   ├── file_tree.py    # Tree operations
│   └── ...
├── tests/              # Test directory
├── docs/               # Documentation
└── examples/           # Example configurations
```

### Development Setup

1. Clone the repository
2. Create a virtual environment
3. Install development dependencies
4. Run tests

```sh
git clone https://github.com/oxillix/ccontext.git
# or
git clone git@github.com:NicolasArnouts/ccontext.git
cd ccontext
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
pip3 install -e .
```

### Contributing Guidelines

1. Fork the repository
2. Create a feature branch
3. Write tests for new features
4. Submit a pull request

### Code Style

- Follow PEP 8 guidelines
- use isort and black
- Use type hints
- Keep functions focused and small

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Thanks to all contributors! 😊
- Inspired by the need for better context handling in AI interactions.
- Built with love and passion for the developer community! 💖

---

Feel free to raise issues or contribute to the project. We appreciate your support!

Happy coding adventures! 🚀
**Nicolas Arnouts**

Looking for a skilled freelancer? I'm available for hire!
Let's collaborate — reach out to me at:
arnouts.software@gmail.com

---

### Badges

[![PyPI version](https://badge.fury.io/py/ccontext.svg)](https://badge.fury.io/py/ccontext)
[![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/NicolasArnouts/ccontext/blob/main/LICENSE)
[![Platform](https://img.shields.io/badge/platform-Windows%20|%20macOS%20|%20Linux-lightgrey.svg)]()

## Using in WSL2 Environment

When using ccontext's web crawling feature in WSL2, you can use crawl4ai for reliable web content extraction:

1. First, set up the WSL2 environment:

```bash
python -m ccontext.fix_wsl
```

2. Then run the crawler as usual:

```bash
python -m ccontext --crawl
```

The crawler will automatically detect WSL2 and configure the environment appropriately. If you prefer to use the crawler directly:

```bash
python -m ccontext.run_crawlers --url https://example.com --output example.md
```

### WSL2 Troubleshooting

If you encounter issues with the crawler in WSL2:

1. Ensure Python and dependencies are properly installed
2. Try running with explicit parameters:
   ```bash
   python -m ccontext.run_crawlers --url https://example.com --output example.md --max-pages 10
   ```
3. Check that any security software isn't blocking the network connections
4. For more detailed logging, add the --verbose flag

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ccontext",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.9",
    "maintainer_email": null,
    "keywords": "context, ccontenxt, collect context, llm, chatgpt",
    "author": "Nicolas Arnouts",
    "author_email": "arnouts.software@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/9c/ad/e73cdedbc23af36d2fe8e589d93681a7419384cc19fd3ee69d06531e023a/ccontext-0.3.9.tar.gz",
    "platform": null,
    "description": "# ccontext\n\n**ccontext (collect-context)** is a cross-platform utility designed to streamline the process of gathering and sending the context of a directory to large language models (LLMs) like ChatGPT-4o. Our mission is to make collecting and sending context to an LLM as easy as possible.\n\n## \ud83d\ude80 Demo: Witness ccontext in Action! \ud83c\udfa5\n\n\u26a0\ufe0f Warning: You May Be Amazed! \ud83e\udd2f\n\nhttps://github.com/user-attachments/assets/c0a98dbc-d971-41dc-abe1-dad4be42e1ee\n\n## Features\n\n**Features**\n\n- \ud83c\udf1f **Easy Setup**: Quick installation and configuration.\n- \ud83c\udf0d **Cross-Platform Support**: Supports Windows, macOS, and Linux.\n- \ud83d\udcbe **Binary File Support**: Handle various binary files including PDFs, Word documents, images, audio, and video files.\n- \ud83d\udcc4 **Markdown and PDF Generation**: Generate detailed Markdown and PDF files of the directory structure and file contents.\n- \ud83c\udf10 **Crawling of (documentation) Sites**: Crawl and gather data from multiple sites using a specified list of URLs.\n- \u2702\ufe0f **Tokenization and Chunking**: Automatically handles tokenization and chunking to stay within LLM token limits.\n- \ud83d\udd27 **Configurable Exclusions and Inclusions**: Flexibly specify which files and directories to include or exclude.\n- \ud83d\udde3\ufe0f **Verbose Output**: Optional verbose mode for detailed output and debugging.\n- \ud83d\udcdd **Prompt Templates** (Upcoming): Create and use custom templates for different types of prompts.\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Usage](#usage)\n- [Configuration](#configuration)\n- [Binary File Handling](#binary-file-handling)\n- [Document Crawling](#document-crawling)\n- [Use Cases and Examples](#use-cases-and-examples)\n- [Troubleshooting](#troubleshooting)\n- [Development Guide](#development-guide)\n\n## Installation\n\n### Using pipx (Recommended)\n\nWe recommend installing ccontext using pipx. pipx is a tool that lets you install and run Python applications in isolated environments, ensuring clean installation and easy management of CLI applications.\n\n1. First, install pipx if you haven't already:\n\n   ```sh\n   # On macOS\n   brew install pipx\n   pipx ensurepath\n\n   # On Ubuntu/Debian\n   sudo apt install pipx\n   pipx ensurepath\n\n   # On Windows\n   python -m pip install --user pipx\n   python -m pipx ensurepath\n   # or read https://pipx.pypa.io/stable/installation/#on-windows\n   ```\n\n2. Install ccontext using pipx:\n\n   ```sh\n   pipx install ccontext\n   ```\n\nWhy use pipx?\n\n- **Isolated Environment**: Each application runs in its own virtual environment\n- **No Dependency Conflicts**: Avoids conflicts with other Python packages\n- **Easy Updates**: Simple command to upgrade (`pipx upgrade ccontext`)\n- **Clean Uninstallation**: Remove everything with one command (`pipx uninstall ccontext`)\n- **Global Access**: Installed applications are available system-wide\n\n### Alternative: Installing from Source\n\nIf you prefer to install from source:\n\n1. Clone the repository:\n\n   ```sh\n   git clone https://github.com/oxillix/ccontext.git\n   cd ccontext\n   ```\n\n2. Set up a virtual environment:\n\n   ```sh\n   python3 -m venv venv\n   source venv/bin/activate  # On Windows, use `venv\\Scripts\\activate`\n   ```\n\n3. Install dependencies:\n\n   ```sh\n   pip install -r requirements.txt\n   ```\n\n4. Install the package:\n\n   ```sh\n   pip install .\n   ```\n\n## Usage\n\n### Basic Usage\n\n1. Run `ccontext` in the folder to ccollect with default settings defined in `~/.ccontext/config.json`:\n\n   ```sh\n   ccontext\n   ```\n\n2. Specify a root path, exclusions, and inclusions:\n\n   ```sh\n   ccontext -p /path/to/directory -e \".git|node_modules\" -i \"important_file.txt|docs\"\n   ```\n\n### Command-Line Arguments\n\n- `-h, --help`: Show help message.\n- `-p, --root_path`: The root path to start the directory tree (default: current directory).\n- `-e, --excludes`: Additional files or directories to exclude, separated by `|`, e.g., `node_modules|.git`.\n- `-i, --includes`: Files or directories to include, separated by `|`, e.g., `important_file.txt|docs`.\n- `-m, --max_tokens`: Maximum number of tokens allowed before chunking.\n- `-c, --config`: Path to a custom configuration file.\n- `-v, --verbose`: Enable verbose output to stdout.\n- `-ig, --ignore_gitignore`: Ignore the `.gitignore` file for exclusions.\n- `-g, --generate-pdf`: Generate a PDF of the directory tree and file contents.\n- `-gm, --generate-md`: Generate a Markdown file of the directory tree and file contents.\n- `--crawl`: Crawls the sites specified in the config.\n\n### Example\n\n```sh\nccontext -p /home/user/project -e \".git|build\" -i \"README.md|src\"\n```\n\n## Configuration\n\n### Configuration File Location\n\nccontext looks for configuration in the following order:\n\n1. Custom config file specified via `-c` argument\n2. `.ccontext-config.json` in the current directory\n   - If present, ccontext will automatically detect and use this local configuration file\n   - Create this file in the same directory where you run the ccontext command\n3. `~/.ccontext/config.json` (default user configuration)\n\n### Configuration Options\n\n```json\n{\n  \"verbose\": false, // Enable detailed output\n  \"max_tokens\": 115000, // Maximum tokens before chunking\n  \"model_type\": \"gpt-4o\", // LLM model type for tokenization\n  \"buffer_size\": 0.05, // Token buffer size (0-1)\n\n  // System prompt for LLM context\n  \"context_prompt\": \"[[SYSTEM INSTRUCTIONS]] The following output represents...\",\n\n  // Web crawler configuration\n  \"urls_to_crawl\": [\n    {\n      \"url\": \"https://www.django-rest-framework.org/\",\n      \"match\": [\"https://www.django-rest-framework.org/**\"],\n      \"exclude\": [\"https://www.django-rest-framework.org/community/**\"],\n      \"selector\": \"\",\n      \"maxPagesToCrawl\": 100,\n      \"outputFileName\": \"django-rest-framework.org.json\",\n      \"maxTokens\": 10000000\n    }\n  ],\n\n  // Files/folders to explicitly include\n  \"included_folders_files\": [],\n\n  // Files/folders to exclude (supports glob patterns)\n  \"excluded_folders_files\": [\n    \"**/.git\",\n    \"**/bin\",\n    \"**/build\",\n    \"**/node_modules/**\",\n    \"**/venv\",\n    \"**/__pycache__\",\n    \"**/package-lock.json\",\n    \"**/ccontext.egg-info\",\n    \"**/dist\",\n    \"**/__tests__\",\n    \"**/coverage\",\n    \"**/.next\",\n    \"**/pnpm-lock.yaml\",\n    \"**/poetry.lock\",\n    \"**/ccontext-output.pdf\",\n    \"**/ccontext-output.md\",\n    \"**/*.phpstorm.meta.php\",\n    \"**/*.min.js\",\n    \"**/composer.lock\",\n    \"**/*.lock\",\n    \"**/vendor\",\n    \"**/laravel_access.log\",\n    \"**/*.DS_Store\",\n    \"**/*.tox\"\n  ],\n\n  // File extensions that can be uploaded to LLMs\n  \"uploadable_extensions\": [\n    // Documents\n    \".pdf\",\n    \".doc\",\n    \".docx\",\n    \".xls\",\n    \".xlsx\",\n    \".ppt\",\n    \".pptx\",\n\n    // Images\n    \".jpg\",\n    \".jpeg\",\n    \".png\",\n    \".gif\",\n    \".bmp\",\n    \".tiff\",\n    \".webp\",\n    \".heic\",\n\n    // Audio\n    \".mp3\",\n    \".wav\",\n    \".ogg\",\n    \".flac\",\n    \".aac\",\n    \".m4a\",\n\n    // Video\n    \".mp4\",\n    \".mkv\",\n    \".avi\",\n    \".mov\",\n    \".wmv\",\n    \".webm\",\n\n    // Archives\n    \".zip\",\n    \".rar\",\n    \".7z\",\n    \".tar\",\n    \".gz\",\n\n    // Binary/System\n    \".exe\",\n    \".dll\",\n    \".iso\",\n    \".dmg\",\n    \".bin\",\n    \".dat\",\n    \".apk\",\n    \".img\",\n    \".so\",\n    \".swf\",\n    \".psd\"\n  ]\n}\n```\n\n### Understanding Glob Patterns\n\nccontext uses the `wcmatch` library for glob pattern matching, which gives you powerful but easy-to-use file matching capabilities. Here's a simple guide to using glob patterns:\n\n1. **Important Wildcards Explained:**\n\n   - `*` (single star): Matches anything in the current folder only\n\n     ```\n     \"*.txt\"      # Matches: a.txt, b.txt  (in current folder)\n     \"*.txt\"      # Won't match: sub/a.txt, deep/sub/b.txt\n     ```\n\n   - `**` (double star): Matches any number of folders\n\n     ```\n     \"**/temp\"    # Matches: temp, sub/temp, deep/sub/temp\n     \"**/temp\"    # Won't match: temp/file.txt\n     ```\n\n   - `**/*` (double star slash star): Matches everything in all folders\n\n     ```\n     \"**/*.txt\"   # Matches: a.txt, sub/b.txt, very/deep/c.txt\n     \"**/*\"       # Matches everything, everywhere\n     ```\n\n   - `?` matches any single character\n   - `.txt` matches exact file extension\n\n2. **Simple Examples:**\n\n   ```json\n   {\n     \"excluded_folders_files\": [\n       // Basic matching\n       \"temp.txt\", // Matches exact file temp.txt\n       \"*.txt\", // Matches all .txt files in root folder\n       \"**/*.txt\", // Matches all .txt files in any folder\n\n       // Folder matching\n       \"temp/*\", // Matches everything in temp folder\n       \"**/temp\", // Matches temp folder anywhere\n       \"**/temp/**\", // Matches everything in any temp folder\n\n       // Common use cases\n       \"**/node_modules\", // Matches node_modules folders anywhere\n       \"**/__pycache__\", // Matches Python cache folders\n       \"**/*.pyc\", // Matches Python compiled files\n       \"build/*\" // Matches everything in build folder\n     ]\n   }\n   ```\n\n3. **Tips for Beginners:**\n   - Start simple! Use `*.ext` for file extensions\n   - Use `**/` when you want to match in any folder\n   - Test your patterns with a small folder first\n   - When in doubt, be more specific\n   - Remember, patterns are case-sensitive\n\nThe glob system is very forgiving - if you make a mistake, it usually just won't match anything rather than causing errors. Feel free to experiment!\n\n### Configuration Options Explained\n\n| Option                 | Description                     | Default       |\n| ---------------------- | ------------------------------- | ------------- |\n| verbose                | Enable detailed output          | false         |\n| max_tokens             | Maximum tokens before chunking  | 115000        |\n| model_type             | LLM model type for tokenization | \"gpt-4o\"      |\n| buffer_size            | Token buffer size (0-1)         | 0.05          |\n| excluded_folders_files | Glob patterns for exclusion     | [\".git\", ...] |\n| included_folders_files | Glob patterns for inclusion     | []            |\n| uploadable_extensions  | File extensions to upload       | [\".pdf\", ...] |\n\n## Binary File Handling\n\nccontext supports handling binary files through the `uploadable_extensions` configuration.\n\n### Supported Binary Files\n\n- **Documents**: `.pdf`, `.doc`, `.docx`, `.xls`, `.xlsx`, `.ppt`, `.pptx`\n- **Images**: `.jpg`, `.jpeg`, `.png`, `.gif`, `.bmp`, `.tiff`, `.webp`, `.heic`\n- **Audio**: `.mp3`, `.wav`, `.ogg`, `.flac`, `.aac`, `.m4a`\n- **Video**: `.mp4`, `.mkv`, `.avi`, `.mov`, `.wmv`, `.webm`\n- **Archives**: `.zip`, `.rar`, `.7z`, `.tar`, `.gz`\n- **Binary/System**: `.exe`, `.dll`, `.iso`, `.dmg`, `.bin`, `.dat`, `.apk`, `.img`, `.so`, `.swf`, `.psd`\n\n### Binary File Processing\n\n- Binary files matching `uploadable_extensions` are prepared for upload to LLMs\n- File references are automatically copied to clipboard\n- Most LLM providers limit maximum of X binary files per prompt\n- Rate limits may apply based on your LLM provider\n\nExample configuration for handling specific file types:\n\n```json\n{\n  \"uploadable_extensions\": [\".pdf\", \".jpg\", \".png\", \".xlsx\"]\n}\n```\n\n## Document Crawling\n\nThe crawling feature allows you to gather documentation from websites for context.\n\n### Crawler Configuration\n\n```json\n{\n  \"urls_to_crawl\": [\n    {\n      \"url\": \"https://docs.example.com\",\n      \"match\": [\"https://docs.example.com/**\"],\n      \"exclude\": [\"https://docs.example.com/internal/**\"],\n      \"selector\": \"\",\n      \"maxPagesToCrawl\": 100,\n      \"outputFileName\": \"docs.json\",\n      \"maxTokens\": 2000000\n    }\n  ]\n}\n```\n\n### Crawler Options\n\n- **url**: Starting URL for crawling\n- **match**: Glob patterns for URLs to include\n- **exclude**: Glob patterns for URLs to exclude\n- **selector**: CSS selector for content extraction\n- **maxPagesToCrawl**: Limit on pages to crawl\n- **outputFileName**: Name of output file\n- **maxTokens**: Maximum tokens to collect\n\n### Best Practices\n\n- Use specific `match` patterns\n- Respect robots.txt and site policies\n\n## Use Cases and Examples\n\n### Common Usage Patterns\n\n1. **Analyzing a Python Project**\n\n```sh\nccontext -p /path/to/project -e \"venv|__pycache__|*.pyc\"\n```\n\n2. **Processing Documentation**\n\n```sh\nccontext -p ./docs --crawl -gm\n```\n\n3. **Including Specific Files**\n\n```sh\nccontext -i \"README.md|docs/*|*.py\"\n```\n\n4. **Generating PDF and Markdown**\n\n```sh\nccontext -g -gm  # Generates both PDF and Markdown\n```\n\n### Integration Examples\n\n1. **With GitHub Copilot**\n\n```sh\nccontext -p . -e \"node_modules|dist\" -i \"src/**/*.ts\"\n```\n\n2. **With ChatGPT (webapp has max 32k) **\n\n```sh\nccontext -p . --max_tokens 32000\n```\n\n## Troubleshooting\n\n### Common Issues\n\n1. **Clipboard Issues in SSH**\n\n   - Issue: Cannot copy to clipboard in SSH session\n   - Solution:\n     - Use SSH with X11 forwarding (`ssh -X user@host`), test using xeyes\n     - On Mac, install XQuartz (`brew install --cask xquartz`)\n\n2. **Token Limit Exceeded**\n\n   - Issue: Content too large for LLM\n   - Solution: Adjust `max_tokens` or use chunking feature\n\n3. **Binary File Handling**\n   - Issue: Binary files not being processed\n   - Solution: Check `uploadable_extensions` configuration\n\n### Platform-Specific Issues\n\n#### Windows: Use WSL if possible!\n\nOtherwise:\n\n- Issue: Path separators in configuration\n- Solution: Use forward slashes or escaped backslashes\n\n#### Linux\n\n- Issue: X11 clipboard support\n- Solution: Install xclip or xsel\n\n#### macOS\n\n- Issue: Clipboard permissions\n- Solution: Grant terminal app accessibility permissions\n\n## Development Guide\n\n### Project Structure\n\n```\nccontext/\n\u251c\u2500\u2500 ccontext/           # Main package directory\n\u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u251c\u2500\u2500 main.py         # Entry point\n\u2502   \u251c\u2500\u2500 file_tree.py    # Tree operations\n\u2502   \u2514\u2500\u2500 ...\n\u251c\u2500\u2500 tests/              # Test directory\n\u251c\u2500\u2500 docs/               # Documentation\n\u2514\u2500\u2500 examples/           # Example configurations\n```\n\n### Development Setup\n\n1. Clone the repository\n2. Create a virtual environment\n3. Install development dependencies\n4. Run tests\n\n```sh\ngit clone https://github.com/oxillix/ccontext.git\n# or\ngit clone git@github.com:NicolasArnouts/ccontext.git\ncd ccontext\npython3 -m venv venv\nsource venv/bin/activate\npip3 install -r requirements.txt\npip3 install -e .\n```\n\n### Contributing Guidelines\n\n1. Fork the repository\n2. Create a feature branch\n3. Write tests for new features\n4. Submit a pull request\n\n### Code Style\n\n- Follow PEP 8 guidelines\n- use isort and black\n- Use type hints\n- Keep functions focused and small\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- Thanks to all contributors! \ud83d\ude0a\n- Inspired by the need for better context handling in AI interactions.\n- Built with love and passion for the developer community! \ud83d\udc96\n\n---\n\nFeel free to raise issues or contribute to the project. We appreciate your support!\n\nHappy coding adventures! \ud83d\ude80\n**Nicolas Arnouts**\n\nLooking for a skilled freelancer? I'm available for hire!\nLet's collaborate \u2014 reach out to me at:\narnouts.software@gmail.com\n\n---\n\n### Badges\n\n[![PyPI version](https://badge.fury.io/py/ccontext.svg)](https://badge.fury.io/py/ccontext)\n[![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/NicolasArnouts/ccontext/blob/main/LICENSE)\n[![Platform](https://img.shields.io/badge/platform-Windows%20|%20macOS%20|%20Linux-lightgrey.svg)]()\n\n## Using in WSL2 Environment\n\nWhen using ccontext's web crawling feature in WSL2, you can use crawl4ai for reliable web content extraction:\n\n1. First, set up the WSL2 environment:\n\n```bash\npython -m ccontext.fix_wsl\n```\n\n2. Then run the crawler as usual:\n\n```bash\npython -m ccontext --crawl\n```\n\nThe crawler will automatically detect WSL2 and configure the environment appropriately. If you prefer to use the crawler directly:\n\n```bash\npython -m ccontext.run_crawlers --url https://example.com --output example.md\n```\n\n### WSL2 Troubleshooting\n\nIf you encounter issues with the crawler in WSL2:\n\n1. Ensure Python and dependencies are properly installed\n2. Try running with explicit parameters:\n   ```bash\n   python -m ccontext.run_crawlers --url https://example.com --output example.md --max-pages 10\n   ```\n3. Check that any security software isn't blocking the network connections\n4. For more detailed logging, add the --verbose flag\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "collect-context: Makes the process of collecting and sending context to an LLM like ChatGPT-4o as easy as possible.",
    "version": "0.3.9",
    "project_urls": {
        "Documentation": "https://github.com/NicolasArnouts/ccontext",
        "Homepage": "https://github.com/NicolasArnouts/ccontext",
        "Issues": "https://github.com/NicolasArnouts/ccontext/issues",
        "Repository": "https://github.com/NicolasArnouts/ccontext"
    },
    "split_keywords": [
        "context",
        " ccontenxt",
        " collect context",
        " llm",
        " chatgpt"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "20af37212d886da1a5d6f964b12710cda71b6c2b8852c719089872b51330668f",
                "md5": "ca5f65d37a95f6a12bb57d02cd01daf6",
                "sha256": "972c46364508833e7029b1b420d792093c44863e6c0560b9bdd20493d87f8719"
            },
            "downloads": -1,
            "filename": "ccontext-0.3.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ca5f65d37a95f6a12bb57d02cd01daf6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.9",
            "size": 1429978,
            "upload_time": "2025-07-14T09:08:38",
            "upload_time_iso_8601": "2025-07-14T09:08:38.801195Z",
            "url": "https://files.pythonhosted.org/packages/20/af/37212d886da1a5d6f964b12710cda71b6c2b8852c719089872b51330668f/ccontext-0.3.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9cade73cdedbc23af36d2fe8e589d93681a7419384cc19fd3ee69d06531e023a",
                "md5": "2513fd243106a94345b2550a41bf802c",
                "sha256": "0226894ea99ad2e65721a73553131388c4b3c834b2b0491c117289bb52e8de15"
            },
            "downloads": -1,
            "filename": "ccontext-0.3.9.tar.gz",
            "has_sig": false,
            "md5_digest": "2513fd243106a94345b2550a41bf802c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.9",
            "size": 1426843,
            "upload_time": "2025-07-14T09:08:40",
            "upload_time_iso_8601": "2025-07-14T09:08:40.676514Z",
            "url": "https://files.pythonhosted.org/packages/9c/ad/e73cdedbc23af36d2fe8e589d93681a7419384cc19fd3ee69d06531e023a/ccontext-0.3.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-14 09:08:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "NicolasArnouts",
    "github_project": "ccontext",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "tiktoken",
            "specs": [
                [
                    "==",
                    "0.7.0"
                ]
            ]
        },
        {
            "name": "colorama",
            "specs": [
                [
                    "==",
                    "0.4.6"
                ]
            ]
        },
        {
            "name": "pyperclip",
            "specs": [
                [
                    "==",
                    "1.9.0"
                ]
            ]
        },
        {
            "name": "pathspec",
            "specs": [
                [
                    "==",
                    "0.12.1"
                ]
            ]
        },
        {
            "name": "reportlab",
            "specs": [
                [
                    "==",
                    "4.2.0"
                ]
            ]
        },
        {
            "name": "poetry",
            "specs": [
                [
                    "==",
                    "1.8.3"
                ]
            ]
        },
        {
            "name": "wcmatch",
            "specs": [
                [
                    "==",
                    "9.0"
                ]
            ]
        },
        {
            "name": "pypdf",
            "specs": [
                [
                    "==",
                    "4.3.1"
                ]
            ]
        },
        {
            "name": "mammoth",
            "specs": [
                [
                    "==",
                    "1.9.0"
                ]
            ]
        }
    ],
    "lcname": "ccontext"
}

Nicolas Arnouts