vexy-glob


Namevexy-glob JSON
Version 1.0.9 PyPI version JSON
download
home_pageNone
SummaryVexy Glob fast file finding
upload_time2025-08-04 23:39:59
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords filesystem find glob parallel rust search
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # vexy_glob - Path Accelerated Finding in Rust

[![PyPI version](https://badge.fury.io/py/vexy_glob.svg)](https://badge.fury.io/py/vexy_glob) [![CI](https://github.com/vexyart/vexy-glob/actions/workflows/ci.yml/badge.svg)](https://github.com/vexyart/vexy-glob/actions/workflows/ci.yml) [![codecov](https://codecov.io/gh/vexyart/vexy-glob/branch/main/graph/badge.svg)](https://codecov.io/gh/vexyart/vexy-glob)

**`vexy_glob`** is a high-performance Python extension for file system traversal and content searching, built with Rust. It provides a faster and more feature-rich alternative to Python's built-in `glob` (up to 6x faster) and `pathlib` (up to 12x faster) modules.

## TL;DR

**Installation:**

```bash
pip install vexy_glob
```

**Quick Start:**

Find all Python files in the current directory and its subdirectories:

```python
import vexy_glob

for path in vexy_glob.find("**/*.py"):
    print(path)
```

Find all files containing the text "import asyncio":

```python
for match in vexy_glob.find("**/*.py", content="import asyncio"):
    print(f"{match.path}:{match.line_number}: {match.line_text}")
```

## What is `vexy_glob`?

`vexy_glob` is a Python library that provides a powerful and efficient way to find files and search for content within them. It's built on top of the excellent Rust crates `ignore` (for file traversal) and `grep-searcher` (for content searching), which are the same engines powering tools like `fd` and `ripgrep`.

This means you get the speed and efficiency of Rust, with the convenience and ease of use of Python.

### Architecture Overview

```
┌─────────────────────┐
│   Python API Layer  │  ← Your Python code calls vexy_glob.find()
├─────────────────────┤
│    PyO3 Bindings    │  ← Zero-copy conversions between Python/Rust
├─────────────────────┤
│  Rust Core Engine   │  ← GIL released for true parallelism
│  ┌───────────────┐  │
│  │ ignore crate  │  │  ← Parallel directory traversal
│  │ (from fd)     │  │     Respects .gitignore files
│  └───────────────┘  │
│  ┌───────────────┐  │
│  │ grep-searcher │  │  ← High-speed content search
│  │ (from ripgrep)│  │     SIMD-accelerated regex
│  └───────────────┘  │
├─────────────────────┤
│ Streaming Channel   │  ← Results yielded as found
│ (crossbeam-channel) │     No memory accumulation
└─────────────────────┘
```

## Key Features

- **🚀 Blazing Fast:** 10-100x faster than Python's `glob` and `pathlib` for many use cases.
- **⚡ Streaming Results:** Get the first results in milliseconds, without waiting for the entire file system scan to complete.
- **💾 Memory Efficient:** `vexy_glob` uses constant memory, regardless of the number of files or results.
- **🔥 Parallel Execution:** Utilizes all your CPU cores to get the job done as quickly as possible.
- **🔍 Content Searching:** Ripgrep-style content searching with regex support.
- **🎯 Rich Filtering:** Filter files by size, modification time, and more.
- **🧠 Smart Defaults:** Automatically respects `.gitignore` files and skips hidden files and directories.
- **🌍 Cross-Platform:** Works on Linux, macOS, and Windows.

### Feature Comparison

| Feature | `glob.glob()` | `pathlib` | `vexy_glob` |
| --- | --- | --- | --- |
| Pattern matching | ✅ Basic | ✅ Basic | ✅ Advanced |
| Recursive search | ✅ Slow | ✅ Slow | ✅ Fast |
| Streaming results | ❌ | ❌ | ✅ |
| Content search | ❌ | ❌ | ✅ |
| .gitignore respect | ❌ | ❌ | ✅ |
| Parallel execution | ❌ | ❌ | ✅ |
| Size filtering | ❌ | ❌ | ✅ |
| Time filtering | ❌ | ❌ | ✅ |
| Memory efficiency | ❌ | ❌ | ✅ |

## How it Works

`vexy_glob` uses a Rust-powered backend to perform the heavy lifting of file system traversal and content searching. The Rust extension releases Python's Global Interpreter Lock (GIL), allowing for true parallelism and a significant performance boost.

Results are streamed back to Python as they are found, using a producer-consumer architecture with crossbeam channels. This means you can start processing results immediately, without having to wait for the entire search to finish.

## Why use `vexy_glob`?

If you find yourself writing scripts that need to find files based on patterns, or search for content within files, `vexy_glob` can be a game-changer. It's particularly useful for:

- **Large codebases:** Quickly find files or code snippets in large projects.
- **Log file analysis:** Search through gigabytes of logs in seconds.
- **Data processing pipelines:** Efficiently find and process files based on various criteria.
- **Build systems:** Fast dependency scanning and file collection.
- **Data science:** Quickly locate and process data files.
- **DevOps:** Log analysis, configuration management, deployment scripts.
- **Testing:** Find test files, fixtures, and coverage reports.
- **Anywhere you need to find files fast!**

### When to Use vexy_glob vs Alternatives

| Use Case | Best Tool | Why |
| --- | --- | --- |
| Simple pattern in small directory | `glob.glob()` | Built-in, no dependencies |
| Large directory, need first result fast | `vexy_glob` | Streaming results |
| Search file contents | `vexy_glob` | Integrated content search |
| Complex filtering (size, time, etc.) | `vexy_glob` | Rich filtering API |
| Cross-platform scripts | `vexy_glob` | Consistent behavior |
| Git-aware file finding | `vexy_glob` | Respects .gitignore |
| Memory-constrained environment | `vexy_glob` | Constant memory usage |

## Installation and Usage

### Python Library

Install `vexy_glob` using pip:

```bash
pip install vexy_glob
```

Then use it in your Python code:

```python
import vexy_glob

# Find all Python files
for path in vexy_glob.find("**/*.py"):
    print(path)
```

### Command-Line Interface

`vexy_glob` also provides a powerful command-line interface for finding files and searching content directly from your terminal.

#### Finding Files

Use `vexy_glob find` to locate files matching glob patterns:

```bash
# Find all Python files
vexy_glob find "**/*.py"

# Find all markdown files larger than 10KB
vexy_glob find "**/*.md" --min-size 10k

# Find all log files modified in the last 2 days
vexy_glob find "*.log" --mtime-after -2d

# Find only directories
vexy_glob find "*" --type d

# Include hidden files
vexy_glob find "*" --hidden

# Limit search depth
vexy_glob find "**/*.txt" --depth 2
```

#### Searching Content

Use `vexy_glob search` to find content within files:

```bash
# Search for "import asyncio" in Python files
vexy_glob search "**/*.py" "import asyncio"

# Search for function definitions using regex
vexy_glob search "src/**/*.rs" "fn\\s+\\w+"

# Search without color output (for piping)
vexy_glob search "**/*.md" "TODO|FIXME" --no-color

# Case-sensitive search
vexy_glob search "*.txt" "Error" --case-sensitive

# Search with size filters
vexy_glob search "**/*.log" "ERROR" --min-size 1M --max-size 100M

# Search recent files only
vexy_glob search "**/*.py" "TODO" --mtime-after -7d

# Complex search with multiple filters
vexy_glob search "src/**/*.{py,js}" "console\.log|print\(" \
    --exclude "*test*" \
    --mtime-after -30d \
    --max-size 50k
```

#### Command-Line Options Reference

**Common options for both `find` and `search`:**

| Option | Type | Description | Example |
| --- | --- | --- | --- |
| `--root` | PATH | Root directory to start search | `--root /home/user/projects` |
| `--min-size` | SIZE | Minimum file size | `--min-size 10k` |
| `--max-size` | SIZE | Maximum file size | `--max-size 5M` |
| `--mtime-after` | TIME | Modified after this time | `--mtime-after -7d` |
| `--mtime-before` | TIME | Modified before this time | `--mtime-before 2024-01-01` |
| `--atime-after` | TIME | Accessed after this time | `--atime-after -1h` |
| `--atime-before` | TIME | Accessed before this time | `--atime-before -30d` |
| `--ctime-after` | TIME | Created after this time | `--ctime-after -1w` |
| `--ctime-before` | TIME | Created before this time | `--ctime-before -1y` |
| `--no-gitignore` | FLAG | Don't respect .gitignore | `--no-gitignore` |
| `--hidden` | FLAG | Include hidden files | `--hidden` |
| `--case-sensitive` | FLAG | Force case sensitivity | `--case-sensitive` |
| `--type` | CHAR | File type (f/d/l) | `--type f` |
| `--extension` | STR | File extension(s) | `--extension py` |
| `--exclude` | PATTERN | Exclude patterns | `--exclude "*test*"` |
| `--depth` | INT | Maximum directory depth | `--depth 3` |
| `--follow-symlinks` | FLAG | Follow symbolic links | `--follow-symlinks` |

**Additional options for `search`:**

| Option | Type | Description | Example |
| --- | --- | --- | --- |
| `--no-color` | FLAG | Disable colored output | `--no-color` |

**Size format examples:**
- Bytes: `1024` or `"1024"`
- Kilobytes: `10k`, `10K`, `10kb`, `10KB`
- Megabytes: `5m`, `5M`, `5mb`, `5MB`
- Gigabytes: `2g`, `2G`, `2gb`, `2GB`
- With decimals: `1.5M`, `2.7G`, `0.5K`

**Time format examples:**
- Relative: `-30s`, `-5m`, `-2h`, `-7d`, `-2w`, `-1mo`, `-1y`
- ISO date: `2024-01-01`, `2024-01-01T10:30:00`
- Natural: `yesterday`, `today` (converted to ISO dates)

#### Unix Pipeline Integration

`vexy_glob` works seamlessly with Unix pipelines:

```bash
# Count Python files
vexy_glob find "**/*.py" | wc -l

# Find Python files containing "async" and edit them
vexy_glob search "**/*.py" "async" --no-color | cut -d: -f1 | sort -u | xargs $EDITOR

# Find large log files and show their sizes
vexy_glob find "*.log" --min-size 100M | xargs ls -lh

# Search for TODOs and format as tasks
vexy_glob search "**/*.py" "TODO" --no-color | awk -F: '{print "- [ ] " $1 ":" $2 ": " $3}'

# Find duplicate file names
vexy_glob find "**/*" --type f | xargs -n1 basename | sort | uniq -d

# Create archive of recent changes
vexy_glob find "**/*" --mtime-after -7d --type f | tar -czf recent_changes.tar.gz -T -

# Find and replace across files
vexy_glob search "**/*.py" "OldClassName" --no-color | cut -d: -f1 | sort -u | xargs sed -i 's/OldClassName/NewClassName/g'

# Generate ctags for Python files
vexy_glob find "**/*.py" | ctags -L -

# Find empty directories
vexy_glob find "**" --type d | while read dir; do [ -z "$(ls -A "$dir")" ] && echo "$dir"; done

# Calculate total size of Python files
vexy_glob find "**/*.py" --type f | xargs stat -f%z | awk '{s+=$1} END {print s}' | numfmt --to=iec
```

#### Advanced CLI Patterns

```bash
# Monitor for file changes (poor man's watch)
while true; do
    clear
    echo "Files modified in last minute:"
    vexy_glob find "**/*" --mtime-after -1m --type f
    sleep 10
done

# Parallel processing with GNU parallel
vexy_glob find "**/*.jpg" | parallel -j4 convert {} {.}_thumb.jpg

# Create a file manifest with checksums
vexy_glob find "**/*" --type f | while read -r file; do
    echo "$(sha256sum "$file" | cut -d' ' -f1) $file"
done > manifest.txt

# Find files by content and show context
vexy_glob search "**/*.py" "class.*Error" --no-color | while IFS=: read -r file line rest; do
    echo "\n=== $file:$line ==="
    sed -n "$((line-2)),$((line+2))p" "$file"
done
```

## Detailed Python API Reference

### Core Functions

#### Core Functions

##### `vexy_glob.find()`

The main function for finding files and searching content.

###### Basic Syntax

```python
def find(
    pattern: str = "*",
    root: Union[str, Path] = ".",
    *,
    content: Optional[str] = None,
    file_type: Optional[str] = None,
    extension: Optional[Union[str, List[str]]] = None,
    max_depth: Optional[int] = None,
    min_depth: int = 0,
    min_size: Optional[int] = None,
    max_size: Optional[int] = None,
    mtime_after: Optional[Union[float, int, str, datetime]] = None,
    mtime_before: Optional[Union[float, int, str, datetime]] = None,
    atime_after: Optional[Union[float, int, str, datetime]] = None,
    atime_before: Optional[Union[float, int, str, datetime]] = None,
    ctime_after: Optional[Union[float, int, str, datetime]] = None,
    ctime_before: Optional[Union[float, int, str, datetime]] = None,
    hidden: bool = False,
    ignore_git: bool = False,
    case_sensitive: Optional[bool] = None,
    follow_symlinks: bool = False,
    threads: Optional[int] = None,
    as_path: bool = False,
    as_list: bool = False,
    exclude: Optional[Union[str, List[str]]] = None,
) -> Union[Iterator[Union[str, Path, SearchResult]], List[Union[str, Path, SearchResult]]]:
    """Find files matching pattern with optional content search.
    
    Args:
        pattern: Glob pattern to match files (e.g., "**/*.py", "src/*.js")
        root: Root directory to start search from
        content: Regex pattern to search within files
        file_type: Filter by type - 'f' (file), 'd' (directory), 'l' (symlink)
        extension: File extension(s) to filter by (e.g., "py" or ["py", "pyi"])
        max_depth: Maximum directory depth to search
        min_depth: Minimum directory depth to search
        min_size: Minimum file size in bytes (or use parse_size())
        max_size: Maximum file size in bytes
        mtime_after: Files modified after this time
        mtime_before: Files modified before this time
        atime_after: Files accessed after this time
        atime_before: Files accessed before this time
        ctime_after: Files created after this time
        ctime_before: Files created before this time
        hidden: Include hidden files and directories
        ignore_git: Don't respect .gitignore files
        case_sensitive: Case sensitivity (None = smart case)
        follow_symlinks: Follow symbolic links
        threads: Number of threads (None = auto)
        as_path: Return Path objects instead of strings
        as_list: Return list instead of iterator
        exclude: Patterns to exclude from results
    
    Returns:
        Iterator or list of file paths (or SearchResult if content is specified)
    """
```

##### Basic Examples

```python
import vexy_glob

# Find all Python files
for path in vexy_glob.find("**/*.py"):
    print(path)

# Find all files in the 'src' directory
for path in vexy_glob.find("src/**/*"):
    print(path)

# Get results as a list instead of iterator
python_files = vexy_glob.find("**/*.py", as_list=True)
print(f"Found {len(python_files)} Python files")

# Get results as Path objects
from pathlib import Path
for path in vexy_glob.find("**/*.md", as_path=True):
    print(path.stem)  # Path object methods available
```

### Content Searching

To search for content within files, use the `content` parameter. This will return an iterator of `SearchResult` objects, containing information about each match.

```python
import vexy_glob

for match in vexy_glob.find("*.py", content="import requests"):
    print(f"Found a match in {match.path} on line {match.line_number}:")
    print(f"  {match.line_text.strip()}")
```

#### SearchResult Object

The `SearchResult` object has the following attributes:

- `path`: The path to the file containing the match.
- `line_number`: The line number of the match (1-indexed).
- `line_text`: The text of the line containing the match.
- `matches`: A list of matched strings on the line.

#### Content Search Examples

```python
# Simple text search
for match in vexy_glob.find("**/*.py", content="TODO"):
    print(f"{match.path}:{match.line_number}: {match.line_text.strip()}")

# Regex pattern search
for match in vexy_glob.find("**/*.py", content=r"def\s+\w+\(.*\):"):
    print(f"Function at {match.path}:{match.line_number}")

# Case-insensitive search
for match in vexy_glob.find("**/*.md", content="python", case_sensitive=False):
    print(match.path)

# Multiple pattern search with OR
for match in vexy_glob.find("**/*.py", content="import (os|sys|pathlib)"):
    print(f"{match.path}: imports {match.matches}")
```

### Filtering Options

#### Size Filtering

`vexy_glob` supports human-readable size formats:

```python
import vexy_glob

# Using parse_size() for readable formats
min_size = vexy_glob.parse_size("10K")   # 10 kilobytes
max_size = vexy_glob.parse_size("5.5M")  # 5.5 megabytes

for path in vexy_glob.find("**/*", min_size=min_size, max_size=max_size):
    print(path)

# Supported formats:
# - Bytes: "1024" or 1024
# - Kilobytes: "10K", "10KB", "10k", "10kb"
# - Megabytes: "5M", "5MB", "5m", "5mb"
# - Gigabytes: "2G", "2GB", "2g", "2gb"
# - Decimal: "1.5M", "2.7G"
```

#### Time Filtering

`vexy_glob` accepts multiple time formats:

```python
import vexy_glob
from datetime import datetime, timedelta

# 1. Relative time formats
for path in vexy_glob.find("**/*.log", mtime_after="-1d"):     # Last 24 hours
    print(path)

# Supported relative formats:
# - Seconds: "-30s" or "-30"
# - Minutes: "-5m"
# - Hours: "-2h"
# - Days: "-7d"
# - Weeks: "-2w"
# - Months: "-1mo" (30 days)
# - Years: "-1y" (365 days)

# 2. ISO date formats
for path in vexy_glob.find("**/*", mtime_after="2024-01-01"):
    print(path)

# Supported ISO formats:
# - Date: "2024-01-01"
# - DateTime: "2024-01-01T10:30:00"
# - With timezone: "2024-01-01T10:30:00Z"

# 3. Python datetime objects
week_ago = datetime.now() - timedelta(weeks=1)
for path in vexy_glob.find("**/*", mtime_after=week_ago):
    print(path)

# 4. Unix timestamps
import time
hour_ago = time.time() - 3600
for path in vexy_glob.find("**/*", mtime_after=hour_ago):
    print(path)

# Combining time filters
for path in vexy_glob.find(
    "**/*.py",
    mtime_after="-30d",      # Modified within 30 days
    mtime_before="-1d"       # But not in the last 24 hours
):
    print(path)
```

#### Type and Extension Filtering

```python
import vexy_glob

# Filter by file type
for path in vexy_glob.find("**/*", file_type="d"):  # Directories only
    print(f"Directory: {path}")

# File types:
# - "f": Regular files
# - "d": Directories
# - "l": Symbolic links

# Filter by extension
for path in vexy_glob.find("**/*", extension="py"):
    print(path)

# Multiple extensions
for path in vexy_glob.find("**/*", extension=["py", "pyi", "pyx"]):
    print(path)
```

#### Exclusion Patterns

```python
import vexy_glob

# Exclude single pattern
for path in vexy_glob.find("**/*.py", exclude="*test*"):
    print(path)

# Exclude multiple patterns
exclusions = [
    "**/__pycache__/**",
    "**/node_modules/**",
    "**/.git/**",
    "**/build/**",
    "**/dist/**"
]
for path in vexy_glob.find("**/*", exclude=exclusions):
    print(path)

# Exclude specific files
for path in vexy_glob.find(
    "**/*.py",
    exclude=["setup.py", "**/conftest.py", "**/*_test.py"]
):
    print(path)
```

### Pattern Matching Guide

#### Glob Pattern Syntax

| Pattern | Matches | Example |
| --- | --- | --- |
| `*` | Any characters (except `/`) | `*.py` matches `test.py` |
| `**` | Any characters including `/` | `**/*.py` matches `src/lib/test.py` |
| `?` | Single character | `test?.py` matches `test1.py` |
| `[seq]` | Character in sequence | `test[123].py` matches `test2.py` |
| `[!seq]` | Character not in sequence | `test[!0].py` matches `test1.py` |
| `{a,b}` | Either pattern a or b | `*.{py,js}` matches `.py` and `.js` files |

#### Smart Case Detection

By default, `vexy_glob` uses smart case detection:
- If pattern contains uppercase → case-sensitive
- If pattern is all lowercase → case-insensitive

```python
# Case-insensitive (finds README.md, readme.md, etc.)
vexy_glob.find("readme.md")

# Case-sensitive (only finds README.md)
vexy_glob.find("README.md")

# Force case sensitivity
vexy_glob.find("readme.md", case_sensitive=True)
```

### Drop-in Replacements

`vexy_glob` provides drop-in replacements for standard library functions:

```python
# Replace glob.glob()
import vexy_glob
files = vexy_glob.glob("**/*.py", recursive=True)

# Replace glob.iglob()
for path in vexy_glob.iglob("**/*.py", recursive=True):
    print(path)

# Migration from standard library
# OLD:
import glob
files = glob.glob("**/*.py", recursive=True)

# NEW: Just change the import!
import vexy_glob as glob
files = glob.glob("**/*.py", recursive=True)  # 10-100x faster!
```

## Performance

### Benchmark Results

Benchmarks on a directory with 100,000 files:

| Operation            | `glob.glob()` | `pathlib` | `vexy_glob` | Speedup  |
| -------------------- | ------------- | --------- | ----------- | -------- |
| Find all `.py` files | 15.2s         | 18.1s     | 0.2s        | 76x      |
| Time to first result | 15.2s         | 18.1s     | 0.005s      | 3040x    |
| Memory usage         | 1.2GB         | 1.5GB     | 45MB        | 27x less |
| With .gitignore      | N/A           | N/A       | 0.15s       | N/A      |

### Performance Characteristics

- **Linear scaling:** Performance scales linearly with file count
- **I/O bound:** SSD vs HDD makes a significant difference
- **Cache friendly:** Repeated searches benefit from OS file cache
- **Memory constant:** Uses ~45MB regardless of result count

### Performance Tips

1. **Use specific patterns:** `src/**/*.py` is faster than `**/*.py`
2. **Limit depth:** Use `max_depth` when you know the structure
3. **Exclude early:** Use `exclude` patterns to skip large directories
4. **Leverage .gitignore:** Default behavior skips ignored files

## Cookbook - Real-World Examples

### Working with Git Repositories

```python
import vexy_glob

# Find all Python files, respecting .gitignore (default behavior)
for path in vexy_glob.find("**/*.py"):
    print(path)

# Include files that are gitignored
for path in vexy_glob.find("**/*.py", ignore_git=True):
    print(path)
```

### Finding Large Log Files

```python
import vexy_glob

# Find log files larger than 100MB
for path in vexy_glob.find("**/*.log", min_size=vexy_glob.parse_size("100M")):
    size_mb = os.path.getsize(path) / 1024 / 1024
    print(f"{path}: {size_mb:.1f}MB")

# Find log files between 10MB and 1GB
for path in vexy_glob.find(
    "**/*.log",
    min_size=vexy_glob.parse_size("10M"),
    max_size=vexy_glob.parse_size("1G")
):
    print(path)
```

### Finding Recently Modified Files

```python
import vexy_glob
from datetime import datetime, timedelta

# Files modified in the last 24 hours
for path in vexy_glob.find("**/*", mtime_after="-1d"):
    print(path)

# Files modified between 1 and 7 days ago
for path in vexy_glob.find(
    "**/*",
    mtime_after="-7d",
    mtime_before="-1d"
):
    print(path)

# Files modified after a specific date
for path in vexy_glob.find("**/*", mtime_after="2024-01-01"):
    print(path)
```

### Code Search - Finding TODOs and FIXMEs

```python
import vexy_glob

# Find all TODO comments in Python files
for match in vexy_glob.find("**/*.py", content=r"TODO|FIXME"):
    print(f"{match.path}:{match.line_number}: {match.line_text.strip()}")

# Find specific function definitions
for match in vexy_glob.find("**/*.py", content=r"def\s+process_data"):
    print(f"Found function at {match.path}:{match.line_number}")
```

### Finding Duplicate Files by Size

```python
import vexy_glob
from collections import defaultdict

# Group files by size to find potential duplicates
size_groups = defaultdict(list)

for path in vexy_glob.find("**/*", file_type="f"):
    size = os.path.getsize(path)
    if size > 0:  # Skip empty files
        size_groups[size].append(path)

# Print potential duplicates
for size, paths in size_groups.items():
    if len(paths) > 1:
        print(f"\nPotential duplicates ({size} bytes):")
        for path in paths:
            print(f"  {path}")
```

### Cleaning Build Artifacts

```python
import vexy_glob
import os

# Find and remove Python cache files
cache_patterns = [
    "**/__pycache__/**",
    "**/*.pyc",
    "**/*.pyo",
    "**/.pytest_cache/**",
    "**/.mypy_cache/**"
]

for pattern in cache_patterns:
    for path in vexy_glob.find(pattern, hidden=True):
        if os.path.isfile(path):
            os.remove(path)
            print(f"Removed: {path}")
        elif os.path.isdir(path):
            shutil.rmtree(path)
            print(f"Removed directory: {path}")
```

### Project Statistics

```python
import vexy_glob
from collections import Counter
import os

# Count files by extension
extension_counts = Counter()

for path in vexy_glob.find("**/*", file_type="f"):
    ext = os.path.splitext(path)[1].lower()
    if ext:
        extension_counts[ext] += 1

# Print top 10 file types
print("Top 10 file types in project:")
for ext, count in extension_counts.most_common(10):
    print(f"  {ext}: {count} files")

# Advanced statistics
total_size = 0
file_count = 0
largest_file = None
largest_size = 0

for path in vexy_glob.find("**/*", file_type="f"):
    size = os.path.getsize(path)
    total_size += size
    file_count += 1
    if size > largest_size:
        largest_size = size
        largest_file = path

print(f"\nProject Statistics:")
print(f"Total files: {file_count:,}")
print(f"Total size: {total_size / 1024 / 1024:.1f} MB")
print(f"Average file size: {total_size / file_count / 1024:.1f} KB")
print(f"Largest file: {largest_file} ({largest_size / 1024 / 1024:.1f} MB)")
```

### Integration with pandas

```python
import vexy_glob
import pandas as pd
import os

# Create a DataFrame of all Python files with metadata
file_data = []

for path in vexy_glob.find("**/*.py"):
    stat = os.stat(path)
    file_data.append({
        'path': path,
        'size': stat.st_size,
        'modified': pd.Timestamp(stat.st_mtime, unit='s'),
        'lines': sum(1 for _ in open(path, 'r', errors='ignore'))
    })

df = pd.DataFrame(file_data)

# Analyze the data
print(f"Total Python files: {len(df)}")
print(f"Total lines of code: {df['lines'].sum():,}")
print(f"Average file size: {df['size'].mean():.0f} bytes")
print(f"\nLargest files:")
print(df.nlargest(5, 'size')[['path', 'size', 'lines']])
```

### Parallel Processing Found Files

```python
import vexy_glob
from concurrent.futures import ProcessPoolExecutor
import os

def process_file(path):
    """Process a single file (e.g., count lines)"""
    try:
        with open(path, 'r', encoding='utf-8') as f:
            return path, sum(1 for _ in f)
    except:
        return path, 0

# Process all Python files in parallel
with ProcessPoolExecutor() as executor:
    # Get all files as a list
    files = vexy_glob.find("**/*.py", as_list=True)
    
    # Process in parallel
    results = executor.map(process_file, files)
    
    # Collect results
    total_lines = 0
    for path, lines in results:
        total_lines += lines
        if lines > 1000:
            print(f"Large file: {path} ({lines} lines)")
    
    print(f"\nTotal lines of code: {total_lines:,}")
```

## Migration Guide

### Migrating from `glob`

```python
# OLD: Using glob
import glob
import os

# Find all Python files
files = glob.glob("**/*.py", recursive=True)

# Filter by size manually
large_files = []
for f in files:
    if os.path.getsize(f) > 1024 * 1024:  # 1MB
        large_files.append(f)

# NEW: Using vexy_glob
import vexy_glob

# Find large Python files directly
large_files = vexy_glob.find("**/*.py", min_size=1024*1024, as_list=True)
```

### Migrating from `pathlib`

```python
# OLD: Using pathlib
from pathlib import Path

# Find all Python files
files = list(Path(".").rglob("*.py"))

# Filter by modification time manually
import datetime
recent = []
for f in files:
    if f.stat().st_mtime > (datetime.datetime.now() - datetime.timedelta(days=7)).timestamp():
        recent.append(f)

# NEW: Using vexy_glob
import vexy_glob

# Find recent Python files directly
recent = vexy_glob.find("**/*.py", mtime_after="-7d", as_path=True, as_list=True)
```

### Migrating from `os.walk`

```python
# OLD: Using os.walk
import os

# Find all .txt files
txt_files = []
for root, dirs, files in os.walk("."):
    for file in files:
        if file.endswith(".txt"):
            txt_files.append(os.path.join(root, file))

# NEW: Using vexy_glob
import vexy_glob

# Much simpler and faster!
txt_files = vexy_glob.find("**/*.txt", as_list=True)
```

## Development

This project is built with `maturin` - a tool for building and publishing Rust-based Python extensions.

### Prerequisites

- Python 3.8 or later
- Rust toolchain (install from [rustup.rs](https://rustup.rs/))
- `uv` for fast Python package management (optional but recommended)

### Setting Up Development Environment

```bash
# Clone the repository
git clone https://github.com/vexyart/vexy-glob.git
cd vexy-glob

# Set up a virtual environment (using uv for faster installation)
pip install uv
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install development dependencies
uv sync

# Build the Rust extension in development mode
python sync_version.py  # Sync version from git tags to Cargo.toml
maturin develop

# Run tests
pytest tests/

# Run benchmarks
pytest tests/test_benchmarks.py -v --benchmark-only
```

### Building Release Artifacts

The project uses a streamlined build system with automatic versioning from git tags.

#### Quick Build

```bash
# Build both wheel and source distribution
./build.sh
```

This script will:
1. Sync the version from git tags to `Cargo.toml`
2. Build an optimized wheel for your platform
3. Build a source distribution (sdist)
4. Place all artifacts in the `dist/` directory

#### Manual Build

```bash
# Ensure you have the latest tags
git fetch --tags

# Sync version to Cargo.toml
python sync_version.py

# Build wheel (platform-specific)
python -m maturin build --release -o dist/

# Build source distribution
python -m maturin sdist -o dist/
```

### Build System Details

The project uses:
- **maturin** as the build backend for creating Python wheels from Rust code
- **setuptools-scm** for automatic versioning based on git tags
- **sync_version.py** to synchronize versions between git tags and `Cargo.toml`

Key files:
- `pyproject.toml` - Python project configuration with maturin as build backend
- `Cargo.toml` - Rust project configuration
- `sync_version.py` - Version synchronization script
- `build.sh` - Convenience build script

### Versioning

Versions are managed through git tags:

```bash
# Create a new version tag
git tag v1.0.4
git push origin v1.0.4

# Build with the new version
./build.sh
```

The version will be automatically detected and used for both the Python package and Rust crate.

### Project Structure

```
vexy-glob/
├── src/                    # Rust source code
│   ├── lib.rs             # Main Rust library with PyO3 bindings
│   └── ...
├── vexy_glob/             # Python package
│   ├── __init__.py        # Python API wrapper
│   ├── __main__.py        # CLI implementation
│   └── ...
├── tests/                 # Python tests
│   ├── test_*.py          # Unit and integration tests
│   └── test_benchmarks.py # Performance benchmarks
├── Cargo.toml             # Rust project configuration
├── pyproject.toml         # Python project configuration
├── sync_version.py        # Version synchronization script
└── build.sh               # Build automation script
```

### CI/CD

The project uses GitHub Actions for continuous integration:
- Testing on Linux, macOS, and Windows
- Python versions 3.8 through 3.12
- Automatic wheel building for releases
- Cross-platform compatibility testing

## Exceptions and Error Handling

### Exception Hierarchy

```python
VexyGlobError(Exception)
├── PatternError(VexyGlobError, ValueError)
│   └── Raised for invalid glob patterns
├── SearchError(VexyGlobError, IOError)  
│   └── Raised for I/O or permission errors
└── TraversalNotSupportedError(VexyGlobError, NotImplementedError)
    └── Raised for unsupported operations
```

### Error Handling Examples

```python
import vexy_glob
from vexy_glob import VexyGlobError, PatternError, SearchError

try:
    # Invalid pattern
    for path in vexy_glob.find("[invalid"):
        print(path)
except PatternError as e:
    print(f"Invalid pattern: {e}")

try:
    # Permission denied or I/O error
    for path in vexy_glob.find("**/*", root="/root"):
        print(path)
except SearchError as e:
    print(f"Search failed: {e}")

# Handle any vexy_glob error
try:
    results = vexy_glob.find("**/*.py", content="[invalid regex")
except VexyGlobError as e:
    print(f"Operation failed: {e}")
```

## Platform-Specific Considerations

### Windows

- Use forward slashes `/` in patterns (automatically converted)
- Hidden files: Files with hidden attribute are included with `hidden=True`
- Case sensitivity: Windows is case-insensitive by default

```python
# Windows-specific examples
import vexy_glob

# These are equivalent on Windows
vexy_glob.find("C:/Users/*/Documents/*.docx")
vexy_glob.find("C:\\Users\\*\\Documents\\*.docx")  # Also works

# Find hidden files on Windows
for path in vexy_glob.find("**/*", hidden=True):
    print(path)
```

### macOS

- `.DS_Store` files are excluded by default (via .gitignore)
- Case sensitivity depends on file system (usually case-insensitive)

```python
# macOS-specific examples
import vexy_glob

# Exclude .DS_Store and other macOS metadata
for path in vexy_glob.find("**/*", exclude=["**/.DS_Store", "**/.Spotlight-V100", "**/.Trashes"]):
    print(path)
```

### Linux

- Always case-sensitive
- Hidden files start with `.`
- Respects standard Unix permissions

```python
# Linux-specific examples
import vexy_glob

# Find files in home directory config
for path in vexy_glob.find("~/.config/**/*.conf", hidden=True):
    print(path)
```

## Troubleshooting

### Common Issues

#### 1. No results found

```python
# Check if you need hidden files
results = list(vexy_glob.find("*"))
if not results:
    # Try with hidden files
    results = list(vexy_glob.find("*", hidden=True))

# Check if .gitignore is excluding files
results = list(vexy_glob.find("**/*.py", ignore_git=True))
```

#### 2. Pattern not matching expected files

```python
# Debug pattern matching
import vexy_glob

# Too specific?
print(list(vexy_glob.find("src/lib/test.py")))  # Only exact match

# Use wildcards
print(list(vexy_glob.find("src/**/test.py")))   # Any depth
print(list(vexy_glob.find("src/*/test.py")))    # One level only
```

#### 3. Content search not finding matches

```python
# Check regex syntax
import vexy_glob

# Wrong: Python regex syntax
results = vexy_glob.find("**/*.py", content=r"import\s+{re,os}")

# Correct: Standard regex
results = vexy_glob.find("**/*.py", content=r"import\s+(re|os)")

# Case sensitivity
results = vexy_glob.find("**/*.py", content="TODO", case_sensitive=False)
```

#### 4. Performance issues

```python
# Optimize your search
import vexy_glob

# Slow: Searching everything
for path in vexy_glob.find("**/*.py", content="import"):
    print(path)

# Fast: Limit scope
for path in vexy_glob.find("src/**/*.py", content="import", max_depth=3):
    print(path)

# Use exclusions
for path in vexy_glob.find(
    "**/*.py",
    exclude=["**/node_modules/**", "**/.venv/**", "**/build/**"]
):
    print(path)
```

### Build Issues

If you encounter build issues:

1. **Rust not found**: Install Rust from [rustup.rs](https://rustup.rs/)
2. **maturin not found**: Run `pip install maturin`
3. **Version mismatch**: Run `python sync_version.py` to sync versions
4. **Import errors**: Ensure you've run `maturin develop` after changes
5. **Build fails**: Check that you have the latest Rust stable toolchain

### Debug Mode

```python
import vexy_glob
import logging

# Enable debug logging
logging.basicConfig(level=logging.DEBUG)

# This will show internal operations
for path in vexy_glob.find("**/*.py"):
    print(path)
```

## FAQ

**Q: Why is vexy_glob so much faster than glob?**

A: vexy_glob uses Rust's parallel directory traversal, releases Python's GIL, and streams results as they're found instead of collecting everything first.

**Q: Does vexy_glob follow symbolic links?**

A: By default, no. Use `follow_symlinks=True` to enable. Loop detection is built-in.

**Q: Can I use vexy_glob with async/await?**

A: Yes! Use it with asyncio.to_thread():
```python
import asyncio
import vexy_glob

async def find_files():
    return await asyncio.to_thread(
        vexy_glob.find, "**/*.py", as_list=True
    )
```

**Q: How do I search in multiple directories?**

A: Call find() multiple times or use a common parent:
```python
# Option 1: Multiple calls
results = []
for root in ["src", "tests", "docs"]:
    results.extend(vexy_glob.find("**/*.py", root=root, as_list=True))

# Option 2: Common parent with specific patterns
results = vexy_glob.find("{src,tests,docs}/**/*.py", as_list=True)
```

**Q: Is the content search as powerful as ripgrep?**

A: Yes! It uses the same grep-searcher crate that powers ripgrep, including SIMD optimizations.

### Advanced Configuration

#### Custom Ignore Files

```python
import vexy_glob

# By default, respects .gitignore
for path in vexy_glob.find("**/*.py"):
    print(path)

# Also respects .ignore and .fdignore files
# Create .ignore in your project root:
# echo "test_*.py" > .ignore

# Now test files will be excluded
for path in vexy_glob.find("**/*.py"):
    print(path)  # test_*.py files excluded
```

#### Thread Configuration

```python
import vexy_glob
import os

# Auto-detect (default)
for path in vexy_glob.find("**/*.py"):
    pass

# Limit threads for CPU-bound operations
for match in vexy_glob.find("**/*.py", content="TODO", threads=2):
    pass

# Max parallelism for I/O-bound operations
cpu_count = os.cpu_count() or 4
for path in vexy_glob.find("**/*", threads=cpu_count * 2):
    pass
```

### Contributing

We welcome contributions! Here's how to get started:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature-name`)
3. Make your changes
4. Run tests (`pytest tests/`)
5. Format code (`cargo fmt` for Rust, `ruff format` for Python)
6. Commit with descriptive messages
7. Push and open a pull request

Before submitting:
- Ensure all tests pass
- Add tests for new functionality
- Update documentation as needed
- Follow existing code style

#### Running the Full Test Suite

```bash
# Python tests
pytest tests/ -v

# Python tests with coverage
pytest tests/ --cov=vexy_glob --cov-report=html

# Rust tests
cargo test

# Benchmarks
pytest tests/test_benchmarks.py -v --benchmark-only

# Linting
cargo clippy -- -D warnings
ruff check .
```

## API Stability and Versioning

vexy_glob follows [Semantic Versioning](https://semver.org/):

- **Major version (1.x.x)**: Breaking API changes
- **Minor version (x.1.x)**: New features, backwards compatible
- **Patch version (x.x.1)**: Bug fixes only

### Stable API Guarantees

The following are guaranteed stable in 1.x:

- `find()` function signature and basic parameters
- `glob()` and `iglob()` compatibility functions
- `SearchResult` object attributes
- Exception hierarchy
- CLI command structure

### Experimental Features

Features marked experimental may change:

- Thread count optimization algorithms
- Internal buffer size tuning
- Specific error message text

## Performance Tuning Guide

### For Maximum Speed

```python
import vexy_glob

# 1. Be specific with patterns
# Slow:
vexy_glob.find("**/*.py")
# Fast:
vexy_glob.find("src/**/*.py")

# 2. Use depth limits when possible
vexy_glob.find("**/*.py", max_depth=3)

# 3. Exclude unnecessary directories
vexy_glob.find(
    "**/*.py",
    exclude=["**/venv/**", "**/node_modules/**", "**/.git/**"]
)

# 4. Use file type filters
vexy_glob.find("**/*.py", file_type="f")  # Skip directories
```

### For Memory Efficiency

```python
# Stream results instead of collecting
# Memory efficient:
for path in vexy_glob.find("**/*"):
    process(path)  # Process one at a time

# Memory intensive:
all_files = vexy_glob.find("**/*", as_list=True)  # Loads all in memory
```

### For I/O Optimization

```python
# Optimize thread count based on storage type
import vexy_glob

# SSD: More threads help
for path in vexy_glob.find("**/*", threads=8):
    pass

# HDD: Fewer threads to avoid seek thrashing
for path in vexy_glob.find("**/*", threads=2):
    pass

# Network storage: Single thread might be best
for path in vexy_glob.find("**/*", threads=1):
    pass
```

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Built on the excellent Rust crates:
  - [`ignore`](https://github.com/BurntSushi/ripgrep/tree/master/crates/ignore) - Fast directory traversal
  - [`grep-searcher`](https://github.com/BurntSushi/ripgrep/tree/master/crates/grep-searcher) - High-performance text search
  - [`globset`](https://github.com/BurntSushi/ripgrep/tree/master/crates/globset) - Efficient glob matching
- Inspired by tools like [`fd`](https://github.com/sharkdp/fd) and [`ripgrep`](https://github.com/BurntSushi/ripgrep)
- Thanks to the PyO3 team for excellent Python-Rust bindings

## Related Projects

- [`fd`](https://github.com/sharkdp/fd) - A simple, fast alternative to `find`
- [`ripgrep`](https://github.com/BurntSushi/ripgrep) - Recursively search directories for a regex pattern
- [`walkdir`](https://github.com/python/cpython/blob/main/Lib/os.py) - Python's built-in directory traversal
- [`scandir`](https://github.com/benhoyt/scandir) - Better directory iteration for Python

---

**Happy fast file finding!** 🚀

If you find `vexy_glob` useful, please consider giving it a star on [GitHub](https://github.com/vexyart/vexy-glob)!


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "vexy-glob",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "filesystem, find, glob, parallel, rust, search",
    "author": null,
    "author_email": "Adam Twardoch <adam+github@twardoch.com>",
    "download_url": "https://files.pythonhosted.org/packages/25/ab/be754b19c7acea5ad55aa5311f4935ce96d38fb9b10b07ec799efefe6597/vexy_glob-1.0.9.tar.gz",
    "platform": null,
    "description": "# vexy_glob - Path Accelerated Finding in Rust\n\n[![PyPI version](https://badge.fury.io/py/vexy_glob.svg)](https://badge.fury.io/py/vexy_glob) [![CI](https://github.com/vexyart/vexy-glob/actions/workflows/ci.yml/badge.svg)](https://github.com/vexyart/vexy-glob/actions/workflows/ci.yml) [![codecov](https://codecov.io/gh/vexyart/vexy-glob/branch/main/graph/badge.svg)](https://codecov.io/gh/vexyart/vexy-glob)\n\n**`vexy_glob`** is a high-performance Python extension for file system traversal and content searching, built with Rust. It provides a faster and more feature-rich alternative to Python's built-in `glob` (up to 6x faster) and `pathlib` (up to 12x faster) modules.\n\n## TL;DR\n\n**Installation:**\n\n```bash\npip install vexy_glob\n```\n\n**Quick Start:**\n\nFind all Python files in the current directory and its subdirectories:\n\n```python\nimport vexy_glob\n\nfor path in vexy_glob.find(\"**/*.py\"):\n    print(path)\n```\n\nFind all files containing the text \"import asyncio\":\n\n```python\nfor match in vexy_glob.find(\"**/*.py\", content=\"import asyncio\"):\n    print(f\"{match.path}:{match.line_number}: {match.line_text}\")\n```\n\n## What is `vexy_glob`?\n\n`vexy_glob` is a Python library that provides a powerful and efficient way to find files and search for content within them. It's built on top of the excellent Rust crates `ignore` (for file traversal) and `grep-searcher` (for content searching), which are the same engines powering tools like `fd` and `ripgrep`.\n\nThis means you get the speed and efficiency of Rust, with the convenience and ease of use of Python.\n\n### Architecture Overview\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502   Python API Layer  \u2502  \u2190 Your Python code calls vexy_glob.find()\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502    PyO3 Bindings    \u2502  \u2190 Zero-copy conversions between Python/Rust\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502  Rust Core Engine   \u2502  \u2190 GIL released for true parallelism\n\u2502  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510  \u2502\n\u2502  \u2502 ignore crate  \u2502  \u2502  \u2190 Parallel directory traversal\n\u2502  \u2502 (from fd)     \u2502  \u2502     Respects .gitignore files\n\u2502  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518  \u2502\n\u2502  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510  \u2502\n\u2502  \u2502 grep-searcher \u2502  \u2502  \u2190 High-speed content search\n\u2502  \u2502 (from ripgrep)\u2502  \u2502     SIMD-accelerated regex\n\u2502  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518  \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502 Streaming Channel   \u2502  \u2190 Results yielded as found\n\u2502 (crossbeam-channel) \u2502     No memory accumulation\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n## Key Features\n\n- **\ud83d\ude80 Blazing Fast:** 10-100x faster than Python's `glob` and `pathlib` for many use cases.\n- **\u26a1 Streaming Results:** Get the first results in milliseconds, without waiting for the entire file system scan to complete.\n- **\ud83d\udcbe Memory Efficient:** `vexy_glob` uses constant memory, regardless of the number of files or results.\n- **\ud83d\udd25 Parallel Execution:** Utilizes all your CPU cores to get the job done as quickly as possible.\n- **\ud83d\udd0d Content Searching:** Ripgrep-style content searching with regex support.\n- **\ud83c\udfaf Rich Filtering:** Filter files by size, modification time, and more.\n- **\ud83e\udde0 Smart Defaults:** Automatically respects `.gitignore` files and skips hidden files and directories.\n- **\ud83c\udf0d Cross-Platform:** Works on Linux, macOS, and Windows.\n\n### Feature Comparison\n\n| Feature | `glob.glob()` | `pathlib` | `vexy_glob` |\n| --- | --- | --- | --- |\n| Pattern matching | \u2705 Basic | \u2705 Basic | \u2705 Advanced |\n| Recursive search | \u2705 Slow | \u2705 Slow | \u2705 Fast |\n| Streaming results | \u274c | \u274c | \u2705 |\n| Content search | \u274c | \u274c | \u2705 |\n| .gitignore respect | \u274c | \u274c | \u2705 |\n| Parallel execution | \u274c | \u274c | \u2705 |\n| Size filtering | \u274c | \u274c | \u2705 |\n| Time filtering | \u274c | \u274c | \u2705 |\n| Memory efficiency | \u274c | \u274c | \u2705 |\n\n## How it Works\n\n`vexy_glob` uses a Rust-powered backend to perform the heavy lifting of file system traversal and content searching. The Rust extension releases Python's Global Interpreter Lock (GIL), allowing for true parallelism and a significant performance boost.\n\nResults are streamed back to Python as they are found, using a producer-consumer architecture with crossbeam channels. This means you can start processing results immediately, without having to wait for the entire search to finish.\n\n## Why use `vexy_glob`?\n\nIf you find yourself writing scripts that need to find files based on patterns, or search for content within files, `vexy_glob` can be a game-changer. It's particularly useful for:\n\n- **Large codebases:** Quickly find files or code snippets in large projects.\n- **Log file analysis:** Search through gigabytes of logs in seconds.\n- **Data processing pipelines:** Efficiently find and process files based on various criteria.\n- **Build systems:** Fast dependency scanning and file collection.\n- **Data science:** Quickly locate and process data files.\n- **DevOps:** Log analysis, configuration management, deployment scripts.\n- **Testing:** Find test files, fixtures, and coverage reports.\n- **Anywhere you need to find files fast!**\n\n### When to Use vexy_glob vs Alternatives\n\n| Use Case | Best Tool | Why |\n| --- | --- | --- |\n| Simple pattern in small directory | `glob.glob()` | Built-in, no dependencies |\n| Large directory, need first result fast | `vexy_glob` | Streaming results |\n| Search file contents | `vexy_glob` | Integrated content search |\n| Complex filtering (size, time, etc.) | `vexy_glob` | Rich filtering API |\n| Cross-platform scripts | `vexy_glob` | Consistent behavior |\n| Git-aware file finding | `vexy_glob` | Respects .gitignore |\n| Memory-constrained environment | `vexy_glob` | Constant memory usage |\n\n## Installation and Usage\n\n### Python Library\n\nInstall `vexy_glob` using pip:\n\n```bash\npip install vexy_glob\n```\n\nThen use it in your Python code:\n\n```python\nimport vexy_glob\n\n# Find all Python files\nfor path in vexy_glob.find(\"**/*.py\"):\n    print(path)\n```\n\n### Command-Line Interface\n\n`vexy_glob` also provides a powerful command-line interface for finding files and searching content directly from your terminal.\n\n#### Finding Files\n\nUse `vexy_glob find` to locate files matching glob patterns:\n\n```bash\n# Find all Python files\nvexy_glob find \"**/*.py\"\n\n# Find all markdown files larger than 10KB\nvexy_glob find \"**/*.md\" --min-size 10k\n\n# Find all log files modified in the last 2 days\nvexy_glob find \"*.log\" --mtime-after -2d\n\n# Find only directories\nvexy_glob find \"*\" --type d\n\n# Include hidden files\nvexy_glob find \"*\" --hidden\n\n# Limit search depth\nvexy_glob find \"**/*.txt\" --depth 2\n```\n\n#### Searching Content\n\nUse `vexy_glob search` to find content within files:\n\n```bash\n# Search for \"import asyncio\" in Python files\nvexy_glob search \"**/*.py\" \"import asyncio\"\n\n# Search for function definitions using regex\nvexy_glob search \"src/**/*.rs\" \"fn\\\\s+\\\\w+\"\n\n# Search without color output (for piping)\nvexy_glob search \"**/*.md\" \"TODO|FIXME\" --no-color\n\n# Case-sensitive search\nvexy_glob search \"*.txt\" \"Error\" --case-sensitive\n\n# Search with size filters\nvexy_glob search \"**/*.log\" \"ERROR\" --min-size 1M --max-size 100M\n\n# Search recent files only\nvexy_glob search \"**/*.py\" \"TODO\" --mtime-after -7d\n\n# Complex search with multiple filters\nvexy_glob search \"src/**/*.{py,js}\" \"console\\.log|print\\(\" \\\n    --exclude \"*test*\" \\\n    --mtime-after -30d \\\n    --max-size 50k\n```\n\n#### Command-Line Options Reference\n\n**Common options for both `find` and `search`:**\n\n| Option | Type | Description | Example |\n| --- | --- | --- | --- |\n| `--root` | PATH | Root directory to start search | `--root /home/user/projects` |\n| `--min-size` | SIZE | Minimum file size | `--min-size 10k` |\n| `--max-size` | SIZE | Maximum file size | `--max-size 5M` |\n| `--mtime-after` | TIME | Modified after this time | `--mtime-after -7d` |\n| `--mtime-before` | TIME | Modified before this time | `--mtime-before 2024-01-01` |\n| `--atime-after` | TIME | Accessed after this time | `--atime-after -1h` |\n| `--atime-before` | TIME | Accessed before this time | `--atime-before -30d` |\n| `--ctime-after` | TIME | Created after this time | `--ctime-after -1w` |\n| `--ctime-before` | TIME | Created before this time | `--ctime-before -1y` |\n| `--no-gitignore` | FLAG | Don't respect .gitignore | `--no-gitignore` |\n| `--hidden` | FLAG | Include hidden files | `--hidden` |\n| `--case-sensitive` | FLAG | Force case sensitivity | `--case-sensitive` |\n| `--type` | CHAR | File type (f/d/l) | `--type f` |\n| `--extension` | STR | File extension(s) | `--extension py` |\n| `--exclude` | PATTERN | Exclude patterns | `--exclude \"*test*\"` |\n| `--depth` | INT | Maximum directory depth | `--depth 3` |\n| `--follow-symlinks` | FLAG | Follow symbolic links | `--follow-symlinks` |\n\n**Additional options for `search`:**\n\n| Option | Type | Description | Example |\n| --- | --- | --- | --- |\n| `--no-color` | FLAG | Disable colored output | `--no-color` |\n\n**Size format examples:**\n- Bytes: `1024` or `\"1024\"`\n- Kilobytes: `10k`, `10K`, `10kb`, `10KB`\n- Megabytes: `5m`, `5M`, `5mb`, `5MB`\n- Gigabytes: `2g`, `2G`, `2gb`, `2GB`\n- With decimals: `1.5M`, `2.7G`, `0.5K`\n\n**Time format examples:**\n- Relative: `-30s`, `-5m`, `-2h`, `-7d`, `-2w`, `-1mo`, `-1y`\n- ISO date: `2024-01-01`, `2024-01-01T10:30:00`\n- Natural: `yesterday`, `today` (converted to ISO dates)\n\n#### Unix Pipeline Integration\n\n`vexy_glob` works seamlessly with Unix pipelines:\n\n```bash\n# Count Python files\nvexy_glob find \"**/*.py\" | wc -l\n\n# Find Python files containing \"async\" and edit them\nvexy_glob search \"**/*.py\" \"async\" --no-color | cut -d: -f1 | sort -u | xargs $EDITOR\n\n# Find large log files and show their sizes\nvexy_glob find \"*.log\" --min-size 100M | xargs ls -lh\n\n# Search for TODOs and format as tasks\nvexy_glob search \"**/*.py\" \"TODO\" --no-color | awk -F: '{print \"- [ ] \" $1 \":\" $2 \": \" $3}'\n\n# Find duplicate file names\nvexy_glob find \"**/*\" --type f | xargs -n1 basename | sort | uniq -d\n\n# Create archive of recent changes\nvexy_glob find \"**/*\" --mtime-after -7d --type f | tar -czf recent_changes.tar.gz -T -\n\n# Find and replace across files\nvexy_glob search \"**/*.py\" \"OldClassName\" --no-color | cut -d: -f1 | sort -u | xargs sed -i 's/OldClassName/NewClassName/g'\n\n# Generate ctags for Python files\nvexy_glob find \"**/*.py\" | ctags -L -\n\n# Find empty directories\nvexy_glob find \"**\" --type d | while read dir; do [ -z \"$(ls -A \"$dir\")\" ] && echo \"$dir\"; done\n\n# Calculate total size of Python files\nvexy_glob find \"**/*.py\" --type f | xargs stat -f%z | awk '{s+=$1} END {print s}' | numfmt --to=iec\n```\n\n#### Advanced CLI Patterns\n\n```bash\n# Monitor for file changes (poor man's watch)\nwhile true; do\n    clear\n    echo \"Files modified in last minute:\"\n    vexy_glob find \"**/*\" --mtime-after -1m --type f\n    sleep 10\ndone\n\n# Parallel processing with GNU parallel\nvexy_glob find \"**/*.jpg\" | parallel -j4 convert {} {.}_thumb.jpg\n\n# Create a file manifest with checksums\nvexy_glob find \"**/*\" --type f | while read -r file; do\n    echo \"$(sha256sum \"$file\" | cut -d' ' -f1) $file\"\ndone > manifest.txt\n\n# Find files by content and show context\nvexy_glob search \"**/*.py\" \"class.*Error\" --no-color | while IFS=: read -r file line rest; do\n    echo \"\\n=== $file:$line ===\"\n    sed -n \"$((line-2)),$((line+2))p\" \"$file\"\ndone\n```\n\n## Detailed Python API Reference\n\n### Core Functions\n\n#### Core Functions\n\n##### `vexy_glob.find()`\n\nThe main function for finding files and searching content.\n\n###### Basic Syntax\n\n```python\ndef find(\n    pattern: str = \"*\",\n    root: Union[str, Path] = \".\",\n    *,\n    content: Optional[str] = None,\n    file_type: Optional[str] = None,\n    extension: Optional[Union[str, List[str]]] = None,\n    max_depth: Optional[int] = None,\n    min_depth: int = 0,\n    min_size: Optional[int] = None,\n    max_size: Optional[int] = None,\n    mtime_after: Optional[Union[float, int, str, datetime]] = None,\n    mtime_before: Optional[Union[float, int, str, datetime]] = None,\n    atime_after: Optional[Union[float, int, str, datetime]] = None,\n    atime_before: Optional[Union[float, int, str, datetime]] = None,\n    ctime_after: Optional[Union[float, int, str, datetime]] = None,\n    ctime_before: Optional[Union[float, int, str, datetime]] = None,\n    hidden: bool = False,\n    ignore_git: bool = False,\n    case_sensitive: Optional[bool] = None,\n    follow_symlinks: bool = False,\n    threads: Optional[int] = None,\n    as_path: bool = False,\n    as_list: bool = False,\n    exclude: Optional[Union[str, List[str]]] = None,\n) -> Union[Iterator[Union[str, Path, SearchResult]], List[Union[str, Path, SearchResult]]]:\n    \"\"\"Find files matching pattern with optional content search.\n    \n    Args:\n        pattern: Glob pattern to match files (e.g., \"**/*.py\", \"src/*.js\")\n        root: Root directory to start search from\n        content: Regex pattern to search within files\n        file_type: Filter by type - 'f' (file), 'd' (directory), 'l' (symlink)\n        extension: File extension(s) to filter by (e.g., \"py\" or [\"py\", \"pyi\"])\n        max_depth: Maximum directory depth to search\n        min_depth: Minimum directory depth to search\n        min_size: Minimum file size in bytes (or use parse_size())\n        max_size: Maximum file size in bytes\n        mtime_after: Files modified after this time\n        mtime_before: Files modified before this time\n        atime_after: Files accessed after this time\n        atime_before: Files accessed before this time\n        ctime_after: Files created after this time\n        ctime_before: Files created before this time\n        hidden: Include hidden files and directories\n        ignore_git: Don't respect .gitignore files\n        case_sensitive: Case sensitivity (None = smart case)\n        follow_symlinks: Follow symbolic links\n        threads: Number of threads (None = auto)\n        as_path: Return Path objects instead of strings\n        as_list: Return list instead of iterator\n        exclude: Patterns to exclude from results\n    \n    Returns:\n        Iterator or list of file paths (or SearchResult if content is specified)\n    \"\"\"\n```\n\n##### Basic Examples\n\n```python\nimport vexy_glob\n\n# Find all Python files\nfor path in vexy_glob.find(\"**/*.py\"):\n    print(path)\n\n# Find all files in the 'src' directory\nfor path in vexy_glob.find(\"src/**/*\"):\n    print(path)\n\n# Get results as a list instead of iterator\npython_files = vexy_glob.find(\"**/*.py\", as_list=True)\nprint(f\"Found {len(python_files)} Python files\")\n\n# Get results as Path objects\nfrom pathlib import Path\nfor path in vexy_glob.find(\"**/*.md\", as_path=True):\n    print(path.stem)  # Path object methods available\n```\n\n### Content Searching\n\nTo search for content within files, use the `content` parameter. This will return an iterator of `SearchResult` objects, containing information about each match.\n\n```python\nimport vexy_glob\n\nfor match in vexy_glob.find(\"*.py\", content=\"import requests\"):\n    print(f\"Found a match in {match.path} on line {match.line_number}:\")\n    print(f\"  {match.line_text.strip()}\")\n```\n\n#### SearchResult Object\n\nThe `SearchResult` object has the following attributes:\n\n- `path`: The path to the file containing the match.\n- `line_number`: The line number of the match (1-indexed).\n- `line_text`: The text of the line containing the match.\n- `matches`: A list of matched strings on the line.\n\n#### Content Search Examples\n\n```python\n# Simple text search\nfor match in vexy_glob.find(\"**/*.py\", content=\"TODO\"):\n    print(f\"{match.path}:{match.line_number}: {match.line_text.strip()}\")\n\n# Regex pattern search\nfor match in vexy_glob.find(\"**/*.py\", content=r\"def\\s+\\w+\\(.*\\):\"):\n    print(f\"Function at {match.path}:{match.line_number}\")\n\n# Case-insensitive search\nfor match in vexy_glob.find(\"**/*.md\", content=\"python\", case_sensitive=False):\n    print(match.path)\n\n# Multiple pattern search with OR\nfor match in vexy_glob.find(\"**/*.py\", content=\"import (os|sys|pathlib)\"):\n    print(f\"{match.path}: imports {match.matches}\")\n```\n\n### Filtering Options\n\n#### Size Filtering\n\n`vexy_glob` supports human-readable size formats:\n\n```python\nimport vexy_glob\n\n# Using parse_size() for readable formats\nmin_size = vexy_glob.parse_size(\"10K\")   # 10 kilobytes\nmax_size = vexy_glob.parse_size(\"5.5M\")  # 5.5 megabytes\n\nfor path in vexy_glob.find(\"**/*\", min_size=min_size, max_size=max_size):\n    print(path)\n\n# Supported formats:\n# - Bytes: \"1024\" or 1024\n# - Kilobytes: \"10K\", \"10KB\", \"10k\", \"10kb\"\n# - Megabytes: \"5M\", \"5MB\", \"5m\", \"5mb\"\n# - Gigabytes: \"2G\", \"2GB\", \"2g\", \"2gb\"\n# - Decimal: \"1.5M\", \"2.7G\"\n```\n\n#### Time Filtering\n\n`vexy_glob` accepts multiple time formats:\n\n```python\nimport vexy_glob\nfrom datetime import datetime, timedelta\n\n# 1. Relative time formats\nfor path in vexy_glob.find(\"**/*.log\", mtime_after=\"-1d\"):     # Last 24 hours\n    print(path)\n\n# Supported relative formats:\n# - Seconds: \"-30s\" or \"-30\"\n# - Minutes: \"-5m\"\n# - Hours: \"-2h\"\n# - Days: \"-7d\"\n# - Weeks: \"-2w\"\n# - Months: \"-1mo\" (30 days)\n# - Years: \"-1y\" (365 days)\n\n# 2. ISO date formats\nfor path in vexy_glob.find(\"**/*\", mtime_after=\"2024-01-01\"):\n    print(path)\n\n# Supported ISO formats:\n# - Date: \"2024-01-01\"\n# - DateTime: \"2024-01-01T10:30:00\"\n# - With timezone: \"2024-01-01T10:30:00Z\"\n\n# 3. Python datetime objects\nweek_ago = datetime.now() - timedelta(weeks=1)\nfor path in vexy_glob.find(\"**/*\", mtime_after=week_ago):\n    print(path)\n\n# 4. Unix timestamps\nimport time\nhour_ago = time.time() - 3600\nfor path in vexy_glob.find(\"**/*\", mtime_after=hour_ago):\n    print(path)\n\n# Combining time filters\nfor path in vexy_glob.find(\n    \"**/*.py\",\n    mtime_after=\"-30d\",      # Modified within 30 days\n    mtime_before=\"-1d\"       # But not in the last 24 hours\n):\n    print(path)\n```\n\n#### Type and Extension Filtering\n\n```python\nimport vexy_glob\n\n# Filter by file type\nfor path in vexy_glob.find(\"**/*\", file_type=\"d\"):  # Directories only\n    print(f\"Directory: {path}\")\n\n# File types:\n# - \"f\": Regular files\n# - \"d\": Directories\n# - \"l\": Symbolic links\n\n# Filter by extension\nfor path in vexy_glob.find(\"**/*\", extension=\"py\"):\n    print(path)\n\n# Multiple extensions\nfor path in vexy_glob.find(\"**/*\", extension=[\"py\", \"pyi\", \"pyx\"]):\n    print(path)\n```\n\n#### Exclusion Patterns\n\n```python\nimport vexy_glob\n\n# Exclude single pattern\nfor path in vexy_glob.find(\"**/*.py\", exclude=\"*test*\"):\n    print(path)\n\n# Exclude multiple patterns\nexclusions = [\n    \"**/__pycache__/**\",\n    \"**/node_modules/**\",\n    \"**/.git/**\",\n    \"**/build/**\",\n    \"**/dist/**\"\n]\nfor path in vexy_glob.find(\"**/*\", exclude=exclusions):\n    print(path)\n\n# Exclude specific files\nfor path in vexy_glob.find(\n    \"**/*.py\",\n    exclude=[\"setup.py\", \"**/conftest.py\", \"**/*_test.py\"]\n):\n    print(path)\n```\n\n### Pattern Matching Guide\n\n#### Glob Pattern Syntax\n\n| Pattern | Matches | Example |\n| --- | --- | --- |\n| `*` | Any characters (except `/`) | `*.py` matches `test.py` |\n| `**` | Any characters including `/` | `**/*.py` matches `src/lib/test.py` |\n| `?` | Single character | `test?.py` matches `test1.py` |\n| `[seq]` | Character in sequence | `test[123].py` matches `test2.py` |\n| `[!seq]` | Character not in sequence | `test[!0].py` matches `test1.py` |\n| `{a,b}` | Either pattern a or b | `*.{py,js}` matches `.py` and `.js` files |\n\n#### Smart Case Detection\n\nBy default, `vexy_glob` uses smart case detection:\n- If pattern contains uppercase \u2192 case-sensitive\n- If pattern is all lowercase \u2192 case-insensitive\n\n```python\n# Case-insensitive (finds README.md, readme.md, etc.)\nvexy_glob.find(\"readme.md\")\n\n# Case-sensitive (only finds README.md)\nvexy_glob.find(\"README.md\")\n\n# Force case sensitivity\nvexy_glob.find(\"readme.md\", case_sensitive=True)\n```\n\n### Drop-in Replacements\n\n`vexy_glob` provides drop-in replacements for standard library functions:\n\n```python\n# Replace glob.glob()\nimport vexy_glob\nfiles = vexy_glob.glob(\"**/*.py\", recursive=True)\n\n# Replace glob.iglob()\nfor path in vexy_glob.iglob(\"**/*.py\", recursive=True):\n    print(path)\n\n# Migration from standard library\n# OLD:\nimport glob\nfiles = glob.glob(\"**/*.py\", recursive=True)\n\n# NEW: Just change the import!\nimport vexy_glob as glob\nfiles = glob.glob(\"**/*.py\", recursive=True)  # 10-100x faster!\n```\n\n## Performance\n\n### Benchmark Results\n\nBenchmarks on a directory with 100,000 files:\n\n| Operation            | `glob.glob()` | `pathlib` | `vexy_glob` | Speedup  |\n| -------------------- | ------------- | --------- | ----------- | -------- |\n| Find all `.py` files | 15.2s         | 18.1s     | 0.2s        | 76x      |\n| Time to first result | 15.2s         | 18.1s     | 0.005s      | 3040x    |\n| Memory usage         | 1.2GB         | 1.5GB     | 45MB        | 27x less |\n| With .gitignore      | N/A           | N/A       | 0.15s       | N/A      |\n\n### Performance Characteristics\n\n- **Linear scaling:** Performance scales linearly with file count\n- **I/O bound:** SSD vs HDD makes a significant difference\n- **Cache friendly:** Repeated searches benefit from OS file cache\n- **Memory constant:** Uses ~45MB regardless of result count\n\n### Performance Tips\n\n1. **Use specific patterns:** `src/**/*.py` is faster than `**/*.py`\n2. **Limit depth:** Use `max_depth` when you know the structure\n3. **Exclude early:** Use `exclude` patterns to skip large directories\n4. **Leverage .gitignore:** Default behavior skips ignored files\n\n## Cookbook - Real-World Examples\n\n### Working with Git Repositories\n\n```python\nimport vexy_glob\n\n# Find all Python files, respecting .gitignore (default behavior)\nfor path in vexy_glob.find(\"**/*.py\"):\n    print(path)\n\n# Include files that are gitignored\nfor path in vexy_glob.find(\"**/*.py\", ignore_git=True):\n    print(path)\n```\n\n### Finding Large Log Files\n\n```python\nimport vexy_glob\n\n# Find log files larger than 100MB\nfor path in vexy_glob.find(\"**/*.log\", min_size=vexy_glob.parse_size(\"100M\")):\n    size_mb = os.path.getsize(path) / 1024 / 1024\n    print(f\"{path}: {size_mb:.1f}MB\")\n\n# Find log files between 10MB and 1GB\nfor path in vexy_glob.find(\n    \"**/*.log\",\n    min_size=vexy_glob.parse_size(\"10M\"),\n    max_size=vexy_glob.parse_size(\"1G\")\n):\n    print(path)\n```\n\n### Finding Recently Modified Files\n\n```python\nimport vexy_glob\nfrom datetime import datetime, timedelta\n\n# Files modified in the last 24 hours\nfor path in vexy_glob.find(\"**/*\", mtime_after=\"-1d\"):\n    print(path)\n\n# Files modified between 1 and 7 days ago\nfor path in vexy_glob.find(\n    \"**/*\",\n    mtime_after=\"-7d\",\n    mtime_before=\"-1d\"\n):\n    print(path)\n\n# Files modified after a specific date\nfor path in vexy_glob.find(\"**/*\", mtime_after=\"2024-01-01\"):\n    print(path)\n```\n\n### Code Search - Finding TODOs and FIXMEs\n\n```python\nimport vexy_glob\n\n# Find all TODO comments in Python files\nfor match in vexy_glob.find(\"**/*.py\", content=r\"TODO|FIXME\"):\n    print(f\"{match.path}:{match.line_number}: {match.line_text.strip()}\")\n\n# Find specific function definitions\nfor match in vexy_glob.find(\"**/*.py\", content=r\"def\\s+process_data\"):\n    print(f\"Found function at {match.path}:{match.line_number}\")\n```\n\n### Finding Duplicate Files by Size\n\n```python\nimport vexy_glob\nfrom collections import defaultdict\n\n# Group files by size to find potential duplicates\nsize_groups = defaultdict(list)\n\nfor path in vexy_glob.find(\"**/*\", file_type=\"f\"):\n    size = os.path.getsize(path)\n    if size > 0:  # Skip empty files\n        size_groups[size].append(path)\n\n# Print potential duplicates\nfor size, paths in size_groups.items():\n    if len(paths) > 1:\n        print(f\"\\nPotential duplicates ({size} bytes):\")\n        for path in paths:\n            print(f\"  {path}\")\n```\n\n### Cleaning Build Artifacts\n\n```python\nimport vexy_glob\nimport os\n\n# Find and remove Python cache files\ncache_patterns = [\n    \"**/__pycache__/**\",\n    \"**/*.pyc\",\n    \"**/*.pyo\",\n    \"**/.pytest_cache/**\",\n    \"**/.mypy_cache/**\"\n]\n\nfor pattern in cache_patterns:\n    for path in vexy_glob.find(pattern, hidden=True):\n        if os.path.isfile(path):\n            os.remove(path)\n            print(f\"Removed: {path}\")\n        elif os.path.isdir(path):\n            shutil.rmtree(path)\n            print(f\"Removed directory: {path}\")\n```\n\n### Project Statistics\n\n```python\nimport vexy_glob\nfrom collections import Counter\nimport os\n\n# Count files by extension\nextension_counts = Counter()\n\nfor path in vexy_glob.find(\"**/*\", file_type=\"f\"):\n    ext = os.path.splitext(path)[1].lower()\n    if ext:\n        extension_counts[ext] += 1\n\n# Print top 10 file types\nprint(\"Top 10 file types in project:\")\nfor ext, count in extension_counts.most_common(10):\n    print(f\"  {ext}: {count} files\")\n\n# Advanced statistics\ntotal_size = 0\nfile_count = 0\nlargest_file = None\nlargest_size = 0\n\nfor path in vexy_glob.find(\"**/*\", file_type=\"f\"):\n    size = os.path.getsize(path)\n    total_size += size\n    file_count += 1\n    if size > largest_size:\n        largest_size = size\n        largest_file = path\n\nprint(f\"\\nProject Statistics:\")\nprint(f\"Total files: {file_count:,}\")\nprint(f\"Total size: {total_size / 1024 / 1024:.1f} MB\")\nprint(f\"Average file size: {total_size / file_count / 1024:.1f} KB\")\nprint(f\"Largest file: {largest_file} ({largest_size / 1024 / 1024:.1f} MB)\")\n```\n\n### Integration with pandas\n\n```python\nimport vexy_glob\nimport pandas as pd\nimport os\n\n# Create a DataFrame of all Python files with metadata\nfile_data = []\n\nfor path in vexy_glob.find(\"**/*.py\"):\n    stat = os.stat(path)\n    file_data.append({\n        'path': path,\n        'size': stat.st_size,\n        'modified': pd.Timestamp(stat.st_mtime, unit='s'),\n        'lines': sum(1 for _ in open(path, 'r', errors='ignore'))\n    })\n\ndf = pd.DataFrame(file_data)\n\n# Analyze the data\nprint(f\"Total Python files: {len(df)}\")\nprint(f\"Total lines of code: {df['lines'].sum():,}\")\nprint(f\"Average file size: {df['size'].mean():.0f} bytes\")\nprint(f\"\\nLargest files:\")\nprint(df.nlargest(5, 'size')[['path', 'size', 'lines']])\n```\n\n### Parallel Processing Found Files\n\n```python\nimport vexy_glob\nfrom concurrent.futures import ProcessPoolExecutor\nimport os\n\ndef process_file(path):\n    \"\"\"Process a single file (e.g., count lines)\"\"\"\n    try:\n        with open(path, 'r', encoding='utf-8') as f:\n            return path, sum(1 for _ in f)\n    except:\n        return path, 0\n\n# Process all Python files in parallel\nwith ProcessPoolExecutor() as executor:\n    # Get all files as a list\n    files = vexy_glob.find(\"**/*.py\", as_list=True)\n    \n    # Process in parallel\n    results = executor.map(process_file, files)\n    \n    # Collect results\n    total_lines = 0\n    for path, lines in results:\n        total_lines += lines\n        if lines > 1000:\n            print(f\"Large file: {path} ({lines} lines)\")\n    \n    print(f\"\\nTotal lines of code: {total_lines:,}\")\n```\n\n## Migration Guide\n\n### Migrating from `glob`\n\n```python\n# OLD: Using glob\nimport glob\nimport os\n\n# Find all Python files\nfiles = glob.glob(\"**/*.py\", recursive=True)\n\n# Filter by size manually\nlarge_files = []\nfor f in files:\n    if os.path.getsize(f) > 1024 * 1024:  # 1MB\n        large_files.append(f)\n\n# NEW: Using vexy_glob\nimport vexy_glob\n\n# Find large Python files directly\nlarge_files = vexy_glob.find(\"**/*.py\", min_size=1024*1024, as_list=True)\n```\n\n### Migrating from `pathlib`\n\n```python\n# OLD: Using pathlib\nfrom pathlib import Path\n\n# Find all Python files\nfiles = list(Path(\".\").rglob(\"*.py\"))\n\n# Filter by modification time manually\nimport datetime\nrecent = []\nfor f in files:\n    if f.stat().st_mtime > (datetime.datetime.now() - datetime.timedelta(days=7)).timestamp():\n        recent.append(f)\n\n# NEW: Using vexy_glob\nimport vexy_glob\n\n# Find recent Python files directly\nrecent = vexy_glob.find(\"**/*.py\", mtime_after=\"-7d\", as_path=True, as_list=True)\n```\n\n### Migrating from `os.walk`\n\n```python\n# OLD: Using os.walk\nimport os\n\n# Find all .txt files\ntxt_files = []\nfor root, dirs, files in os.walk(\".\"):\n    for file in files:\n        if file.endswith(\".txt\"):\n            txt_files.append(os.path.join(root, file))\n\n# NEW: Using vexy_glob\nimport vexy_glob\n\n# Much simpler and faster!\ntxt_files = vexy_glob.find(\"**/*.txt\", as_list=True)\n```\n\n## Development\n\nThis project is built with `maturin` - a tool for building and publishing Rust-based Python extensions.\n\n### Prerequisites\n\n- Python 3.8 or later\n- Rust toolchain (install from [rustup.rs](https://rustup.rs/))\n- `uv` for fast Python package management (optional but recommended)\n\n### Setting Up Development Environment\n\n```bash\n# Clone the repository\ngit clone https://github.com/vexyart/vexy-glob.git\ncd vexy-glob\n\n# Set up a virtual environment (using uv for faster installation)\npip install uv\nuv venv\nsource .venv/bin/activate  # On Windows: .venv\\Scripts\\activate\n\n# Install development dependencies\nuv sync\n\n# Build the Rust extension in development mode\npython sync_version.py  # Sync version from git tags to Cargo.toml\nmaturin develop\n\n# Run tests\npytest tests/\n\n# Run benchmarks\npytest tests/test_benchmarks.py -v --benchmark-only\n```\n\n### Building Release Artifacts\n\nThe project uses a streamlined build system with automatic versioning from git tags.\n\n#### Quick Build\n\n```bash\n# Build both wheel and source distribution\n./build.sh\n```\n\nThis script will:\n1. Sync the version from git tags to `Cargo.toml`\n2. Build an optimized wheel for your platform\n3. Build a source distribution (sdist)\n4. Place all artifacts in the `dist/` directory\n\n#### Manual Build\n\n```bash\n# Ensure you have the latest tags\ngit fetch --tags\n\n# Sync version to Cargo.toml\npython sync_version.py\n\n# Build wheel (platform-specific)\npython -m maturin build --release -o dist/\n\n# Build source distribution\npython -m maturin sdist -o dist/\n```\n\n### Build System Details\n\nThe project uses:\n- **maturin** as the build backend for creating Python wheels from Rust code\n- **setuptools-scm** for automatic versioning based on git tags\n- **sync_version.py** to synchronize versions between git tags and `Cargo.toml`\n\nKey files:\n- `pyproject.toml` - Python project configuration with maturin as build backend\n- `Cargo.toml` - Rust project configuration\n- `sync_version.py` - Version synchronization script\n- `build.sh` - Convenience build script\n\n### Versioning\n\nVersions are managed through git tags:\n\n```bash\n# Create a new version tag\ngit tag v1.0.4\ngit push origin v1.0.4\n\n# Build with the new version\n./build.sh\n```\n\nThe version will be automatically detected and used for both the Python package and Rust crate.\n\n### Project Structure\n\n```\nvexy-glob/\n\u251c\u2500\u2500 src/                    # Rust source code\n\u2502   \u251c\u2500\u2500 lib.rs             # Main Rust library with PyO3 bindings\n\u2502   \u2514\u2500\u2500 ...\n\u251c\u2500\u2500 vexy_glob/             # Python package\n\u2502   \u251c\u2500\u2500 __init__.py        # Python API wrapper\n\u2502   \u251c\u2500\u2500 __main__.py        # CLI implementation\n\u2502   \u2514\u2500\u2500 ...\n\u251c\u2500\u2500 tests/                 # Python tests\n\u2502   \u251c\u2500\u2500 test_*.py          # Unit and integration tests\n\u2502   \u2514\u2500\u2500 test_benchmarks.py # Performance benchmarks\n\u251c\u2500\u2500 Cargo.toml             # Rust project configuration\n\u251c\u2500\u2500 pyproject.toml         # Python project configuration\n\u251c\u2500\u2500 sync_version.py        # Version synchronization script\n\u2514\u2500\u2500 build.sh               # Build automation script\n```\n\n### CI/CD\n\nThe project uses GitHub Actions for continuous integration:\n- Testing on Linux, macOS, and Windows\n- Python versions 3.8 through 3.12\n- Automatic wheel building for releases\n- Cross-platform compatibility testing\n\n## Exceptions and Error Handling\n\n### Exception Hierarchy\n\n```python\nVexyGlobError(Exception)\n\u251c\u2500\u2500 PatternError(VexyGlobError, ValueError)\n\u2502   \u2514\u2500\u2500 Raised for invalid glob patterns\n\u251c\u2500\u2500 SearchError(VexyGlobError, IOError)  \n\u2502   \u2514\u2500\u2500 Raised for I/O or permission errors\n\u2514\u2500\u2500 TraversalNotSupportedError(VexyGlobError, NotImplementedError)\n    \u2514\u2500\u2500 Raised for unsupported operations\n```\n\n### Error Handling Examples\n\n```python\nimport vexy_glob\nfrom vexy_glob import VexyGlobError, PatternError, SearchError\n\ntry:\n    # Invalid pattern\n    for path in vexy_glob.find(\"[invalid\"):\n        print(path)\nexcept PatternError as e:\n    print(f\"Invalid pattern: {e}\")\n\ntry:\n    # Permission denied or I/O error\n    for path in vexy_glob.find(\"**/*\", root=\"/root\"):\n        print(path)\nexcept SearchError as e:\n    print(f\"Search failed: {e}\")\n\n# Handle any vexy_glob error\ntry:\n    results = vexy_glob.find(\"**/*.py\", content=\"[invalid regex\")\nexcept VexyGlobError as e:\n    print(f\"Operation failed: {e}\")\n```\n\n## Platform-Specific Considerations\n\n### Windows\n\n- Use forward slashes `/` in patterns (automatically converted)\n- Hidden files: Files with hidden attribute are included with `hidden=True`\n- Case sensitivity: Windows is case-insensitive by default\n\n```python\n# Windows-specific examples\nimport vexy_glob\n\n# These are equivalent on Windows\nvexy_glob.find(\"C:/Users/*/Documents/*.docx\")\nvexy_glob.find(\"C:\\\\Users\\\\*\\\\Documents\\\\*.docx\")  # Also works\n\n# Find hidden files on Windows\nfor path in vexy_glob.find(\"**/*\", hidden=True):\n    print(path)\n```\n\n### macOS\n\n- `.DS_Store` files are excluded by default (via .gitignore)\n- Case sensitivity depends on file system (usually case-insensitive)\n\n```python\n# macOS-specific examples\nimport vexy_glob\n\n# Exclude .DS_Store and other macOS metadata\nfor path in vexy_glob.find(\"**/*\", exclude=[\"**/.DS_Store\", \"**/.Spotlight-V100\", \"**/.Trashes\"]):\n    print(path)\n```\n\n### Linux\n\n- Always case-sensitive\n- Hidden files start with `.`\n- Respects standard Unix permissions\n\n```python\n# Linux-specific examples\nimport vexy_glob\n\n# Find files in home directory config\nfor path in vexy_glob.find(\"~/.config/**/*.conf\", hidden=True):\n    print(path)\n```\n\n## Troubleshooting\n\n### Common Issues\n\n#### 1. No results found\n\n```python\n# Check if you need hidden files\nresults = list(vexy_glob.find(\"*\"))\nif not results:\n    # Try with hidden files\n    results = list(vexy_glob.find(\"*\", hidden=True))\n\n# Check if .gitignore is excluding files\nresults = list(vexy_glob.find(\"**/*.py\", ignore_git=True))\n```\n\n#### 2. Pattern not matching expected files\n\n```python\n# Debug pattern matching\nimport vexy_glob\n\n# Too specific?\nprint(list(vexy_glob.find(\"src/lib/test.py\")))  # Only exact match\n\n# Use wildcards\nprint(list(vexy_glob.find(\"src/**/test.py\")))   # Any depth\nprint(list(vexy_glob.find(\"src/*/test.py\")))    # One level only\n```\n\n#### 3. Content search not finding matches\n\n```python\n# Check regex syntax\nimport vexy_glob\n\n# Wrong: Python regex syntax\nresults = vexy_glob.find(\"**/*.py\", content=r\"import\\s+{re,os}\")\n\n# Correct: Standard regex\nresults = vexy_glob.find(\"**/*.py\", content=r\"import\\s+(re|os)\")\n\n# Case sensitivity\nresults = vexy_glob.find(\"**/*.py\", content=\"TODO\", case_sensitive=False)\n```\n\n#### 4. Performance issues\n\n```python\n# Optimize your search\nimport vexy_glob\n\n# Slow: Searching everything\nfor path in vexy_glob.find(\"**/*.py\", content=\"import\"):\n    print(path)\n\n# Fast: Limit scope\nfor path in vexy_glob.find(\"src/**/*.py\", content=\"import\", max_depth=3):\n    print(path)\n\n# Use exclusions\nfor path in vexy_glob.find(\n    \"**/*.py\",\n    exclude=[\"**/node_modules/**\", \"**/.venv/**\", \"**/build/**\"]\n):\n    print(path)\n```\n\n### Build Issues\n\nIf you encounter build issues:\n\n1. **Rust not found**: Install Rust from [rustup.rs](https://rustup.rs/)\n2. **maturin not found**: Run `pip install maturin`\n3. **Version mismatch**: Run `python sync_version.py` to sync versions\n4. **Import errors**: Ensure you've run `maturin develop` after changes\n5. **Build fails**: Check that you have the latest Rust stable toolchain\n\n### Debug Mode\n\n```python\nimport vexy_glob\nimport logging\n\n# Enable debug logging\nlogging.basicConfig(level=logging.DEBUG)\n\n# This will show internal operations\nfor path in vexy_glob.find(\"**/*.py\"):\n    print(path)\n```\n\n## FAQ\n\n**Q: Why is vexy_glob so much faster than glob?**\n\nA: vexy_glob uses Rust's parallel directory traversal, releases Python's GIL, and streams results as they're found instead of collecting everything first.\n\n**Q: Does vexy_glob follow symbolic links?**\n\nA: By default, no. Use `follow_symlinks=True` to enable. Loop detection is built-in.\n\n**Q: Can I use vexy_glob with async/await?**\n\nA: Yes! Use it with asyncio.to_thread():\n```python\nimport asyncio\nimport vexy_glob\n\nasync def find_files():\n    return await asyncio.to_thread(\n        vexy_glob.find, \"**/*.py\", as_list=True\n    )\n```\n\n**Q: How do I search in multiple directories?**\n\nA: Call find() multiple times or use a common parent:\n```python\n# Option 1: Multiple calls\nresults = []\nfor root in [\"src\", \"tests\", \"docs\"]:\n    results.extend(vexy_glob.find(\"**/*.py\", root=root, as_list=True))\n\n# Option 2: Common parent with specific patterns\nresults = vexy_glob.find(\"{src,tests,docs}/**/*.py\", as_list=True)\n```\n\n**Q: Is the content search as powerful as ripgrep?**\n\nA: Yes! It uses the same grep-searcher crate that powers ripgrep, including SIMD optimizations.\n\n### Advanced Configuration\n\n#### Custom Ignore Files\n\n```python\nimport vexy_glob\n\n# By default, respects .gitignore\nfor path in vexy_glob.find(\"**/*.py\"):\n    print(path)\n\n# Also respects .ignore and .fdignore files\n# Create .ignore in your project root:\n# echo \"test_*.py\" > .ignore\n\n# Now test files will be excluded\nfor path in vexy_glob.find(\"**/*.py\"):\n    print(path)  # test_*.py files excluded\n```\n\n#### Thread Configuration\n\n```python\nimport vexy_glob\nimport os\n\n# Auto-detect (default)\nfor path in vexy_glob.find(\"**/*.py\"):\n    pass\n\n# Limit threads for CPU-bound operations\nfor match in vexy_glob.find(\"**/*.py\", content=\"TODO\", threads=2):\n    pass\n\n# Max parallelism for I/O-bound operations\ncpu_count = os.cpu_count() or 4\nfor path in vexy_glob.find(\"**/*\", threads=cpu_count * 2):\n    pass\n```\n\n### Contributing\n\nWe welcome contributions! Here's how to get started:\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature-name`)\n3. Make your changes\n4. Run tests (`pytest tests/`)\n5. Format code (`cargo fmt` for Rust, `ruff format` for Python)\n6. Commit with descriptive messages\n7. Push and open a pull request\n\nBefore submitting:\n- Ensure all tests pass\n- Add tests for new functionality\n- Update documentation as needed\n- Follow existing code style\n\n#### Running the Full Test Suite\n\n```bash\n# Python tests\npytest tests/ -v\n\n# Python tests with coverage\npytest tests/ --cov=vexy_glob --cov-report=html\n\n# Rust tests\ncargo test\n\n# Benchmarks\npytest tests/test_benchmarks.py -v --benchmark-only\n\n# Linting\ncargo clippy -- -D warnings\nruff check .\n```\n\n## API Stability and Versioning\n\nvexy_glob follows [Semantic Versioning](https://semver.org/):\n\n- **Major version (1.x.x)**: Breaking API changes\n- **Minor version (x.1.x)**: New features, backwards compatible\n- **Patch version (x.x.1)**: Bug fixes only\n\n### Stable API Guarantees\n\nThe following are guaranteed stable in 1.x:\n\n- `find()` function signature and basic parameters\n- `glob()` and `iglob()` compatibility functions\n- `SearchResult` object attributes\n- Exception hierarchy\n- CLI command structure\n\n### Experimental Features\n\nFeatures marked experimental may change:\n\n- Thread count optimization algorithms\n- Internal buffer size tuning\n- Specific error message text\n\n## Performance Tuning Guide\n\n### For Maximum Speed\n\n```python\nimport vexy_glob\n\n# 1. Be specific with patterns\n# Slow:\nvexy_glob.find(\"**/*.py\")\n# Fast:\nvexy_glob.find(\"src/**/*.py\")\n\n# 2. Use depth limits when possible\nvexy_glob.find(\"**/*.py\", max_depth=3)\n\n# 3. Exclude unnecessary directories\nvexy_glob.find(\n    \"**/*.py\",\n    exclude=[\"**/venv/**\", \"**/node_modules/**\", \"**/.git/**\"]\n)\n\n# 4. Use file type filters\nvexy_glob.find(\"**/*.py\", file_type=\"f\")  # Skip directories\n```\n\n### For Memory Efficiency\n\n```python\n# Stream results instead of collecting\n# Memory efficient:\nfor path in vexy_glob.find(\"**/*\"):\n    process(path)  # Process one at a time\n\n# Memory intensive:\nall_files = vexy_glob.find(\"**/*\", as_list=True)  # Loads all in memory\n```\n\n### For I/O Optimization\n\n```python\n# Optimize thread count based on storage type\nimport vexy_glob\n\n# SSD: More threads help\nfor path in vexy_glob.find(\"**/*\", threads=8):\n    pass\n\n# HDD: Fewer threads to avoid seek thrashing\nfor path in vexy_glob.find(\"**/*\", threads=2):\n    pass\n\n# Network storage: Single thread might be best\nfor path in vexy_glob.find(\"**/*\", threads=1):\n    pass\n```\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- Built on the excellent Rust crates:\n  - [`ignore`](https://github.com/BurntSushi/ripgrep/tree/master/crates/ignore) - Fast directory traversal\n  - [`grep-searcher`](https://github.com/BurntSushi/ripgrep/tree/master/crates/grep-searcher) - High-performance text search\n  - [`globset`](https://github.com/BurntSushi/ripgrep/tree/master/crates/globset) - Efficient glob matching\n- Inspired by tools like [`fd`](https://github.com/sharkdp/fd) and [`ripgrep`](https://github.com/BurntSushi/ripgrep)\n- Thanks to the PyO3 team for excellent Python-Rust bindings\n\n## Related Projects\n\n- [`fd`](https://github.com/sharkdp/fd) - A simple, fast alternative to `find`\n- [`ripgrep`](https://github.com/BurntSushi/ripgrep) - Recursively search directories for a regex pattern\n- [`walkdir`](https://github.com/python/cpython/blob/main/Lib/os.py) - Python's built-in directory traversal\n- [`scandir`](https://github.com/benhoyt/scandir) - Better directory iteration for Python\n\n---\n\n**Happy fast file finding!** \ud83d\ude80\n\nIf you find `vexy_glob` useful, please consider giving it a star on [GitHub](https://github.com/vexyart/vexy-glob)!\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Vexy Glob fast file finding",
    "version": "1.0.9",
    "project_urls": {
        "Bug Tracker": "https://github.com/vexyart/vexy-glob/issues",
        "Homepage": "https://github.com/vexyart/vexy-glob",
        "Repository": "https://github.com/vexyart/vexy-glob"
    },
    "split_keywords": [
        "filesystem",
        " find",
        " glob",
        " parallel",
        " rust",
        " search"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "52a25d3c74a12fa93f6567e4c3d69c255b7cdcba58e6971f25c7b7672a288f53",
                "md5": "4141c35cae85e8712e18b666c684902f",
                "sha256": "fafc08862efcea87b309525ba0a47f1c827b969ef55ba7f5bdb9a91b59a9a324"
            },
            "downloads": -1,
            "filename": "vexy_glob-1.0.9-cp38-abi3-macosx_10_12_x86_64.whl",
            "has_sig": false,
            "md5_digest": "4141c35cae85e8712e18b666c684902f",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 1124707,
            "upload_time": "2025-08-04T23:40:02",
            "upload_time_iso_8601": "2025-08-04T23:40:02.164714Z",
            "url": "https://files.pythonhosted.org/packages/52/a2/5d3c74a12fa93f6567e4c3d69c255b7cdcba58e6971f25c7b7672a288f53/vexy_glob-1.0.9-cp38-abi3-macosx_10_12_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "25abbe754b19c7acea5ad55aa5311f4935ce96d38fb9b10b07ec799efefe6597",
                "md5": "53bb38887335ff4d866f5383e520e427",
                "sha256": "e334f8fb78d0e269768c4b8537f699821611c76b2ab62cbdb3c2298715071a08"
            },
            "downloads": -1,
            "filename": "vexy_glob-1.0.9.tar.gz",
            "has_sig": false,
            "md5_digest": "53bb38887335ff4d866f5383e520e427",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 268795,
            "upload_time": "2025-08-04T23:39:59",
            "upload_time_iso_8601": "2025-08-04T23:39:59.570281Z",
            "url": "https://files.pythonhosted.org/packages/25/ab/be754b19c7acea5ad55aa5311f4935ce96d38fb9b10b07ec799efefe6597/vexy_glob-1.0.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-04 23:39:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "vexyart",
    "github_project": "vexy-glob",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "vexy-glob"
}
        
Elapsed time: 1.02968s