project-vectorizer

Name	project-vectorizer JSON
Version	0.1.2 JSON
	download
home_page	None
Summary	CLI tool for vectorizing codebases and serving them via MCP
upload_time	2025-10-13 08:49:39
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	MIT
keywords	vectorization code-search embeddings semantic-search mcp chromadb codebase-analysis ai-tools
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Project Vectorizer

A powerful CLI tool that vectorizes codebases, stores them in a vector database, tracks changes, and serves them via MCP (Model Context Protocol) for AI agents like Claude, Codex, and others.

**Latest Version**: 0.1.2 | [Changelog](#changelog) | [GitHub](https://github.com/starkbaknet/project-vectorizer)

---

## 📋 Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Performance Optimization](#performance-optimization)
- [CLI Commands](#cli-commands)
- [Configuration](#configuration)
- [Search Features](#search-features)
- [MCP Server](#mcp-server)
- [Advanced Usage](#advanced-usage)
- [Troubleshooting](#troubleshooting)
- [Changelog](#changelog)
- [Contributing](#contributing)

---

## Features

### 🚀 Performance & Optimization

- **Auto-Optimized Config** - Auto-detect CPU cores and RAM for optimal settings (`--optimize`)
- **Max Resources Mode** - Use maximum system resources for fastest indexing (`--max-resources`)
- **Smart Incremental** - 60-70% faster indexing with intelligent change categorization
- **Git-Aware Indexing** - 80-90% faster by indexing only git-changed files
- **Parallel Processing** - Multi-threaded with auto-detected optimal worker count (up to 16 workers)
- **Memory Monitoring** - Real-time memory tracking with automatic garbage collection
- **Batch Optimization** - Memory-based batch size calculation for safe processing

### 🔍 Search & Indexing

- **Code Vectorization** - Parse and vectorize with sentence-transformers or OpenAI embeddings
- **Multi-Level Chunking** - Functions, classes, micro-chunks, and word-level chunks for precision
- **Enhanced Single-Word Search** - High-precision search for single keywords (0.8+ thresholds)
- **Semantic + Exact Search** - Combines semantic similarity with exact word matching
- **Adaptive Thresholds** - Automatically adjusts for optimal results
- **Multiple Languages** - 30+ languages (Python, JS, TS, Go, Rust, Java, C++, C, PHP, Ruby, Swift, Kotlin, and more)

### 🔄 Change Management

- **Git Integration** - Track changes via git commits with `index-git` command
- **Smart File Categorization** - Detects New, Modified, and Deleted files
- **Watch Mode** - Real-time monitoring with configurable debouncing (0.5-10s)
- **Incremental Updates** - Only re-index changed content
- **Hash-Based Detection** - SHA256 file hashing for accurate change detection

### 🌐 AI Integration

- **MCP Server** - Model Context Protocol for AI agents (Claude, Codex, etc.)
- **HTTP Fallback API** - RESTful endpoints when MCP unavailable
- **Semantic Search** - Natural language queries for code discovery
- **File Operations** - Get content, list files, project statistics

### 🎨 User Experience

- **Clean Progress Output** - Single unified progress bar with timing information
- **Suppressed Library Logs** - No cluttered batch progress bars from dependencies
- **Timing Information** - Elapsed time for all operations (seconds or minutes+seconds)
- **Verbose Mode** - Optional detailed logging for debugging
- **Professional UI** - Rich terminal output with colors, panels, and formatting
- **Real-time Updates** - Live file names and status tags during indexing

### 💾 Database & Storage

- **ChromaDB Backend** - High-performance vector database
- **Fast HNSW Indexing** - Optimized similarity search algorithm
- **Scalable** - Handles 500K+ chunks efficiently
- **Single Database** - No external dependencies required
- **Custom Paths** - Configurable database location

---

## Installation

### From PyPI (Recommended)

```bash
# Install from PyPI
pip install project-vectorizer

# Verify installation
pv --version
```

### From Source

```bash
# Clone repository
git clone https://github.com/starkbaknet/project-vectorizer.git
cd project-vectorizer

# Install
pip install -e .

# Or with development dependencies
pip install -e ".[dev]"
```

---

## Quick Start

### 1. Initialize Your Project

```bash
# 🚀 Recommended: Auto-optimize based on your system (16 workers, 400 batch on 8-core/16GB RAM)
pv init /path/to/project --optimize

# Or with custom settings
pv init /path/to/project \
  --name "My Project" \
  --embedding-model "all-MiniLM-L6-v2" \
  --chunk-size 256 \
  --optimize
```

**Output:**

```
✓ Project initialized successfully!

Name: My Project
Path: /path/to/project
Model: all-MiniLM-L6-v2
Provider: sentence-transformers
Chunk Size: 256 tokens

Optimized Settings:
  • Workers: 16
  • Batch Size: 400
  • Embedding Batch: 200
  • Memory Monitoring: Enabled
  • GC Interval: 100 files
```

### 2. Index Your Codebase

```bash
# 🚀 Recommended: First-time indexing with max resources (2-4x faster)
pv index /path/to/project --max-resources

# 🚀 Recommended: Smart incremental for updates (60-70% faster)
pv index /path/to/project --smart

# 🚀 Recommended: Git-aware for recent changes (80-90% faster)
pv index-git /path/to/project --since HEAD~5

# Standard full indexing
pv index /path/to/project

# Force re-index everything
pv index /path/to/project --force

# Combine for maximum performance
pv index /path/to/project --smart --max-resources
```

**Output:**

```
Using maximum system resources (optimized settings)...
  • Workers: 16
  • Batch Size: 400
  • Embedding Batch: 200

  Indexing examples/demo.py ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100%

╭────────────────── Indexing Complete ──────────────────╮
│ ✓ Indexing complete!                                  │
│                                                       │
│ Files indexed: 48/49                                  │
│ Total chunks: 9222                                    │
│ Model: all-MiniLM-L6-v2                               │
│ Time taken: 2m 16s                                    │
│                                                       │
│ You can now search with: pv search . "your query"     │
╰───────────────────────────────────────────────────────╯
```

### 3. Search Your Code

```bash
# Natural language search
pv search /path/to/project "authentication logic"

# Single-word searches work great (high precision)
pv search /path/to/project "async" --threshold 0.8
pv search /path/to/project "test" --threshold 0.9

# Multi-word queries (semantic search)
pv search /path/to/project "user login validation" --threshold 0.5

# Find specific constructs
pv search /path/to/project "class" --limit 10
```

**Output:**

```
Search Results for: authentication logic

Found 5 result(s) with threshold >= 0.5

╭─────────────────────── Result 1 ───────────────────────╮
│ src/auth/login.py                                      │
│ Lines 45-67 | Similarity: 0.892                        │
│                                                        │
│ def authenticate_user(username: str, password: str):   │
│     """                                                │
│     Authenticate user credentials against database     │
│     Returns user object if valid, None otherwise       │
│     """                                                │
│     ...                                                │
╰────────────────────────────────────────────────────────╯
```

### 4. Start MCP Server

```bash
# Start server (default: localhost:8000)
pv serve /path/to/project

# Custom host/port
pv serve /path/to/project --host 0.0.0.0 --port 8080
```

### 5. Monitor Changes in Real-Time

```bash
# Watch for file changes (default 2s debounce)
pv sync /path/to/project --watch

# Fast feedback (0.5s)
pv sync /path/to/project --watch --debounce 0.5

# Slower systems (5s)
pv sync /path/to/project --watch --debounce 5.0
```

---

## Performance Optimization

### Understanding the Optimization Flags

#### `--optimize` (Permanent)

Use when **initializing** a new project. Detects your system and saves optimal settings.

```bash
pv init /path/to/project --optimize
```

**What it does:**

- Detects CPU cores → sets `max_workers` (e.g., 8 cores = 16 workers)
- Calculates RAM → sets safe `batch_size` (e.g., 16GB = 400 batch)
- Sets memory thresholds based on total RAM
- **Saves to config** - All future operations use these settings

**When to use:**

- ✅ New projects
- ✅ Want permanent optimization
- ✅ Same machine for all operations
- ✅ "Set and forget" approach

#### `--max-resources` (Temporary)

Use when **indexing** to temporarily boost performance without changing config.

```bash
pv index /path/to/project --max-resources
pv index-git /path/to/project --since HEAD~1 --max-resources
```

**What it does:**

- Detects system resources (same as --optimize)
- **Temporarily overrides** config for this operation only
- Original config unchanged

**When to use:**

- ✅ Existing project without optimization
- ✅ One-time heavy indexing
- ✅ CI/CD with dedicated resources
- ✅ Don't want to modify config

### Performance Benchmarks

**System**: 8-core CPU, 16GB RAM, SSD

| Mode               | Files     | Chunks | Time   | Settings              |
| ------------------ | --------- | ------ | ------ | --------------------- |
| Standard           | 48        | 9222   | 4m 32s | 4 workers, 100 batch  |
| --max-resources    | 48        | 9222   | 2m 16s | 16 workers, 400 batch |
| Smart incremental  | 5 changed | 412    | 24s    | 16 workers, 400 batch |
| Git-aware (HEAD~1) | 3 changed | 287    | 15s    | 16 workers, 400 batch |

**Key Findings:**

- `--max-resources`: **2x faster** for full indexing
- Smart incremental: **60-70% faster** than full reindex
- Git-aware: **80-90% faster** for recent changes
- Chunk size (128 vs 512): **No performance difference** (same ~2m 16s)

### System Resource Detection

**CPU Detection:**

```
Detected: 8 cores
Optimal workers: min(8 * 2, 16) = 16 workers
```

**Memory Detection:**

```
Total RAM: 16GB
Available RAM: 8GB
Safe batch size: 8GB * 0.5 * 100 = 400
Embedding batch: 400 * 0.5 = 200
GC interval: 100 files
```

**Memory Thresholds:**

```
32GB+ RAM → threshold: 50000
16-32GB   → threshold: 20000
8-16GB    → threshold: 10000
<8GB      → threshold: 5000
```

### Best Practices

1. **Initialize with optimization**

   ```bash
   pv init ~/my-project --optimize
   ```

2. **Use max resources for heavy operations**

   ```bash
   pv index ~/my-project --force --max-resources
   ```

3. **Use smart mode for daily updates**

   ```bash
   pv index ~/my-project --smart
   ```

4. **Use git-aware after pulling changes**

   ```bash
   pv index-git ~/my-project --since HEAD~1
   ```

5. **Monitor memory with verbose mode**
   ```bash
   pv index ~/my-project --max-resources --verbose
   ```

---

## CLI Commands

### Global Options

```bash
pv [OPTIONS] COMMAND [ARGS]

Options:
  -v, --verbose    Enable verbose output
  --version        Show version
  --help           Show help
```

### `pv init` - Initialize Project

Initialize a new project for vectorization.

```bash
pv init [OPTIONS] PROJECT_PATH

Options:
  -n, --name TEXT              Project name (default: directory name)
  -m, --embedding-model TEXT   Model name (default: all-MiniLM-L6-v2)
  -p, --embedding-provider     Provider: sentence-transformers | openai
  -c, --chunk-size INT         Chunk size in tokens (default: 256)
  -o, --chunk-overlap INT      Overlap in tokens (default: 32)
  --optimize                   Auto-optimize based on system resources ⭐
```

**Examples:**

```bash
# Basic initialization
pv init /path/to/project

# With optimization (recommended)
pv init /path/to/project --optimize

# With OpenAI embeddings
export OPENAI_API_KEY="sk-..."
pv init /path/to/project \
  --embedding-provider openai \
  --embedding-model text-embedding-ada-002 \
  --optimize
```

### `pv index` - Index Codebase

Index the codebase for searching.

```bash
pv index [OPTIONS] PROJECT_PATH

Options:
  -i, --incremental      Only index changed files
  -s, --smart            Smart incremental (categorized: new/modified/deleted) ⭐
  -f, --force            Force re-index all files
  --max-resources        Use maximum system resources ⭐
```

**Examples:**

```bash
# Full indexing with max resources
pv index /path/to/project --max-resources

# Smart incremental (fastest for updates)
pv index /path/to/project --smart

# Combine for maximum performance
pv index /path/to/project --smart --max-resources

# Force complete reindex
pv index /path/to/project --force
```

### `pv index-git` - Git-Aware Indexing

Index only files changed in git commits.

```bash
pv index-git [OPTIONS] PROJECT_PATH

Options:
  -s, --since TEXT       Git reference (default: HEAD~1)
  --max-resources        Use maximum system resources ⭐
```

**Examples:**

```bash
# Last commit
pv index-git /path/to/project --since HEAD~1

# Last 5 commits
pv index-git /path/to/project --since HEAD~5

# Since main branch
pv index-git /path/to/project --since main

# Since specific commit
pv index-git /path/to/project --since abc123def

# With max resources
pv index-git /path/to/project --since HEAD~10 --max-resources
```

**Use Cases:**

- After `git pull` - index only new changes
- Before code review - index PR changes
- CI/CD pipelines - index commit range
- After branch switch - index differences

### `pv search` - Search Code

Search through vectorized codebase.

```bash
pv search [OPTIONS] PROJECT_PATH QUERY

Options:
  -l, --limit INT        Number of results (default: 10)
  -t, --threshold FLOAT  Similarity threshold 0.0-1.0 (default: 0.3)
```

**Examples:**

```bash
# Natural language search
pv search /path/to/project "error handling in database connections"

# Single-word search (high threshold)
pv search /path/to/project "async" --threshold 0.9

# Find all tests
pv search /path/to/project "test" --limit 20 --threshold 0.8

# Broad semantic search (low threshold)
pv search /path/to/project "api authentication" --threshold 0.3
```

**Threshold Guide:**

- **0.8-0.95**: Single words, exact matches
- **0.5-0.7**: Multi-word phrases, semantic
- **0.3-0.5**: Complex queries, broad search
- **0.1-0.3**: Very broad, exploratory

### `pv sync` - Sync Changes / Watch Mode

Sync changes or watch for file modifications.

```bash
pv sync [OPTIONS] PROJECT_PATH

Options:
  -w, --watch           Watch for file changes
  -d, --debounce FLOAT  Debounce delay in seconds (default: 2.0)
```

**Examples:**

```bash
# One-time sync (smart incremental)
pv sync /path/to/project

# Watch mode with default debounce (2s)
pv sync /path/to/project --watch

# Fast feedback (0.5s)
pv sync /path/to/project --watch --debounce 0.5

# Slower systems (5s)
pv sync /path/to/project --watch --debounce 5.0
```

**Debounce Explained:**

- Waits X seconds after last file change before indexing
- Batches multiple rapid changes together
- Prevents redundant indexing when saving files repeatedly
- Reduces CPU usage during active development

**Recommended Values:**

- **0.5-1.0s**: Fast machines, need instant feedback
- **2.0s**: Balanced (default)
- **5.0-10.0s**: Slower machines, large codebases

### `pv serve` - Start MCP Server

Start MCP server for AI agent integration.

```bash
pv serve [OPTIONS] PROJECT_PATH

Options:
  -p, --port INT   Port number (default: 8000)
  -h, --host TEXT  Host address (default: localhost)
```

**Examples:**

```bash
# Start server
pv serve /path/to/project

# Custom port
pv serve /path/to/project --port 8080

# Expose to network
pv serve /path/to/project --host 0.0.0.0 --port 8000
```

### `pv status` - Show Project Status

Show project status and statistics.

```bash
pv status PROJECT_PATH
```

**Output:**

```
╭────────────── Project Status ──────────────╮
│ Name              my-project               │
│ Path              /path/to/project         │
│ Embedding Model   all-MiniLM-L6-v2         │
│                                            │
│ Total Files       49                       │
│ Indexed Files     48                       │
│ Total Chunks      9222                     │
│                                            │
│ Git Branch        main                     │
│ Last Updated      2025-10-13 12:15:42      │
│ Created           2025-10-10 09:30:15      │
╰────────────────────────────────────────────╯
```

---

## Configuration

### Config File Location

Configuration is stored at `<project>/.vectorizer/config.json`

### Full Configuration Reference

```json
{
  "chromadb_path": null,
  "embedding_model": "all-MiniLM-L6-v2",
  "embedding_provider": "sentence-transformers",
  "openai_api_key": null,
  "chunk_size": 128,
  "chunk_overlap": 32,
  "max_file_size_mb": 10,
  "included_extensions": [
    ".py",
    ".js",
    ".ts",
    ".jsx",
    ".tsx",
    ".go",
    ".rs",
    ".java",
    ".cpp",
    ".c",
    ".h",
    ".hpp",
    ".cs",
    ".php",
    ".rb",
    ".swift",
    ".kt",
    ".scala",
    ".clj",
    ".sh",
    ".bash",
    ".zsh",
    ".fish",
    ".ps1",
    ".bat",
    ".cmd",
    ".md",
    ".txt",
    ".rst",
    ".json",
    ".yaml",
    ".yml",
    ".toml",
    ".xml",
    ".html",
    ".css",
    ".scss",
    ".sql",
    ".graphql",
    ".proto"
  ],
  "excluded_patterns": [
    "node_modules/**",
    ".git/**",
    "__pycache__/**",
    "*.pyc",
    ".pytest_cache/**",
    "venv/**",
    "env/**",
    ".env/**",
    "build/**",
    "dist/**",
    "*.egg-info/**",
    ".DS_Store",
    "*.min.js",
    "*.min.css"
  ],
  "mcp_host": "localhost",
  "mcp_port": 8000,
  "log_level": "INFO",
  "log_file": null,
  "max_workers": 4,
  "batch_size": 100,
  "embedding_batch_size": 100,
  "parallel_file_processing": true,
  "memory_monitoring_enabled": true,
  "memory_efficient_search_threshold": 10000,
  "gc_interval": 100
}
```

### Key Settings Explained

**Embedding Settings:**

- `embedding_model`: Model for embeddings (all-MiniLM-L6-v2, text-embedding-ada-002, etc.)
- `embedding_provider`: "sentence-transformers" (local) or "openai" (API)
- `chunk_size`: Tokens per chunk (128 for precision, 512 for context)
- `chunk_overlap`: Overlap between chunks (16-32 recommended)

**Performance Settings:**

- `max_workers`: Parallel workers (auto-detected with --optimize)
- `batch_size`: Files per batch (auto-calculated with --optimize)
- `embedding_batch_size`: Embeddings per batch
- `parallel_file_processing`: Enable parallel processing (recommended: true)

**Memory Settings:**

- `memory_monitoring_enabled`: Monitor RAM usage (recommended: true)
- `memory_efficient_search_threshold`: Switch to streaming for large results
- `gc_interval`: Garbage collection frequency (files between GC)

**File Filtering:**

- `included_extensions`: File types to index
- `excluded_patterns`: Glob patterns to ignore
- `max_file_size_mb`: Skip files larger than this

**Server Settings:**

- `mcp_host`: MCP server host
- `mcp_port`: MCP server port
- `log_level`: INFO, DEBUG, WARNING, ERROR
- `chromadb_path`: Custom ChromaDB location (optional)

### Environment Variables

Create `.env` file or export:

```bash
# OpenAI API Key (required for OpenAI embeddings)
export OPENAI_API_KEY="sk-..."

# Override config values
export EMBEDDING_PROVIDER="sentence-transformers"
export EMBEDDING_MODEL="all-MiniLM-L6-v2"
export CHUNK_SIZE="256"
export DEFAULT_SEARCH_THRESHOLD="0.3"

# Database
export CHROMADB_PATH="/custom/path/to/chromadb"

# Logging
export LOG_LEVEL="INFO"
export LOG_FILE="/var/log/vectorizer.log"
```

For complete list, see [docs/ENVIRONMENT.md](docs/ENVIRONMENT.md)

### Editing Configuration

```bash
# View current config
cat /path/to/project/.vectorizer/config.json

# Edit manually
nano /path/to/project/.vectorizer/config.json

# Or regenerate with optimization
pv init /path/to/project --optimize
```

---

## Search Features

### Single-Word Search

Optimized for high-precision single-keyword searches.

```bash
# Programming keywords
pv search /path/to/project "async" --threshold 0.9
pv search /path/to/project "test" --threshold 0.8
pv search /path/to/project "class" --threshold 0.9
pv search /path/to/project "import" --threshold 0.85

# Works great for finding specific constructs
pv search /path/to/project "def" --threshold 0.9  # Python functions
pv search /path/to/project "function" --threshold 0.9  # JS functions
pv search /path/to/project "catch" --threshold 0.8  # Error handling
```

**Features:**

- **Exact Word Matching**: Prioritizes exact word boundaries
- **Keyword Detection**: Special handling for programming keywords
- **Relevance Boosting**: Huge boost for exact matches
- **High Thresholds**: Reliable results even at 0.8-0.9+

### Multi-Word Search

Semantic search for phrases and concepts.

```bash
# Natural language
pv search /path/to/project "user authentication logic" --threshold 0.5

# Code patterns
pv search /path/to/project "error handling in database" --threshold 0.4

# Features
pv search /path/to/project "rate limiting middleware" --threshold 0.6
```

### Search Result Ranking

Results ranked by:

1. **Exact word matches** (highest priority)
2. **Content type** (micro/word chunks get boost)
3. **Partial matches** within larger words
4. **Semantic similarity** from embeddings

### Recommended Thresholds by Query Type

| Query Type     | Threshold | Example                           |
| -------------- | --------- | --------------------------------- |
| Single keyword | 0.7-0.95  | "async", "test", "class"          |
| Two words      | 0.5-0.8   | "error handling", "api routes"    |
| Short phrase   | 0.4-0.7   | "user login validation"           |
| Complex query  | 0.3-0.5   | "authentication with jwt tokens"  |
| Exploratory    | 0.1-0.3   | "machine learning model training" |

---

## MCP Server

### Starting the Server

```bash
# Default (localhost:8000)
pv serve /path/to/project

# Custom settings
pv serve /path/to/project --host 0.0.0.0 --port 8080
```

### Available MCP Tools

When running, AI agents can use these tools:

1. **search_code** - Search vectorized codebase

   ```json
   {
     "query": "authentication logic",
     "limit": 10,
     "threshold": 0.5
   }
   ```

2. **get_file_content** - Retrieve full file

   ```json
   {
     "file_path": "src/auth/login.py"
   }
   ```

3. **list_files** - List all files

   ```json
   {
     "file_type": "py" // optional filter
   }
   ```

4. **get_project_stats** - Get statistics
   ```json
   {}
   ```

### HTTP Fallback API

If MCP unavailable, HTTP endpoints provided:

```bash
# Search
curl "http://localhost:8000/search?q=authentication&limit=5&threshold=0.5"

# Get file
curl "http://localhost:8000/file/src/auth/login.py"

# List files
curl "http://localhost:8000/files?type=py"

# Statistics
curl "http://localhost:8000/stats"

# Health check
curl "http://localhost:8000/health"
```

### Use Cases

1. **AI Code Review**: Let Claude analyze your codebase semantically
2. **Intelligent Navigation**: Ask AI to find relevant code
3. **Documentation**: Generate docs from actual code
4. **Onboarding**: Help new devs understand codebase
5. **Refactoring**: Find similar patterns across project

---

## Advanced Usage

### Python API

#### Basic Usage

```python
import asyncio
from pathlib import Path
from project_vectorizer.core.config import Config
from project_vectorizer.core.project import ProjectManager

async def main():
    # Initialize project
    config = Config.create_optimized(
        embedding_model="all-MiniLM-L6-v2",
        chunk_size=256
    )

    project_path = Path("/path/to/project")
    manager = ProjectManager(project_path, config)

    # Initialize
    await manager.initialize("My Project")

    # Index
    await manager.load()
    await manager.index_all()

    # Search
    results = await manager.search("authentication", limit=10, threshold=0.5)
    for result in results:
        print(f"{result['file_path']}: {result['similarity']:.3f}")

asyncio.run(main())
```

#### Progress Tracking

```python
from rich.progress import Progress, BarColumn, TaskProgressColumn

async def index_with_progress(project_path):
    config = Config.load_from_project(project_path)
    manager = ProjectManager(project_path, config)
    await manager.load()

    with Progress() as progress:
        task = progress.add_task("Indexing...", total=100)

        def update_progress(current, total, description):
            progress.update(task, completed=current, total=total, description=description)

        manager.set_progress_callback(update_progress)
        await manager.index_all()
```

#### Custom Resource Limits

```python
import psutil

async def adaptive_index(project_path):
    """Index with resources based on current load."""
    cpu_percent = psutil.cpu_percent(interval=1)

    if cpu_percent < 50:  # System idle
        config = Config.create_optimized()
    else:  # System busy
        config = Config(max_workers=4, batch_size=100)

    manager = ProjectManager(project_path, config)
    await manager.load()
    await manager.index_all()
```

### Chunk Size Optimization

The engine enforces a maximum of 128 tokens per chunk (see engine.py:35) for precision, but you can configure larger sizes for more context:

```bash
# Precision (default, forced max 128)
pv init /path/to/project --chunk-size 128

# More context (still capped at 128 by engine)
pv init /path/to/project --chunk-size 512
```

**Performance Note**: Chunk size has virtually NO impact on indexing speed (~2m 16s for both 128 and 512 tokens). Choose based on search quality needs:

- **128**: Better precision, exact matches
- **512**: More context, better understanding

### CI/CD Integration

```yaml
# .github/workflows/vectorize.yml
name: Vectorize Codebase

on:
  push:
    branches: [main]

jobs:
  vectorize:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.9"

      - name: Install vectorizer
        run: pip install project-vectorizer

      - name: Initialize and index
        run: |
          pv init . --optimize --name "${{ github.repository }}"
          pv index . --max-resources

      - name: Test search
        run: pv search . "test" --limit 5
```

### Custom File Filters

```json
{
  "included_extensions": [".py", ".js", ".custom"],
  "excluded_patterns": ["tests/**", "*.generated.js", "vendor/**", "*.min.*"]
}
```

### Watch Mode During Development

```bash
# Terminal 1: Watch mode
pv sync /path/to/project --watch --debounce 1.0

# Terminal 2: Make code changes
# Auto-indexes when you save

# Terminal 3: Search as you code
pv search /path/to/project "your new function" --threshold 0.5
```

---

## Troubleshooting

### Common Issues

#### 1. Slow Indexing

**Problem**: Indexing taking too long

**Solutions:**

```bash
# Use max resources
pv index /path/to/project --max-resources

# Use smart incremental for updates
pv index /path/to/project --smart

# Use git-aware for recent changes
pv index-git /path/to/project --since HEAD~1

# Check if optimization is working
pv index /path/to/project --max-resources --verbose
# Look for: "Workers: 16, Batch Size: 400"
```

#### 2. High Memory Usage

**Problem**: Process using too much RAM or getting killed

**Solutions:**

```bash
# Reduce batch size in config
{
  "batch_size": 50,
  "max_workers": 4
}

# Enable memory monitoring
{
  "memory_monitoring_enabled": true,
  "gc_interval": 50
}

# Use smaller chunks
pv init /path/to/project --chunk-size 128
```

#### 3. Poor Search Results

**Problem**: Search not finding relevant code

**Solutions:**

```bash
# Lower threshold for phrases
pv search /path/to/project "your query" --threshold 0.3

# Higher threshold for keywords
pv search /path/to/project "async" --threshold 0.9

# Use smaller chunk size for precision
# Edit config: "chunk_size": 128

# Ensure index is up to date
pv index /path/to/project --smart
```

#### 4. No Results for Single Words

**Problem**: Single-word searches return nothing

**Solutions:**

```bash
# Try lower threshold
pv search /path/to/project "yourword" --threshold 0.5

# Check if word exists
pv search /path/to/project "yourword" --threshold 0.1 --limit 1

# Reindex with smaller chunks
# Edit config: "chunk_size": 128
pv index /path/to/project --force
```

#### 5. Missing Recent Changes

**Problem**: Just-edited code not showing in search

**Solutions:**

```bash
# Run smart incremental
pv index /path/to/project --smart

# Or git-aware
pv index-git /path/to/project --since HEAD~1

# Check status
pv status /path/to/project
```

#### 6. psutil Not Found

**Problem**: Optimization not working

**Solution:**

```bash
# Install psutil
pip install psutil

# Verify
python -c "import psutil; print(f'CPUs: {psutil.cpu_count()}, RAM: {psutil.virtual_memory().available / 1024**3:.1f}GB')"

# Try again
pv init /path/to/project --optimize
```

### Debug Mode

```bash
# Enable verbose logging
pv --verbose index /path/to/project

# Check project status
pv status /path/to/project

# View config
cat /path/to/project/.vectorizer/config.json

# Check ChromaDB
ls -lh /path/to/project/.vectorizer/chromadb/
```

### Performance Debugging

```bash
# Time operations
time pv index /path/to/project
time pv index /path/to/project --max-resources

# Monitor resources during indexing
# Terminal 1:
pv index /path/to/project --max-resources

# Terminal 2:
htop  # or top
# Should see high CPU across all cores

# Check memory warnings
pv index /path/to/project --max-resources --verbose
# Look for memory warnings
```

---

## Changelog

### [0.1.2] - 2025-10-13

#### Added

- **Optimized Config Generation** - `Config.create_optimized()` auto-detects CPU/RAM
- **Max Resources Flag** - `--max-resources` for temporary performance boost
- **psutil Integration** - Automatic system resource detection
- **Unified Progress Tracking** - Clean single-line progress bar
- **Library Progress Suppression** - No more cluttered batch progress bars
- **Timing Information** - All operations show elapsed time
- **Clean Terminal Output** - Professional UI with timing

#### Performance

- **2x faster** full indexing with --max-resources
- **60-70% faster** smart incremental updates
- **80-90% faster** git-aware indexing

#### Documentation

- Comprehensive documentation overhaul
- Consolidated all guides into main README
- Added CHANGELOG.md with version history

### [0.1.1] - 2025-10-12

- Enhanced single-word search with high precision
- Multi-level chunking (micro + word-level)
- Adaptive search thresholds
- Programming keyword detection
- Improved word matching and relevance boosting

### [0.1.0] - 2025-10-10

- Initial release
- Code vectorization
- Smart incremental indexing
- Git-aware indexing
- MCP server
- Watch mode
- ChromaDB backend
- 30+ language support

---

## Contributing

### Development Setup

```bash
# Clone repository
git clone https://github.com/starkbaknet/project-vectorizer.git
cd project-vectorizer

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black .
isort .
```

### Running Tests

```bash
# All tests
pytest

# With coverage
pytest --cov=project_vectorizer

# Specific test
pytest tests/test_config.py

# Verbose
pytest -v
```

See [docs/TESTING.md](docs/TESTING.md) for details.

### Publishing

See [docs/PUBLISHING.md](docs/PUBLISHING.md) for PyPI publishing guide.

### Contributing Guidelines

1. Fork repository
2. Create feature branch: `git checkout -b feature/amazing-feature`
3. Make changes and add tests
4. Ensure tests pass: `pytest`
5. Format code: `black . && isort .`
6. Commit: `git commit -m 'Add amazing feature'`
7. Push: `git push origin feature/amazing-feature`
8. Open Pull Request

---

## License

MIT License - see [LICENSE](LICENSE) file

---

## Additional Resources

- **GitHub**: https://github.com/starkbaknet/project-vectorizer
- **PyPI**: https://pypi.org/project/project-vectorizer/
- **Issues**: https://github.com/starkbaknet/project-vectorizer/issues

---

**Made with ❤️ by StarkBakNet**

_Vectorize your codebase. Empower your AI agents. Build better software._

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "project-vectorizer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "vectorization, code-search, embeddings, semantic-search, mcp, chromadb, codebase-analysis, ai-tools",
    "author": null,
    "author_email": "StarkBak <bak.stark06@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/c0/d8/f78f6e363e213cc6149e35a8870374007baa9eaf3daded91b7699bb2343b/project_vectorizer-0.1.2.tar.gz",
    "platform": null,
    "description": "# Project Vectorizer\n\nA powerful CLI tool that vectorizes codebases, stores them in a vector database, tracks changes, and serves them via MCP (Model Context Protocol) for AI agents like Claude, Codex, and others.\n\n**Latest Version**: 0.1.2 | [Changelog](#changelog) | [GitHub](https://github.com/starkbaknet/project-vectorizer)\n\n---\n\n## \ud83d\udccb Table of Contents\n\n- [Features](#features)\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [Performance Optimization](#performance-optimization)\n- [CLI Commands](#cli-commands)\n- [Configuration](#configuration)\n- [Search Features](#search-features)\n- [MCP Server](#mcp-server)\n- [Advanced Usage](#advanced-usage)\n- [Troubleshooting](#troubleshooting)\n- [Changelog](#changelog)\n- [Contributing](#contributing)\n\n---\n\n## Features\n\n### \ud83d\ude80 Performance & Optimization\n\n- **Auto-Optimized Config** - Auto-detect CPU cores and RAM for optimal settings (`--optimize`)\n- **Max Resources Mode** - Use maximum system resources for fastest indexing (`--max-resources`)\n- **Smart Incremental** - 60-70% faster indexing with intelligent change categorization\n- **Git-Aware Indexing** - 80-90% faster by indexing only git-changed files\n- **Parallel Processing** - Multi-threaded with auto-detected optimal worker count (up to 16 workers)\n- **Memory Monitoring** - Real-time memory tracking with automatic garbage collection\n- **Batch Optimization** - Memory-based batch size calculation for safe processing\n\n### \ud83d\udd0d Search & Indexing\n\n- **Code Vectorization** - Parse and vectorize with sentence-transformers or OpenAI embeddings\n- **Multi-Level Chunking** - Functions, classes, micro-chunks, and word-level chunks for precision\n- **Enhanced Single-Word Search** - High-precision search for single keywords (0.8+ thresholds)\n- **Semantic + Exact Search** - Combines semantic similarity with exact word matching\n- **Adaptive Thresholds** - Automatically adjusts for optimal results\n- **Multiple Languages** - 30+ languages (Python, JS, TS, Go, Rust, Java, C++, C, PHP, Ruby, Swift, Kotlin, and more)\n\n### \ud83d\udd04 Change Management\n\n- **Git Integration** - Track changes via git commits with `index-git` command\n- **Smart File Categorization** - Detects New, Modified, and Deleted files\n- **Watch Mode** - Real-time monitoring with configurable debouncing (0.5-10s)\n- **Incremental Updates** - Only re-index changed content\n- **Hash-Based Detection** - SHA256 file hashing for accurate change detection\n\n### \ud83c\udf10 AI Integration\n\n- **MCP Server** - Model Context Protocol for AI agents (Claude, Codex, etc.)\n- **HTTP Fallback API** - RESTful endpoints when MCP unavailable\n- **Semantic Search** - Natural language queries for code discovery\n- **File Operations** - Get content, list files, project statistics\n\n### \ud83c\udfa8 User Experience\n\n- **Clean Progress Output** - Single unified progress bar with timing information\n- **Suppressed Library Logs** - No cluttered batch progress bars from dependencies\n- **Timing Information** - Elapsed time for all operations (seconds or minutes+seconds)\n- **Verbose Mode** - Optional detailed logging for debugging\n- **Professional UI** - Rich terminal output with colors, panels, and formatting\n- **Real-time Updates** - Live file names and status tags during indexing\n\n### \ud83d\udcbe Database & Storage\n\n- **ChromaDB Backend** - High-performance vector database\n- **Fast HNSW Indexing** - Optimized similarity search algorithm\n- **Scalable** - Handles 500K+ chunks efficiently\n- **Single Database** - No external dependencies required\n- **Custom Paths** - Configurable database location\n\n---\n\n## Installation\n\n### From PyPI (Recommended)\n\n```bash\n# Install from PyPI\npip install project-vectorizer\n\n# Verify installation\npv --version\n```\n\n### From Source\n\n```bash\n# Clone repository\ngit clone https://github.com/starkbaknet/project-vectorizer.git\ncd project-vectorizer\n\n# Install\npip install -e .\n\n# Or with development dependencies\npip install -e \".[dev]\"\n```\n\n---\n\n## Quick Start\n\n### 1. Initialize Your Project\n\n```bash\n# \ud83d\ude80 Recommended: Auto-optimize based on your system (16 workers, 400 batch on 8-core/16GB RAM)\npv init /path/to/project --optimize\n\n# Or with custom settings\npv init /path/to/project \\\n  --name \"My Project\" \\\n  --embedding-model \"all-MiniLM-L6-v2\" \\\n  --chunk-size 256 \\\n  --optimize\n```\n\n**Output:**\n\n```\n\u2713 Project initialized successfully!\n\nName: My Project\nPath: /path/to/project\nModel: all-MiniLM-L6-v2\nProvider: sentence-transformers\nChunk Size: 256 tokens\n\nOptimized Settings:\n  \u2022 Workers: 16\n  \u2022 Batch Size: 400\n  \u2022 Embedding Batch: 200\n  \u2022 Memory Monitoring: Enabled\n  \u2022 GC Interval: 100 files\n```\n\n### 2. Index Your Codebase\n\n```bash\n# \ud83d\ude80 Recommended: First-time indexing with max resources (2-4x faster)\npv index /path/to/project --max-resources\n\n# \ud83d\ude80 Recommended: Smart incremental for updates (60-70% faster)\npv index /path/to/project --smart\n\n# \ud83d\ude80 Recommended: Git-aware for recent changes (80-90% faster)\npv index-git /path/to/project --since HEAD~5\n\n# Standard full indexing\npv index /path/to/project\n\n# Force re-index everything\npv index /path/to/project --force\n\n# Combine for maximum performance\npv index /path/to/project --smart --max-resources\n```\n\n**Output:**\n\n```\nUsing maximum system resources (optimized settings)...\n  \u2022 Workers: 16\n  \u2022 Batch Size: 400\n  \u2022 Embedding Batch: 200\n\n  Indexing examples/demo.py \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501 100%\n\n\u256d\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 Indexing Complete \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 \u2713 Indexing complete!                                  \u2502\n\u2502                                                       \u2502\n\u2502 Files indexed: 48/49                                  \u2502\n\u2502 Total chunks: 9222                                    \u2502\n\u2502 Model: all-MiniLM-L6-v2                               \u2502\n\u2502 Time taken: 2m 16s                                    \u2502\n\u2502                                                       \u2502\n\u2502 You can now search with: pv search . \"your query\"     \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\n### 3. Search Your Code\n\n```bash\n# Natural language search\npv search /path/to/project \"authentication logic\"\n\n# Single-word searches work great (high precision)\npv search /path/to/project \"async\" --threshold 0.8\npv search /path/to/project \"test\" --threshold 0.9\n\n# Multi-word queries (semantic search)\npv search /path/to/project \"user login validation\" --threshold 0.5\n\n# Find specific constructs\npv search /path/to/project \"class\" --limit 10\n```\n\n**Output:**\n\n```\nSearch Results for: authentication logic\n\nFound 5 result(s) with threshold >= 0.5\n\n\u256d\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 Result 1 \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 src/auth/login.py                                      \u2502\n\u2502 Lines 45-67 | Similarity: 0.892                        \u2502\n\u2502                                                        \u2502\n\u2502 def authenticate_user(username: str, password: str):   \u2502\n\u2502     \"\"\"                                                \u2502\n\u2502     Authenticate user credentials against database     \u2502\n\u2502     Returns user object if valid, None otherwise       \u2502\n\u2502     \"\"\"                                                \u2502\n\u2502     ...                                                \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\n### 4. Start MCP Server\n\n```bash\n# Start server (default: localhost:8000)\npv serve /path/to/project\n\n# Custom host/port\npv serve /path/to/project --host 0.0.0.0 --port 8080\n```\n\n### 5. Monitor Changes in Real-Time\n\n```bash\n# Watch for file changes (default 2s debounce)\npv sync /path/to/project --watch\n\n# Fast feedback (0.5s)\npv sync /path/to/project --watch --debounce 0.5\n\n# Slower systems (5s)\npv sync /path/to/project --watch --debounce 5.0\n```\n\n---\n\n## Performance Optimization\n\n### Understanding the Optimization Flags\n\n#### `--optimize` (Permanent)\n\nUse when **initializing** a new project. Detects your system and saves optimal settings.\n\n```bash\npv init /path/to/project --optimize\n```\n\n**What it does:**\n\n- Detects CPU cores \u2192 sets `max_workers` (e.g., 8 cores = 16 workers)\n- Calculates RAM \u2192 sets safe `batch_size` (e.g., 16GB = 400 batch)\n- Sets memory thresholds based on total RAM\n- **Saves to config** - All future operations use these settings\n\n**When to use:**\n\n- \u2705 New projects\n- \u2705 Want permanent optimization\n- \u2705 Same machine for all operations\n- \u2705 \"Set and forget\" approach\n\n#### `--max-resources` (Temporary)\n\nUse when **indexing** to temporarily boost performance without changing config.\n\n```bash\npv index /path/to/project --max-resources\npv index-git /path/to/project --since HEAD~1 --max-resources\n```\n\n**What it does:**\n\n- Detects system resources (same as --optimize)\n- **Temporarily overrides** config for this operation only\n- Original config unchanged\n\n**When to use:**\n\n- \u2705 Existing project without optimization\n- \u2705 One-time heavy indexing\n- \u2705 CI/CD with dedicated resources\n- \u2705 Don't want to modify config\n\n### Performance Benchmarks\n\n**System**: 8-core CPU, 16GB RAM, SSD\n\n| Mode               | Files     | Chunks | Time   | Settings              |\n| ------------------ | --------- | ------ | ------ | --------------------- |\n| Standard           | 48        | 9222   | 4m 32s | 4 workers, 100 batch  |\n| --max-resources    | 48        | 9222   | 2m 16s | 16 workers, 400 batch |\n| Smart incremental  | 5 changed | 412    | 24s    | 16 workers, 400 batch |\n| Git-aware (HEAD~1) | 3 changed | 287    | 15s    | 16 workers, 400 batch |\n\n**Key Findings:**\n\n- `--max-resources`: **2x faster** for full indexing\n- Smart incremental: **60-70% faster** than full reindex\n- Git-aware: **80-90% faster** for recent changes\n- Chunk size (128 vs 512): **No performance difference** (same ~2m 16s)\n\n### System Resource Detection\n\n**CPU Detection:**\n\n```\nDetected: 8 cores\nOptimal workers: min(8 * 2, 16) = 16 workers\n```\n\n**Memory Detection:**\n\n```\nTotal RAM: 16GB\nAvailable RAM: 8GB\nSafe batch size: 8GB * 0.5 * 100 = 400\nEmbedding batch: 400 * 0.5 = 200\nGC interval: 100 files\n```\n\n**Memory Thresholds:**\n\n```\n32GB+ RAM \u2192 threshold: 50000\n16-32GB   \u2192 threshold: 20000\n8-16GB    \u2192 threshold: 10000\n<8GB      \u2192 threshold: 5000\n```\n\n### Best Practices\n\n1. **Initialize with optimization**\n\n   ```bash\n   pv init ~/my-project --optimize\n   ```\n\n2. **Use max resources for heavy operations**\n\n   ```bash\n   pv index ~/my-project --force --max-resources\n   ```\n\n3. **Use smart mode for daily updates**\n\n   ```bash\n   pv index ~/my-project --smart\n   ```\n\n4. **Use git-aware after pulling changes**\n\n   ```bash\n   pv index-git ~/my-project --since HEAD~1\n   ```\n\n5. **Monitor memory with verbose mode**\n   ```bash\n   pv index ~/my-project --max-resources --verbose\n   ```\n\n---\n\n## CLI Commands\n\n### Global Options\n\n```bash\npv [OPTIONS] COMMAND [ARGS]\n\nOptions:\n  -v, --verbose    Enable verbose output\n  --version        Show version\n  --help           Show help\n```\n\n### `pv init` - Initialize Project\n\nInitialize a new project for vectorization.\n\n```bash\npv init [OPTIONS] PROJECT_PATH\n\nOptions:\n  -n, --name TEXT              Project name (default: directory name)\n  -m, --embedding-model TEXT   Model name (default: all-MiniLM-L6-v2)\n  -p, --embedding-provider     Provider: sentence-transformers | openai\n  -c, --chunk-size INT         Chunk size in tokens (default: 256)\n  -o, --chunk-overlap INT      Overlap in tokens (default: 32)\n  --optimize                   Auto-optimize based on system resources \u2b50\n```\n\n**Examples:**\n\n```bash\n# Basic initialization\npv init /path/to/project\n\n# With optimization (recommended)\npv init /path/to/project --optimize\n\n# With OpenAI embeddings\nexport OPENAI_API_KEY=\"sk-...\"\npv init /path/to/project \\\n  --embedding-provider openai \\\n  --embedding-model text-embedding-ada-002 \\\n  --optimize\n```\n\n### `pv index` - Index Codebase\n\nIndex the codebase for searching.\n\n```bash\npv index [OPTIONS] PROJECT_PATH\n\nOptions:\n  -i, --incremental      Only index changed files\n  -s, --smart            Smart incremental (categorized: new/modified/deleted) \u2b50\n  -f, --force            Force re-index all files\n  --max-resources        Use maximum system resources \u2b50\n```\n\n**Examples:**\n\n```bash\n# Full indexing with max resources\npv index /path/to/project --max-resources\n\n# Smart incremental (fastest for updates)\npv index /path/to/project --smart\n\n# Combine for maximum performance\npv index /path/to/project --smart --max-resources\n\n# Force complete reindex\npv index /path/to/project --force\n```\n\n### `pv index-git` - Git-Aware Indexing\n\nIndex only files changed in git commits.\n\n```bash\npv index-git [OPTIONS] PROJECT_PATH\n\nOptions:\n  -s, --since TEXT       Git reference (default: HEAD~1)\n  --max-resources        Use maximum system resources \u2b50\n```\n\n**Examples:**\n\n```bash\n# Last commit\npv index-git /path/to/project --since HEAD~1\n\n# Last 5 commits\npv index-git /path/to/project --since HEAD~5\n\n# Since main branch\npv index-git /path/to/project --since main\n\n# Since specific commit\npv index-git /path/to/project --since abc123def\n\n# With max resources\npv index-git /path/to/project --since HEAD~10 --max-resources\n```\n\n**Use Cases:**\n\n- After `git pull` - index only new changes\n- Before code review - index PR changes\n- CI/CD pipelines - index commit range\n- After branch switch - index differences\n\n### `pv search` - Search Code\n\nSearch through vectorized codebase.\n\n```bash\npv search [OPTIONS] PROJECT_PATH QUERY\n\nOptions:\n  -l, --limit INT        Number of results (default: 10)\n  -t, --threshold FLOAT  Similarity threshold 0.0-1.0 (default: 0.3)\n```\n\n**Examples:**\n\n```bash\n# Natural language search\npv search /path/to/project \"error handling in database connections\"\n\n# Single-word search (high threshold)\npv search /path/to/project \"async\" --threshold 0.9\n\n# Find all tests\npv search /path/to/project \"test\" --limit 20 --threshold 0.8\n\n# Broad semantic search (low threshold)\npv search /path/to/project \"api authentication\" --threshold 0.3\n```\n\n**Threshold Guide:**\n\n- **0.8-0.95**: Single words, exact matches\n- **0.5-0.7**: Multi-word phrases, semantic\n- **0.3-0.5**: Complex queries, broad search\n- **0.1-0.3**: Very broad, exploratory\n\n### `pv sync` - Sync Changes / Watch Mode\n\nSync changes or watch for file modifications.\n\n```bash\npv sync [OPTIONS] PROJECT_PATH\n\nOptions:\n  -w, --watch           Watch for file changes\n  -d, --debounce FLOAT  Debounce delay in seconds (default: 2.0)\n```\n\n**Examples:**\n\n```bash\n# One-time sync (smart incremental)\npv sync /path/to/project\n\n# Watch mode with default debounce (2s)\npv sync /path/to/project --watch\n\n# Fast feedback (0.5s)\npv sync /path/to/project --watch --debounce 0.5\n\n# Slower systems (5s)\npv sync /path/to/project --watch --debounce 5.0\n```\n\n**Debounce Explained:**\n\n- Waits X seconds after last file change before indexing\n- Batches multiple rapid changes together\n- Prevents redundant indexing when saving files repeatedly\n- Reduces CPU usage during active development\n\n**Recommended Values:**\n\n- **0.5-1.0s**: Fast machines, need instant feedback\n- **2.0s**: Balanced (default)\n- **5.0-10.0s**: Slower machines, large codebases\n\n### `pv serve` - Start MCP Server\n\nStart MCP server for AI agent integration.\n\n```bash\npv serve [OPTIONS] PROJECT_PATH\n\nOptions:\n  -p, --port INT   Port number (default: 8000)\n  -h, --host TEXT  Host address (default: localhost)\n```\n\n**Examples:**\n\n```bash\n# Start server\npv serve /path/to/project\n\n# Custom port\npv serve /path/to/project --port 8080\n\n# Expose to network\npv serve /path/to/project --host 0.0.0.0 --port 8000\n```\n\n### `pv status` - Show Project Status\n\nShow project status and statistics.\n\n```bash\npv status PROJECT_PATH\n```\n\n**Output:**\n\n```\n\u256d\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 Project Status \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 Name              my-project               \u2502\n\u2502 Path              /path/to/project         \u2502\n\u2502 Embedding Model   all-MiniLM-L6-v2         \u2502\n\u2502                                            \u2502\n\u2502 Total Files       49                       \u2502\n\u2502 Indexed Files     48                       \u2502\n\u2502 Total Chunks      9222                     \u2502\n\u2502                                            \u2502\n\u2502 Git Branch        main                     \u2502\n\u2502 Last Updated      2025-10-13 12:15:42      \u2502\n\u2502 Created           2025-10-10 09:30:15      \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\n---\n\n## Configuration\n\n### Config File Location\n\nConfiguration is stored at `<project>/.vectorizer/config.json`\n\n### Full Configuration Reference\n\n```json\n{\n  \"chromadb_path\": null,\n  \"embedding_model\": \"all-MiniLM-L6-v2\",\n  \"embedding_provider\": \"sentence-transformers\",\n  \"openai_api_key\": null,\n  \"chunk_size\": 128,\n  \"chunk_overlap\": 32,\n  \"max_file_size_mb\": 10,\n  \"included_extensions\": [\n    \".py\",\n    \".js\",\n    \".ts\",\n    \".jsx\",\n    \".tsx\",\n    \".go\",\n    \".rs\",\n    \".java\",\n    \".cpp\",\n    \".c\",\n    \".h\",\n    \".hpp\",\n    \".cs\",\n    \".php\",\n    \".rb\",\n    \".swift\",\n    \".kt\",\n    \".scala\",\n    \".clj\",\n    \".sh\",\n    \".bash\",\n    \".zsh\",\n    \".fish\",\n    \".ps1\",\n    \".bat\",\n    \".cmd\",\n    \".md\",\n    \".txt\",\n    \".rst\",\n    \".json\",\n    \".yaml\",\n    \".yml\",\n    \".toml\",\n    \".xml\",\n    \".html\",\n    \".css\",\n    \".scss\",\n    \".sql\",\n    \".graphql\",\n    \".proto\"\n  ],\n  \"excluded_patterns\": [\n    \"node_modules/**\",\n    \".git/**\",\n    \"__pycache__/**\",\n    \"*.pyc\",\n    \".pytest_cache/**\",\n    \"venv/**\",\n    \"env/**\",\n    \".env/**\",\n    \"build/**\",\n    \"dist/**\",\n    \"*.egg-info/**\",\n    \".DS_Store\",\n    \"*.min.js\",\n    \"*.min.css\"\n  ],\n  \"mcp_host\": \"localhost\",\n  \"mcp_port\": 8000,\n  \"log_level\": \"INFO\",\n  \"log_file\": null,\n  \"max_workers\": 4,\n  \"batch_size\": 100,\n  \"embedding_batch_size\": 100,\n  \"parallel_file_processing\": true,\n  \"memory_monitoring_enabled\": true,\n  \"memory_efficient_search_threshold\": 10000,\n  \"gc_interval\": 100\n}\n```\n\n### Key Settings Explained\n\n**Embedding Settings:**\n\n- `embedding_model`: Model for embeddings (all-MiniLM-L6-v2, text-embedding-ada-002, etc.)\n- `embedding_provider`: \"sentence-transformers\" (local) or \"openai\" (API)\n- `chunk_size`: Tokens per chunk (128 for precision, 512 for context)\n- `chunk_overlap`: Overlap between chunks (16-32 recommended)\n\n**Performance Settings:**\n\n- `max_workers`: Parallel workers (auto-detected with --optimize)\n- `batch_size`: Files per batch (auto-calculated with --optimize)\n- `embedding_batch_size`: Embeddings per batch\n- `parallel_file_processing`: Enable parallel processing (recommended: true)\n\n**Memory Settings:**\n\n- `memory_monitoring_enabled`: Monitor RAM usage (recommended: true)\n- `memory_efficient_search_threshold`: Switch to streaming for large results\n- `gc_interval`: Garbage collection frequency (files between GC)\n\n**File Filtering:**\n\n- `included_extensions`: File types to index\n- `excluded_patterns`: Glob patterns to ignore\n- `max_file_size_mb`: Skip files larger than this\n\n**Server Settings:**\n\n- `mcp_host`: MCP server host\n- `mcp_port`: MCP server port\n- `log_level`: INFO, DEBUG, WARNING, ERROR\n- `chromadb_path`: Custom ChromaDB location (optional)\n\n### Environment Variables\n\nCreate `.env` file or export:\n\n```bash\n# OpenAI API Key (required for OpenAI embeddings)\nexport OPENAI_API_KEY=\"sk-...\"\n\n# Override config values\nexport EMBEDDING_PROVIDER=\"sentence-transformers\"\nexport EMBEDDING_MODEL=\"all-MiniLM-L6-v2\"\nexport CHUNK_SIZE=\"256\"\nexport DEFAULT_SEARCH_THRESHOLD=\"0.3\"\n\n# Database\nexport CHROMADB_PATH=\"/custom/path/to/chromadb\"\n\n# Logging\nexport LOG_LEVEL=\"INFO\"\nexport LOG_FILE=\"/var/log/vectorizer.log\"\n```\n\nFor complete list, see [docs/ENVIRONMENT.md](docs/ENVIRONMENT.md)\n\n### Editing Configuration\n\n```bash\n# View current config\ncat /path/to/project/.vectorizer/config.json\n\n# Edit manually\nnano /path/to/project/.vectorizer/config.json\n\n# Or regenerate with optimization\npv init /path/to/project --optimize\n```\n\n---\n\n## Search Features\n\n### Single-Word Search\n\nOptimized for high-precision single-keyword searches.\n\n```bash\n# Programming keywords\npv search /path/to/project \"async\" --threshold 0.9\npv search /path/to/project \"test\" --threshold 0.8\npv search /path/to/project \"class\" --threshold 0.9\npv search /path/to/project \"import\" --threshold 0.85\n\n# Works great for finding specific constructs\npv search /path/to/project \"def\" --threshold 0.9  # Python functions\npv search /path/to/project \"function\" --threshold 0.9  # JS functions\npv search /path/to/project \"catch\" --threshold 0.8  # Error handling\n```\n\n**Features:**\n\n- **Exact Word Matching**: Prioritizes exact word boundaries\n- **Keyword Detection**: Special handling for programming keywords\n- **Relevance Boosting**: Huge boost for exact matches\n- **High Thresholds**: Reliable results even at 0.8-0.9+\n\n### Multi-Word Search\n\nSemantic search for phrases and concepts.\n\n```bash\n# Natural language\npv search /path/to/project \"user authentication logic\" --threshold 0.5\n\n# Code patterns\npv search /path/to/project \"error handling in database\" --threshold 0.4\n\n# Features\npv search /path/to/project \"rate limiting middleware\" --threshold 0.6\n```\n\n### Search Result Ranking\n\nResults ranked by:\n\n1. **Exact word matches** (highest priority)\n2. **Content type** (micro/word chunks get boost)\n3. **Partial matches** within larger words\n4. **Semantic similarity** from embeddings\n\n### Recommended Thresholds by Query Type\n\n| Query Type     | Threshold | Example                           |\n| -------------- | --------- | --------------------------------- |\n| Single keyword | 0.7-0.95  | \"async\", \"test\", \"class\"          |\n| Two words      | 0.5-0.8   | \"error handling\", \"api routes\"    |\n| Short phrase   | 0.4-0.7   | \"user login validation\"           |\n| Complex query  | 0.3-0.5   | \"authentication with jwt tokens\"  |\n| Exploratory    | 0.1-0.3   | \"machine learning model training\" |\n\n---\n\n## MCP Server\n\n### Starting the Server\n\n```bash\n# Default (localhost:8000)\npv serve /path/to/project\n\n# Custom settings\npv serve /path/to/project --host 0.0.0.0 --port 8080\n```\n\n### Available MCP Tools\n\nWhen running, AI agents can use these tools:\n\n1. **search_code** - Search vectorized codebase\n\n   ```json\n   {\n     \"query\": \"authentication logic\",\n     \"limit\": 10,\n     \"threshold\": 0.5\n   }\n   ```\n\n2. **get_file_content** - Retrieve full file\n\n   ```json\n   {\n     \"file_path\": \"src/auth/login.py\"\n   }\n   ```\n\n3. **list_files** - List all files\n\n   ```json\n   {\n     \"file_type\": \"py\" // optional filter\n   }\n   ```\n\n4. **get_project_stats** - Get statistics\n   ```json\n   {}\n   ```\n\n### HTTP Fallback API\n\nIf MCP unavailable, HTTP endpoints provided:\n\n```bash\n# Search\ncurl \"http://localhost:8000/search?q=authentication&limit=5&threshold=0.5\"\n\n# Get file\ncurl \"http://localhost:8000/file/src/auth/login.py\"\n\n# List files\ncurl \"http://localhost:8000/files?type=py\"\n\n# Statistics\ncurl \"http://localhost:8000/stats\"\n\n# Health check\ncurl \"http://localhost:8000/health\"\n```\n\n### Use Cases\n\n1. **AI Code Review**: Let Claude analyze your codebase semantically\n2. **Intelligent Navigation**: Ask AI to find relevant code\n3. **Documentation**: Generate docs from actual code\n4. **Onboarding**: Help new devs understand codebase\n5. **Refactoring**: Find similar patterns across project\n\n---\n\n## Advanced Usage\n\n### Python API\n\n#### Basic Usage\n\n```python\nimport asyncio\nfrom pathlib import Path\nfrom project_vectorizer.core.config import Config\nfrom project_vectorizer.core.project import ProjectManager\n\nasync def main():\n    # Initialize project\n    config = Config.create_optimized(\n        embedding_model=\"all-MiniLM-L6-v2\",\n        chunk_size=256\n    )\n\n    project_path = Path(\"/path/to/project\")\n    manager = ProjectManager(project_path, config)\n\n    # Initialize\n    await manager.initialize(\"My Project\")\n\n    # Index\n    await manager.load()\n    await manager.index_all()\n\n    # Search\n    results = await manager.search(\"authentication\", limit=10, threshold=0.5)\n    for result in results:\n        print(f\"{result['file_path']}: {result['similarity']:.3f}\")\n\nasyncio.run(main())\n```\n\n#### Progress Tracking\n\n```python\nfrom rich.progress import Progress, BarColumn, TaskProgressColumn\n\nasync def index_with_progress(project_path):\n    config = Config.load_from_project(project_path)\n    manager = ProjectManager(project_path, config)\n    await manager.load()\n\n    with Progress() as progress:\n        task = progress.add_task(\"Indexing...\", total=100)\n\n        def update_progress(current, total, description):\n            progress.update(task, completed=current, total=total, description=description)\n\n        manager.set_progress_callback(update_progress)\n        await manager.index_all()\n```\n\n#### Custom Resource Limits\n\n```python\nimport psutil\n\nasync def adaptive_index(project_path):\n    \"\"\"Index with resources based on current load.\"\"\"\n    cpu_percent = psutil.cpu_percent(interval=1)\n\n    if cpu_percent < 50:  # System idle\n        config = Config.create_optimized()\n    else:  # System busy\n        config = Config(max_workers=4, batch_size=100)\n\n    manager = ProjectManager(project_path, config)\n    await manager.load()\n    await manager.index_all()\n```\n\n### Chunk Size Optimization\n\nThe engine enforces a maximum of 128 tokens per chunk (see engine.py:35) for precision, but you can configure larger sizes for more context:\n\n```bash\n# Precision (default, forced max 128)\npv init /path/to/project --chunk-size 128\n\n# More context (still capped at 128 by engine)\npv init /path/to/project --chunk-size 512\n```\n\n**Performance Note**: Chunk size has virtually NO impact on indexing speed (~2m 16s for both 128 and 512 tokens). Choose based on search quality needs:\n\n- **128**: Better precision, exact matches\n- **512**: More context, better understanding\n\n### CI/CD Integration\n\n```yaml\n# .github/workflows/vectorize.yml\nname: Vectorize Codebase\n\non:\n  push:\n    branches: [main]\n\njobs:\n  vectorize:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v3\n\n      - name: Setup Python\n        uses: actions/setup-python@v4\n        with:\n          python-version: \"3.9\"\n\n      - name: Install vectorizer\n        run: pip install project-vectorizer\n\n      - name: Initialize and index\n        run: |\n          pv init . --optimize --name \"${{ github.repository }}\"\n          pv index . --max-resources\n\n      - name: Test search\n        run: pv search . \"test\" --limit 5\n```\n\n### Custom File Filters\n\n```json\n{\n  \"included_extensions\": [\".py\", \".js\", \".custom\"],\n  \"excluded_patterns\": [\"tests/**\", \"*.generated.js\", \"vendor/**\", \"*.min.*\"]\n}\n```\n\n### Watch Mode During Development\n\n```bash\n# Terminal 1: Watch mode\npv sync /path/to/project --watch --debounce 1.0\n\n# Terminal 2: Make code changes\n# Auto-indexes when you save\n\n# Terminal 3: Search as you code\npv search /path/to/project \"your new function\" --threshold 0.5\n```\n\n---\n\n## Troubleshooting\n\n### Common Issues\n\n#### 1. Slow Indexing\n\n**Problem**: Indexing taking too long\n\n**Solutions:**\n\n```bash\n# Use max resources\npv index /path/to/project --max-resources\n\n# Use smart incremental for updates\npv index /path/to/project --smart\n\n# Use git-aware for recent changes\npv index-git /path/to/project --since HEAD~1\n\n# Check if optimization is working\npv index /path/to/project --max-resources --verbose\n# Look for: \"Workers: 16, Batch Size: 400\"\n```\n\n#### 2. High Memory Usage\n\n**Problem**: Process using too much RAM or getting killed\n\n**Solutions:**\n\n```bash\n# Reduce batch size in config\n{\n  \"batch_size\": 50,\n  \"max_workers\": 4\n}\n\n# Enable memory monitoring\n{\n  \"memory_monitoring_enabled\": true,\n  \"gc_interval\": 50\n}\n\n# Use smaller chunks\npv init /path/to/project --chunk-size 128\n```\n\n#### 3. Poor Search Results\n\n**Problem**: Search not finding relevant code\n\n**Solutions:**\n\n```bash\n# Lower threshold for phrases\npv search /path/to/project \"your query\" --threshold 0.3\n\n# Higher threshold for keywords\npv search /path/to/project \"async\" --threshold 0.9\n\n# Use smaller chunk size for precision\n# Edit config: \"chunk_size\": 128\n\n# Ensure index is up to date\npv index /path/to/project --smart\n```\n\n#### 4. No Results for Single Words\n\n**Problem**: Single-word searches return nothing\n\n**Solutions:**\n\n```bash\n# Try lower threshold\npv search /path/to/project \"yourword\" --threshold 0.5\n\n# Check if word exists\npv search /path/to/project \"yourword\" --threshold 0.1 --limit 1\n\n# Reindex with smaller chunks\n# Edit config: \"chunk_size\": 128\npv index /path/to/project --force\n```\n\n#### 5. Missing Recent Changes\n\n**Problem**: Just-edited code not showing in search\n\n**Solutions:**\n\n```bash\n# Run smart incremental\npv index /path/to/project --smart\n\n# Or git-aware\npv index-git /path/to/project --since HEAD~1\n\n# Check status\npv status /path/to/project\n```\n\n#### 6. psutil Not Found\n\n**Problem**: Optimization not working\n\n**Solution:**\n\n```bash\n# Install psutil\npip install psutil\n\n# Verify\npython -c \"import psutil; print(f'CPUs: {psutil.cpu_count()}, RAM: {psutil.virtual_memory().available / 1024**3:.1f}GB')\"\n\n# Try again\npv init /path/to/project --optimize\n```\n\n### Debug Mode\n\n```bash\n# Enable verbose logging\npv --verbose index /path/to/project\n\n# Check project status\npv status /path/to/project\n\n# View config\ncat /path/to/project/.vectorizer/config.json\n\n# Check ChromaDB\nls -lh /path/to/project/.vectorizer/chromadb/\n```\n\n### Performance Debugging\n\n```bash\n# Time operations\ntime pv index /path/to/project\ntime pv index /path/to/project --max-resources\n\n# Monitor resources during indexing\n# Terminal 1:\npv index /path/to/project --max-resources\n\n# Terminal 2:\nhtop  # or top\n# Should see high CPU across all cores\n\n# Check memory warnings\npv index /path/to/project --max-resources --verbose\n# Look for memory warnings\n```\n\n---\n\n## Changelog\n\n### [0.1.2] - 2025-10-13\n\n#### Added\n\n- **Optimized Config Generation** - `Config.create_optimized()` auto-detects CPU/RAM\n- **Max Resources Flag** - `--max-resources` for temporary performance boost\n- **psutil Integration** - Automatic system resource detection\n- **Unified Progress Tracking** - Clean single-line progress bar\n- **Library Progress Suppression** - No more cluttered batch progress bars\n- **Timing Information** - All operations show elapsed time\n- **Clean Terminal Output** - Professional UI with timing\n\n#### Performance\n\n- **2x faster** full indexing with --max-resources\n- **60-70% faster** smart incremental updates\n- **80-90% faster** git-aware indexing\n\n#### Documentation\n\n- Comprehensive documentation overhaul\n- Consolidated all guides into main README\n- Added CHANGELOG.md with version history\n\n### [0.1.1] - 2025-10-12\n\n- Enhanced single-word search with high precision\n- Multi-level chunking (micro + word-level)\n- Adaptive search thresholds\n- Programming keyword detection\n- Improved word matching and relevance boosting\n\n### [0.1.0] - 2025-10-10\n\n- Initial release\n- Code vectorization\n- Smart incremental indexing\n- Git-aware indexing\n- MCP server\n- Watch mode\n- ChromaDB backend\n- 30+ language support\n\n---\n\n## Contributing\n\n### Development Setup\n\n```bash\n# Clone repository\ngit clone https://github.com/starkbaknet/project-vectorizer.git\ncd project-vectorizer\n\n# Create virtual environment\npython -m venv venv\nsource venv/bin/activate  # Windows: venv\\Scripts\\activate\n\n# Install with dev dependencies\npip install -e \".[dev]\"\n\n# Run tests\npytest\n\n# Format code\nblack .\nisort .\n```\n\n### Running Tests\n\n```bash\n# All tests\npytest\n\n# With coverage\npytest --cov=project_vectorizer\n\n# Specific test\npytest tests/test_config.py\n\n# Verbose\npytest -v\n```\n\nSee [docs/TESTING.md](docs/TESTING.md) for details.\n\n### Publishing\n\nSee [docs/PUBLISHING.md](docs/PUBLISHING.md) for PyPI publishing guide.\n\n### Contributing Guidelines\n\n1. Fork repository\n2. Create feature branch: `git checkout -b feature/amazing-feature`\n3. Make changes and add tests\n4. Ensure tests pass: `pytest`\n5. Format code: `black . && isort .`\n6. Commit: `git commit -m 'Add amazing feature'`\n7. Push: `git push origin feature/amazing-feature`\n8. Open Pull Request\n\n---\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file\n\n---\n\n## Additional Resources\n\n- **GitHub**: https://github.com/starkbaknet/project-vectorizer\n- **PyPI**: https://pypi.org/project/project-vectorizer/\n- **Issues**: https://github.com/starkbaknet/project-vectorizer/issues\n\n---\n\n**Made with \u2764\ufe0f by StarkBakNet**\n\n_Vectorize your codebase. Empower your AI agents. Build better software._\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "CLI tool for vectorizing codebases and serving them via MCP",
    "version": "0.1.2",
    "project_urls": {
        "Changelog": "https://github.com/starkbaknet/project-vectorizer/releases",
        "Documentation": "https://github.com/starkbaknet/project-vectorizer#readme",
        "Homepage": "https://github.com/starkbaknet/project-vectorizer",
        "Issues": "https://github.com/starkbaknet/project-vectorizer/issues",
        "Repository": "https://github.com/starkbaknet/project-vectorizer"
    },
    "split_keywords": [
        "vectorization",
        " code-search",
        " embeddings",
        " semantic-search",
        " mcp",
        " chromadb",
        " codebase-analysis",
        " ai-tools"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0c674fa94c87a1a133644c9110f59f61f60121e699867884da51b32f005a927a",
                "md5": "39f9cd4cd8941d3ee09057e3d0a9fca8",
                "sha256": "ad80c18cb4f81039586dc3aa365934ae497562229dfb1ddff3b2ac1347602271"
            },
            "downloads": -1,
            "filename": "project_vectorizer-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "39f9cd4cd8941d3ee09057e3d0a9fca8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 51544,
            "upload_time": "2025-10-13T08:49:38",
            "upload_time_iso_8601": "2025-10-13T08:49:38.377974Z",
            "url": "https://files.pythonhosted.org/packages/0c/67/4fa94c87a1a133644c9110f59f61f60121e699867884da51b32f005a927a/project_vectorizer-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c0d8f78f6e363e213cc6149e35a8870374007baa9eaf3daded91b7699bb2343b",
                "md5": "fb59b06058a76c0a4ed4363b49d910b1",
                "sha256": "eb1a631814c33bdc75b24d48be136aa0a372482a03a34e6b0cea62bbd1a1efa5"
            },
            "downloads": -1,
            "filename": "project_vectorizer-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "fb59b06058a76c0a4ed4363b49d910b1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 86398,
            "upload_time": "2025-10-13T08:49:39",
            "upload_time_iso_8601": "2025-10-13T08:49:39.398221Z",
            "url": "https://files.pythonhosted.org/packages/c0/d8/f78f6e363e213cc6149e35a8870374007baa9eaf3daded91b7699bb2343b/project_vectorizer-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-13 08:49:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "starkbaknet",
    "github_project": "project-vectorizer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "project-vectorizer"
}

None