# Rust Crate Pipeline v4.0.0
A comprehensive, enterprise-grade system for gathering, enriching, and analyzing metadata for Rust crates using AI-powered insights, advanced caching, machine learning predictions, and microservices architecture. This pipeline provides deep analysis of Rust crates with support for multiple LLM providers, intelligent caching, ML quality predictions, and comprehensive Rust code quality assessment.
## 🚀 Quick Start
### Option 1: Install via pip (Recommended for users)
```bash
# Install the package (includes automatic setup)
pip install rust-crate-pipeline
# The package will automatically run setup for all components
# You can also run setup manually:
rust-crate-pipeline --setup
# Run with your preferred LLM provider
rust-crate-pipeline --llm-provider ollama --llm-model tinyllama --crates serde tokio
```
### Option 2: Clone and run from repository (Recommended for developers)
```bash
# Clone the repository
git clone https://github.com/Superuser666-Sigil/SigilDERG-Data_Production.git
cd SigilDERG-Data_Production
# Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Run setup for all components
python -m rust_crate_pipeline --setup
# Run the pipeline
python run_with_llm.py --llm-provider ollama --llm-model tinyllama --crates serde tokio
```
## ✨ Key Features
### 🤖 **AI & Machine Learning**
- **Multi-Provider LLM Support**: Azure OpenAI, OpenAI, Anthropic, Ollama, LM Studio, Lambda.AI, and 100+ LiteLLM providers
- **ML Quality Predictor**: Automated quality scoring, security risk assessment, maintenance predictions
- **Intelligent Analysis**: AI-powered insights and recommendations
- **Real-time Learning**: Adaptive model training and prediction refinement
### 🚀 **Performance & Caching**
- **Advanced Multi-Level Caching**: Memory, Disk, and Redis caching with intelligent warming
- **Cache Hit Optimization**: 10-100x faster response times for cached results
- **Tag-based Invalidation**: Intelligent cache management and cleanup
- **TTL Management**: Configurable cache expiration and size limits
### 🌐 **Web Scraping & Analysis**
- **Advanced Web Scraping**: Crawl4AI + Playwright for intelligent content extraction
- **Enhanced Rust Analysis**: cargo-geiger, cargo-outdated, cargo-license, cargo-tarpaulin, cargo-deny
- **Comprehensive Tooling**: Full Rust ecosystem analysis and quality assessment
### 🔒 **Security & Trust**
- **Sigil Protocol Support**: Sacred Chain analysis with IRL trust scoring
- **Security Analysis**: Privacy and security scanning with Presidio
- **Trust Verification**: Canon registry and reputation system
- **Audit Logging**: Comprehensive audit trails for compliance
### 🏗️ **Architecture & Scalability**
- **Microservices Ready**: API Gateway with service discovery and load balancing
- **Event-Driven**: Message queues and asynchronous processing
- **Horizontal Scaling**: Support for 1000+ concurrent users
### 📊 **Monitoring & Observability**
- **Real-time Progress Tracking**: Comprehensive monitoring and error recovery
- **Prometheus Metrics**: Detailed performance and health metrics
- **Health Checks**: Automated service health monitoring
- **Structured Logging**: JSON-formatted logs with correlation IDs
### 🐳 **Deployment & Operations**
- **Docker Support**: Containerized deployment with docker-compose
- **Auto-Resume Capability**: Automatically skips already processed crates
- **Batch Processing**: Configurable memory optimization and cost control
- **Production Ready**: Enterprise-grade reliability and performance
## 📋 Requirements
- **Python 3.12+** (required)
- **Git** (for repository operations)
- **Cargo** (for Rust crate analysis)
- **Playwright browsers** (auto-installed via setup)
- **Rust analysis tools** (auto-installed via setup)
### Optional Dependencies
- **Redis**: For distributed caching (recommended for production)
- **Prometheus**: For metrics collection
## 🔧 Installation & Setup
### For End Users (pip install)
The package includes automatic setup for all components:
```bash
# Install the package (includes all dependencies and automatic setup)
pip install rust-crate-pipeline
# Check setup status
rust-crate-pipeline --setup-check
# Run setup manually if needed
rust-crate-pipeline --setup --verbose-setup
```
### For Developers (repository clone)
```bash
# Clone the repository
git clone https://github.com/Superuser666-Sigil/SigilDERG-Data_Production.git
cd SigilDERG-Data_Production
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Run comprehensive setup
python -m rust_crate_pipeline --setup --verbose-setup
# Set up environment variables (optional but recommended)
export AZURE_OPENAI_ENDPOINT="your_endpoint"
export AZURE_OPENAI_API_KEY="your_api_key"
export GITHUB_TOKEN="your_github_token"
```
## 🎯 Usage Examples
### Basic Usage with Integrated Components
```python
from rust_crate_pipeline.config import PipelineConfig
from rust_crate_pipeline.unified_pipeline import UnifiedSigilPipeline
# Create configuration
config = PipelineConfig(
model_path="~/models/deepseek-coder-6.7b-instruct.Q4_K_M.gguf",
max_tokens=512,
batch_size=5,
output_path="./output"
)
# Create pipeline with integrated components
pipeline = UnifiedSigilPipeline(config)
# Analyze crates with caching and ML predictions
async with pipeline:
result = await pipeline.analyze_crate("serde")
# ML predictions are automatically added
ml_predictions = result.audit_info.get("ml_predictions", {})
print(f"Quality Score: {ml_predictions.get('quality_score', 0)}")
```
### Advanced Caching Usage
```python
from rust_crate_pipeline.utils.advanced_cache import get_cache
# Get cache instance
cache = get_cache()
# Store data with TTL and tags
await cache.set(
"crate:serde",
crate_data,
ttl=3600, # 1 hour
tags=["rust", "serialization"]
)
# Retrieve data
cached_data = await cache.get("crate:serde")
# Invalidate by tags
await cache.invalidate_by_tags(["rust"])
```
### ML Quality Predictions
```python
from rust_crate_pipeline.ml.quality_predictor import get_predictor
# Get predictor instance
predictor = get_predictor()
# Predict quality metrics
prediction = predictor.predict_quality(crate_data)
print(f"Quality Score: {prediction.quality_score}")
print(f"Security Risk: {prediction.security_risk}")
print(f"Maintenance Score: {prediction.maintenance_score}")
```
### API Gateway for Microservices
```python
from rust_crate_pipeline.services.api_gateway import APIGateway
# Load configuration
with open("configs/gateway_config.json", "r") as f:
config = json.load(f)
# Create gateway
gateway = APIGateway(config)
# Start gateway (in production)
# python rust_crate_pipeline/services/api_gateway.py --config configs/gateway_config.json
```
### Command Line Usage
```bash
# Basic analysis with caching and ML
rust-crate-pipeline --llm-provider ollama --llm-model tinyllama --crates serde tokio
# Advanced analysis with all features
rust-crate-pipeline --llm-provider azure --llm-model gpt-4o --crates actix-web --enable-ml --enable-caching
# Batch processing with auto-resume
rust-crate-pipeline --crates-file data/crate_list.txt --auto-resume --batch-size 5
# Force restart processing
rust-crate-pipeline --crates-file data/crate_list.txt --force-restart
```
## 🔍 Enhanced Rust Analysis
The pipeline includes comprehensive Rust analysis tools:
- **cargo-geiger**: Unsafe code detection and safety scoring
- **cargo-outdated**: Dependency update recommendations
- **cargo-license**: License analysis and compliance
- **cargo-tarpaulin**: Code coverage analysis
- **cargo-deny**: Comprehensive dependency checking
- **cargo-audit**: Security vulnerability scanning
- **cargo-tree**: Dependency visualization
### Analysis Output with ML Predictions
Each crate analysis includes:
```json
{
"enhanced_analysis": {
"build": { "returncode": 0, "stdout": "...", "stderr": "..." },
"test": { "returncode": 0, "stdout": "...", "stderr": "..." },
"clippy": { "returncode": 0, "stdout": "...", "stderr": "..." },
"geiger": { "returncode": 0, "stdout": "...", "stderr": "..." },
"ml_predictions": {
"quality_score": 0.85,
"security_risk": "low",
"maintenance_score": 0.92,
"popularity_trend": "growing",
"dependency_health": 0.88,
"confidence": 0.95,
"model_version": "1.0.0"
},
"insights": {
"overall_quality_score": 0.85,
"security_risk_level": "low",
"code_quality": "excellent",
"recommendations": [
"Consider updating dependencies",
"Review 2 unsafe code items detected by cargo-geiger"
]
}
}
}
```
## 🤖 LLM Provider Support
### Supported Providers
| Provider | Setup | Usage |
|----------|-------|-------|
| **Ollama** | `ollama serve` + `ollama pull model` | `--llm-provider ollama --llm-model tinyllama` |
| **Azure OpenAI** | Set env vars | `--llm-provider azure --llm-model gpt-4o` |
| **OpenAI** | Set `OPENAI_API_KEY` | `--llm-provider openai --llm-model gpt-4` |
| **Anthropic** | Set `ANTHROPIC_API_KEY` | `--llm-provider anthropic --llm-model claude-3` |
| **LM Studio** | Start LM Studio server | `--llm-provider lmstudio --llm-model local-model` |
| **llama-cpp** | Download .gguf file | `--llm-provider llama-cpp --llm-model path/to/model.gguf` |
| **Lambda.AI** | Set `LAMBDA_API_KEY` | `--llm-provider lambda --llm-model qwen25-coder-32b` |
### Provider Configuration
```bash
# Ollama (recommended for local development)
rust-crate-pipeline --llm-provider ollama --llm-model tinyllama
# Azure OpenAI (recommended for production)
rust-crate-pipeline --llm-provider azure --llm-model gpt-4o
# OpenAI
rust-crate-pipeline --llm-provider openai --llm-model gpt-4
# Local llama-cpp model
rust-crate-pipeline --llm-provider llama-cpp --llm-model ~/models/deepseek.gguf
```
## 📊 Output and Results
### Analysis Reports & Teaching Bundles
The pipeline generates comprehensive analysis reports and optional teaching bundles per crate:
- **Basic Metadata**: Crate information, dependencies, downloads
- **Web Scraping Results**: Documentation from crates.io, docs.rs, lib.rs
- **Enhanced Analysis**: Rust tool outputs and quality metrics
- **LLM Enrichment**: AI-generated insights and recommendations
- **ML Predictions**: Quality scores, security risks, maintenance metrics
- **Sacred Chain Analysis**: Trust scoring and security assessment
- **Cache Performance**: Hit rates and optimization metrics
### Output Structure
```
output/
├── serde_analysis_report.json # Complete analysis with ML predictions
├── tokio_analysis_report.json # Complete analysis with ML predictions
├── checkpoint_batch_1_20250821.jsonl # Progress checkpoints
├── pipeline_status.json # Overall status
├── cache_metrics.json # Cache performance metrics
└── ml_predictions_summary.json # ML prediction summary
```
Teaching bundles structure:
```
teaching_bundles/
├── <crate_name>/
│ ├── Cargo.toml # Uses real crate versions
│ ├── src/lib.rs # Sanitized, formatted examples
│ ├── tests/basic.rs # Auto-generated tests per topic
│ ├── README.md # Includes license attribution
│ ├── quality_labels.json # Includes build/test results
│ ├── validate.sh # Validates compile/tests and license presence
│ └── LICENSE | COPYING # Copied from upstream if available
└── ...
```
### Audit Logs
Comprehensive audit logs are stored in `audits/records/` for compliance and traceability.
## 🏗️ Architecture
### Modular Monolith with Microservices Ready
The system is designed as a modular monolith that can be easily decomposed into microservices:
```
┌─────────────────────────────────────────────────────────────┐
│ Rust Crate Pipeline │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Core │ │ LLM │ │ Analysis │ │
│ │ Pipeline │ │ Processing │ │ Engine │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Web │ │ Cache │ │ ML │ │
│ │ Scraping │ │ System │ │ Predictor │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Sigil │ │ Audit │ │ Utils │ │
│ │ Protocol │ │ System │ │ & Tools │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### Microservices Architecture
When deployed as microservices:
```
┌─────────────────────────────────────────────────────────────┐
│ API Gateway │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Auth │ │ Rate │ │ Load │ │
│ │ Service │ │ Limiting │ │ Balancing │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
┌───────▼────────┐ ┌─────────▼────────┐ ┌────────▼────────┐
│ Pipeline │ │ Analysis │ │ Scraping │
│ Service │ │ Service │ │ Service │
└────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ Shared Services │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Cache │ │ Database │ │ Message │ │
│ │ Service │ │ Service │ │ Queue │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
## 🔧 Setup and Configuration
### Automatic Setup
The package includes automatic setup for all dependencies:
```bash
# Run setup (automatically runs on pip install)
rust-crate-pipeline --setup
# Check setup status
rust-crate-pipeline --setup-check
# Verbose setup with detailed output
rust-crate-pipeline --setup --verbose-setup
```
### Manual Setup
If automatic setup fails, you can run components manually:
```bash
# Install Playwright browsers
playwright install
# Install Rust analysis tools
cargo install cargo-geiger cargo-outdated cargo-license cargo-tarpaulin cargo-deny cargo-audit
# Configure environment variables
cp ~/.rust_crate_pipeline/.env.template .env
# Edit .env with your API keys
```
### Configuration Files
Setup creates configuration files in `~/.rust_crate_pipeline/`:
- `crawl4ai_config.json`: Crawl4AI settings
- `rust_tools_config.json`: Rust tool status
- `llm_providers_config.json`: LLM provider configurations
- `cache_config.json`: Cache settings and performance
- `ml_config.json`: ML model configurations
- `system_checks.json`: System compatibility results
- `.env.template`: Environment variable template
## 🐳 Docker Support
### Quick Docker Setup
```bash
# Build and run with Docker Compose
docker-compose up -d
# Run pipeline in container
docker-compose exec rust-pipeline rust-crate-pipeline --crates serde tokio
```
### Custom Docker Configuration
```dockerfile
# Use the provided Dockerfile
FROM python:3.12-slim
# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt
# Install Rust and tools
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
RUN cargo install cargo-geiger cargo-outdated cargo-license cargo-tarpaulin cargo-deny cargo-audit
# Install Playwright
RUN playwright install
# Copy application
COPY . /app
WORKDIR /app
# Run setup
RUN python -m rust_crate_pipeline --setup
```
## 🚀 Performance and Optimization
### Caching Performance
- **Cache Hit**: 10-100x faster response times
- **Memory Cache**: Sub-millisecond access
- **Disk Cache**: Persistent storage with intelligent eviction
- **Redis Cache**: Distributed caching for multi-instance deployments
### Batch Processing
```bash
# Optimize for memory usage
rust-crate-pipeline --batch-size 2 --max-workers 2
# Optimize for speed
rust-crate-pipeline --batch-size 10 --max-workers 8
```
### Cost Control
```bash
# Skip expensive operations
rust-crate-pipeline --skip-ai --skip-source-analysis
# Limit processing
rust-crate-pipeline --limit 50 --batch-size 5
```
## 🔍 Troubleshooting
### Common Issues
1. **Playwright browsers not installed**
```bash
playwright install
```
2. **Rust tools not available**
```bash
rust-crate-pipeline --setup
```
3. **LLM connection issues**
```bash
# Check Ollama
curl http://localhost:11434/api/tags
# Check Azure OpenAI
curl -H "api-key: $AZURE_OPENAI_API_KEY" "$AZURE_OPENAI_ENDPOINT/openai/deployments"
```
4. **Cache issues**
```bash
# Clear cache
rm -rf ~/.rust_crate_pipeline/cache/
# Check cache status
rust-crate-pipeline --cache-status
```
5. **ML model issues**
```bash
# Check ML model status
rust-crate-pipeline --ml-status
# Retrain models
rust-crate-pipeline --retrain-ml-models
```
### Logs and Debugging
```bash
# Enable debug logging
rust-crate-pipeline --log-level DEBUG --crates serde
# Check setup logs
cat ~/.rust_crate_pipeline/setup_results.json
# Check cache logs
cat ~/.rust_crate_pipeline/cache_metrics.json
```
## 📈 Monitoring and Metrics
### Prometheus Metrics
The system exposes comprehensive metrics:
- **Request counters**: Total requests, success/failure rates
- **Response times**: Latency histograms and percentiles
- **Cache metrics**: Hit rates, miss rates, eviction rates
- **ML metrics**: Prediction accuracy, model performance
- **System metrics**: CPU, memory, disk usage
### Health Checks
```bash
# Check overall health
curl http://localhost:8080/health
# Check specific services
curl http://localhost:8080/health/pipeline
curl http://localhost:8080/health/analysis
curl http://localhost:8080/health/scraping
```
## 🤝 Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Run the test suite
6. Submit a pull request
### Development Setup
```bash
# Clone and setup development environment
git clone https://github.com/Superuser666-Sigil/SigilDERG-Data_Production.git
cd SigilDERG-Data_Production
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
pytest tests/
# Run integration tests
pytest tests/test_integration.py -v
# Run linting
black rust_crate_pipeline/
flake8 rust_crate_pipeline/
```
## 📚 Documentation
- **[Architecture Guide](docs/ARCHITECTURE.md)**: Detailed architecture documentation
- **[Implementation Plan](docs/IMPLEMENTATION_PLAN.md)**: Development roadmap
- **[Roadmap Status](docs/ROADMAP_STATUS.md)**: Current status and next steps
- **[LLM Providers Guide](docs/README_LLM_PROVIDERS.md)**: LLM provider configuration
- **[Integration Examples](examples/integration_example.py)**: Usage examples
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🙏 Acknowledgments
- **Crawl4AI** for advanced web scraping capabilities
- **Playwright** for browser automation
- **Rust community** for the excellent analysis tools
- **Ollama** for local LLM serving
- **All LLM providers** for their APIs and models
- **Redis** for distributed caching
- **Prometheus** for metrics collection
## 📞 Support
- **Issues**: [GitHub Issues](https://github.com/Superuser666-Sigil/SigilDERG-Data_Production/issues)
- **Documentation**: [Wiki](https://github.com/Superuser666-Sigil/SigilDERG-Data_Production/wiki)
- **Discussions**: [GitHub Discussions](https://github.com/Superuser666-Sigil/SigilDERG-Data_Production/discussions)
---
**Rust Crate Pipeline v3.0.0** - Enterprise-grade Rust crate analysis with AI-powered insights, advanced caching, ML predictions, and microservices architecture.
**🚀 Ready for production deployment and scaling!**
Raw data
{
"_id": null,
"home_page": "https://github.com/Superuser666-Sigil/SigilDERG-Data_Production",
"name": "rust-crate-pipeline",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": "SuperUser666-Sigil <miragemodularframework@gmail.com>",
"keywords": "rust, crate, analysis, ai, llm, pipeline, caching, ml, microservices",
"author": "SuperUser666-Sigil",
"author_email": "SuperUser666-Sigil <miragemodularframework@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/0f/cf/e282f155a1073b8e34b50a9c803f9f5858c76792c300e5b1531471ddf352/rust_crate_pipeline-4.0.0.tar.gz",
"platform": null,
"description": "# Rust Crate Pipeline v4.0.0\r\n\r\nA comprehensive, enterprise-grade system for gathering, enriching, and analyzing metadata for Rust crates using AI-powered insights, advanced caching, machine learning predictions, and microservices architecture. This pipeline provides deep analysis of Rust crates with support for multiple LLM providers, intelligent caching, ML quality predictions, and comprehensive Rust code quality assessment.\r\n\r\n## \ud83d\ude80 Quick Start\r\n\r\n### Option 1: Install via pip (Recommended for users)\r\n\r\n```bash\r\n# Install the package (includes automatic setup)\r\npip install rust-crate-pipeline\r\n\r\n# The package will automatically run setup for all components\r\n# You can also run setup manually:\r\nrust-crate-pipeline --setup\r\n\r\n# Run with your preferred LLM provider\r\nrust-crate-pipeline --llm-provider ollama --llm-model tinyllama --crates serde tokio\r\n```\r\n\r\n### Option 2: Clone and run from repository (Recommended for developers)\r\n\r\n```bash\r\n# Clone the repository\r\ngit clone https://github.com/Superuser666-Sigil/SigilDERG-Data_Production.git\r\ncd SigilDERG-Data_Production\r\n\r\n# Install dependencies\r\npip install -r requirements.txt\r\npip install -r requirements-dev.txt\r\n\r\n# Run setup for all components\r\npython -m rust_crate_pipeline --setup\r\n\r\n# Run the pipeline\r\npython run_with_llm.py --llm-provider ollama --llm-model tinyllama --crates serde tokio\r\n```\r\n\r\n## \u2728 Key Features\r\n\r\n### \ud83e\udd16 **AI & Machine Learning**\r\n- **Multi-Provider LLM Support**: Azure OpenAI, OpenAI, Anthropic, Ollama, LM Studio, Lambda.AI, and 100+ LiteLLM providers\r\n- **ML Quality Predictor**: Automated quality scoring, security risk assessment, maintenance predictions\r\n- **Intelligent Analysis**: AI-powered insights and recommendations\r\n- **Real-time Learning**: Adaptive model training and prediction refinement\r\n\r\n### \ud83d\ude80 **Performance & Caching**\r\n- **Advanced Multi-Level Caching**: Memory, Disk, and Redis caching with intelligent warming\r\n- **Cache Hit Optimization**: 10-100x faster response times for cached results\r\n- **Tag-based Invalidation**: Intelligent cache management and cleanup\r\n- **TTL Management**: Configurable cache expiration and size limits\r\n\r\n### \ud83c\udf10 **Web Scraping & Analysis**\r\n- **Advanced Web Scraping**: Crawl4AI + Playwright for intelligent content extraction\r\n- **Enhanced Rust Analysis**: cargo-geiger, cargo-outdated, cargo-license, cargo-tarpaulin, cargo-deny\r\n- **Comprehensive Tooling**: Full Rust ecosystem analysis and quality assessment\r\n\r\n### \ud83d\udd12 **Security & Trust**\r\n- **Sigil Protocol Support**: Sacred Chain analysis with IRL trust scoring\r\n- **Security Analysis**: Privacy and security scanning with Presidio\r\n- **Trust Verification**: Canon registry and reputation system\r\n- **Audit Logging**: Comprehensive audit trails for compliance\r\n\r\n### \ud83c\udfd7\ufe0f **Architecture & Scalability**\r\n- **Microservices Ready**: API Gateway with service discovery and load balancing\r\n- **Event-Driven**: Message queues and asynchronous processing\r\n- **Horizontal Scaling**: Support for 1000+ concurrent users\r\n\r\n### \ud83d\udcca **Monitoring & Observability**\r\n- **Real-time Progress Tracking**: Comprehensive monitoring and error recovery\r\n- **Prometheus Metrics**: Detailed performance and health metrics\r\n- **Health Checks**: Automated service health monitoring\r\n- **Structured Logging**: JSON-formatted logs with correlation IDs\r\n\r\n### \ud83d\udc33 **Deployment & Operations**\r\n- **Docker Support**: Containerized deployment with docker-compose\r\n- **Auto-Resume Capability**: Automatically skips already processed crates\r\n- **Batch Processing**: Configurable memory optimization and cost control\r\n- **Production Ready**: Enterprise-grade reliability and performance\r\n\r\n## \ud83d\udccb Requirements\r\n\r\n- **Python 3.12+** (required)\r\n- **Git** (for repository operations)\r\n- **Cargo** (for Rust crate analysis)\r\n- **Playwright browsers** (auto-installed via setup)\r\n- **Rust analysis tools** (auto-installed via setup)\r\n\r\n### Optional Dependencies\r\n- **Redis**: For distributed caching (recommended for production)\r\n- **Prometheus**: For metrics collection\r\n\r\n## \ud83d\udd27 Installation & Setup\r\n\r\n### For End Users (pip install)\r\n\r\nThe package includes automatic setup for all components:\r\n\r\n```bash\r\n# Install the package (includes all dependencies and automatic setup)\r\npip install rust-crate-pipeline\r\n\r\n# Check setup status\r\nrust-crate-pipeline --setup-check\r\n\r\n# Run setup manually if needed\r\nrust-crate-pipeline --setup --verbose-setup\r\n```\r\n\r\n### For Developers (repository clone)\r\n\r\n```bash\r\n# Clone the repository\r\ngit clone https://github.com/Superuser666-Sigil/SigilDERG-Data_Production.git\r\ncd SigilDERG-Data_Production\r\n\r\n# Create virtual environment (recommended)\r\npython -m venv venv\r\nsource venv/bin/activate # On Windows: venv\\Scripts\\activate\r\n\r\n# Install dependencies\r\npip install -r requirements.txt\r\npip install -r requirements-dev.txt\r\n\r\n# Run comprehensive setup\r\npython -m rust_crate_pipeline --setup --verbose-setup\r\n\r\n# Set up environment variables (optional but recommended)\r\nexport AZURE_OPENAI_ENDPOINT=\"your_endpoint\"\r\nexport AZURE_OPENAI_API_KEY=\"your_api_key\"\r\nexport GITHUB_TOKEN=\"your_github_token\"\r\n```\r\n\r\n## \ud83c\udfaf Usage Examples\r\n\r\n### Basic Usage with Integrated Components\r\n\r\n```python\r\nfrom rust_crate_pipeline.config import PipelineConfig\r\nfrom rust_crate_pipeline.unified_pipeline import UnifiedSigilPipeline\r\n\r\n# Create configuration\r\nconfig = PipelineConfig(\r\n model_path=\"~/models/deepseek-coder-6.7b-instruct.Q4_K_M.gguf\",\r\n max_tokens=512,\r\n batch_size=5,\r\n output_path=\"./output\"\r\n)\r\n\r\n# Create pipeline with integrated components\r\npipeline = UnifiedSigilPipeline(config)\r\n\r\n# Analyze crates with caching and ML predictions\r\nasync with pipeline:\r\n result = await pipeline.analyze_crate(\"serde\")\r\n \r\n # ML predictions are automatically added\r\n ml_predictions = result.audit_info.get(\"ml_predictions\", {})\r\n print(f\"Quality Score: {ml_predictions.get('quality_score', 0)}\")\r\n```\r\n\r\n### Advanced Caching Usage\r\n\r\n```python\r\nfrom rust_crate_pipeline.utils.advanced_cache import get_cache\r\n\r\n# Get cache instance\r\ncache = get_cache()\r\n\r\n# Store data with TTL and tags\r\nawait cache.set(\r\n \"crate:serde\", \r\n crate_data, \r\n ttl=3600, # 1 hour\r\n tags=[\"rust\", \"serialization\"]\r\n)\r\n\r\n# Retrieve data\r\ncached_data = await cache.get(\"crate:serde\")\r\n\r\n# Invalidate by tags\r\nawait cache.invalidate_by_tags([\"rust\"])\r\n```\r\n\r\n### ML Quality Predictions\r\n\r\n```python\r\nfrom rust_crate_pipeline.ml.quality_predictor import get_predictor\r\n\r\n# Get predictor instance\r\npredictor = get_predictor()\r\n\r\n# Predict quality metrics\r\nprediction = predictor.predict_quality(crate_data)\r\n\r\nprint(f\"Quality Score: {prediction.quality_score}\")\r\nprint(f\"Security Risk: {prediction.security_risk}\")\r\nprint(f\"Maintenance Score: {prediction.maintenance_score}\")\r\n```\r\n\r\n### API Gateway for Microservices\r\n\r\n```python\r\nfrom rust_crate_pipeline.services.api_gateway import APIGateway\r\n\r\n# Load configuration\r\nwith open(\"configs/gateway_config.json\", \"r\") as f:\r\n config = json.load(f)\r\n\r\n# Create gateway\r\ngateway = APIGateway(config)\r\n\r\n# Start gateway (in production)\r\n# python rust_crate_pipeline/services/api_gateway.py --config configs/gateway_config.json\r\n```\r\n\r\n### Command Line Usage\r\n\r\n```bash\r\n# Basic analysis with caching and ML\r\nrust-crate-pipeline --llm-provider ollama --llm-model tinyllama --crates serde tokio\r\n\r\n# Advanced analysis with all features\r\nrust-crate-pipeline --llm-provider azure --llm-model gpt-4o --crates actix-web --enable-ml --enable-caching\r\n\r\n# Batch processing with auto-resume\r\nrust-crate-pipeline --crates-file data/crate_list.txt --auto-resume --batch-size 5\r\n\r\n# Force restart processing\r\nrust-crate-pipeline --crates-file data/crate_list.txt --force-restart\r\n```\r\n\r\n## \ud83d\udd0d Enhanced Rust Analysis\r\n\r\nThe pipeline includes comprehensive Rust analysis tools:\r\n\r\n- **cargo-geiger**: Unsafe code detection and safety scoring\r\n- **cargo-outdated**: Dependency update recommendations\r\n- **cargo-license**: License analysis and compliance\r\n- **cargo-tarpaulin**: Code coverage analysis\r\n- **cargo-deny**: Comprehensive dependency checking\r\n- **cargo-audit**: Security vulnerability scanning\r\n- **cargo-tree**: Dependency visualization\r\n\r\n### Analysis Output with ML Predictions\r\n\r\nEach crate analysis includes:\r\n\r\n```json\r\n{\r\n \"enhanced_analysis\": {\r\n \"build\": { \"returncode\": 0, \"stdout\": \"...\", \"stderr\": \"...\" },\r\n \"test\": { \"returncode\": 0, \"stdout\": \"...\", \"stderr\": \"...\" },\r\n \"clippy\": { \"returncode\": 0, \"stdout\": \"...\", \"stderr\": \"...\" },\r\n \"geiger\": { \"returncode\": 0, \"stdout\": \"...\", \"stderr\": \"...\" },\r\n \"ml_predictions\": {\r\n \"quality_score\": 0.85,\r\n \"security_risk\": \"low\",\r\n \"maintenance_score\": 0.92,\r\n \"popularity_trend\": \"growing\",\r\n \"dependency_health\": 0.88,\r\n \"confidence\": 0.95,\r\n \"model_version\": \"1.0.0\"\r\n },\r\n \"insights\": {\r\n \"overall_quality_score\": 0.85,\r\n \"security_risk_level\": \"low\",\r\n \"code_quality\": \"excellent\",\r\n \"recommendations\": [\r\n \"Consider updating dependencies\",\r\n \"Review 2 unsafe code items detected by cargo-geiger\"\r\n ]\r\n }\r\n }\r\n}\r\n```\r\n\r\n## \ud83e\udd16 LLM Provider Support\r\n\r\n### Supported Providers\r\n\r\n| Provider | Setup | Usage |\r\n|----------|-------|-------|\r\n| **Ollama** | `ollama serve` + `ollama pull model` | `--llm-provider ollama --llm-model tinyllama` |\r\n| **Azure OpenAI** | Set env vars | `--llm-provider azure --llm-model gpt-4o` |\r\n| **OpenAI** | Set `OPENAI_API_KEY` | `--llm-provider openai --llm-model gpt-4` |\r\n| **Anthropic** | Set `ANTHROPIC_API_KEY` | `--llm-provider anthropic --llm-model claude-3` |\r\n| **LM Studio** | Start LM Studio server | `--llm-provider lmstudio --llm-model local-model` |\r\n| **llama-cpp** | Download .gguf file | `--llm-provider llama-cpp --llm-model path/to/model.gguf` |\r\n| **Lambda.AI** | Set `LAMBDA_API_KEY` | `--llm-provider lambda --llm-model qwen25-coder-32b` |\r\n\r\n### Provider Configuration\r\n\r\n```bash\r\n# Ollama (recommended for local development)\r\nrust-crate-pipeline --llm-provider ollama --llm-model tinyllama\r\n\r\n# Azure OpenAI (recommended for production)\r\nrust-crate-pipeline --llm-provider azure --llm-model gpt-4o\r\n\r\n# OpenAI\r\nrust-crate-pipeline --llm-provider openai --llm-model gpt-4\r\n\r\n# Local llama-cpp model\r\nrust-crate-pipeline --llm-provider llama-cpp --llm-model ~/models/deepseek.gguf\r\n```\r\n\r\n## \ud83d\udcca Output and Results\r\n\r\n### Analysis Reports & Teaching Bundles\r\n\r\nThe pipeline generates comprehensive analysis reports and optional teaching bundles per crate:\r\n\r\n- **Basic Metadata**: Crate information, dependencies, downloads\r\n- **Web Scraping Results**: Documentation from crates.io, docs.rs, lib.rs\r\n- **Enhanced Analysis**: Rust tool outputs and quality metrics\r\n- **LLM Enrichment**: AI-generated insights and recommendations\r\n- **ML Predictions**: Quality scores, security risks, maintenance metrics\r\n- **Sacred Chain Analysis**: Trust scoring and security assessment\r\n- **Cache Performance**: Hit rates and optimization metrics\r\n\r\n### Output Structure\r\n\r\n```\r\noutput/\r\n\u251c\u2500\u2500 serde_analysis_report.json # Complete analysis with ML predictions\r\n\u251c\u2500\u2500 tokio_analysis_report.json # Complete analysis with ML predictions\r\n\u251c\u2500\u2500 checkpoint_batch_1_20250821.jsonl # Progress checkpoints\r\n\u251c\u2500\u2500 pipeline_status.json # Overall status\r\n\u251c\u2500\u2500 cache_metrics.json # Cache performance metrics\r\n\u2514\u2500\u2500 ml_predictions_summary.json # ML prediction summary\r\n```\r\n\r\nTeaching bundles structure:\r\n\r\n```\r\nteaching_bundles/\r\n\u251c\u2500\u2500 <crate_name>/\r\n\u2502 \u251c\u2500\u2500 Cargo.toml # Uses real crate versions\r\n\u2502 \u251c\u2500\u2500 src/lib.rs # Sanitized, formatted examples\r\n\u2502 \u251c\u2500\u2500 tests/basic.rs # Auto-generated tests per topic\r\n\u2502 \u251c\u2500\u2500 README.md # Includes license attribution\r\n\u2502 \u251c\u2500\u2500 quality_labels.json # Includes build/test results\r\n\u2502 \u251c\u2500\u2500 validate.sh # Validates compile/tests and license presence\r\n\u2502 \u2514\u2500\u2500 LICENSE | COPYING # Copied from upstream if available\r\n\u2514\u2500\u2500 ...\r\n```\r\n\r\n### Audit Logs\r\n\r\nComprehensive audit logs are stored in `audits/records/` for compliance and traceability.\r\n\r\n## \ud83c\udfd7\ufe0f Architecture\r\n\r\n### Modular Monolith with Microservices Ready\r\n\r\nThe system is designed as a modular monolith that can be easily decomposed into microservices:\r\n\r\n```\r\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\r\n\u2502 Rust Crate Pipeline \u2502\r\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\r\n\u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502\r\n\u2502 \u2502 Core \u2502 \u2502 LLM \u2502 \u2502 Analysis \u2502 \u2502\r\n\u2502 \u2502 Pipeline \u2502 \u2502 Processing \u2502 \u2502 Engine \u2502 \u2502\r\n\u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502\r\n\u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502\r\n\u2502 \u2502 Web \u2502 \u2502 Cache \u2502 \u2502 ML \u2502 \u2502\r\n\u2502 \u2502 Scraping \u2502 \u2502 System \u2502 \u2502 Predictor \u2502 \u2502\r\n\u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502\r\n\u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502\r\n\u2502 \u2502 Sigil \u2502 \u2502 Audit \u2502 \u2502 Utils \u2502 \u2502\r\n\u2502 \u2502 Protocol \u2502 \u2502 System \u2502 \u2502 & Tools \u2502 \u2502\r\n\u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502\r\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\r\n```\r\n\r\n### Microservices Architecture\r\n\r\nWhen deployed as microservices:\r\n\r\n```\r\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\r\n\u2502 API Gateway \u2502\r\n\u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502\r\n\u2502 \u2502 Auth \u2502 \u2502 Rate \u2502 \u2502 Load \u2502 \u2502\r\n\u2502 \u2502 Service \u2502 \u2502 Limiting \u2502 \u2502 Balancing \u2502 \u2502\r\n\u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502\r\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\r\n \u2502\r\n \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\r\n \u2502 \u2502 \u2502\r\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\r\n\u2502 Pipeline \u2502 \u2502 Analysis \u2502 \u2502 Scraping \u2502\r\n\u2502 Service \u2502 \u2502 Service \u2502 \u2502 Service \u2502\r\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\r\n \u2502 \u2502 \u2502\r\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\r\n \u2502\r\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\r\n\u2502 Shared Services \u2502\r\n\u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502\r\n\u2502 \u2502 Cache \u2502 \u2502 Database \u2502 \u2502 Message \u2502 \u2502\r\n\u2502 \u2502 Service \u2502 \u2502 Service \u2502 \u2502 Queue \u2502 \u2502\r\n\u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502\r\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\r\n```\r\n\r\n## \ud83d\udd27 Setup and Configuration\r\n\r\n### Automatic Setup\r\n\r\nThe package includes automatic setup for all dependencies:\r\n\r\n```bash\r\n# Run setup (automatically runs on pip install)\r\nrust-crate-pipeline --setup\r\n\r\n# Check setup status\r\nrust-crate-pipeline --setup-check\r\n\r\n# Verbose setup with detailed output\r\nrust-crate-pipeline --setup --verbose-setup\r\n```\r\n\r\n### Manual Setup\r\n\r\nIf automatic setup fails, you can run components manually:\r\n\r\n```bash\r\n# Install Playwright browsers\r\nplaywright install\r\n\r\n# Install Rust analysis tools\r\ncargo install cargo-geiger cargo-outdated cargo-license cargo-tarpaulin cargo-deny cargo-audit\r\n\r\n# Configure environment variables\r\ncp ~/.rust_crate_pipeline/.env.template .env\r\n# Edit .env with your API keys\r\n```\r\n\r\n### Configuration Files\r\n\r\nSetup creates configuration files in `~/.rust_crate_pipeline/`:\r\n\r\n- `crawl4ai_config.json`: Crawl4AI settings\r\n- `rust_tools_config.json`: Rust tool status\r\n- `llm_providers_config.json`: LLM provider configurations\r\n- `cache_config.json`: Cache settings and performance\r\n- `ml_config.json`: ML model configurations\r\n- `system_checks.json`: System compatibility results\r\n- `.env.template`: Environment variable template\r\n\r\n## \ud83d\udc33 Docker Support\r\n\r\n### Quick Docker Setup\r\n\r\n```bash\r\n# Build and run with Docker Compose\r\ndocker-compose up -d\r\n\r\n# Run pipeline in container\r\ndocker-compose exec rust-pipeline rust-crate-pipeline --crates serde tokio\r\n```\r\n\r\n### Custom Docker Configuration\r\n\r\n```dockerfile\r\n# Use the provided Dockerfile\r\nFROM python:3.12-slim\r\n\r\n# Install dependencies\r\nCOPY requirements.txt .\r\nRUN pip install -r requirements.txt\r\n\r\n# Install Rust and tools\r\nRUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\r\nRUN cargo install cargo-geiger cargo-outdated cargo-license cargo-tarpaulin cargo-deny cargo-audit\r\n\r\n# Install Playwright\r\nRUN playwright install\r\n\r\n# Copy application\r\nCOPY . /app\r\nWORKDIR /app\r\n\r\n# Run setup\r\nRUN python -m rust_crate_pipeline --setup\r\n```\r\n\r\n## \ud83d\ude80 Performance and Optimization\r\n\r\n### Caching Performance\r\n\r\n- **Cache Hit**: 10-100x faster response times\r\n- **Memory Cache**: Sub-millisecond access\r\n- **Disk Cache**: Persistent storage with intelligent eviction\r\n- **Redis Cache**: Distributed caching for multi-instance deployments\r\n\r\n### Batch Processing\r\n\r\n```bash\r\n# Optimize for memory usage\r\nrust-crate-pipeline --batch-size 2 --max-workers 2\r\n\r\n# Optimize for speed\r\nrust-crate-pipeline --batch-size 10 --max-workers 8\r\n```\r\n\r\n### Cost Control\r\n\r\n```bash\r\n# Skip expensive operations\r\nrust-crate-pipeline --skip-ai --skip-source-analysis\r\n\r\n# Limit processing\r\nrust-crate-pipeline --limit 50 --batch-size 5\r\n```\r\n\r\n## \ud83d\udd0d Troubleshooting\r\n\r\n### Common Issues\r\n\r\n1. **Playwright browsers not installed**\r\n ```bash\r\n playwright install\r\n ```\r\n\r\n2. **Rust tools not available**\r\n ```bash\r\n rust-crate-pipeline --setup\r\n ```\r\n\r\n3. **LLM connection issues**\r\n ```bash\r\n # Check Ollama\r\n curl http://localhost:11434/api/tags\r\n \r\n # Check Azure OpenAI\r\n curl -H \"api-key: $AZURE_OPENAI_API_KEY\" \"$AZURE_OPENAI_ENDPOINT/openai/deployments\"\r\n ```\r\n\r\n4. **Cache issues**\r\n ```bash\r\n # Clear cache\r\n rm -rf ~/.rust_crate_pipeline/cache/\r\n \r\n # Check cache status\r\n rust-crate-pipeline --cache-status\r\n ```\r\n\r\n5. **ML model issues**\r\n ```bash\r\n # Check ML model status\r\n rust-crate-pipeline --ml-status\r\n \r\n # Retrain models\r\n rust-crate-pipeline --retrain-ml-models\r\n ```\r\n\r\n### Logs and Debugging\r\n\r\n```bash\r\n# Enable debug logging\r\nrust-crate-pipeline --log-level DEBUG --crates serde\r\n\r\n# Check setup logs\r\ncat ~/.rust_crate_pipeline/setup_results.json\r\n\r\n# Check cache logs\r\ncat ~/.rust_crate_pipeline/cache_metrics.json\r\n```\r\n\r\n## \ud83d\udcc8 Monitoring and Metrics\r\n\r\n### Prometheus Metrics\r\n\r\nThe system exposes comprehensive metrics:\r\n\r\n- **Request counters**: Total requests, success/failure rates\r\n- **Response times**: Latency histograms and percentiles\r\n- **Cache metrics**: Hit rates, miss rates, eviction rates\r\n- **ML metrics**: Prediction accuracy, model performance\r\n- **System metrics**: CPU, memory, disk usage\r\n\r\n### Health Checks\r\n\r\n```bash\r\n# Check overall health\r\ncurl http://localhost:8080/health\r\n\r\n# Check specific services\r\ncurl http://localhost:8080/health/pipeline\r\ncurl http://localhost:8080/health/analysis\r\ncurl http://localhost:8080/health/scraping\r\n```\r\n\r\n## \ud83e\udd1d Contributing\r\n\r\n1. Fork the repository\r\n2. Create a feature branch\r\n3. Make your changes\r\n4. Add tests\r\n5. Run the test suite\r\n6. Submit a pull request\r\n\r\n### Development Setup\r\n\r\n```bash\r\n# Clone and setup development environment\r\ngit clone https://github.com/Superuser666-Sigil/SigilDERG-Data_Production.git\r\ncd SigilDERG-Data_Production\r\n\r\n# Install development dependencies\r\npip install -r requirements-dev.txt\r\n\r\n# Run tests\r\npytest tests/\r\n\r\n# Run integration tests\r\npytest tests/test_integration.py -v\r\n\r\n# Run linting\r\nblack rust_crate_pipeline/\r\nflake8 rust_crate_pipeline/\r\n```\r\n\r\n## \ud83d\udcda Documentation\r\n\r\n- **[Architecture Guide](docs/ARCHITECTURE.md)**: Detailed architecture documentation\r\n- **[Implementation Plan](docs/IMPLEMENTATION_PLAN.md)**: Development roadmap\r\n- **[Roadmap Status](docs/ROADMAP_STATUS.md)**: Current status and next steps\r\n- **[LLM Providers Guide](docs/README_LLM_PROVIDERS.md)**: LLM provider configuration\r\n- **[Integration Examples](examples/integration_example.py)**: Usage examples\r\n\r\n## \ud83d\udcc4 License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## \ud83d\ude4f Acknowledgments\r\n\r\n- **Crawl4AI** for advanced web scraping capabilities\r\n- **Playwright** for browser automation\r\n- **Rust community** for the excellent analysis tools\r\n- **Ollama** for local LLM serving\r\n- **All LLM providers** for their APIs and models\r\n- **Redis** for distributed caching\r\n- **Prometheus** for metrics collection\r\n\r\n## \ud83d\udcde Support\r\n\r\n- **Issues**: [GitHub Issues](https://github.com/Superuser666-Sigil/SigilDERG-Data_Production/issues)\r\n- **Documentation**: [Wiki](https://github.com/Superuser666-Sigil/SigilDERG-Data_Production/wiki)\r\n- **Discussions**: [GitHub Discussions](https://github.com/Superuser666-Sigil/SigilDERG-Data_Production/discussions)\r\n\r\n---\r\n\r\n**Rust Crate Pipeline v3.0.0** - Enterprise-grade Rust crate analysis with AI-powered insights, advanced caching, ML predictions, and microservices architecture.\r\n\r\n**\ud83d\ude80 Ready for production deployment and scaling!**\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A comprehensive system for gathering, enriching, and analyzing metadata for Rust crates using AI-powered insights, advanced caching, machine learning predictions, and microservices architecture.",
"version": "4.0.0",
"project_urls": {
"Bug Tracker": "https://github.com/Superuser666-Sigil/SigilDERG-Data_Production/issues",
"Documentation": "https://github.com/Superuser666-Sigil/SigilDERG-Data_Production/wiki",
"Homepage": "https://github.com/Superuser666-Sigil/SigilDERG-Data_Production",
"Release Notes": "https://github.com/Superuser666-Sigil/SigilDERG-Data_Production/releases",
"Repository": "https://github.com/Superuser666-Sigil/SigilDERG-Data_Production"
},
"split_keywords": [
"rust",
" crate",
" analysis",
" ai",
" llm",
" pipeline",
" caching",
" ml",
" microservices"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "37bbf3512f7f98f53961cb65a3a8a8d0e62ed4e3cd703dbddf1885a2a51daf09",
"md5": "7ccfecd243775a750116d026a6bf46c6",
"sha256": "ce382a774877dc44ff8ce4b5cd5f268e237c04b560a8f64239d71d1db005d3bf"
},
"downloads": -1,
"filename": "rust_crate_pipeline-4.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7ccfecd243775a750116d026a6bf46c6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 156188,
"upload_time": "2025-08-26T10:34:05",
"upload_time_iso_8601": "2025-08-26T10:34:05.908764Z",
"url": "https://files.pythonhosted.org/packages/37/bb/f3512f7f98f53961cb65a3a8a8d0e62ed4e3cd703dbddf1885a2a51daf09/rust_crate_pipeline-4.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0fcfe282f155a1073b8e34b50a9c803f9f5858c76792c300e5b1531471ddf352",
"md5": "f8d2050287633ff34d7b5137c254bea4",
"sha256": "dcf8a897336112efd4517d4579a51bc89a3dc390a367ddbf5efff8b08e157277"
},
"downloads": -1,
"filename": "rust_crate_pipeline-4.0.0.tar.gz",
"has_sig": false,
"md5_digest": "f8d2050287633ff34d7b5137c254bea4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 184334,
"upload_time": "2025-08-26T10:34:08",
"upload_time_iso_8601": "2025-08-26T10:34:08.374985Z",
"url": "https://files.pythonhosted.org/packages/0f/cf/e282f155a1073b8e34b50a9c803f9f5858c76792c300e5b1531471ddf352/rust_crate_pipeline-4.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-26 10:34:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Superuser666-Sigil",
"github_project": "SigilDERG-Data_Production",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "requests",
"specs": [
[
"<",
"3.0.0"
],
[
">=",
"2.31.0"
]
]
},
{
"name": "requests-cache",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.1.1"
]
]
},
{
"name": "httpx",
"specs": [
[
">=",
"0.25.0"
],
[
"<",
"1.0.0"
]
]
},
{
"name": "aiohttp",
"specs": [
[
"<",
"4.0.0"
],
[
">=",
"3.9.0"
]
]
},
{
"name": "beautifulsoup4",
"specs": [
[
"<",
"5.0.0"
],
[
">=",
"4.12.0"
]
]
},
{
"name": "lxml",
"specs": [
[
"<",
"6.0.0"
],
[
">=",
"4.9.0"
]
]
},
{
"name": "playwright",
"specs": [
[
">=",
"1.40.0"
],
[
"<",
"2.0.0"
]
]
},
{
"name": "selenium",
"specs": [
[
"<",
"5.0.0"
],
[
">=",
"4.15.0"
]
]
},
{
"name": "crawl4ai",
"specs": [
[
">=",
"0.6.0"
],
[
"<",
"1.0.0"
]
]
},
{
"name": "litellm",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.30.0"
]
]
},
{
"name": "llama-cpp-python",
"specs": [
[
">=",
"0.2.0"
],
[
"<",
"1.0.0"
]
]
},
{
"name": "tiktoken",
"specs": [
[
">=",
"0.5.0"
],
[
"<",
"1.0.0"
]
]
},
{
"name": "azure-core",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.29.0"
]
]
},
{
"name": "azure-identity",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.15.0"
]
]
},
{
"name": "azure-ai-inference",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.0.0b9"
]
]
},
{
"name": "pydantic",
"specs": [
[
"<",
"3.0.0"
],
[
">=",
"2.5.0"
]
]
},
{
"name": "dataclasses-json",
"specs": [
[
">=",
"0.6.0"
],
[
"<",
"1.0.0"
]
]
},
{
"name": "toml",
"specs": [
[
"<",
"1.0.0"
],
[
">=",
"0.10.0"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
">=",
"2.8.0"
],
[
"<",
"3.0.0"
]
]
},
{
"name": "radon",
"specs": [
[
">=",
"6.0.0"
],
[
"<",
"7.0.0"
]
]
},
{
"name": "rustworkx",
"specs": [
[
">=",
"0.13.0"
],
[
"<",
"1.0.0"
]
]
},
{
"name": "presidio-analyzer",
"specs": [
[
"<",
"3.0.0"
],
[
">=",
"2.2.0"
]
]
},
{
"name": "spacy",
"specs": [
[
"<",
"4.0.0"
],
[
">=",
"3.7.0"
]
]
},
{
"name": "psutil",
"specs": [
[
"<",
"7.0.0"
],
[
">=",
"6.1.1"
]
]
},
{
"name": "tqdm",
"specs": [
[
"<",
"5.0.0"
],
[
">=",
"4.66.0"
]
]
},
{
"name": "cachetools",
"specs": [
[
">=",
"5.3.0"
],
[
"<",
"6.0.0"
]
]
},
{
"name": "aiofiles",
"specs": [
[
">=",
"24.1.0"
],
[
"<",
"25.0.0"
]
]
},
{
"name": "redis",
"specs": [
[
">=",
"5.0.0"
],
[
"<",
"6.0.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.3.0"
]
]
},
{
"name": "numpy",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.24.0"
]
]
},
{
"name": "pandas",
"specs": [
[
"<",
"3.0.0"
],
[
">=",
"2.0.0"
]
]
},
{
"name": "aiohttp",
"specs": [
[
"<",
"4.0.0"
],
[
">=",
"3.9.0"
]
]
},
{
"name": "PyJWT",
"specs": [
[
">=",
"2.8.0"
],
[
"<",
"3.0.0"
]
]
},
{
"name": "prometheus-client",
"specs": [
[
"<",
"1.0.0"
],
[
">=",
"0.17.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"7.4.0"
],
[
"<",
"8.0.0"
]
]
},
{
"name": "pytest-cov",
"specs": [
[
">=",
"4.1.0"
],
[
"<",
"5.0.0"
]
]
},
{
"name": "pytest-asyncio",
"specs": [
[
"<",
"1.0.0"
],
[
">=",
"0.21.0"
]
]
},
{
"name": "black",
"specs": [
[
"<",
"24.0.0"
],
[
">=",
"23.0.0"
]
]
},
{
"name": "isort",
"specs": [
[
">=",
"5.12.0"
],
[
"<",
"6.0.0"
]
]
},
{
"name": "flake8",
"specs": [
[
"<",
"7.0.0"
],
[
">=",
"6.1.0"
]
]
},
{
"name": "mypy",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.7.0"
]
]
},
{
"name": "pyright",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.1.0"
]
]
},
{
"name": "bandit",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.7.0"
]
]
},
{
"name": "safety",
"specs": [
[
">=",
"3.0.0"
],
[
"<",
"4.0.0"
]
]
},
{
"name": "sphinx",
"specs": [
[
"<",
"8.0.0"
],
[
">=",
"7.2.0"
]
]
},
{
"name": "sphinx-rtd-theme",
"specs": [
[
"<",
"3.0.0"
],
[
">=",
"2.0.0"
]
]
},
{
"name": "build",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.0.0"
]
]
},
{
"name": "twine",
"specs": [
[
">=",
"4.0.0"
],
[
"<",
"5.0.0"
]
]
},
{
"name": "pre-commit",
"specs": [
[
">=",
"3.5.0"
],
[
"<",
"4.0.0"
]
]
},
{
"name": "coverage",
"specs": [
[
"<",
"8.0.0"
],
[
">=",
"7.3.0"
]
]
},
{
"name": "codecov",
"specs": [
[
">=",
"2.1.13"
],
[
"<",
"3.0.0"
]
]
},
{
"name": "opentelemetry-api",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.22.0"
]
]
},
{
"name": "opentelemetry-sdk",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.22.0"
]
]
}
],
"lcname": "rust-crate-pipeline"
}