# Rust Crate Pipeline
A comprehensive system for gathering, enriching, and analyzing metadata for Rust crates using AI-powered insights, web scraping, and dependency analysis.
## Quickstart
```bash
pip install rust-crate-pipeline
python -m rust_crate_pipeline --config-file my_config.json
```
- Requires **Python 3.11+**
- For GPU/DeepSeek/llama-cpp-python, see the 'Advanced' section below
## Features
- Enhanced web scraping (Crawl4AI + Playwright)
- AI enrichment (local or Azure OpenAI, DeepSeek, Lambda.AI, etc.)
- Multi-provider LLM support (see Advanced)
- Cargo build/test/audit
- Batch processing, JSON output, Docker support
## Requirements
- **Python 3.11+**
- Git, Cargo, Playwright (auto-installed)
## Installation
```bash
pip install rust-crate-pipeline
# For dev: pip install -e .
# Install Playwright browsers (required for enhanced scraping)
playwright install
```
## Configuration
- All configuration is via a single JSON file, passed with `--config-file <path>`
- No config file is auto-loaded; you must specify the path
Example `my_config.json`:
```json
{
"batch_size": 10,
"n_workers": 4,
"max_retries": 3,
"checkpoint_interval": 10,
"use_azure_openai": false,
"enable_crawl4ai": true,
"model_path": "/path/to/model.gguf"
}
```
Set required environment variables as needed (e.g., `GITHUB_TOKEN`, Azure/OpenAI keys, etc.)
## Usage
### Basic Usage
```bash
python -m rust_crate_pipeline --config-file my_config.json
```
### Custom Options (combine as needed)
```bash
python -m rust_crate_pipeline \
--config-file my_config.json \
--batch-size 20 \
--n-workers 8 \
--max-tokens 2048 \
--checkpoint-interval 5 \
--log-level DEBUG \
--output-path ./results
```
### Advanced: Multi-Provider LLM & GPU
- For local DeepSeek, GPU, or custom LLMs, set `model_path` and `n_gpu_layers` in your config file.
- For Azure/OpenAI/Lambda.AI, set `use_azure_openai: true` and provide the required environment variables.
- For full LLM provider support, see [README_LLM_PROVIDERS.md](README_LLM_PROVIDERS.md)
## Development
- Build: `python -m build`
- Test: `pytest --cov=rust_crate_pipeline tests/`
- Lint: `pyright rust_crate_pipeline/`
- Format: `black rust_crate_pipeline/`
- Publish: `twine upload dist/*`
## Changelog
See `CHANGELOGS/CHANGELOG_v1.5.0.md` and previous changelogs for release history.
Raw data
{
"_id": null,
"home_page": "https://github.com/SigilDERG/rust-crate-pipeline",
"name": "rust-crate-pipeline",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "rust, crates, analysis, ai, pipeline, scraping",
"author": "SigilDERG Team",
"author_email": "SigilDERG Team <sigilderg@example.com>",
"download_url": "https://files.pythonhosted.org/packages/08/0d/3ab2b71c6b29e704d4fb97d6eead5c35322ac82cf95da1ccc0e4743e24b9/rust_crate_pipeline-1.5.6.tar.gz",
"platform": null,
"description": "# Rust Crate Pipeline\r\n\r\nA comprehensive system for gathering, enriching, and analyzing metadata for Rust crates using AI-powered insights, web scraping, and dependency analysis.\r\n\r\n## Quickstart\r\n\r\n```bash\r\npip install rust-crate-pipeline\r\npython -m rust_crate_pipeline --config-file my_config.json\r\n```\r\n\r\n- Requires **Python 3.11+**\r\n- For GPU/DeepSeek/llama-cpp-python, see the 'Advanced' section below\r\n\r\n## Features\r\n\r\n- Enhanced web scraping (Crawl4AI + Playwright)\r\n- AI enrichment (local or Azure OpenAI, DeepSeek, Lambda.AI, etc.)\r\n- Multi-provider LLM support (see Advanced)\r\n- Cargo build/test/audit\r\n- Batch processing, JSON output, Docker support\r\n\r\n## Requirements\r\n\r\n- **Python 3.11+**\r\n- Git, Cargo, Playwright (auto-installed)\r\n\r\n## Installation\r\n\r\n```bash\r\npip install rust-crate-pipeline\r\n# For dev: pip install -e .\r\n# Install Playwright browsers (required for enhanced scraping)\r\nplaywright install\r\n```\r\n\r\n## Configuration\r\n\r\n- All configuration is via a single JSON file, passed with `--config-file <path>`\r\n- No config file is auto-loaded; you must specify the path\r\n\r\nExample `my_config.json`:\r\n```json\r\n{\r\n \"batch_size\": 10,\r\n \"n_workers\": 4,\r\n \"max_retries\": 3,\r\n \"checkpoint_interval\": 10,\r\n \"use_azure_openai\": false,\r\n \"enable_crawl4ai\": true,\r\n \"model_path\": \"/path/to/model.gguf\"\r\n}\r\n```\r\n\r\nSet required environment variables as needed (e.g., `GITHUB_TOKEN`, Azure/OpenAI keys, etc.)\r\n\r\n## Usage\r\n\r\n### Basic Usage\r\n\r\n```bash\r\npython -m rust_crate_pipeline --config-file my_config.json\r\n```\r\n\r\n### Custom Options (combine as needed)\r\n\r\n```bash\r\npython -m rust_crate_pipeline \\\r\n --config-file my_config.json \\\r\n --batch-size 20 \\\r\n --n-workers 8 \\\r\n --max-tokens 2048 \\\r\n --checkpoint-interval 5 \\\r\n --log-level DEBUG \\\r\n --output-path ./results\r\n```\r\n\r\n### Advanced: Multi-Provider LLM & GPU\r\n\r\n- For local DeepSeek, GPU, or custom LLMs, set `model_path` and `n_gpu_layers` in your config file.\r\n- For Azure/OpenAI/Lambda.AI, set `use_azure_openai: true` and provide the required environment variables.\r\n- For full LLM provider support, see [README_LLM_PROVIDERS.md](README_LLM_PROVIDERS.md)\r\n\r\n## Development\r\n\r\n- Build: `python -m build`\r\n- Test: `pytest --cov=rust_crate_pipeline tests/`\r\n- Lint: `pyright rust_crate_pipeline/`\r\n- Format: `black rust_crate_pipeline/`\r\n- Publish: `twine upload dist/*`\r\n\r\n## Changelog\r\n\r\nSee `CHANGELOGS/CHANGELOG_v1.5.0.md` and previous changelogs for release history.\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A comprehensive pipeline for analyzing Rust crates with AI enrichment and enhanced scraping",
"version": "1.5.6",
"project_urls": {
"Bug Tracker": "https://github.com/Superuser666-Sigil/SigilDERG-Data_Production/issues",
"Documentation": "https://github.com/Superuser666-Sigil/SigilDERG-Data_Production#readme",
"Homepage": "https://github.com/Superuser666-Sigil/SigilDERG-Data_Production",
"Repository": "https://github.com/Superuser666-Sigil/SigilDERG-Data_Production"
},
"split_keywords": [
"rust",
" crates",
" analysis",
" ai",
" pipeline",
" scraping"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "5026620f1ea8056f621ea908f0868438251cf93e00cda29c45dd556f71123e13",
"md5": "ada1d0d3184cfaa2d6d3383fd531d418",
"sha256": "f274dadd3a465023858ed2f68040c5d9f49d66bf9b7f4ec3178f505b74720144"
},
"downloads": -1,
"filename": "rust_crate_pipeline-1.5.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ada1d0d3184cfaa2d6d3383fd531d418",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 112644,
"upload_time": "2025-08-01T23:19:42",
"upload_time_iso_8601": "2025-08-01T23:19:42.669051Z",
"url": "https://files.pythonhosted.org/packages/50/26/620f1ea8056f621ea908f0868438251cf93e00cda29c45dd556f71123e13/rust_crate_pipeline-1.5.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "080d3ab2b71c6b29e704d4fb97d6eead5c35322ac82cf95da1ccc0e4743e24b9",
"md5": "5dfe44d748ccf6588da12bbd3b8ad8f3",
"sha256": "3d7871bb3a3f7e0565f318528f9c6f787e9bf473ee6ae707980d04644d4e48c7"
},
"downloads": -1,
"filename": "rust_crate_pipeline-1.5.6.tar.gz",
"has_sig": false,
"md5_digest": "5dfe44d748ccf6588da12bbd3b8ad8f3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 127943,
"upload_time": "2025-08-01T23:19:44",
"upload_time_iso_8601": "2025-08-01T23:19:44.310482Z",
"url": "https://files.pythonhosted.org/packages/08/0d/3ab2b71c6b29e704d4fb97d6eead5c35322ac82cf95da1ccc0e4743e24b9/rust_crate_pipeline-1.5.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-01 23:19:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "SigilDERG",
"github_project": "rust-crate-pipeline",
"github_not_found": true,
"lcname": "rust-crate-pipeline"
}