rust-crate-pipeline


Namerust-crate-pipeline JSON
Version 1.5.6 PyPI version JSON
download
home_pagehttps://github.com/SigilDERG/rust-crate-pipeline
SummaryA comprehensive pipeline for analyzing Rust crates with AI enrichment and enhanced scraping
upload_time2025-08-01 23:19:44
maintainerNone
docs_urlNone
authorSigilDERG Team
requires_python>=3.12
licenseMIT
keywords rust crates analysis ai pipeline scraping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Rust Crate Pipeline

A comprehensive system for gathering, enriching, and analyzing metadata for Rust crates using AI-powered insights, web scraping, and dependency analysis.

## Quickstart

```bash
pip install rust-crate-pipeline
python -m rust_crate_pipeline --config-file my_config.json
```

- Requires **Python 3.11+**
- For GPU/DeepSeek/llama-cpp-python, see the 'Advanced' section below

## Features

- Enhanced web scraping (Crawl4AI + Playwright)
- AI enrichment (local or Azure OpenAI, DeepSeek, Lambda.AI, etc.)
- Multi-provider LLM support (see Advanced)
- Cargo build/test/audit
- Batch processing, JSON output, Docker support

## Requirements

- **Python 3.11+**
- Git, Cargo, Playwright (auto-installed)

## Installation

```bash
pip install rust-crate-pipeline
# For dev: pip install -e .
# Install Playwright browsers (required for enhanced scraping)
playwright install
```

## Configuration

- All configuration is via a single JSON file, passed with `--config-file <path>`
- No config file is auto-loaded; you must specify the path

Example `my_config.json`:
```json
{
    "batch_size": 10,
    "n_workers": 4,
    "max_retries": 3,
    "checkpoint_interval": 10,
    "use_azure_openai": false,
    "enable_crawl4ai": true,
    "model_path": "/path/to/model.gguf"
}
```

Set required environment variables as needed (e.g., `GITHUB_TOKEN`, Azure/OpenAI keys, etc.)

## Usage

### Basic Usage

```bash
python -m rust_crate_pipeline --config-file my_config.json
```

### Custom Options (combine as needed)

```bash
python -m rust_crate_pipeline \
  --config-file my_config.json \
  --batch-size 20 \
  --n-workers 8 \
  --max-tokens 2048 \
  --checkpoint-interval 5 \
  --log-level DEBUG \
  --output-path ./results
```

### Advanced: Multi-Provider LLM & GPU

- For local DeepSeek, GPU, or custom LLMs, set `model_path` and `n_gpu_layers` in your config file.
- For Azure/OpenAI/Lambda.AI, set `use_azure_openai: true` and provide the required environment variables.
- For full LLM provider support, see [README_LLM_PROVIDERS.md](README_LLM_PROVIDERS.md)

## Development

- Build: `python -m build`
- Test: `pytest --cov=rust_crate_pipeline tests/`
- Lint: `pyright rust_crate_pipeline/`
- Format: `black rust_crate_pipeline/`
- Publish: `twine upload dist/*`

## Changelog

See `CHANGELOGS/CHANGELOG_v1.5.0.md` and previous changelogs for release history.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SigilDERG/rust-crate-pipeline",
    "name": "rust-crate-pipeline",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "rust, crates, analysis, ai, pipeline, scraping",
    "author": "SigilDERG Team",
    "author_email": "SigilDERG Team <sigilderg@example.com>",
    "download_url": "https://files.pythonhosted.org/packages/08/0d/3ab2b71c6b29e704d4fb97d6eead5c35322ac82cf95da1ccc0e4743e24b9/rust_crate_pipeline-1.5.6.tar.gz",
    "platform": null,
    "description": "# Rust Crate Pipeline\r\n\r\nA comprehensive system for gathering, enriching, and analyzing metadata for Rust crates using AI-powered insights, web scraping, and dependency analysis.\r\n\r\n## Quickstart\r\n\r\n```bash\r\npip install rust-crate-pipeline\r\npython -m rust_crate_pipeline --config-file my_config.json\r\n```\r\n\r\n- Requires **Python 3.11+**\r\n- For GPU/DeepSeek/llama-cpp-python, see the 'Advanced' section below\r\n\r\n## Features\r\n\r\n- Enhanced web scraping (Crawl4AI + Playwright)\r\n- AI enrichment (local or Azure OpenAI, DeepSeek, Lambda.AI, etc.)\r\n- Multi-provider LLM support (see Advanced)\r\n- Cargo build/test/audit\r\n- Batch processing, JSON output, Docker support\r\n\r\n## Requirements\r\n\r\n- **Python 3.11+**\r\n- Git, Cargo, Playwright (auto-installed)\r\n\r\n## Installation\r\n\r\n```bash\r\npip install rust-crate-pipeline\r\n# For dev: pip install -e .\r\n# Install Playwright browsers (required for enhanced scraping)\r\nplaywright install\r\n```\r\n\r\n## Configuration\r\n\r\n- All configuration is via a single JSON file, passed with `--config-file <path>`\r\n- No config file is auto-loaded; you must specify the path\r\n\r\nExample `my_config.json`:\r\n```json\r\n{\r\n    \"batch_size\": 10,\r\n    \"n_workers\": 4,\r\n    \"max_retries\": 3,\r\n    \"checkpoint_interval\": 10,\r\n    \"use_azure_openai\": false,\r\n    \"enable_crawl4ai\": true,\r\n    \"model_path\": \"/path/to/model.gguf\"\r\n}\r\n```\r\n\r\nSet required environment variables as needed (e.g., `GITHUB_TOKEN`, Azure/OpenAI keys, etc.)\r\n\r\n## Usage\r\n\r\n### Basic Usage\r\n\r\n```bash\r\npython -m rust_crate_pipeline --config-file my_config.json\r\n```\r\n\r\n### Custom Options (combine as needed)\r\n\r\n```bash\r\npython -m rust_crate_pipeline \\\r\n  --config-file my_config.json \\\r\n  --batch-size 20 \\\r\n  --n-workers 8 \\\r\n  --max-tokens 2048 \\\r\n  --checkpoint-interval 5 \\\r\n  --log-level DEBUG \\\r\n  --output-path ./results\r\n```\r\n\r\n### Advanced: Multi-Provider LLM & GPU\r\n\r\n- For local DeepSeek, GPU, or custom LLMs, set `model_path` and `n_gpu_layers` in your config file.\r\n- For Azure/OpenAI/Lambda.AI, set `use_azure_openai: true` and provide the required environment variables.\r\n- For full LLM provider support, see [README_LLM_PROVIDERS.md](README_LLM_PROVIDERS.md)\r\n\r\n## Development\r\n\r\n- Build: `python -m build`\r\n- Test: `pytest --cov=rust_crate_pipeline tests/`\r\n- Lint: `pyright rust_crate_pipeline/`\r\n- Format: `black rust_crate_pipeline/`\r\n- Publish: `twine upload dist/*`\r\n\r\n## Changelog\r\n\r\nSee `CHANGELOGS/CHANGELOG_v1.5.0.md` and previous changelogs for release history.\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A comprehensive pipeline for analyzing Rust crates with AI enrichment and enhanced scraping",
    "version": "1.5.6",
    "project_urls": {
        "Bug Tracker": "https://github.com/Superuser666-Sigil/SigilDERG-Data_Production/issues",
        "Documentation": "https://github.com/Superuser666-Sigil/SigilDERG-Data_Production#readme",
        "Homepage": "https://github.com/Superuser666-Sigil/SigilDERG-Data_Production",
        "Repository": "https://github.com/Superuser666-Sigil/SigilDERG-Data_Production"
    },
    "split_keywords": [
        "rust",
        " crates",
        " analysis",
        " ai",
        " pipeline",
        " scraping"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5026620f1ea8056f621ea908f0868438251cf93e00cda29c45dd556f71123e13",
                "md5": "ada1d0d3184cfaa2d6d3383fd531d418",
                "sha256": "f274dadd3a465023858ed2f68040c5d9f49d66bf9b7f4ec3178f505b74720144"
            },
            "downloads": -1,
            "filename": "rust_crate_pipeline-1.5.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ada1d0d3184cfaa2d6d3383fd531d418",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 112644,
            "upload_time": "2025-08-01T23:19:42",
            "upload_time_iso_8601": "2025-08-01T23:19:42.669051Z",
            "url": "https://files.pythonhosted.org/packages/50/26/620f1ea8056f621ea908f0868438251cf93e00cda29c45dd556f71123e13/rust_crate_pipeline-1.5.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "080d3ab2b71c6b29e704d4fb97d6eead5c35322ac82cf95da1ccc0e4743e24b9",
                "md5": "5dfe44d748ccf6588da12bbd3b8ad8f3",
                "sha256": "3d7871bb3a3f7e0565f318528f9c6f787e9bf473ee6ae707980d04644d4e48c7"
            },
            "downloads": -1,
            "filename": "rust_crate_pipeline-1.5.6.tar.gz",
            "has_sig": false,
            "md5_digest": "5dfe44d748ccf6588da12bbd3b8ad8f3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 127943,
            "upload_time": "2025-08-01T23:19:44",
            "upload_time_iso_8601": "2025-08-01T23:19:44.310482Z",
            "url": "https://files.pythonhosted.org/packages/08/0d/3ab2b71c6b29e704d4fb97d6eead5c35322ac82cf95da1ccc0e4743e24b9/rust_crate_pipeline-1.5.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-01 23:19:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SigilDERG",
    "github_project": "rust-crate-pipeline",
    "github_not_found": true,
    "lcname": "rust-crate-pipeline"
}
        
Elapsed time: 1.94867s