# vigil-ai
**AI-powered workflow generation with foundation models for reproducible science**
vigil-ai extends [Vigil](https://github.com/Science-Abundance/vigil) with AI capabilities, making scientific workflow creation accessible through natural language and specialized foundation models.
## Features
- **Natural language → Pipeline**: Generate Snakemake workflows from plain English descriptions
- **Foundation models**: 10+ specialized models for biology, chemistry, materials science
- **Domain-specific AI**: Auto-select the best model for your scientific domain
- **AI debugging**: Get intelligent suggestions for fixing pipeline errors
- **Workflow optimization**: Analyze and optimize for speed, cost, or resource usage
- **Task-based interface**: Simple, high-level API for common workflows
- **MCP integration**: Works with Claude Desktop and AI assistants
## Installation
**Basic (Claude models only):**
```bash
pip install vigil-ai
```
**With science models (ESM-2, BioGPT, ChemBERTa, etc.):**
```bash
pip install 'vigil-ai[science]'
```
**Or install with Vigil:**
```bash
pip install 'vigil[ai]' # Basic
pip install 'vigil[ai,science]' # With science models
```
## Requirements
- Python 3.11+
- Vigil >= 0.2.1
- Anthropic API key (get one at https://console.anthropic.com/)
## Setup
Set your Anthropic API key:
```bash
export ANTHROPIC_API_KEY='your-api-key-here'
```
Or add to your `.env` file:
```
ANTHROPIC_API_KEY=your-api-key-here
```
## Usage
### Generate Pipeline from Description
```bash
vigil ai create "Filter variants by quality >30, annotate with Ensembl, calculate Ti/Tv ratio"
```
Output:
```
✓ Pipeline created: app/code/pipelines/Snakefile
Next steps:
1. Review the generated pipeline
2. Create necessary step scripts
3. vigil run --cores 4
```
### Debug Pipeline Errors
```bash
vigil ai debug
# Or specify error log
vigil ai debug --error-log .snakemake/log/error.log
```
Output:
```
Analyzing error...
Root Cause:
The rule 'filter_variants' failed because the input file 'variants.csv' was not found.
Suggested Fix:
1. Check that your data exists: ls app/data/samples/
2. Verify file name matches exactly (case-sensitive)
3. If file is missing, download or create it
4. Run: vigil doctor to check project health
Prevention:
Add input validation before running pipeline.
```
### Optimize Workflow
```bash
vigil ai optimize --focus speed
# Or optimize for cost
vigil ai optimize --focus cost
```
Output:
```
Optimization Suggestions:
Rule: filter_variants
Issue: Sequential processing
Suggestion: Add threads: 4 and use parallel processing
Impact: 4x faster with multi-core
Rule: annotate
Issue: Repeated API calls
Suggestion: Implement caching for Ensembl queries
Impact: 10x faster on reruns
```
## Quick Start (Task-Based Interface)
The simplest way to use vigil-ai is through the task-based interface:
```python
from vigil_ai.tasks import PipelineGenerator, ErrorDebugger, ModelSelector
# 1. Generate a pipeline for biology
bio_gen = PipelineGenerator(domain="biology")
pipeline = bio_gen.create("Filter variants >30, annotate, calculate Ti/Tv")
bio_gen.create_and_save(pipeline, "workflow.smk")
# 2. Debug errors when they occur
debugger = ErrorDebugger()
fix = debugger.analyze("FileNotFoundError: variants.csv not found")
print(fix)
# 3. Get model recommendations
selector = ModelSelector()
model, reason = selector.recommend("I need to analyze protein sequences")
print(reason) # "Recommended biology model (ESM-2) for protein analysis"
```
## Foundation Models
vigil-ai supports 10+ specialized foundation models across scientific domains:
### Biology Models
- **ESM-2** (650M, 3B, 15B) - Protein language models from Meta AI
- **BioGPT** - Biomedical text generation
- **ProtGPT2** - Protein sequence generation
### Chemistry Models
- **ChemBERTa** - Molecular property prediction
- **MolFormer** - Chemical structure analysis
### Materials Science Models
- **MatBERT** - Materials property prediction
### General Models
- **Claude 3.5 Sonnet** (default) - General-purpose, most capable
- **Claude 3 Opus** - Most powerful
- **Galactica** - Scientific knowledge and reasoning
### Using Domain-Specific Models
```python
from vigil_ai import get_model, ModelDomain
# Automatically select best model for domain
bio_model = get_model(domain=ModelDomain.BIOLOGY) # Returns ESM-2
chem_model = get_model(domain=ModelDomain.CHEMISTRY) # Returns ChemBERTa
mat_model = get_model(domain=ModelDomain.MATERIALS) # Returns MatBERT
# Use specific model by name
esm = get_model(name="esm-2-650m")
embedding = esm.embed("MKFLKFSLLTAVLLSVVFAFSSCGDDDDTGYLPPSQAIQDLL")
# Generate with domain-specific model
from vigil_ai import generate_pipeline
pipeline = generate_pipeline(
"Analyze protein sequences and predict function",
domain=ModelDomain.BIOLOGY # Uses ESM-2
)
```
## Python API (Low-Level)
For more control, use the low-level API:
```python
from vigil_ai import generate_pipeline, ai_debug, ai_optimize
# Generate pipeline
pipeline = generate_pipeline(
"Filter variants by quality >30, calculate Ti/Tv ratio",
template="genomics-starter",
model="claude-3-5-sonnet-20241022" # Or specify domain
)
print(pipeline)
# Debug error
fix = ai_debug("FileNotFoundError: variants.csv not found")
print(fix)
# Optimize workflow
suggestions = ai_optimize(focus="speed")
print(suggestions)
```
## Examples
### Create Imaging Analysis Pipeline
```bash
vigil ai create "Segment cells from microscopy images, count cells per field, measure intensity"
```
Generates:
```python
rule segment_cells:
input: "data/images/{sample}.tif"
output: "artifacts/masks/{sample}_mask.png"
script: "../lib/steps/segment.py"
rule count_cells:
input: "artifacts/masks/{sample}_mask.png"
output: "artifacts/counts/{sample}_counts.json"
script: "../lib/steps/count.py"
rule measure_intensity:
input:
image="data/images/{sample}.tif",
mask="artifacts/masks/{sample}_mask.png"
output: "artifacts/intensity/{sample}_intensity.csv"
script: "../lib/steps/measure.py"
```
### Interactive Mode
```bash
vigil ai chat
```
Starts interactive session:
```
> Create a pipeline to filter variants
✓ Pipeline generated
> Add a rule to calculate metrics
✓ Added metrics rule
> How can I make this faster?
Suggestions:
1. Add parallel processing
2. Cache intermediate results
...
```
## Configuration
Create `.vigil-ai.yaml` in your project:
```yaml
ai:
model: claude-3-5-sonnet-20241022 # Claude model to use
max_tokens: 4096 # Max response length
temperature: 0.7 # Creativity (0-1)
cache_responses: true # Cache AI responses
```
## Advanced Usage
### Generate Step Script
```python
from vigil_ai.generator import generate_step_script
script = generate_step_script(
rule_name="filter_variants",
description="Filter variants by quality score >30",
inputs=["variants.csv"],
outputs=["filtered.parquet"],
language="python"
)
with open("app/code/lib/steps/filter.py", "w") as f:
f.write(script)
```
### Custom Prompts
```python
from vigil_ai import generate_pipeline
pipeline = generate_pipeline(
description="""
Create a multi-sample variant calling pipeline:
1. Align reads with BWA
2. Mark duplicates with Picard
3. Call variants with GATK
4. Filter and annotate
""",
template="genomics-starter"
)
```
## Architecture
vigil-ai is part of a **three-layer architecture** for reproducible science:
```
┌─────────────────────────────────────────────────────┐
│ Agents Layer: AI Assistants │
│ (Claude Desktop, custom agents) │
└─────────────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────┐
│ Application Layer: vigil-ai (THIS PACKAGE) │
│ - MCP Server Integration │
│ - Foundation Models (Claude, ESM, BioGPT, etc.) │
│ - Task Interface (PipelineGenerator, etc.) │
└─────────────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────┐
│ Foundation Layer: Vigil Core │
│ - Snakemake pipelines │
│ - Artifact management │
│ - Receipt tracking │
└─────────────────────────────────────────────────────┘
```
### MCP Integration
vigil-ai extends the Vigil MCP server with 5 AI-powered verbs:
- `ai_generate_pipeline` - Generate Snakemake workflow from description
- `ai_debug_error` - Analyze and fix pipeline errors
- `ai_optimize_workflow` - Suggest performance optimizations
- `ai_list_models` - List available foundation models
- `ai_get_model_info` - Get model metadata and capabilities
**Use with Claude Desktop:**
```json
{
"mcpServers": {
"vigil": {
"command": "vigil",
"args": ["mcp"]
}
}
}
```
Then ask Claude: *"Generate a pipeline to filter variants and calculate metrics"*
## All Supported Models
### General-Purpose (API-based)
- `claude-3-5-sonnet-20241022` (default, recommended)
- `claude-3-opus-20240229` (most powerful)
- `claude-3-sonnet-20240229` (balanced)
- `claude-3-haiku-20240307` (fastest, cheapest)
### Biology (requires `[science]` install)
- `esm-2-650m` - Meta AI protein model, 650M params
- `esm-2-3b` - Meta AI protein model, 3B params (GPU recommended)
- `esm-2-15b` - Meta AI protein model, 15B params (GPU required)
- `biogpt` - Microsoft biomedical text model
- `protgpt2` - Protein sequence generation
### Chemistry (requires `[science]` install)
- `chemberta-v2` - DeepChem molecular property model
- `molformer` - Molecular structure analysis
### Materials Science (requires `[science]` install)
- `matbert` - Materials property prediction
## Cost Estimates
**Claude models (API-based):**
- Pipeline generation: ~$0.02-0.05 per request
- Debugging: ~$0.01-0.03 per request
- Optimization: ~$0.03-0.07 per request
**Science models (local inference):**
- Free to use (runs on your hardware)
- Requires GPU for optimal performance (ESM-2, BioGPT)
- CPU inference possible but slower
**Cost optimization tips:**
- Enable response caching: `cache_responses: true` in `.vigil-ai.yaml`
- Use smaller models for simpler tasks (`claude-3-haiku` vs `claude-3-opus`)
- Use local science models when applicable (no API costs)
## Example Gallery
See the [`examples/`](examples/) directory for complete examples:
- **[task_based_workflow.py](examples/task_based_workflow.py)** - Complete workflow using task interface
- **[domain_specific_models.py](examples/domain_specific_models.py)** - Using biology/chemistry/materials models
- **[basic_pipeline_generation.py](examples/basic_pipeline_generation.py)** - Low-level API examples
- **[with_caching_and_config.py](examples/with_caching_and_config.py)** - Configuration and caching
Run any example:
```bash
python examples/task_based_workflow.py
```
## Limitations
**General:**
- Claude models require internet connection and API key
- Generated pipelines need review before use in production
- AI suggestions should be validated by domain experts
- Not a replacement for scientific expertise
**Science models:**
- Require `pip install vigil-ai[science]` and additional dependencies
- Large models (ESM-2 15B) require significant GPU memory (40GB+)
- Local inference slower than API-based models
- May require domain-specific preprocessing
## Development
```bash
# Clone repo
git clone https://github.com/Science-Abundance/vigil
cd vigil/packages/vigil-core-ai
# Install in dev mode with all dependencies
pip install -e '.[dev,science]'
# Run tests
pytest
# Run tests with science models (requires GPU)
pytest -m science
# Lint
ruff check .
# Type check
mypy src/
# Run examples
python examples/task_based_workflow.py
python examples/domain_specific_models.py
```
## Contributing
Contributions welcome! See [CONTRIBUTING.md](../../CONTRIBUTING.md)
## License
Apache-2.0
## Support
- GitHub Issues: https://github.com/Science-Abundance/vigil/issues
- Documentation: https://github.com/Science-Abundance/vigil
- Discord: [coming soon]
## Acknowledgments
Built with:
- [Anthropic Claude](https://www.anthropic.com/) - General-purpose AI capabilities
- [Vigil](https://github.com/Science-Abundance/vigil) - Reproducible science platform
- [HuggingFace Transformers](https://huggingface.co/transformers/) - Foundation model infrastructure
Foundation models:
- **ESM-2** - Meta AI ([paper](https://www.biorxiv.org/content/10.1101/2022.07.20.500902v1))
- **BioGPT** - Microsoft Research ([paper](https://arxiv.org/abs/2210.10341))
- **ChemBERTa** - DeepChem ([paper](https://arxiv.org/abs/2010.09885))
- **MatBERT** - Materials Project ([paper](https://arxiv.org/abs/2109.15290))
- **Galactica** - Meta AI ([paper](https://arxiv.org/abs/2211.09085))
Raw data
{
"_id": null,
"home_page": null,
"name": "vigil-ai",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "ai, claude, llm, pipeline, science, workflow",
"author": "Science Abundance",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/40/7f/0c26ba8a8d66509a806ba22e8ad0e14c7e727f510b8d62727e6a7260a328/vigil_ai-0.3.0.tar.gz",
"platform": null,
"description": "# vigil-ai\n\n**AI-powered workflow generation with foundation models for reproducible science**\n\nvigil-ai extends [Vigil](https://github.com/Science-Abundance/vigil) with AI capabilities, making scientific workflow creation accessible through natural language and specialized foundation models.\n\n## Features\n\n- **Natural language \u2192 Pipeline**: Generate Snakemake workflows from plain English descriptions\n- **Foundation models**: 10+ specialized models for biology, chemistry, materials science\n- **Domain-specific AI**: Auto-select the best model for your scientific domain\n- **AI debugging**: Get intelligent suggestions for fixing pipeline errors\n- **Workflow optimization**: Analyze and optimize for speed, cost, or resource usage\n- **Task-based interface**: Simple, high-level API for common workflows\n- **MCP integration**: Works with Claude Desktop and AI assistants\n\n## Installation\n\n**Basic (Claude models only):**\n```bash\npip install vigil-ai\n```\n\n**With science models (ESM-2, BioGPT, ChemBERTa, etc.):**\n```bash\npip install 'vigil-ai[science]'\n```\n\n**Or install with Vigil:**\n```bash\npip install 'vigil[ai]' # Basic\npip install 'vigil[ai,science]' # With science models\n```\n\n## Requirements\n\n- Python 3.11+\n- Vigil >= 0.2.1\n- Anthropic API key (get one at https://console.anthropic.com/)\n\n## Setup\n\nSet your Anthropic API key:\n\n```bash\nexport ANTHROPIC_API_KEY='your-api-key-here'\n```\n\nOr add to your `.env` file:\n\n```\nANTHROPIC_API_KEY=your-api-key-here\n```\n\n## Usage\n\n### Generate Pipeline from Description\n\n```bash\nvigil ai create \"Filter variants by quality >30, annotate with Ensembl, calculate Ti/Tv ratio\"\n```\n\nOutput:\n```\n\u2713 Pipeline created: app/code/pipelines/Snakefile\n\nNext steps:\n 1. Review the generated pipeline\n 2. Create necessary step scripts\n 3. vigil run --cores 4\n```\n\n### Debug Pipeline Errors\n\n```bash\nvigil ai debug\n\n# Or specify error log\nvigil ai debug --error-log .snakemake/log/error.log\n```\n\nOutput:\n```\nAnalyzing error...\n\nRoot Cause:\nThe rule 'filter_variants' failed because the input file 'variants.csv' was not found.\n\nSuggested Fix:\n1. Check that your data exists: ls app/data/samples/\n2. Verify file name matches exactly (case-sensitive)\n3. If file is missing, download or create it\n4. Run: vigil doctor to check project health\n\nPrevention:\nAdd input validation before running pipeline.\n```\n\n### Optimize Workflow\n\n```bash\nvigil ai optimize --focus speed\n\n# Or optimize for cost\nvigil ai optimize --focus cost\n```\n\nOutput:\n```\nOptimization Suggestions:\n\nRule: filter_variants\nIssue: Sequential processing\nSuggestion: Add threads: 4 and use parallel processing\nImpact: 4x faster with multi-core\n\nRule: annotate\nIssue: Repeated API calls\nSuggestion: Implement caching for Ensembl queries\nImpact: 10x faster on reruns\n```\n\n## Quick Start (Task-Based Interface)\n\nThe simplest way to use vigil-ai is through the task-based interface:\n\n```python\nfrom vigil_ai.tasks import PipelineGenerator, ErrorDebugger, ModelSelector\n\n# 1. Generate a pipeline for biology\nbio_gen = PipelineGenerator(domain=\"biology\")\npipeline = bio_gen.create(\"Filter variants >30, annotate, calculate Ti/Tv\")\nbio_gen.create_and_save(pipeline, \"workflow.smk\")\n\n# 2. Debug errors when they occur\ndebugger = ErrorDebugger()\nfix = debugger.analyze(\"FileNotFoundError: variants.csv not found\")\nprint(fix)\n\n# 3. Get model recommendations\nselector = ModelSelector()\nmodel, reason = selector.recommend(\"I need to analyze protein sequences\")\nprint(reason) # \"Recommended biology model (ESM-2) for protein analysis\"\n```\n\n## Foundation Models\n\nvigil-ai supports 10+ specialized foundation models across scientific domains:\n\n### Biology Models\n- **ESM-2** (650M, 3B, 15B) - Protein language models from Meta AI\n- **BioGPT** - Biomedical text generation\n- **ProtGPT2** - Protein sequence generation\n\n### Chemistry Models\n- **ChemBERTa** - Molecular property prediction\n- **MolFormer** - Chemical structure analysis\n\n### Materials Science Models\n- **MatBERT** - Materials property prediction\n\n### General Models\n- **Claude 3.5 Sonnet** (default) - General-purpose, most capable\n- **Claude 3 Opus** - Most powerful\n- **Galactica** - Scientific knowledge and reasoning\n\n### Using Domain-Specific Models\n\n```python\nfrom vigil_ai import get_model, ModelDomain\n\n# Automatically select best model for domain\nbio_model = get_model(domain=ModelDomain.BIOLOGY) # Returns ESM-2\nchem_model = get_model(domain=ModelDomain.CHEMISTRY) # Returns ChemBERTa\nmat_model = get_model(domain=ModelDomain.MATERIALS) # Returns MatBERT\n\n# Use specific model by name\nesm = get_model(name=\"esm-2-650m\")\nembedding = esm.embed(\"MKFLKFSLLTAVLLSVVFAFSSCGDDDDTGYLPPSQAIQDLL\")\n\n# Generate with domain-specific model\nfrom vigil_ai import generate_pipeline\npipeline = generate_pipeline(\n \"Analyze protein sequences and predict function\",\n domain=ModelDomain.BIOLOGY # Uses ESM-2\n)\n```\n\n## Python API (Low-Level)\n\nFor more control, use the low-level API:\n\n```python\nfrom vigil_ai import generate_pipeline, ai_debug, ai_optimize\n\n# Generate pipeline\npipeline = generate_pipeline(\n \"Filter variants by quality >30, calculate Ti/Tv ratio\",\n template=\"genomics-starter\",\n model=\"claude-3-5-sonnet-20241022\" # Or specify domain\n)\nprint(pipeline)\n\n# Debug error\nfix = ai_debug(\"FileNotFoundError: variants.csv not found\")\nprint(fix)\n\n# Optimize workflow\nsuggestions = ai_optimize(focus=\"speed\")\nprint(suggestions)\n```\n\n## Examples\n\n### Create Imaging Analysis Pipeline\n\n```bash\nvigil ai create \"Segment cells from microscopy images, count cells per field, measure intensity\"\n```\n\nGenerates:\n```python\nrule segment_cells:\n input: \"data/images/{sample}.tif\"\n output: \"artifacts/masks/{sample}_mask.png\"\n script: \"../lib/steps/segment.py\"\n\nrule count_cells:\n input: \"artifacts/masks/{sample}_mask.png\"\n output: \"artifacts/counts/{sample}_counts.json\"\n script: \"../lib/steps/count.py\"\n\nrule measure_intensity:\n input:\n image=\"data/images/{sample}.tif\",\n mask=\"artifacts/masks/{sample}_mask.png\"\n output: \"artifacts/intensity/{sample}_intensity.csv\"\n script: \"../lib/steps/measure.py\"\n```\n\n### Interactive Mode\n\n```bash\nvigil ai chat\n```\n\nStarts interactive session:\n```\n> Create a pipeline to filter variants\n\u2713 Pipeline generated\n\n> Add a rule to calculate metrics\n\u2713 Added metrics rule\n\n> How can I make this faster?\nSuggestions:\n1. Add parallel processing\n2. Cache intermediate results\n...\n```\n\n## Configuration\n\nCreate `.vigil-ai.yaml` in your project:\n\n```yaml\nai:\n model: claude-3-5-sonnet-20241022 # Claude model to use\n max_tokens: 4096 # Max response length\n temperature: 0.7 # Creativity (0-1)\n cache_responses: true # Cache AI responses\n```\n\n## Advanced Usage\n\n### Generate Step Script\n\n```python\nfrom vigil_ai.generator import generate_step_script\n\nscript = generate_step_script(\n rule_name=\"filter_variants\",\n description=\"Filter variants by quality score >30\",\n inputs=[\"variants.csv\"],\n outputs=[\"filtered.parquet\"],\n language=\"python\"\n)\n\nwith open(\"app/code/lib/steps/filter.py\", \"w\") as f:\n f.write(script)\n```\n\n### Custom Prompts\n\n```python\nfrom vigil_ai import generate_pipeline\n\npipeline = generate_pipeline(\n description=\"\"\"\n Create a multi-sample variant calling pipeline:\n 1. Align reads with BWA\n 2. Mark duplicates with Picard\n 3. Call variants with GATK\n 4. Filter and annotate\n \"\"\",\n template=\"genomics-starter\"\n)\n```\n\n## Architecture\n\nvigil-ai is part of a **three-layer architecture** for reproducible science:\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Agents Layer: AI Assistants \u2502\n\u2502 (Claude Desktop, custom agents) \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Application Layer: vigil-ai (THIS PACKAGE) \u2502\n\u2502 - MCP Server Integration \u2502\n\u2502 - Foundation Models (Claude, ESM, BioGPT, etc.) \u2502\n\u2502 - Task Interface (PipelineGenerator, etc.) \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Foundation Layer: Vigil Core \u2502\n\u2502 - Snakemake pipelines \u2502\n\u2502 - Artifact management \u2502\n\u2502 - Receipt tracking \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n### MCP Integration\n\nvigil-ai extends the Vigil MCP server with 5 AI-powered verbs:\n\n- `ai_generate_pipeline` - Generate Snakemake workflow from description\n- `ai_debug_error` - Analyze and fix pipeline errors\n- `ai_optimize_workflow` - Suggest performance optimizations\n- `ai_list_models` - List available foundation models\n- `ai_get_model_info` - Get model metadata and capabilities\n\n**Use with Claude Desktop:**\n\n```json\n{\n \"mcpServers\": {\n \"vigil\": {\n \"command\": \"vigil\",\n \"args\": [\"mcp\"]\n }\n }\n}\n```\n\nThen ask Claude: *\"Generate a pipeline to filter variants and calculate metrics\"*\n\n## All Supported Models\n\n### General-Purpose (API-based)\n- `claude-3-5-sonnet-20241022` (default, recommended)\n- `claude-3-opus-20240229` (most powerful)\n- `claude-3-sonnet-20240229` (balanced)\n- `claude-3-haiku-20240307` (fastest, cheapest)\n\n### Biology (requires `[science]` install)\n- `esm-2-650m` - Meta AI protein model, 650M params\n- `esm-2-3b` - Meta AI protein model, 3B params (GPU recommended)\n- `esm-2-15b` - Meta AI protein model, 15B params (GPU required)\n- `biogpt` - Microsoft biomedical text model\n- `protgpt2` - Protein sequence generation\n\n### Chemistry (requires `[science]` install)\n- `chemberta-v2` - DeepChem molecular property model\n- `molformer` - Molecular structure analysis\n\n### Materials Science (requires `[science]` install)\n- `matbert` - Materials property prediction\n\n## Cost Estimates\n\n**Claude models (API-based):**\n- Pipeline generation: ~$0.02-0.05 per request\n- Debugging: ~$0.01-0.03 per request\n- Optimization: ~$0.03-0.07 per request\n\n**Science models (local inference):**\n- Free to use (runs on your hardware)\n- Requires GPU for optimal performance (ESM-2, BioGPT)\n- CPU inference possible but slower\n\n**Cost optimization tips:**\n- Enable response caching: `cache_responses: true` in `.vigil-ai.yaml`\n- Use smaller models for simpler tasks (`claude-3-haiku` vs `claude-3-opus`)\n- Use local science models when applicable (no API costs)\n\n## Example Gallery\n\nSee the [`examples/`](examples/) directory for complete examples:\n\n- **[task_based_workflow.py](examples/task_based_workflow.py)** - Complete workflow using task interface\n- **[domain_specific_models.py](examples/domain_specific_models.py)** - Using biology/chemistry/materials models\n- **[basic_pipeline_generation.py](examples/basic_pipeline_generation.py)** - Low-level API examples\n- **[with_caching_and_config.py](examples/with_caching_and_config.py)** - Configuration and caching\n\nRun any example:\n```bash\npython examples/task_based_workflow.py\n```\n\n## Limitations\n\n**General:**\n- Claude models require internet connection and API key\n- Generated pipelines need review before use in production\n- AI suggestions should be validated by domain experts\n- Not a replacement for scientific expertise\n\n**Science models:**\n- Require `pip install vigil-ai[science]` and additional dependencies\n- Large models (ESM-2 15B) require significant GPU memory (40GB+)\n- Local inference slower than API-based models\n- May require domain-specific preprocessing\n\n## Development\n\n```bash\n# Clone repo\ngit clone https://github.com/Science-Abundance/vigil\ncd vigil/packages/vigil-core-ai\n\n# Install in dev mode with all dependencies\npip install -e '.[dev,science]'\n\n# Run tests\npytest\n\n# Run tests with science models (requires GPU)\npytest -m science\n\n# Lint\nruff check .\n\n# Type check\nmypy src/\n\n# Run examples\npython examples/task_based_workflow.py\npython examples/domain_specific_models.py\n```\n\n## Contributing\n\nContributions welcome! See [CONTRIBUTING.md](../../CONTRIBUTING.md)\n\n## License\n\nApache-2.0\n\n## Support\n\n- GitHub Issues: https://github.com/Science-Abundance/vigil/issues\n- Documentation: https://github.com/Science-Abundance/vigil\n- Discord: [coming soon]\n\n## Acknowledgments\n\nBuilt with:\n- [Anthropic Claude](https://www.anthropic.com/) - General-purpose AI capabilities\n- [Vigil](https://github.com/Science-Abundance/vigil) - Reproducible science platform\n- [HuggingFace Transformers](https://huggingface.co/transformers/) - Foundation model infrastructure\n\nFoundation models:\n- **ESM-2** - Meta AI ([paper](https://www.biorxiv.org/content/10.1101/2022.07.20.500902v1))\n- **BioGPT** - Microsoft Research ([paper](https://arxiv.org/abs/2210.10341))\n- **ChemBERTa** - DeepChem ([paper](https://arxiv.org/abs/2010.09885))\n- **MatBERT** - Materials Project ([paper](https://arxiv.org/abs/2109.15290))\n- **Galactica** - Meta AI ([paper](https://arxiv.org/abs/2211.09085))\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "AI-powered workflow generation with foundation models (Claude, ESM, BioGPT, ChemBERTa, etc.)",
"version": "0.3.0",
"project_urls": {
"Documentation": "https://github.com/Science-Abundance/vigil/tree/main/packages/vigil-core-ai",
"Homepage": "https://github.com/Science-Abundance/vigil",
"Issues": "https://github.com/Science-Abundance/vigil/issues",
"Repository": "https://github.com/Science-Abundance/vigil"
},
"split_keywords": [
"ai",
" claude",
" llm",
" pipeline",
" science",
" workflow"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "d5f16fe9d09f5bf8bb00c7e813de647183a67da6ec234ad2c90c4837ad2a7837",
"md5": "f9db67ec9c3bb54491d23c07f406b600",
"sha256": "65c44ee94af1fab8f5b7645d731c8579aa65e4461f71a1c7f4222a7e8d626453"
},
"downloads": -1,
"filename": "vigil_ai-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f9db67ec9c3bb54491d23c07f406b600",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 37589,
"upload_time": "2025-10-21T23:24:27",
"upload_time_iso_8601": "2025-10-21T23:24:27.581798Z",
"url": "https://files.pythonhosted.org/packages/d5/f1/6fe9d09f5bf8bb00c7e813de647183a67da6ec234ad2c90c4837ad2a7837/vigil_ai-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "407f0c26ba8a8d66509a806ba22e8ad0e14c7e727f510b8d62727e6a7260a328",
"md5": "e969ccc29c280beddbb30fc5c75e257f",
"sha256": "070b10c76326d2091c6f42469cb6fb1e0da6475f9d75cee5995b03a61fa682be"
},
"downloads": -1,
"filename": "vigil_ai-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "e969ccc29c280beddbb30fc5c75e257f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 34419,
"upload_time": "2025-10-21T23:24:28",
"upload_time_iso_8601": "2025-10-21T23:24:28.906808Z",
"url": "https://files.pythonhosted.org/packages/40/7f/0c26ba8a8d66509a806ba22e8ad0e14c7e727f510b8d62727e6a7260a328/vigil_ai-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-21 23:24:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Science-Abundance",
"github_project": "vigil",
"github_not_found": true,
"lcname": "vigil-ai"
}