ml-agents-reasoning


Nameml-agents-reasoning JSON
Version 0.0.18a0 PyPI version JSON
download
home_pageNone
SummaryML Agents Reasoning Research Platform
upload_time2025-09-02 20:18:59
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords reasoning llm benchmark research ai
VCS
bugtrack_url
requirements numpy openai cohere anthropic ipywidgets tqdm huggingface_hub pandas datasets python-dotenv jupyter ipykernel typer rich pydantic instructor openpyxl
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Cohere Labs Open Science Research into ML Agents and Reasoning

## Community Resources

**👉 New to ML Agents? Check out the [BEGINNER_GUIDE.md](https://github.com/thompsonson/c4ai-ml-agents/blob/main/BEGINNER_GUIDE.md) for a step-by-step walkthrough!**

- **[ML Agents Community Program](https://sites.google.com/cohere.com/coherelabs-community/community-programs/ml-agents)** - Main hub for Cohere Labs' community-driven initiative on open-source agent research, focusing on agentic frameworks, applications, evaluations, and benchmarks

- **[Project Documentation](https://docs.google.com/document/d/1fLnwUzTvO3XuvViBwLz-QuSe_y87a1p4j8Uw2R4eBiI/edit?pli=1&tab=t.0#heading=h.d0279byf6lhr)** - Detailed specifications and roadmap for the ZeroHPO (Zero-shot Hyperparameter Optimization) project for agentic tasks

- **[Project Tracker](https://docs.google.com/spreadsheets/d/1-TBlPSIiBymQfCdF_LCYJznLwaxcKtYTRJME0NT17kU/edit?usp=sharing)** - Community project tracking, task assignments, and progress monitoring

- **[Discord Community](https://discord.gg/ckaQnUakYx)** - Join the #ml-agents channel for discussions, meetings, and collaboration with the community

## Overview

This project investigates how different reasoning approaches impact AI model performance across various tasks. It provides a comprehensive framework for comparing multiple reasoning techniques with various language models.

**🎉 Phase 11 Complete**: The platform now includes a stable CLI interface with production-ready commands for environment setup, database management, and dataset preprocessing, plus experimental evaluation features!

## Research Questions

1. **Universal Benefit**: Do all tasks benefit from reasoning?
2. **Model Variability**: Do different models show varying benefits from reasoning?
3. **Approach Comparison**: How do different reasoning approaches (CoT, PoT, etc.) compare?
4. **Task-Approach Fit**: Do certain tasks benefit more from specific reasoning methods?
5. **Cost-Benefit Analysis**: What is the tradeoff for each approach and task?
6. **Predictive Reasoning**: Can we predict the need for reasoning based on the input prompt alone?

## Reasoning Approaches Available

The platform currently supports **8 production-ready reasoning approaches**:

1. **None** - Baseline direct prompting without reasoning
2. **Chain-of-Thought (CoT)** - Step-by-step reasoning process
3. **Program-of-Thought (PoT)** - Code-based problem solving
4. **Reasoning-as-Planning** - Strategic planning with goal decomposition
5. **Reflection** - Self-evaluation and iterative improvement
6. **Chain-of-Verification** - Systematic verification with follow-up questions
7. **Skeleton-of-Thought** - Hierarchical outline-first reasoning
8. **Tree-of-Thought** - Multiple reasoning path exploration and synthesis

**Additional approaches planned**: Graph-of-Thought, ReWOO, Buffer-of-Thoughts (Phase 6)

## Quick Start

### Prerequisites

- Python 3.9+
- uv (for virtual environment management)
- API keys for at least one provider (Anthropic, Cohere, or OpenRouter)

### Installation

#### Option 1: pip Install (Recommended)

Install the latest stable version from PyPI:

```bash
# Install globally
pip install ml-agents-reasoning

# Or install with development dependencies
pip install ml-agents-reasoning[dev]

# Verify installation
ml-agents --version
ml-agents --help
```

#### Option 2: Modern Python (uv/uvx)

With [uv](https://github.com/astral-sh/uv) (fastest):

```bash
# Install with uv
uv tool install ml-agents-reasoning

# Run without installing (recommended for trying out)
uvx ml-agents-reasoning run --approach CoT --samples 10

# Add to project dependencies
uv add ml-agents-reasoning
```

#### Option 3: Development Installation

For contributors or advanced users:

```bash
# Clone and install in development mode
git clone https://github.com/thompsonson/c4ai-ml-agents
cd c4ai-ml-agents
pip install -e .[dev]

# Or with uv (recommended)
uv sync --all-extras
```

### Configure API Keys

After installation, configure your API keys:

```bash
# Create configuration file
cp .env.example .env
# Edit .env with your actual API keys

# Or set environment variables directly
export ANTHROPIC_API_KEY="your-key-here"
export OPENROUTER_API_KEY="your-key-here"
```

### ⚠️ Important: CLI Command Classification

The ML Agents CLI includes two types of commands:

- **Stable Commands** (✅ Production Ready): `setup`, `db`, `preprocess` - Well-tested, stable API, suitable for production use
- **Pre-Alpha Commands** (⚠️ Experimental): `eval`, `results` - Experimental features that may be unstable or have breaking changes

**For production use or getting started, we recommend using only the stable commands first.**

### CLI Quick Start

Once installed, you can use the ML Agents CLI:

```bash
# Validate your environment
ml-agents setup validate-env

# List available reasoning approaches
ml-agents setup list-approaches

# Run a simple experiment (⚠️ PRE-ALPHA)
ml-agents eval run --approach ChainOfThought --samples 10

# Compare multiple approaches (⚠️ PRE-ALPHA)
ml-agents eval compare --approaches "ChainOfThought,AsPlanning,TreeOfThought" --samples 50 --parallel
```

### Jupyter Notebook (Original Interface)

To use the original Jupyter notebook interface:

```bash
jupyter notebook Reasoning_LLM.ipynb
```

## Configuration

### Supported Providers and Models

- **Anthropic**: Claude Opus 4, Claude Sonnet 4, Claude 3.5 Haiku
- **Cohere**: Command R+, Command R, Command Light
- **OpenRouter**: GPT-5, GPT-5 Mini, GPT OSS-120B, Gemini 2.5 Flash Lite

### Hyperparameters

- **Temperature**: 0.0 - 2.0 (controls randomness)
- **Max Tokens**: 64 - 4096 (output length limit)
- **Top P**: 0.0 - 1.0 (nucleus sampling parameter)

## MCP Integration (Phase 7)

The platform includes **SQLite database persistence** for all experiment results and supports **Claude Code MCP server integration** for direct database access during conversations.

### Database Features

- **Real-time persistence**: All experiment results are automatically saved to `ml_agents_results.db`
- **Read-only MCP access**: Query the database directly from Claude Code conversations
- **Rich export formats**: CSV, JSON, and Excel with advanced formatting
- **Advanced analytics**: Approach comparisons, failure analysis, and cost tracking

### Database CLI Commands

```bash
# Database management (Stable Commands)
ml-agents db init --db-path ./results.db          # Initialize database
ml-agents db backup --source ./results.db         # Create backup
ml-agents db stats --db-path ./results.db         # Show statistics
ml-agents db migrate --db-path ./results.db       # Migrate database schema

# Export and analysis (⚠️ PRE-ALPHA)
ml-agents results export EXPERIMENT_ID --format excel     # Export to Excel
ml-agents results compare "exp1,exp2,exp3"               # Compare experiments
ml-agents results analyze EXPERIMENT_ID --type accuracy   # Generate reports
ml-agents results list --status completed                # List experiments
```

## CLI Usage Guide

### Basic Commands

#### Single Experiment (⚠️ PRE-ALPHA)

Run one reasoning approach on a dataset:

```bash
# Basic usage
ml-agents eval run --approach ChainOfThought --samples 50

# With specific model
ml-agents eval run --approach TreeOfThought --samples 100 --provider openrouter --model "openai/gpt-oss-120b"

# With advanced settings
ml-agents eval run --approach ChainOfVerification --multi-step-verification --max-reasoning-calls 5

# With preprocessing integration
ml-agents eval run --dataset MilaWang/SpatialEval --approach ChainOfThought --samples 50

# Use specific preprocessing
ml-agents eval run --preprocessing-id prep_20240824_143256 --approach TreeOfThought

# Use custom preprocessed data
ml-agents eval run --preprocessing-path ./custom/processed.json --approach Reflection
```

#### Comparison Experiments (⚠️ PRE-ALPHA)

Compare multiple approaches side-by-side:

```bash
# Basic comparison
ml-agents eval compare --approaches "ChainOfThought,AsPlanning,TreeOfThought" --samples 100

# Parallel execution for faster results
ml-agents eval compare --approaches "None,ChainOfThought,Reflection" --samples 200 --parallel --max-workers 4

# Advanced reasoning comparison
ml-agents eval compare --approaches "ChainOfVerification,Reflection,SkeletonOfThought" --multi-step-verification --parallel
```

### Configuration Files

For complex experiments, use YAML configuration files:

```bash
# Run with configuration file (⚠️ PRE-ALPHA)
ml-agents eval run --config examples/configs/single_experiment.yaml

# Override specific parameters (⚠️ PRE-ALPHA)
ml-agents eval run --config examples/configs/comparison_study.yaml --samples 200 --parallel
```

**Example configuration** (`config.yaml`):

```yaml
experiment:
  name: "reasoning_comparison_study"
  sample_count: 100
  output_dir: "./results"

model:
  provider: "openrouter"
  name: "openai/gpt-oss-120b"
  temperature: 0.3
  max_tokens: 512

reasoning:
  approaches:
    - ChainOfThought
    - AsPlanning
    - TreeOfThought
  multi_step_verification: true
  max_reasoning_calls: 5

execution:
  parallel: true
  max_workers: 4
  save_checkpoints: true
```

### Checkpoint Management (⚠️ PRE-ALPHA)

Resume interrupted experiments:

```bash
# List available checkpoints
ml-agents eval checkpoints

# Resume from specific checkpoint
ml-agents eval resume checkpoint_exp_20250818_123456.json
```

### Advanced Features (⚠️ PRE-ALPHA)

#### Cost Control

```bash
# Set reasoning limits to control costs
ml-agents eval run --approach ChainOfVerification --max-reasoning-calls 3 --samples 50

# Monitor costs with verbose output
ml-agents eval compare --approaches "ChainOfThought,TreeOfThought" --samples 100 --verbose
```

#### Multi-step Reasoning

```bash
# Enable multi-step reflection
ml-agents eval run --approach Reflection --multi-step-reflection --max-reflection-iterations 3

# Enable multi-step verification
ml-agents eval run --approach ChainOfVerification --multi-step-verification --max-reasoning-calls 5
```

#### Parallel Processing

```bash
# Parallel execution with custom worker count
ml-agents eval compare --approaches "ChainOfThought,AsPlanning,TreeOfThought,Reflection" --parallel --max-workers 2

# Balance speed vs rate limits
ml-agents eval compare --approaches "None,ChainOfThought" --samples 500 --parallel --max-workers 8
```

### Output and Results

Results are organized by dataset with full preprocessing-evaluation traceability:

```
./outputs/
├── {dataset_name}/
│   ├── preprocessing/
│   │   ├── {timestamp}/
│   │   │   ├── analysis.json           # Dataset schema analysis
│   │   │   ├── rules.json              # Transformation rules
│   │   │   ├── processed.json          # Standardized dataset
│   │   │   └── metadata.json           # Preprocessing metadata
│   │   └── latest → {most_recent}/     # Symlink to latest preprocessing
│   └── eval/
│       ├── {exp_timestamp}/
│       │   ├── experiment_config.json  # Experiment configuration
│       │   ├── experiment_results.csv  # Detailed results per approach
│       │   ├── experiment_summary.json # Performance summary
│       │   └── experiment_errors.json  # Any processing errors
│       └── latest → {most_recent}/     # Symlink to latest experiment
```

Each result file contains:

- Input prompts and model responses
- Complete reasoning traces
- Performance metrics (accuracy, time, cost)
- Configuration details
- Error information
- **Preprocessing lineage**: Complete traceability to preprocessing rules used

### Example Workflows

#### 1. Environment Setup (Stable)

```bash
# Validate your setup first
ml-agents setup validate-env
ml-agents setup list-approaches
ml-agents setup version
```

#### 2. Database Management (Stable)

```bash
# Initialize and manage experiment database
ml-agents db init
ml-agents db stats
ml-agents db backup --source ./ml_agents_results.db
```

#### 3. Dataset Preprocessing (Stable)

```bash
# Preprocess datasets for evaluation
ml-agents preprocess list
ml-agents preprocess inspect MilaWang/SpatialEval --samples 100
ml-agents preprocess batch --max 5
```

#### 4. Complete Dataset → Evaluation Pipeline

```bash
# 1. Preprocess a custom dataset (creates organized folder structure)
ml-agents preprocess inspect MilaWang/SpatialEval --config tqa
ml-agents preprocess generate-rules MilaWang/SpatialEval --config tqa
ml-agents preprocess transform MilaWang/SpatialEval rules.json --config tqa

# 2. Run evaluation with preprocessed data (auto-detects latest preprocessing) (⚠️ PRE-ALPHA)
ml-agents eval run --dataset MilaWang/SpatialEval --approach ChainOfThought --samples 50

# 3. Compare approaches on same preprocessed dataset (⚠️ PRE-ALPHA)
ml-agents eval run --dataset MilaWang/SpatialEval --approach TreeOfThought --samples 50
ml-agents eval run --dataset MilaWang/SpatialEval --approach Reflection --samples 50

# 4. View organized results
ml-agents results list
```

#### 5. Quick Testing (⚠️ PRE-ALPHA)

```bash
# Test with small sample size
ml-agents eval run --approach ChainOfThought --samples 5 --verbose
```

#### 6. Research Study (⚠️ PRE-ALPHA)

```bash
# Comprehensive comparison study
ml-agents eval compare \
  --approaches "None,ChainOfThought,AsPlanning,TreeOfThought,Reflection" \
  --samples 200 \
  --parallel \
  --max-workers 4 \
  --multi-step-verification \
  --output "./studies/comprehensive_study"
```

### Command Reference

#### Stable Commands (Production Ready)
| Command | Description |
|---------|-------------|
| `ml-agents setup validate-env` | Check environment setup |
| `ml-agents setup list-approaches` | Show available reasoning methods |
| `ml-agents setup version` | Show version information |
| `ml-agents db init` | Initialize experiment database |
| `ml-agents db backup` | Create database backup |
| `ml-agents db stats` | Show database statistics |
| `ml-agents db migrate` | Migrate database schema |
| `ml-agents preprocess list` | List unprocessed datasets |
| `ml-agents preprocess inspect` | Inspect dataset schema |
| `ml-agents preprocess batch` | Batch process datasets |

#### Pre-Alpha Commands (⚠️ Experimental)
| Command | Description |
|---------|-------------|
| `ml-agents eval run` | Single reasoning experiment |
| `ml-agents eval compare` | Multi-approach comparison |
| `ml-agents eval resume` | Resume from checkpoint |
| `ml-agents eval checkpoints` | Show available checkpoints |
| `ml-agents results export` | Export experiment results |
| `ml-agents results compare` | Compare experiments |
| `ml-agents results analyze` | Analyze experiment patterns |

For detailed help on any command:

```bash
ml-agents setup --help
ml-agents eval run --help
ml-agents db --help
```

## Jupyter Notebook Usage (Original Interface)

For users who prefer the notebook interface:

1. **Setup**: Ensure dependencies are installed via `./setup.sh`
2. **Configuration**: Use interactive widgets to select models and approaches
3. **Data**: Default uses "bbeh-eval" dataset, customizable
4. **Execute**: Run experiment cells to process your dataset
5. **Results**: Tables and CSV files with format `{model}_{approach}_{timestamp}.csv`

## Dataset Requirements

Your dataset should include:

- **input** column: The question/problem to solve
- **answer** column (optional): Expected output for evaluation
- **task** column (optional): Task category for analysis

## Output Files

The notebook generates CSV files containing:

- Input prompts
- Model outputs
- Full reasoning traces
- Execution time
- Cost estimates
- Configuration details

## Project Structure

```
ml-agents/
├── src/                           # Main source code
│   ├── cli/                      # CLI interface (Phase 5)
│   │   ├── main.py              # CLI entry point
│   │   ├── commands.py          # Run/compare commands
│   │   ├── config_loader.py     # Configuration management
│   │   ├── display.py           # Rich output formatting
│   │   └── validators.py        # Input validation
│   ├── core/                    # Core experiment logic
│   │   ├── experiment_runner.py # Experiment orchestration
│   │   ├── dataset_loader.py    # Dataset loading
│   │   └── reasoning_inference.py # Inference engine
│   ├── reasoning/               # Reasoning approaches
│   │   ├── base.py             # Base reasoning class
│   │   ├── chain_of_thought.py # CoT implementation
│   │   ├── tree_of_thought.py  # ToT implementation
│   │   └── ...                 # Other approaches
│   └── utils/                   # Utilities
│       ├── api_clients.py      # API wrappers
│       ├── rate_limiter.py     # Rate limiting
│       └── logging_config.py   # Logging setup
├── examples/                    # Usage examples
│   ├── configs/                # Configuration templates
│   ├── scripts/                # Batch processing scripts
│   └── README.md               # Examples documentation
├── tests/                      # Test suite
├── outputs/                    # Organized experiment and preprocessing results
│   └── {dataset_name}/        # Dataset-centric organization
│       ├── preprocessing/     # Preprocessing runs with timestamps
│       └── eval/              # Evaluation runs with timestamps
├── Reasoning_LLM.ipynb        # Original Jupyter notebook
├── config.py                  # Environment configuration
├── requirements.txt           # Python dependencies
├── setup.sh                   # Automated setup script
├── Makefile                   # Development commands
└── README.md                  # This file
```

## Best Practices

### For Researchers

1. **Start Small**: Begin with `--samples 10` to test approaches quickly
2. **Use Baselines**: Always include `None` approach for comparison
3. **Cost Control**: Monitor costs with `--verbose` and set `--max-reasoning-calls`
4. **Parallel Processing**: Use `--parallel` for faster comparison studies
5. **Reproducibility**: Save configuration files and use checkpoints

### For Cost Management

1. **Temperature Settings**: Lower values (0.1-0.3) for consistent, cost-effective results
2. **Token Limits**: Set appropriate `--max-tokens` based on your task complexity
3. **Sample Sizing**: Use smaller samples for initial exploration
4. **Provider Selection**: Compare costs across different providers
5. **Multi-step Limits**: Control `--max-reasoning-calls` for approaches like Chain-of-Verification

### For Performance

1. **Parallel Execution**: Use `--parallel --max-workers N` for comparison studies
2. **Checkpoint Usage**: Enable checkpoints for long-running experiments
3. **Rate Limiting**: Adjust `--max-workers` based on provider rate limits
4. **Batch Processing**: Use configuration files and scripts for multiple experiments

## Troubleshooting

### Common Issues

#### Environment Setup

```bash
# Check environment
ml-agents setup validate-env

# Fix dependency issues
make clean && make install-dev

# Verify imports
make debug-imports
```

#### API Key Problems

```bash
# Check .env file exists and has keys
cat .env

# Validate specific provider
ml-agents setup validate-env
```

Error messages will guide you to set missing keys:

```bash
export OPENROUTER_API_KEY="your_key_here"
export ANTHROPIC_API_KEY="your_key_here"
```

#### Rate Limiting

If you encounter rate limits:

```bash
# Reduce parallel workers
ml-agents eval compare --approaches "ChainOfThought,AsPlanning" --max-workers 1

# Add delays between requests
ml-agents eval run --approach ChainOfThought --samples 50 --parallel false
```

#### Memory Issues

For large experiments:

```bash
# Reduce sample size
ml-agents eval compare --approaches "ChainOfThought,TreeOfThought" --samples 50

# Disable parallel processing
ml-agents eval compare --approaches "..." --parallel false
```

#### NumPy Compatibility Warning

The warning about NumPy 1.x vs 2.x is cosmetic and doesn't affect functionality:

```
A module that was compiled using NumPy 1.x cannot be run in NumPy 2.3.2...
```

This is a known PyTorch compatibility issue and can be ignored.

### CLI Issues

#### Command Not Found

```bash
# Reinstall the package
make install-dev

# Check entry point
which ml-agents
```

#### Import Errors

```bash
# Activate virtual environment
source .venv/bin/activate

# Test imports
make debug-imports
```

#### Configuration Validation

For configuration errors, check:

1. YAML/JSON syntax is valid
2. All required fields are present
3. Approach names match available options (`ml-agents list-approaches`)
4. Provider/model combinations are supported

### Getting Help

1. **Command Help**: Use `--help` with any command

   ```bash
   ml-agents --help
   ml-agents eval run --help
   ml-agents eval compare --help
   ```

2. **Verbose Output**: Add `--verbose` to see detailed execution logs

   ```bash
   ml-agents eval run --approach ChainOfThought --samples 5 --verbose
   ```

3. **Check Status**: Validate your setup

   ```bash
   ml-agents setup validate-env
   ml-agents setup list-approaches
   make validate-env
   ```

4. **Community Support**: Join the Discord #ml-agents channel for help

### Development Issues

For developers working on the codebase:

```bash
# Run test suite
make test

# Check code quality
make lint

# Type checking
make type-check

# Full development check
make check
```

## Development Tools

### Claude Code MCP Server Setup

For developers using Claude Code, enable direct database queries in conversations:

```bash
# Configure MCP server (one-time setup)
make configure-mcp

# Or run the script directly
./scripts/install-sqlite-mcp-server.sh
```

**Available MCP Tools**:

- `read_query`: Execute validated SELECT queries
- `list_tables`: Show all database tables
- `describe_table`: Show table schemas

**⚠️ Note**: Project-scoped MCP servers don't appear in `claude mcp list` due to a [known bug](https://github.com/anthropics/claude-code/issues/5963). Use `claude mcp get sqlite-read-only` to verify installation.

## Contributing

Feel free to extend the notebook with:

- Additional reasoning approaches
- New evaluation metrics
- Support for more models/providers
- Performance optimizations

## License

### Recommended: CC BY 4.0 (Creative Commons Attribution 4.0 International)

This project is licensed under the Creative Commons Attribution 4.0 International License. This means:

- ✅ **Share** - Copy and redistribute the material in any medium or format
- ✅ **Adapt** - Remix, transform, and build upon the material for any purpose, even commercially
- ✅ **Attribution** - You must give appropriate credit, provide a link to the license, and indicate if changes were made

This license is chosen because:

1. **Open Science**: Aligns with Cohere Labs' open science mission
2. **Maximum Impact**: Allows both academic and commercial use, accelerating AI research
3. **Community Growth**: Enables derivatives while ensuring original work is credited
4. **Simplicity**: Easy to understand and implement

**Note**: For the code components specifically, you may want to consider dual-licensing with MIT or Apache 2.0 for better software compatibility.

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a>

### Alternative Options Considered

- **CC BY-SA 4.0**: Adds "ShareAlike" requirement - derivatives must use same license (more restrictive but ensures openness)
- **CC BY-NC 4.0**: Adds "NonCommercial" restriction - prevents commercial use (limits industry collaboration)
- **CC0**: Public domain dedication - no attribution required (maximum freedom but no credit requirement)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ml-agents-reasoning",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "reasoning, llm, benchmark, research, ai",
    "author": null,
    "author_email": "Matthew Thompson <thompsonson@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/c8/2f/cf94c581e5e9f8bbd080c81b3fe915b32d70bd2b94c8ed74d0f3b66f4c0e/ml_agents_reasoning-0.0.18a0.tar.gz",
    "platform": null,
    "description": "# Cohere Labs Open Science Research into ML Agents and Reasoning\n\n## Community Resources\n\n**\ud83d\udc49 New to ML Agents? Check out the [BEGINNER_GUIDE.md](https://github.com/thompsonson/c4ai-ml-agents/blob/main/BEGINNER_GUIDE.md) for a step-by-step walkthrough!**\n\n- **[ML Agents Community Program](https://sites.google.com/cohere.com/coherelabs-community/community-programs/ml-agents)** - Main hub for Cohere Labs' community-driven initiative on open-source agent research, focusing on agentic frameworks, applications, evaluations, and benchmarks\n\n- **[Project Documentation](https://docs.google.com/document/d/1fLnwUzTvO3XuvViBwLz-QuSe_y87a1p4j8Uw2R4eBiI/edit?pli=1&tab=t.0#heading=h.d0279byf6lhr)** - Detailed specifications and roadmap for the ZeroHPO (Zero-shot Hyperparameter Optimization) project for agentic tasks\n\n- **[Project Tracker](https://docs.google.com/spreadsheets/d/1-TBlPSIiBymQfCdF_LCYJznLwaxcKtYTRJME0NT17kU/edit?usp=sharing)** - Community project tracking, task assignments, and progress monitoring\n\n- **[Discord Community](https://discord.gg/ckaQnUakYx)** - Join the #ml-agents channel for discussions, meetings, and collaboration with the community\n\n## Overview\n\nThis project investigates how different reasoning approaches impact AI model performance across various tasks. It provides a comprehensive framework for comparing multiple reasoning techniques with various language models.\n\n**\ud83c\udf89 Phase 11 Complete**: The platform now includes a stable CLI interface with production-ready commands for environment setup, database management, and dataset preprocessing, plus experimental evaluation features!\n\n## Research Questions\n\n1. **Universal Benefit**: Do all tasks benefit from reasoning?\n2. **Model Variability**: Do different models show varying benefits from reasoning?\n3. **Approach Comparison**: How do different reasoning approaches (CoT, PoT, etc.) compare?\n4. **Task-Approach Fit**: Do certain tasks benefit more from specific reasoning methods?\n5. **Cost-Benefit Analysis**: What is the tradeoff for each approach and task?\n6. **Predictive Reasoning**: Can we predict the need for reasoning based on the input prompt alone?\n\n## Reasoning Approaches Available\n\nThe platform currently supports **8 production-ready reasoning approaches**:\n\n1. **None** - Baseline direct prompting without reasoning\n2. **Chain-of-Thought (CoT)** - Step-by-step reasoning process\n3. **Program-of-Thought (PoT)** - Code-based problem solving\n4. **Reasoning-as-Planning** - Strategic planning with goal decomposition\n5. **Reflection** - Self-evaluation and iterative improvement\n6. **Chain-of-Verification** - Systematic verification with follow-up questions\n7. **Skeleton-of-Thought** - Hierarchical outline-first reasoning\n8. **Tree-of-Thought** - Multiple reasoning path exploration and synthesis\n\n**Additional approaches planned**: Graph-of-Thought, ReWOO, Buffer-of-Thoughts (Phase 6)\n\n## Quick Start\n\n### Prerequisites\n\n- Python 3.9+\n- uv (for virtual environment management)\n- API keys for at least one provider (Anthropic, Cohere, or OpenRouter)\n\n### Installation\n\n#### Option 1: pip Install (Recommended)\n\nInstall the latest stable version from PyPI:\n\n```bash\n# Install globally\npip install ml-agents-reasoning\n\n# Or install with development dependencies\npip install ml-agents-reasoning[dev]\n\n# Verify installation\nml-agents --version\nml-agents --help\n```\n\n#### Option 2: Modern Python (uv/uvx)\n\nWith [uv](https://github.com/astral-sh/uv) (fastest):\n\n```bash\n# Install with uv\nuv tool install ml-agents-reasoning\n\n# Run without installing (recommended for trying out)\nuvx ml-agents-reasoning run --approach CoT --samples 10\n\n# Add to project dependencies\nuv add ml-agents-reasoning\n```\n\n#### Option 3: Development Installation\n\nFor contributors or advanced users:\n\n```bash\n# Clone and install in development mode\ngit clone https://github.com/thompsonson/c4ai-ml-agents\ncd c4ai-ml-agents\npip install -e .[dev]\n\n# Or with uv (recommended)\nuv sync --all-extras\n```\n\n### Configure API Keys\n\nAfter installation, configure your API keys:\n\n```bash\n# Create configuration file\ncp .env.example .env\n# Edit .env with your actual API keys\n\n# Or set environment variables directly\nexport ANTHROPIC_API_KEY=\"your-key-here\"\nexport OPENROUTER_API_KEY=\"your-key-here\"\n```\n\n### \u26a0\ufe0f Important: CLI Command Classification\n\nThe ML Agents CLI includes two types of commands:\n\n- **Stable Commands** (\u2705 Production Ready): `setup`, `db`, `preprocess` - Well-tested, stable API, suitable for production use\n- **Pre-Alpha Commands** (\u26a0\ufe0f Experimental): `eval`, `results` - Experimental features that may be unstable or have breaking changes\n\n**For production use or getting started, we recommend using only the stable commands first.**\n\n### CLI Quick Start\n\nOnce installed, you can use the ML Agents CLI:\n\n```bash\n# Validate your environment\nml-agents setup validate-env\n\n# List available reasoning approaches\nml-agents setup list-approaches\n\n# Run a simple experiment (\u26a0\ufe0f PRE-ALPHA)\nml-agents eval run --approach ChainOfThought --samples 10\n\n# Compare multiple approaches (\u26a0\ufe0f PRE-ALPHA)\nml-agents eval compare --approaches \"ChainOfThought,AsPlanning,TreeOfThought\" --samples 50 --parallel\n```\n\n### Jupyter Notebook (Original Interface)\n\nTo use the original Jupyter notebook interface:\n\n```bash\njupyter notebook Reasoning_LLM.ipynb\n```\n\n## Configuration\n\n### Supported Providers and Models\n\n- **Anthropic**: Claude Opus 4, Claude Sonnet 4, Claude 3.5 Haiku\n- **Cohere**: Command R+, Command R, Command Light\n- **OpenRouter**: GPT-5, GPT-5 Mini, GPT OSS-120B, Gemini 2.5 Flash Lite\n\n### Hyperparameters\n\n- **Temperature**: 0.0 - 2.0 (controls randomness)\n- **Max Tokens**: 64 - 4096 (output length limit)\n- **Top P**: 0.0 - 1.0 (nucleus sampling parameter)\n\n## MCP Integration (Phase 7)\n\nThe platform includes **SQLite database persistence** for all experiment results and supports **Claude Code MCP server integration** for direct database access during conversations.\n\n### Database Features\n\n- **Real-time persistence**: All experiment results are automatically saved to `ml_agents_results.db`\n- **Read-only MCP access**: Query the database directly from Claude Code conversations\n- **Rich export formats**: CSV, JSON, and Excel with advanced formatting\n- **Advanced analytics**: Approach comparisons, failure analysis, and cost tracking\n\n### Database CLI Commands\n\n```bash\n# Database management (Stable Commands)\nml-agents db init --db-path ./results.db          # Initialize database\nml-agents db backup --source ./results.db         # Create backup\nml-agents db stats --db-path ./results.db         # Show statistics\nml-agents db migrate --db-path ./results.db       # Migrate database schema\n\n# Export and analysis (\u26a0\ufe0f PRE-ALPHA)\nml-agents results export EXPERIMENT_ID --format excel     # Export to Excel\nml-agents results compare \"exp1,exp2,exp3\"               # Compare experiments\nml-agents results analyze EXPERIMENT_ID --type accuracy   # Generate reports\nml-agents results list --status completed                # List experiments\n```\n\n## CLI Usage Guide\n\n### Basic Commands\n\n#### Single Experiment (\u26a0\ufe0f PRE-ALPHA)\n\nRun one reasoning approach on a dataset:\n\n```bash\n# Basic usage\nml-agents eval run --approach ChainOfThought --samples 50\n\n# With specific model\nml-agents eval run --approach TreeOfThought --samples 100 --provider openrouter --model \"openai/gpt-oss-120b\"\n\n# With advanced settings\nml-agents eval run --approach ChainOfVerification --multi-step-verification --max-reasoning-calls 5\n\n# With preprocessing integration\nml-agents eval run --dataset MilaWang/SpatialEval --approach ChainOfThought --samples 50\n\n# Use specific preprocessing\nml-agents eval run --preprocessing-id prep_20240824_143256 --approach TreeOfThought\n\n# Use custom preprocessed data\nml-agents eval run --preprocessing-path ./custom/processed.json --approach Reflection\n```\n\n#### Comparison Experiments (\u26a0\ufe0f PRE-ALPHA)\n\nCompare multiple approaches side-by-side:\n\n```bash\n# Basic comparison\nml-agents eval compare --approaches \"ChainOfThought,AsPlanning,TreeOfThought\" --samples 100\n\n# Parallel execution for faster results\nml-agents eval compare --approaches \"None,ChainOfThought,Reflection\" --samples 200 --parallel --max-workers 4\n\n# Advanced reasoning comparison\nml-agents eval compare --approaches \"ChainOfVerification,Reflection,SkeletonOfThought\" --multi-step-verification --parallel\n```\n\n### Configuration Files\n\nFor complex experiments, use YAML configuration files:\n\n```bash\n# Run with configuration file (\u26a0\ufe0f PRE-ALPHA)\nml-agents eval run --config examples/configs/single_experiment.yaml\n\n# Override specific parameters (\u26a0\ufe0f PRE-ALPHA)\nml-agents eval run --config examples/configs/comparison_study.yaml --samples 200 --parallel\n```\n\n**Example configuration** (`config.yaml`):\n\n```yaml\nexperiment:\n  name: \"reasoning_comparison_study\"\n  sample_count: 100\n  output_dir: \"./results\"\n\nmodel:\n  provider: \"openrouter\"\n  name: \"openai/gpt-oss-120b\"\n  temperature: 0.3\n  max_tokens: 512\n\nreasoning:\n  approaches:\n    - ChainOfThought\n    - AsPlanning\n    - TreeOfThought\n  multi_step_verification: true\n  max_reasoning_calls: 5\n\nexecution:\n  parallel: true\n  max_workers: 4\n  save_checkpoints: true\n```\n\n### Checkpoint Management (\u26a0\ufe0f PRE-ALPHA)\n\nResume interrupted experiments:\n\n```bash\n# List available checkpoints\nml-agents eval checkpoints\n\n# Resume from specific checkpoint\nml-agents eval resume checkpoint_exp_20250818_123456.json\n```\n\n### Advanced Features (\u26a0\ufe0f PRE-ALPHA)\n\n#### Cost Control\n\n```bash\n# Set reasoning limits to control costs\nml-agents eval run --approach ChainOfVerification --max-reasoning-calls 3 --samples 50\n\n# Monitor costs with verbose output\nml-agents eval compare --approaches \"ChainOfThought,TreeOfThought\" --samples 100 --verbose\n```\n\n#### Multi-step Reasoning\n\n```bash\n# Enable multi-step reflection\nml-agents eval run --approach Reflection --multi-step-reflection --max-reflection-iterations 3\n\n# Enable multi-step verification\nml-agents eval run --approach ChainOfVerification --multi-step-verification --max-reasoning-calls 5\n```\n\n#### Parallel Processing\n\n```bash\n# Parallel execution with custom worker count\nml-agents eval compare --approaches \"ChainOfThought,AsPlanning,TreeOfThought,Reflection\" --parallel --max-workers 2\n\n# Balance speed vs rate limits\nml-agents eval compare --approaches \"None,ChainOfThought\" --samples 500 --parallel --max-workers 8\n```\n\n### Output and Results\n\nResults are organized by dataset with full preprocessing-evaluation traceability:\n\n```\n./outputs/\n\u251c\u2500\u2500 {dataset_name}/\n\u2502   \u251c\u2500\u2500 preprocessing/\n\u2502   \u2502   \u251c\u2500\u2500 {timestamp}/\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 analysis.json           # Dataset schema analysis\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 rules.json              # Transformation rules\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 processed.json          # Standardized dataset\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 metadata.json           # Preprocessing metadata\n\u2502   \u2502   \u2514\u2500\u2500 latest \u2192 {most_recent}/     # Symlink to latest preprocessing\n\u2502   \u2514\u2500\u2500 eval/\n\u2502       \u251c\u2500\u2500 {exp_timestamp}/\n\u2502       \u2502   \u251c\u2500\u2500 experiment_config.json  # Experiment configuration\n\u2502       \u2502   \u251c\u2500\u2500 experiment_results.csv  # Detailed results per approach\n\u2502       \u2502   \u251c\u2500\u2500 experiment_summary.json # Performance summary\n\u2502       \u2502   \u2514\u2500\u2500 experiment_errors.json  # Any processing errors\n\u2502       \u2514\u2500\u2500 latest \u2192 {most_recent}/     # Symlink to latest experiment\n```\n\nEach result file contains:\n\n- Input prompts and model responses\n- Complete reasoning traces\n- Performance metrics (accuracy, time, cost)\n- Configuration details\n- Error information\n- **Preprocessing lineage**: Complete traceability to preprocessing rules used\n\n### Example Workflows\n\n#### 1. Environment Setup (Stable)\n\n```bash\n# Validate your setup first\nml-agents setup validate-env\nml-agents setup list-approaches\nml-agents setup version\n```\n\n#### 2. Database Management (Stable)\n\n```bash\n# Initialize and manage experiment database\nml-agents db init\nml-agents db stats\nml-agents db backup --source ./ml_agents_results.db\n```\n\n#### 3. Dataset Preprocessing (Stable)\n\n```bash\n# Preprocess datasets for evaluation\nml-agents preprocess list\nml-agents preprocess inspect MilaWang/SpatialEval --samples 100\nml-agents preprocess batch --max 5\n```\n\n#### 4. Complete Dataset \u2192 Evaluation Pipeline\n\n```bash\n# 1. Preprocess a custom dataset (creates organized folder structure)\nml-agents preprocess inspect MilaWang/SpatialEval --config tqa\nml-agents preprocess generate-rules MilaWang/SpatialEval --config tqa\nml-agents preprocess transform MilaWang/SpatialEval rules.json --config tqa\n\n# 2. Run evaluation with preprocessed data (auto-detects latest preprocessing) (\u26a0\ufe0f PRE-ALPHA)\nml-agents eval run --dataset MilaWang/SpatialEval --approach ChainOfThought --samples 50\n\n# 3. Compare approaches on same preprocessed dataset (\u26a0\ufe0f PRE-ALPHA)\nml-agents eval run --dataset MilaWang/SpatialEval --approach TreeOfThought --samples 50\nml-agents eval run --dataset MilaWang/SpatialEval --approach Reflection --samples 50\n\n# 4. View organized results\nml-agents results list\n```\n\n#### 5. Quick Testing (\u26a0\ufe0f PRE-ALPHA)\n\n```bash\n# Test with small sample size\nml-agents eval run --approach ChainOfThought --samples 5 --verbose\n```\n\n#### 6. Research Study (\u26a0\ufe0f PRE-ALPHA)\n\n```bash\n# Comprehensive comparison study\nml-agents eval compare \\\n  --approaches \"None,ChainOfThought,AsPlanning,TreeOfThought,Reflection\" \\\n  --samples 200 \\\n  --parallel \\\n  --max-workers 4 \\\n  --multi-step-verification \\\n  --output \"./studies/comprehensive_study\"\n```\n\n### Command Reference\n\n#### Stable Commands (Production Ready)\n| Command | Description |\n|---------|-------------|\n| `ml-agents setup validate-env` | Check environment setup |\n| `ml-agents setup list-approaches` | Show available reasoning methods |\n| `ml-agents setup version` | Show version information |\n| `ml-agents db init` | Initialize experiment database |\n| `ml-agents db backup` | Create database backup |\n| `ml-agents db stats` | Show database statistics |\n| `ml-agents db migrate` | Migrate database schema |\n| `ml-agents preprocess list` | List unprocessed datasets |\n| `ml-agents preprocess inspect` | Inspect dataset schema |\n| `ml-agents preprocess batch` | Batch process datasets |\n\n#### Pre-Alpha Commands (\u26a0\ufe0f Experimental)\n| Command | Description |\n|---------|-------------|\n| `ml-agents eval run` | Single reasoning experiment |\n| `ml-agents eval compare` | Multi-approach comparison |\n| `ml-agents eval resume` | Resume from checkpoint |\n| `ml-agents eval checkpoints` | Show available checkpoints |\n| `ml-agents results export` | Export experiment results |\n| `ml-agents results compare` | Compare experiments |\n| `ml-agents results analyze` | Analyze experiment patterns |\n\nFor detailed help on any command:\n\n```bash\nml-agents setup --help\nml-agents eval run --help\nml-agents db --help\n```\n\n## Jupyter Notebook Usage (Original Interface)\n\nFor users who prefer the notebook interface:\n\n1. **Setup**: Ensure dependencies are installed via `./setup.sh`\n2. **Configuration**: Use interactive widgets to select models and approaches\n3. **Data**: Default uses \"bbeh-eval\" dataset, customizable\n4. **Execute**: Run experiment cells to process your dataset\n5. **Results**: Tables and CSV files with format `{model}_{approach}_{timestamp}.csv`\n\n## Dataset Requirements\n\nYour dataset should include:\n\n- **input** column: The question/problem to solve\n- **answer** column (optional): Expected output for evaluation\n- **task** column (optional): Task category for analysis\n\n## Output Files\n\nThe notebook generates CSV files containing:\n\n- Input prompts\n- Model outputs\n- Full reasoning traces\n- Execution time\n- Cost estimates\n- Configuration details\n\n## Project Structure\n\n```\nml-agents/\n\u251c\u2500\u2500 src/                           # Main source code\n\u2502   \u251c\u2500\u2500 cli/                      # CLI interface (Phase 5)\n\u2502   \u2502   \u251c\u2500\u2500 main.py              # CLI entry point\n\u2502   \u2502   \u251c\u2500\u2500 commands.py          # Run/compare commands\n\u2502   \u2502   \u251c\u2500\u2500 config_loader.py     # Configuration management\n\u2502   \u2502   \u251c\u2500\u2500 display.py           # Rich output formatting\n\u2502   \u2502   \u2514\u2500\u2500 validators.py        # Input validation\n\u2502   \u251c\u2500\u2500 core/                    # Core experiment logic\n\u2502   \u2502   \u251c\u2500\u2500 experiment_runner.py # Experiment orchestration\n\u2502   \u2502   \u251c\u2500\u2500 dataset_loader.py    # Dataset loading\n\u2502   \u2502   \u2514\u2500\u2500 reasoning_inference.py # Inference engine\n\u2502   \u251c\u2500\u2500 reasoning/               # Reasoning approaches\n\u2502   \u2502   \u251c\u2500\u2500 base.py             # Base reasoning class\n\u2502   \u2502   \u251c\u2500\u2500 chain_of_thought.py # CoT implementation\n\u2502   \u2502   \u251c\u2500\u2500 tree_of_thought.py  # ToT implementation\n\u2502   \u2502   \u2514\u2500\u2500 ...                 # Other approaches\n\u2502   \u2514\u2500\u2500 utils/                   # Utilities\n\u2502       \u251c\u2500\u2500 api_clients.py      # API wrappers\n\u2502       \u251c\u2500\u2500 rate_limiter.py     # Rate limiting\n\u2502       \u2514\u2500\u2500 logging_config.py   # Logging setup\n\u251c\u2500\u2500 examples/                    # Usage examples\n\u2502   \u251c\u2500\u2500 configs/                # Configuration templates\n\u2502   \u251c\u2500\u2500 scripts/                # Batch processing scripts\n\u2502   \u2514\u2500\u2500 README.md               # Examples documentation\n\u251c\u2500\u2500 tests/                      # Test suite\n\u251c\u2500\u2500 outputs/                    # Organized experiment and preprocessing results\n\u2502   \u2514\u2500\u2500 {dataset_name}/        # Dataset-centric organization\n\u2502       \u251c\u2500\u2500 preprocessing/     # Preprocessing runs with timestamps\n\u2502       \u2514\u2500\u2500 eval/              # Evaluation runs with timestamps\n\u251c\u2500\u2500 Reasoning_LLM.ipynb        # Original Jupyter notebook\n\u251c\u2500\u2500 config.py                  # Environment configuration\n\u251c\u2500\u2500 requirements.txt           # Python dependencies\n\u251c\u2500\u2500 setup.sh                   # Automated setup script\n\u251c\u2500\u2500 Makefile                   # Development commands\n\u2514\u2500\u2500 README.md                  # This file\n```\n\n## Best Practices\n\n### For Researchers\n\n1. **Start Small**: Begin with `--samples 10` to test approaches quickly\n2. **Use Baselines**: Always include `None` approach for comparison\n3. **Cost Control**: Monitor costs with `--verbose` and set `--max-reasoning-calls`\n4. **Parallel Processing**: Use `--parallel` for faster comparison studies\n5. **Reproducibility**: Save configuration files and use checkpoints\n\n### For Cost Management\n\n1. **Temperature Settings**: Lower values (0.1-0.3) for consistent, cost-effective results\n2. **Token Limits**: Set appropriate `--max-tokens` based on your task complexity\n3. **Sample Sizing**: Use smaller samples for initial exploration\n4. **Provider Selection**: Compare costs across different providers\n5. **Multi-step Limits**: Control `--max-reasoning-calls` for approaches like Chain-of-Verification\n\n### For Performance\n\n1. **Parallel Execution**: Use `--parallel --max-workers N` for comparison studies\n2. **Checkpoint Usage**: Enable checkpoints for long-running experiments\n3. **Rate Limiting**: Adjust `--max-workers` based on provider rate limits\n4. **Batch Processing**: Use configuration files and scripts for multiple experiments\n\n## Troubleshooting\n\n### Common Issues\n\n#### Environment Setup\n\n```bash\n# Check environment\nml-agents setup validate-env\n\n# Fix dependency issues\nmake clean && make install-dev\n\n# Verify imports\nmake debug-imports\n```\n\n#### API Key Problems\n\n```bash\n# Check .env file exists and has keys\ncat .env\n\n# Validate specific provider\nml-agents setup validate-env\n```\n\nError messages will guide you to set missing keys:\n\n```bash\nexport OPENROUTER_API_KEY=\"your_key_here\"\nexport ANTHROPIC_API_KEY=\"your_key_here\"\n```\n\n#### Rate Limiting\n\nIf you encounter rate limits:\n\n```bash\n# Reduce parallel workers\nml-agents eval compare --approaches \"ChainOfThought,AsPlanning\" --max-workers 1\n\n# Add delays between requests\nml-agents eval run --approach ChainOfThought --samples 50 --parallel false\n```\n\n#### Memory Issues\n\nFor large experiments:\n\n```bash\n# Reduce sample size\nml-agents eval compare --approaches \"ChainOfThought,TreeOfThought\" --samples 50\n\n# Disable parallel processing\nml-agents eval compare --approaches \"...\" --parallel false\n```\n\n#### NumPy Compatibility Warning\n\nThe warning about NumPy 1.x vs 2.x is cosmetic and doesn't affect functionality:\n\n```\nA module that was compiled using NumPy 1.x cannot be run in NumPy 2.3.2...\n```\n\nThis is a known PyTorch compatibility issue and can be ignored.\n\n### CLI Issues\n\n#### Command Not Found\n\n```bash\n# Reinstall the package\nmake install-dev\n\n# Check entry point\nwhich ml-agents\n```\n\n#### Import Errors\n\n```bash\n# Activate virtual environment\nsource .venv/bin/activate\n\n# Test imports\nmake debug-imports\n```\n\n#### Configuration Validation\n\nFor configuration errors, check:\n\n1. YAML/JSON syntax is valid\n2. All required fields are present\n3. Approach names match available options (`ml-agents list-approaches`)\n4. Provider/model combinations are supported\n\n### Getting Help\n\n1. **Command Help**: Use `--help` with any command\n\n   ```bash\n   ml-agents --help\n   ml-agents eval run --help\n   ml-agents eval compare --help\n   ```\n\n2. **Verbose Output**: Add `--verbose` to see detailed execution logs\n\n   ```bash\n   ml-agents eval run --approach ChainOfThought --samples 5 --verbose\n   ```\n\n3. **Check Status**: Validate your setup\n\n   ```bash\n   ml-agents setup validate-env\n   ml-agents setup list-approaches\n   make validate-env\n   ```\n\n4. **Community Support**: Join the Discord #ml-agents channel for help\n\n### Development Issues\n\nFor developers working on the codebase:\n\n```bash\n# Run test suite\nmake test\n\n# Check code quality\nmake lint\n\n# Type checking\nmake type-check\n\n# Full development check\nmake check\n```\n\n## Development Tools\n\n### Claude Code MCP Server Setup\n\nFor developers using Claude Code, enable direct database queries in conversations:\n\n```bash\n# Configure MCP server (one-time setup)\nmake configure-mcp\n\n# Or run the script directly\n./scripts/install-sqlite-mcp-server.sh\n```\n\n**Available MCP Tools**:\n\n- `read_query`: Execute validated SELECT queries\n- `list_tables`: Show all database tables\n- `describe_table`: Show table schemas\n\n**\u26a0\ufe0f Note**: Project-scoped MCP servers don't appear in `claude mcp list` due to a [known bug](https://github.com/anthropics/claude-code/issues/5963). Use `claude mcp get sqlite-read-only` to verify installation.\n\n## Contributing\n\nFeel free to extend the notebook with:\n\n- Additional reasoning approaches\n- New evaluation metrics\n- Support for more models/providers\n- Performance optimizations\n\n## License\n\n### Recommended: CC BY 4.0 (Creative Commons Attribution 4.0 International)\n\nThis project is licensed under the Creative Commons Attribution 4.0 International License. This means:\n\n- \u2705 **Share** - Copy and redistribute the material in any medium or format\n- \u2705 **Adapt** - Remix, transform, and build upon the material for any purpose, even commercially\n- \u2705 **Attribution** - You must give appropriate credit, provide a link to the license, and indicate if changes were made\n\nThis license is chosen because:\n\n1. **Open Science**: Aligns with Cohere Labs' open science mission\n2. **Maximum Impact**: Allows both academic and commercial use, accelerating AI research\n3. **Community Growth**: Enables derivatives while ensuring original work is credited\n4. **Simplicity**: Easy to understand and implement\n\n**Note**: For the code components specifically, you may want to consider dual-licensing with MIT or Apache 2.0 for better software compatibility.\n\n<a rel=\"license\" href=\"http://creativecommons.org/licenses/by/4.0/\"><img alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by/4.0/88x31.png\" /></a>\n\n### Alternative Options Considered\n\n- **CC BY-SA 4.0**: Adds \"ShareAlike\" requirement - derivatives must use same license (more restrictive but ensures openness)\n- **CC BY-NC 4.0**: Adds \"NonCommercial\" restriction - prevents commercial use (limits industry collaboration)\n- **CC0**: Public domain dedication - no attribution required (maximum freedom but no credit requirement)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "ML Agents Reasoning Research Platform",
    "version": "0.0.18a0",
    "project_urls": {
        "Documentation": "https://github.com/thompsonson/c4ai-ml-agents#readme",
        "Homepage": "https://github.com/thompsonson/c4ai-ml-agents",
        "Issues": "https://github.com/thompsonson/c4ai-ml-agents/issues",
        "Repository": "https://github.com/thompsonson/c4ai-ml-agents"
    },
    "split_keywords": [
        "reasoning",
        " llm",
        " benchmark",
        " research",
        " ai"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "504d2698c359afeb7aa94fec8d0a8bd25bb27a14f855df83ef96428ebd3cfdc5",
                "md5": "e1e10f97c5b1a45c1fbebc0c04e09862",
                "sha256": "8c233d9e67669525bede4821da733830ebc8e47a18c6024d2cfc7b9b42140b93"
            },
            "downloads": -1,
            "filename": "ml_agents_reasoning-0.0.18a0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e1e10f97c5b1a45c1fbebc0c04e09862",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 147285,
            "upload_time": "2025-09-02T20:18:57",
            "upload_time_iso_8601": "2025-09-02T20:18:57.834480Z",
            "url": "https://files.pythonhosted.org/packages/50/4d/2698c359afeb7aa94fec8d0a8bd25bb27a14f855df83ef96428ebd3cfdc5/ml_agents_reasoning-0.0.18a0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c82fcf94c581e5e9f8bbd080c81b3fe915b32d70bd2b94c8ed74d0f3b66f4c0e",
                "md5": "f2048b2e0cde11afd0d4c9380dd41de9",
                "sha256": "deb8845734f8aff82fa75c4d54f58066d4ffcc1d34f13aed63b98caabcfa28e1"
            },
            "downloads": -1,
            "filename": "ml_agents_reasoning-0.0.18a0.tar.gz",
            "has_sig": false,
            "md5_digest": "f2048b2e0cde11afd0d4c9380dd41de9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 408766,
            "upload_time": "2025-09-02T20:18:59",
            "upload_time_iso_8601": "2025-09-02T20:18:59.522262Z",
            "url": "https://files.pythonhosted.org/packages/c8/2f/cf94c581e5e9f8bbd080c81b3fe915b32d70bd2b94c8ed74d0f3b66f4c0e/ml_agents_reasoning-0.0.18a0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-02 20:18:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "thompsonson",
    "github_project": "c4ai-ml-agents#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    "<",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "openai",
            "specs": []
        },
        {
            "name": "cohere",
            "specs": []
        },
        {
            "name": "anthropic",
            "specs": []
        },
        {
            "name": "ipywidgets",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "huggingface_hub",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "datasets",
            "specs": []
        },
        {
            "name": "python-dotenv",
            "specs": []
        },
        {
            "name": "jupyter",
            "specs": []
        },
        {
            "name": "ipykernel",
            "specs": []
        },
        {
            "name": "typer",
            "specs": []
        },
        {
            "name": "rich",
            "specs": []
        },
        {
            "name": "pydantic",
            "specs": []
        },
        {
            "name": "instructor",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "openpyxl",
            "specs": [
                [
                    ">=",
                    "3.1.0"
                ]
            ]
        }
    ],
    "lcname": "ml-agents-reasoning"
}
        
Elapsed time: 2.10594s