# Cohere Labs Open Science Research into ML Agents and Reasoning
## Community Resources
**👉 New to ML Agents? Check out the [BEGINNER_GUIDE.md](https://github.com/thompsonson/c4ai-ml-agents/blob/main/BEGINNER_GUIDE.md) for a step-by-step walkthrough!**
- **[ML Agents Community Program](https://sites.google.com/cohere.com/coherelabs-community/community-programs/ml-agents)** - Main hub for Cohere Labs' community-driven initiative on open-source agent research, focusing on agentic frameworks, applications, evaluations, and benchmarks
- **[Project Documentation](https://docs.google.com/document/d/1fLnwUzTvO3XuvViBwLz-QuSe_y87a1p4j8Uw2R4eBiI/edit?pli=1&tab=t.0#heading=h.d0279byf6lhr)** - Detailed specifications and roadmap for the ZeroHPO (Zero-shot Hyperparameter Optimization) project for agentic tasks
- **[Project Tracker](https://docs.google.com/spreadsheets/d/1-TBlPSIiBymQfCdF_LCYJznLwaxcKtYTRJME0NT17kU/edit?usp=sharing)** - Community project tracking, task assignments, and progress monitoring
- **[Discord Community](https://discord.gg/ckaQnUakYx)** - Join the #ml-agents channel for discussions, meetings, and collaboration with the community
## Overview
This project investigates how different reasoning approaches impact AI model performance across various tasks. It provides a comprehensive framework for comparing multiple reasoning techniques with various language models.
**🎉 Phase 11 Complete**: The platform now includes a stable CLI interface with production-ready commands for environment setup, database management, and dataset preprocessing, plus experimental evaluation features!
## Research Questions
1. **Universal Benefit**: Do all tasks benefit from reasoning?
2. **Model Variability**: Do different models show varying benefits from reasoning?
3. **Approach Comparison**: How do different reasoning approaches (CoT, PoT, etc.) compare?
4. **Task-Approach Fit**: Do certain tasks benefit more from specific reasoning methods?
5. **Cost-Benefit Analysis**: What is the tradeoff for each approach and task?
6. **Predictive Reasoning**: Can we predict the need for reasoning based on the input prompt alone?
## Reasoning Approaches Available
The platform currently supports **8 production-ready reasoning approaches**:
1. **None** - Baseline direct prompting without reasoning
2. **Chain-of-Thought (CoT)** - Step-by-step reasoning process
3. **Program-of-Thought (PoT)** - Code-based problem solving
4. **Reasoning-as-Planning** - Strategic planning with goal decomposition
5. **Reflection** - Self-evaluation and iterative improvement
6. **Chain-of-Verification** - Systematic verification with follow-up questions
7. **Skeleton-of-Thought** - Hierarchical outline-first reasoning
8. **Tree-of-Thought** - Multiple reasoning path exploration and synthesis
**Additional approaches planned**: Graph-of-Thought, ReWOO, Buffer-of-Thoughts (Phase 6)
## Quick Start
### Prerequisites
- Python 3.9+
- uv (for virtual environment management)
- API keys for at least one provider (Anthropic, Cohere, or OpenRouter)
### Installation
#### Option 1: pip Install (Recommended)
Install the latest stable version from PyPI:
```bash
# Install globally
pip install ml-agents-reasoning
# Or install with development dependencies
pip install ml-agents-reasoning[dev]
# Verify installation
ml-agents --version
ml-agents --help
```
#### Option 2: Modern Python (uv/uvx)
With [uv](https://github.com/astral-sh/uv) (fastest):
```bash
# Install with uv
uv tool install ml-agents-reasoning
# Run without installing (recommended for trying out)
uvx ml-agents-reasoning run --approach CoT --samples 10
# Add to project dependencies
uv add ml-agents-reasoning
```
#### Option 3: Development Installation
For contributors or advanced users:
```bash
# Clone and install in development mode
git clone https://github.com/thompsonson/c4ai-ml-agents
cd c4ai-ml-agents
pip install -e .[dev]
# Or with uv (recommended)
uv sync --all-extras
```
### Configure API Keys
After installation, configure your API keys:
```bash
# Create configuration file
cp .env.example .env
# Edit .env with your actual API keys
# Or set environment variables directly
export ANTHROPIC_API_KEY="your-key-here"
export OPENROUTER_API_KEY="your-key-here"
```
### ⚠️ Important: CLI Command Classification
The ML Agents CLI includes two types of commands:
- **Stable Commands** (✅ Production Ready): `setup`, `db`, `preprocess` - Well-tested, stable API, suitable for production use
- **Pre-Alpha Commands** (⚠️ Experimental): `eval`, `results` - Experimental features that may be unstable or have breaking changes
**For production use or getting started, we recommend using only the stable commands first.**
### CLI Quick Start
Once installed, you can use the ML Agents CLI:
```bash
# Validate your environment
ml-agents setup validate-env
# List available reasoning approaches
ml-agents setup list-approaches
# Run a simple experiment (⚠️ PRE-ALPHA)
ml-agents eval run --approach ChainOfThought --samples 10
# Compare multiple approaches (⚠️ PRE-ALPHA)
ml-agents eval compare --approaches "ChainOfThought,AsPlanning,TreeOfThought" --samples 50 --parallel
```
### Jupyter Notebook (Original Interface)
To use the original Jupyter notebook interface:
```bash
jupyter notebook Reasoning_LLM.ipynb
```
## Configuration
### Supported Providers and Models
- **Anthropic**: Claude Opus 4, Claude Sonnet 4, Claude 3.5 Haiku
- **Cohere**: Command R+, Command R, Command Light
- **OpenRouter**: GPT-5, GPT-5 Mini, GPT OSS-120B, Gemini 2.5 Flash Lite
### Hyperparameters
- **Temperature**: 0.0 - 2.0 (controls randomness)
- **Max Tokens**: 64 - 4096 (output length limit)
- **Top P**: 0.0 - 1.0 (nucleus sampling parameter)
## MCP Integration (Phase 7)
The platform includes **SQLite database persistence** for all experiment results and supports **Claude Code MCP server integration** for direct database access during conversations.
### Database Features
- **Real-time persistence**: All experiment results are automatically saved to `ml_agents_results.db`
- **Read-only MCP access**: Query the database directly from Claude Code conversations
- **Rich export formats**: CSV, JSON, and Excel with advanced formatting
- **Advanced analytics**: Approach comparisons, failure analysis, and cost tracking
### Database CLI Commands
```bash
# Database management (Stable Commands)
ml-agents db init --db-path ./results.db # Initialize database
ml-agents db backup --source ./results.db # Create backup
ml-agents db stats --db-path ./results.db # Show statistics
ml-agents db migrate --db-path ./results.db # Migrate database schema
# Export and analysis (⚠️ PRE-ALPHA)
ml-agents results export EXPERIMENT_ID --format excel # Export to Excel
ml-agents results compare "exp1,exp2,exp3" # Compare experiments
ml-agents results analyze EXPERIMENT_ID --type accuracy # Generate reports
ml-agents results list --status completed # List experiments
```
## CLI Usage Guide
### Basic Commands
#### Single Experiment (⚠️ PRE-ALPHA)
Run one reasoning approach on a dataset:
```bash
# Basic usage
ml-agents eval run --approach ChainOfThought --samples 50
# With specific model
ml-agents eval run --approach TreeOfThought --samples 100 --provider openrouter --model "openai/gpt-oss-120b"
# With advanced settings
ml-agents eval run --approach ChainOfVerification --multi-step-verification --max-reasoning-calls 5
# With preprocessing integration
ml-agents eval run --dataset MilaWang/SpatialEval --approach ChainOfThought --samples 50
# Use specific preprocessing
ml-agents eval run --preprocessing-id prep_20240824_143256 --approach TreeOfThought
# Use custom preprocessed data
ml-agents eval run --preprocessing-path ./custom/processed.json --approach Reflection
```
#### Comparison Experiments (⚠️ PRE-ALPHA)
Compare multiple approaches side-by-side:
```bash
# Basic comparison
ml-agents eval compare --approaches "ChainOfThought,AsPlanning,TreeOfThought" --samples 100
# Parallel execution for faster results
ml-agents eval compare --approaches "None,ChainOfThought,Reflection" --samples 200 --parallel --max-workers 4
# Advanced reasoning comparison
ml-agents eval compare --approaches "ChainOfVerification,Reflection,SkeletonOfThought" --multi-step-verification --parallel
```
### Configuration Files
For complex experiments, use YAML configuration files:
```bash
# Run with configuration file (⚠️ PRE-ALPHA)
ml-agents eval run --config examples/configs/single_experiment.yaml
# Override specific parameters (⚠️ PRE-ALPHA)
ml-agents eval run --config examples/configs/comparison_study.yaml --samples 200 --parallel
```
**Example configuration** (`config.yaml`):
```yaml
experiment:
name: "reasoning_comparison_study"
sample_count: 100
output_dir: "./results"
model:
provider: "openrouter"
name: "openai/gpt-oss-120b"
temperature: 0.3
max_tokens: 512
reasoning:
approaches:
- ChainOfThought
- AsPlanning
- TreeOfThought
multi_step_verification: true
max_reasoning_calls: 5
execution:
parallel: true
max_workers: 4
save_checkpoints: true
```
### Checkpoint Management (⚠️ PRE-ALPHA)
Resume interrupted experiments:
```bash
# List available checkpoints
ml-agents eval checkpoints
# Resume from specific checkpoint
ml-agents eval resume checkpoint_exp_20250818_123456.json
```
### Advanced Features (⚠️ PRE-ALPHA)
#### Cost Control
```bash
# Set reasoning limits to control costs
ml-agents eval run --approach ChainOfVerification --max-reasoning-calls 3 --samples 50
# Monitor costs with verbose output
ml-agents eval compare --approaches "ChainOfThought,TreeOfThought" --samples 100 --verbose
```
#### Multi-step Reasoning
```bash
# Enable multi-step reflection
ml-agents eval run --approach Reflection --multi-step-reflection --max-reflection-iterations 3
# Enable multi-step verification
ml-agents eval run --approach ChainOfVerification --multi-step-verification --max-reasoning-calls 5
```
#### Parallel Processing
```bash
# Parallel execution with custom worker count
ml-agents eval compare --approaches "ChainOfThought,AsPlanning,TreeOfThought,Reflection" --parallel --max-workers 2
# Balance speed vs rate limits
ml-agents eval compare --approaches "None,ChainOfThought" --samples 500 --parallel --max-workers 8
```
### Output and Results
Results are organized by dataset with full preprocessing-evaluation traceability:
```
./outputs/
├── {dataset_name}/
│ ├── preprocessing/
│ │ ├── {timestamp}/
│ │ │ ├── analysis.json # Dataset schema analysis
│ │ │ ├── rules.json # Transformation rules
│ │ │ ├── processed.json # Standardized dataset
│ │ │ └── metadata.json # Preprocessing metadata
│ │ └── latest → {most_recent}/ # Symlink to latest preprocessing
│ └── eval/
│ ├── {exp_timestamp}/
│ │ ├── experiment_config.json # Experiment configuration
│ │ ├── experiment_results.csv # Detailed results per approach
│ │ ├── experiment_summary.json # Performance summary
│ │ └── experiment_errors.json # Any processing errors
│ └── latest → {most_recent}/ # Symlink to latest experiment
```
Each result file contains:
- Input prompts and model responses
- Complete reasoning traces
- Performance metrics (accuracy, time, cost)
- Configuration details
- Error information
- **Preprocessing lineage**: Complete traceability to preprocessing rules used
### Example Workflows
#### 1. Environment Setup (Stable)
```bash
# Validate your setup first
ml-agents setup validate-env
ml-agents setup list-approaches
ml-agents setup version
```
#### 2. Database Management (Stable)
```bash
# Initialize and manage experiment database
ml-agents db init
ml-agents db stats
ml-agents db backup --source ./ml_agents_results.db
```
#### 3. Dataset Preprocessing (Stable)
```bash
# Preprocess datasets for evaluation
ml-agents preprocess list
ml-agents preprocess inspect MilaWang/SpatialEval --samples 100
ml-agents preprocess batch --max 5
```
#### 4. Complete Dataset → Evaluation Pipeline
```bash
# 1. Preprocess a custom dataset (creates organized folder structure)
ml-agents preprocess inspect MilaWang/SpatialEval --config tqa
ml-agents preprocess generate-rules MilaWang/SpatialEval --config tqa
ml-agents preprocess transform MilaWang/SpatialEval rules.json --config tqa
# 2. Run evaluation with preprocessed data (auto-detects latest preprocessing) (⚠️ PRE-ALPHA)
ml-agents eval run --dataset MilaWang/SpatialEval --approach ChainOfThought --samples 50
# 3. Compare approaches on same preprocessed dataset (⚠️ PRE-ALPHA)
ml-agents eval run --dataset MilaWang/SpatialEval --approach TreeOfThought --samples 50
ml-agents eval run --dataset MilaWang/SpatialEval --approach Reflection --samples 50
# 4. View organized results
ml-agents results list
```
#### 5. Quick Testing (⚠️ PRE-ALPHA)
```bash
# Test with small sample size
ml-agents eval run --approach ChainOfThought --samples 5 --verbose
```
#### 6. Research Study (⚠️ PRE-ALPHA)
```bash
# Comprehensive comparison study
ml-agents eval compare \
--approaches "None,ChainOfThought,AsPlanning,TreeOfThought,Reflection" \
--samples 200 \
--parallel \
--max-workers 4 \
--multi-step-verification \
--output "./studies/comprehensive_study"
```
### Command Reference
#### Stable Commands (Production Ready)
| Command | Description |
|---------|-------------|
| `ml-agents setup validate-env` | Check environment setup |
| `ml-agents setup list-approaches` | Show available reasoning methods |
| `ml-agents setup version` | Show version information |
| `ml-agents db init` | Initialize experiment database |
| `ml-agents db backup` | Create database backup |
| `ml-agents db stats` | Show database statistics |
| `ml-agents db migrate` | Migrate database schema |
| `ml-agents preprocess list` | List unprocessed datasets |
| `ml-agents preprocess inspect` | Inspect dataset schema |
| `ml-agents preprocess batch` | Batch process datasets |
#### Pre-Alpha Commands (⚠️ Experimental)
| Command | Description |
|---------|-------------|
| `ml-agents eval run` | Single reasoning experiment |
| `ml-agents eval compare` | Multi-approach comparison |
| `ml-agents eval resume` | Resume from checkpoint |
| `ml-agents eval checkpoints` | Show available checkpoints |
| `ml-agents results export` | Export experiment results |
| `ml-agents results compare` | Compare experiments |
| `ml-agents results analyze` | Analyze experiment patterns |
For detailed help on any command:
```bash
ml-agents setup --help
ml-agents eval run --help
ml-agents db --help
```
## Jupyter Notebook Usage (Original Interface)
For users who prefer the notebook interface:
1. **Setup**: Ensure dependencies are installed via `./setup.sh`
2. **Configuration**: Use interactive widgets to select models and approaches
3. **Data**: Default uses "bbeh-eval" dataset, customizable
4. **Execute**: Run experiment cells to process your dataset
5. **Results**: Tables and CSV files with format `{model}_{approach}_{timestamp}.csv`
## Dataset Requirements
Your dataset should include:
- **input** column: The question/problem to solve
- **answer** column (optional): Expected output for evaluation
- **task** column (optional): Task category for analysis
## Output Files
The notebook generates CSV files containing:
- Input prompts
- Model outputs
- Full reasoning traces
- Execution time
- Cost estimates
- Configuration details
## Project Structure
```
ml-agents/
├── src/ # Main source code
│ ├── cli/ # CLI interface (Phase 5)
│ │ ├── main.py # CLI entry point
│ │ ├── commands.py # Run/compare commands
│ │ ├── config_loader.py # Configuration management
│ │ ├── display.py # Rich output formatting
│ │ └── validators.py # Input validation
│ ├── core/ # Core experiment logic
│ │ ├── experiment_runner.py # Experiment orchestration
│ │ ├── dataset_loader.py # Dataset loading
│ │ └── reasoning_inference.py # Inference engine
│ ├── reasoning/ # Reasoning approaches
│ │ ├── base.py # Base reasoning class
│ │ ├── chain_of_thought.py # CoT implementation
│ │ ├── tree_of_thought.py # ToT implementation
│ │ └── ... # Other approaches
│ └── utils/ # Utilities
│ ├── api_clients.py # API wrappers
│ ├── rate_limiter.py # Rate limiting
│ └── logging_config.py # Logging setup
├── examples/ # Usage examples
│ ├── configs/ # Configuration templates
│ ├── scripts/ # Batch processing scripts
│ └── README.md # Examples documentation
├── tests/ # Test suite
├── outputs/ # Organized experiment and preprocessing results
│ └── {dataset_name}/ # Dataset-centric organization
│ ├── preprocessing/ # Preprocessing runs with timestamps
│ └── eval/ # Evaluation runs with timestamps
├── Reasoning_LLM.ipynb # Original Jupyter notebook
├── config.py # Environment configuration
├── requirements.txt # Python dependencies
├── setup.sh # Automated setup script
├── Makefile # Development commands
└── README.md # This file
```
## Best Practices
### For Researchers
1. **Start Small**: Begin with `--samples 10` to test approaches quickly
2. **Use Baselines**: Always include `None` approach for comparison
3. **Cost Control**: Monitor costs with `--verbose` and set `--max-reasoning-calls`
4. **Parallel Processing**: Use `--parallel` for faster comparison studies
5. **Reproducibility**: Save configuration files and use checkpoints
### For Cost Management
1. **Temperature Settings**: Lower values (0.1-0.3) for consistent, cost-effective results
2. **Token Limits**: Set appropriate `--max-tokens` based on your task complexity
3. **Sample Sizing**: Use smaller samples for initial exploration
4. **Provider Selection**: Compare costs across different providers
5. **Multi-step Limits**: Control `--max-reasoning-calls` for approaches like Chain-of-Verification
### For Performance
1. **Parallel Execution**: Use `--parallel --max-workers N` for comparison studies
2. **Checkpoint Usage**: Enable checkpoints for long-running experiments
3. **Rate Limiting**: Adjust `--max-workers` based on provider rate limits
4. **Batch Processing**: Use configuration files and scripts for multiple experiments
## Troubleshooting
### Common Issues
#### Environment Setup
```bash
# Check environment
ml-agents setup validate-env
# Fix dependency issues
make clean && make install-dev
# Verify imports
make debug-imports
```
#### API Key Problems
```bash
# Check .env file exists and has keys
cat .env
# Validate specific provider
ml-agents setup validate-env
```
Error messages will guide you to set missing keys:
```bash
export OPENROUTER_API_KEY="your_key_here"
export ANTHROPIC_API_KEY="your_key_here"
```
#### Rate Limiting
If you encounter rate limits:
```bash
# Reduce parallel workers
ml-agents eval compare --approaches "ChainOfThought,AsPlanning" --max-workers 1
# Add delays between requests
ml-agents eval run --approach ChainOfThought --samples 50 --parallel false
```
#### Memory Issues
For large experiments:
```bash
# Reduce sample size
ml-agents eval compare --approaches "ChainOfThought,TreeOfThought" --samples 50
# Disable parallel processing
ml-agents eval compare --approaches "..." --parallel false
```
#### NumPy Compatibility Warning
The warning about NumPy 1.x vs 2.x is cosmetic and doesn't affect functionality:
```
A module that was compiled using NumPy 1.x cannot be run in NumPy 2.3.2...
```
This is a known PyTorch compatibility issue and can be ignored.
### CLI Issues
#### Command Not Found
```bash
# Reinstall the package
make install-dev
# Check entry point
which ml-agents
```
#### Import Errors
```bash
# Activate virtual environment
source .venv/bin/activate
# Test imports
make debug-imports
```
#### Configuration Validation
For configuration errors, check:
1. YAML/JSON syntax is valid
2. All required fields are present
3. Approach names match available options (`ml-agents list-approaches`)
4. Provider/model combinations are supported
### Getting Help
1. **Command Help**: Use `--help` with any command
```bash
ml-agents --help
ml-agents eval run --help
ml-agents eval compare --help
```
2. **Verbose Output**: Add `--verbose` to see detailed execution logs
```bash
ml-agents eval run --approach ChainOfThought --samples 5 --verbose
```
3. **Check Status**: Validate your setup
```bash
ml-agents setup validate-env
ml-agents setup list-approaches
make validate-env
```
4. **Community Support**: Join the Discord #ml-agents channel for help
### Development Issues
For developers working on the codebase:
```bash
# Run test suite
make test
# Check code quality
make lint
# Type checking
make type-check
# Full development check
make check
```
## Development Tools
### Claude Code MCP Server Setup
For developers using Claude Code, enable direct database queries in conversations:
```bash
# Configure MCP server (one-time setup)
make configure-mcp
# Or run the script directly
./scripts/install-sqlite-mcp-server.sh
```
**Available MCP Tools**:
- `read_query`: Execute validated SELECT queries
- `list_tables`: Show all database tables
- `describe_table`: Show table schemas
**⚠️ Note**: Project-scoped MCP servers don't appear in `claude mcp list` due to a [known bug](https://github.com/anthropics/claude-code/issues/5963). Use `claude mcp get sqlite-read-only` to verify installation.
## Contributing
Feel free to extend the notebook with:
- Additional reasoning approaches
- New evaluation metrics
- Support for more models/providers
- Performance optimizations
## License
### Recommended: CC BY 4.0 (Creative Commons Attribution 4.0 International)
This project is licensed under the Creative Commons Attribution 4.0 International License. This means:
- ✅ **Share** - Copy and redistribute the material in any medium or format
- ✅ **Adapt** - Remix, transform, and build upon the material for any purpose, even commercially
- ✅ **Attribution** - You must give appropriate credit, provide a link to the license, and indicate if changes were made
This license is chosen because:
1. **Open Science**: Aligns with Cohere Labs' open science mission
2. **Maximum Impact**: Allows both academic and commercial use, accelerating AI research
3. **Community Growth**: Enables derivatives while ensuring original work is credited
4. **Simplicity**: Easy to understand and implement
**Note**: For the code components specifically, you may want to consider dual-licensing with MIT or Apache 2.0 for better software compatibility.
<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a>
### Alternative Options Considered
- **CC BY-SA 4.0**: Adds "ShareAlike" requirement - derivatives must use same license (more restrictive but ensures openness)
- **CC BY-NC 4.0**: Adds "NonCommercial" restriction - prevents commercial use (limits industry collaboration)
- **CC0**: Public domain dedication - no attribution required (maximum freedom but no credit requirement)
Raw data
{
"_id": null,
"home_page": null,
"name": "ml-agents-reasoning",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "reasoning, llm, benchmark, research, ai",
"author": null,
"author_email": "Matthew Thompson <thompsonson@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/c8/2f/cf94c581e5e9f8bbd080c81b3fe915b32d70bd2b94c8ed74d0f3b66f4c0e/ml_agents_reasoning-0.0.18a0.tar.gz",
"platform": null,
"description": "# Cohere Labs Open Science Research into ML Agents and Reasoning\n\n## Community Resources\n\n**\ud83d\udc49 New to ML Agents? Check out the [BEGINNER_GUIDE.md](https://github.com/thompsonson/c4ai-ml-agents/blob/main/BEGINNER_GUIDE.md) for a step-by-step walkthrough!**\n\n- **[ML Agents Community Program](https://sites.google.com/cohere.com/coherelabs-community/community-programs/ml-agents)** - Main hub for Cohere Labs' community-driven initiative on open-source agent research, focusing on agentic frameworks, applications, evaluations, and benchmarks\n\n- **[Project Documentation](https://docs.google.com/document/d/1fLnwUzTvO3XuvViBwLz-QuSe_y87a1p4j8Uw2R4eBiI/edit?pli=1&tab=t.0#heading=h.d0279byf6lhr)** - Detailed specifications and roadmap for the ZeroHPO (Zero-shot Hyperparameter Optimization) project for agentic tasks\n\n- **[Project Tracker](https://docs.google.com/spreadsheets/d/1-TBlPSIiBymQfCdF_LCYJznLwaxcKtYTRJME0NT17kU/edit?usp=sharing)** - Community project tracking, task assignments, and progress monitoring\n\n- **[Discord Community](https://discord.gg/ckaQnUakYx)** - Join the #ml-agents channel for discussions, meetings, and collaboration with the community\n\n## Overview\n\nThis project investigates how different reasoning approaches impact AI model performance across various tasks. It provides a comprehensive framework for comparing multiple reasoning techniques with various language models.\n\n**\ud83c\udf89 Phase 11 Complete**: The platform now includes a stable CLI interface with production-ready commands for environment setup, database management, and dataset preprocessing, plus experimental evaluation features!\n\n## Research Questions\n\n1. **Universal Benefit**: Do all tasks benefit from reasoning?\n2. **Model Variability**: Do different models show varying benefits from reasoning?\n3. **Approach Comparison**: How do different reasoning approaches (CoT, PoT, etc.) compare?\n4. **Task-Approach Fit**: Do certain tasks benefit more from specific reasoning methods?\n5. **Cost-Benefit Analysis**: What is the tradeoff for each approach and task?\n6. **Predictive Reasoning**: Can we predict the need for reasoning based on the input prompt alone?\n\n## Reasoning Approaches Available\n\nThe platform currently supports **8 production-ready reasoning approaches**:\n\n1. **None** - Baseline direct prompting without reasoning\n2. **Chain-of-Thought (CoT)** - Step-by-step reasoning process\n3. **Program-of-Thought (PoT)** - Code-based problem solving\n4. **Reasoning-as-Planning** - Strategic planning with goal decomposition\n5. **Reflection** - Self-evaluation and iterative improvement\n6. **Chain-of-Verification** - Systematic verification with follow-up questions\n7. **Skeleton-of-Thought** - Hierarchical outline-first reasoning\n8. **Tree-of-Thought** - Multiple reasoning path exploration and synthesis\n\n**Additional approaches planned**: Graph-of-Thought, ReWOO, Buffer-of-Thoughts (Phase 6)\n\n## Quick Start\n\n### Prerequisites\n\n- Python 3.9+\n- uv (for virtual environment management)\n- API keys for at least one provider (Anthropic, Cohere, or OpenRouter)\n\n### Installation\n\n#### Option 1: pip Install (Recommended)\n\nInstall the latest stable version from PyPI:\n\n```bash\n# Install globally\npip install ml-agents-reasoning\n\n# Or install with development dependencies\npip install ml-agents-reasoning[dev]\n\n# Verify installation\nml-agents --version\nml-agents --help\n```\n\n#### Option 2: Modern Python (uv/uvx)\n\nWith [uv](https://github.com/astral-sh/uv) (fastest):\n\n```bash\n# Install with uv\nuv tool install ml-agents-reasoning\n\n# Run without installing (recommended for trying out)\nuvx ml-agents-reasoning run --approach CoT --samples 10\n\n# Add to project dependencies\nuv add ml-agents-reasoning\n```\n\n#### Option 3: Development Installation\n\nFor contributors or advanced users:\n\n```bash\n# Clone and install in development mode\ngit clone https://github.com/thompsonson/c4ai-ml-agents\ncd c4ai-ml-agents\npip install -e .[dev]\n\n# Or with uv (recommended)\nuv sync --all-extras\n```\n\n### Configure API Keys\n\nAfter installation, configure your API keys:\n\n```bash\n# Create configuration file\ncp .env.example .env\n# Edit .env with your actual API keys\n\n# Or set environment variables directly\nexport ANTHROPIC_API_KEY=\"your-key-here\"\nexport OPENROUTER_API_KEY=\"your-key-here\"\n```\n\n### \u26a0\ufe0f Important: CLI Command Classification\n\nThe ML Agents CLI includes two types of commands:\n\n- **Stable Commands** (\u2705 Production Ready): `setup`, `db`, `preprocess` - Well-tested, stable API, suitable for production use\n- **Pre-Alpha Commands** (\u26a0\ufe0f Experimental): `eval`, `results` - Experimental features that may be unstable or have breaking changes\n\n**For production use or getting started, we recommend using only the stable commands first.**\n\n### CLI Quick Start\n\nOnce installed, you can use the ML Agents CLI:\n\n```bash\n# Validate your environment\nml-agents setup validate-env\n\n# List available reasoning approaches\nml-agents setup list-approaches\n\n# Run a simple experiment (\u26a0\ufe0f PRE-ALPHA)\nml-agents eval run --approach ChainOfThought --samples 10\n\n# Compare multiple approaches (\u26a0\ufe0f PRE-ALPHA)\nml-agents eval compare --approaches \"ChainOfThought,AsPlanning,TreeOfThought\" --samples 50 --parallel\n```\n\n### Jupyter Notebook (Original Interface)\n\nTo use the original Jupyter notebook interface:\n\n```bash\njupyter notebook Reasoning_LLM.ipynb\n```\n\n## Configuration\n\n### Supported Providers and Models\n\n- **Anthropic**: Claude Opus 4, Claude Sonnet 4, Claude 3.5 Haiku\n- **Cohere**: Command R+, Command R, Command Light\n- **OpenRouter**: GPT-5, GPT-5 Mini, GPT OSS-120B, Gemini 2.5 Flash Lite\n\n### Hyperparameters\n\n- **Temperature**: 0.0 - 2.0 (controls randomness)\n- **Max Tokens**: 64 - 4096 (output length limit)\n- **Top P**: 0.0 - 1.0 (nucleus sampling parameter)\n\n## MCP Integration (Phase 7)\n\nThe platform includes **SQLite database persistence** for all experiment results and supports **Claude Code MCP server integration** for direct database access during conversations.\n\n### Database Features\n\n- **Real-time persistence**: All experiment results are automatically saved to `ml_agents_results.db`\n- **Read-only MCP access**: Query the database directly from Claude Code conversations\n- **Rich export formats**: CSV, JSON, and Excel with advanced formatting\n- **Advanced analytics**: Approach comparisons, failure analysis, and cost tracking\n\n### Database CLI Commands\n\n```bash\n# Database management (Stable Commands)\nml-agents db init --db-path ./results.db # Initialize database\nml-agents db backup --source ./results.db # Create backup\nml-agents db stats --db-path ./results.db # Show statistics\nml-agents db migrate --db-path ./results.db # Migrate database schema\n\n# Export and analysis (\u26a0\ufe0f PRE-ALPHA)\nml-agents results export EXPERIMENT_ID --format excel # Export to Excel\nml-agents results compare \"exp1,exp2,exp3\" # Compare experiments\nml-agents results analyze EXPERIMENT_ID --type accuracy # Generate reports\nml-agents results list --status completed # List experiments\n```\n\n## CLI Usage Guide\n\n### Basic Commands\n\n#### Single Experiment (\u26a0\ufe0f PRE-ALPHA)\n\nRun one reasoning approach on a dataset:\n\n```bash\n# Basic usage\nml-agents eval run --approach ChainOfThought --samples 50\n\n# With specific model\nml-agents eval run --approach TreeOfThought --samples 100 --provider openrouter --model \"openai/gpt-oss-120b\"\n\n# With advanced settings\nml-agents eval run --approach ChainOfVerification --multi-step-verification --max-reasoning-calls 5\n\n# With preprocessing integration\nml-agents eval run --dataset MilaWang/SpatialEval --approach ChainOfThought --samples 50\n\n# Use specific preprocessing\nml-agents eval run --preprocessing-id prep_20240824_143256 --approach TreeOfThought\n\n# Use custom preprocessed data\nml-agents eval run --preprocessing-path ./custom/processed.json --approach Reflection\n```\n\n#### Comparison Experiments (\u26a0\ufe0f PRE-ALPHA)\n\nCompare multiple approaches side-by-side:\n\n```bash\n# Basic comparison\nml-agents eval compare --approaches \"ChainOfThought,AsPlanning,TreeOfThought\" --samples 100\n\n# Parallel execution for faster results\nml-agents eval compare --approaches \"None,ChainOfThought,Reflection\" --samples 200 --parallel --max-workers 4\n\n# Advanced reasoning comparison\nml-agents eval compare --approaches \"ChainOfVerification,Reflection,SkeletonOfThought\" --multi-step-verification --parallel\n```\n\n### Configuration Files\n\nFor complex experiments, use YAML configuration files:\n\n```bash\n# Run with configuration file (\u26a0\ufe0f PRE-ALPHA)\nml-agents eval run --config examples/configs/single_experiment.yaml\n\n# Override specific parameters (\u26a0\ufe0f PRE-ALPHA)\nml-agents eval run --config examples/configs/comparison_study.yaml --samples 200 --parallel\n```\n\n**Example configuration** (`config.yaml`):\n\n```yaml\nexperiment:\n name: \"reasoning_comparison_study\"\n sample_count: 100\n output_dir: \"./results\"\n\nmodel:\n provider: \"openrouter\"\n name: \"openai/gpt-oss-120b\"\n temperature: 0.3\n max_tokens: 512\n\nreasoning:\n approaches:\n - ChainOfThought\n - AsPlanning\n - TreeOfThought\n multi_step_verification: true\n max_reasoning_calls: 5\n\nexecution:\n parallel: true\n max_workers: 4\n save_checkpoints: true\n```\n\n### Checkpoint Management (\u26a0\ufe0f PRE-ALPHA)\n\nResume interrupted experiments:\n\n```bash\n# List available checkpoints\nml-agents eval checkpoints\n\n# Resume from specific checkpoint\nml-agents eval resume checkpoint_exp_20250818_123456.json\n```\n\n### Advanced Features (\u26a0\ufe0f PRE-ALPHA)\n\n#### Cost Control\n\n```bash\n# Set reasoning limits to control costs\nml-agents eval run --approach ChainOfVerification --max-reasoning-calls 3 --samples 50\n\n# Monitor costs with verbose output\nml-agents eval compare --approaches \"ChainOfThought,TreeOfThought\" --samples 100 --verbose\n```\n\n#### Multi-step Reasoning\n\n```bash\n# Enable multi-step reflection\nml-agents eval run --approach Reflection --multi-step-reflection --max-reflection-iterations 3\n\n# Enable multi-step verification\nml-agents eval run --approach ChainOfVerification --multi-step-verification --max-reasoning-calls 5\n```\n\n#### Parallel Processing\n\n```bash\n# Parallel execution with custom worker count\nml-agents eval compare --approaches \"ChainOfThought,AsPlanning,TreeOfThought,Reflection\" --parallel --max-workers 2\n\n# Balance speed vs rate limits\nml-agents eval compare --approaches \"None,ChainOfThought\" --samples 500 --parallel --max-workers 8\n```\n\n### Output and Results\n\nResults are organized by dataset with full preprocessing-evaluation traceability:\n\n```\n./outputs/\n\u251c\u2500\u2500 {dataset_name}/\n\u2502 \u251c\u2500\u2500 preprocessing/\n\u2502 \u2502 \u251c\u2500\u2500 {timestamp}/\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 analysis.json # Dataset schema analysis\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 rules.json # Transformation rules\n\u2502 \u2502 \u2502 \u251c\u2500\u2500 processed.json # Standardized dataset\n\u2502 \u2502 \u2502 \u2514\u2500\u2500 metadata.json # Preprocessing metadata\n\u2502 \u2502 \u2514\u2500\u2500 latest \u2192 {most_recent}/ # Symlink to latest preprocessing\n\u2502 \u2514\u2500\u2500 eval/\n\u2502 \u251c\u2500\u2500 {exp_timestamp}/\n\u2502 \u2502 \u251c\u2500\u2500 experiment_config.json # Experiment configuration\n\u2502 \u2502 \u251c\u2500\u2500 experiment_results.csv # Detailed results per approach\n\u2502 \u2502 \u251c\u2500\u2500 experiment_summary.json # Performance summary\n\u2502 \u2502 \u2514\u2500\u2500 experiment_errors.json # Any processing errors\n\u2502 \u2514\u2500\u2500 latest \u2192 {most_recent}/ # Symlink to latest experiment\n```\n\nEach result file contains:\n\n- Input prompts and model responses\n- Complete reasoning traces\n- Performance metrics (accuracy, time, cost)\n- Configuration details\n- Error information\n- **Preprocessing lineage**: Complete traceability to preprocessing rules used\n\n### Example Workflows\n\n#### 1. Environment Setup (Stable)\n\n```bash\n# Validate your setup first\nml-agents setup validate-env\nml-agents setup list-approaches\nml-agents setup version\n```\n\n#### 2. Database Management (Stable)\n\n```bash\n# Initialize and manage experiment database\nml-agents db init\nml-agents db stats\nml-agents db backup --source ./ml_agents_results.db\n```\n\n#### 3. Dataset Preprocessing (Stable)\n\n```bash\n# Preprocess datasets for evaluation\nml-agents preprocess list\nml-agents preprocess inspect MilaWang/SpatialEval --samples 100\nml-agents preprocess batch --max 5\n```\n\n#### 4. Complete Dataset \u2192 Evaluation Pipeline\n\n```bash\n# 1. Preprocess a custom dataset (creates organized folder structure)\nml-agents preprocess inspect MilaWang/SpatialEval --config tqa\nml-agents preprocess generate-rules MilaWang/SpatialEval --config tqa\nml-agents preprocess transform MilaWang/SpatialEval rules.json --config tqa\n\n# 2. Run evaluation with preprocessed data (auto-detects latest preprocessing) (\u26a0\ufe0f PRE-ALPHA)\nml-agents eval run --dataset MilaWang/SpatialEval --approach ChainOfThought --samples 50\n\n# 3. Compare approaches on same preprocessed dataset (\u26a0\ufe0f PRE-ALPHA)\nml-agents eval run --dataset MilaWang/SpatialEval --approach TreeOfThought --samples 50\nml-agents eval run --dataset MilaWang/SpatialEval --approach Reflection --samples 50\n\n# 4. View organized results\nml-agents results list\n```\n\n#### 5. Quick Testing (\u26a0\ufe0f PRE-ALPHA)\n\n```bash\n# Test with small sample size\nml-agents eval run --approach ChainOfThought --samples 5 --verbose\n```\n\n#### 6. Research Study (\u26a0\ufe0f PRE-ALPHA)\n\n```bash\n# Comprehensive comparison study\nml-agents eval compare \\\n --approaches \"None,ChainOfThought,AsPlanning,TreeOfThought,Reflection\" \\\n --samples 200 \\\n --parallel \\\n --max-workers 4 \\\n --multi-step-verification \\\n --output \"./studies/comprehensive_study\"\n```\n\n### Command Reference\n\n#### Stable Commands (Production Ready)\n| Command | Description |\n|---------|-------------|\n| `ml-agents setup validate-env` | Check environment setup |\n| `ml-agents setup list-approaches` | Show available reasoning methods |\n| `ml-agents setup version` | Show version information |\n| `ml-agents db init` | Initialize experiment database |\n| `ml-agents db backup` | Create database backup |\n| `ml-agents db stats` | Show database statistics |\n| `ml-agents db migrate` | Migrate database schema |\n| `ml-agents preprocess list` | List unprocessed datasets |\n| `ml-agents preprocess inspect` | Inspect dataset schema |\n| `ml-agents preprocess batch` | Batch process datasets |\n\n#### Pre-Alpha Commands (\u26a0\ufe0f Experimental)\n| Command | Description |\n|---------|-------------|\n| `ml-agents eval run` | Single reasoning experiment |\n| `ml-agents eval compare` | Multi-approach comparison |\n| `ml-agents eval resume` | Resume from checkpoint |\n| `ml-agents eval checkpoints` | Show available checkpoints |\n| `ml-agents results export` | Export experiment results |\n| `ml-agents results compare` | Compare experiments |\n| `ml-agents results analyze` | Analyze experiment patterns |\n\nFor detailed help on any command:\n\n```bash\nml-agents setup --help\nml-agents eval run --help\nml-agents db --help\n```\n\n## Jupyter Notebook Usage (Original Interface)\n\nFor users who prefer the notebook interface:\n\n1. **Setup**: Ensure dependencies are installed via `./setup.sh`\n2. **Configuration**: Use interactive widgets to select models and approaches\n3. **Data**: Default uses \"bbeh-eval\" dataset, customizable\n4. **Execute**: Run experiment cells to process your dataset\n5. **Results**: Tables and CSV files with format `{model}_{approach}_{timestamp}.csv`\n\n## Dataset Requirements\n\nYour dataset should include:\n\n- **input** column: The question/problem to solve\n- **answer** column (optional): Expected output for evaluation\n- **task** column (optional): Task category for analysis\n\n## Output Files\n\nThe notebook generates CSV files containing:\n\n- Input prompts\n- Model outputs\n- Full reasoning traces\n- Execution time\n- Cost estimates\n- Configuration details\n\n## Project Structure\n\n```\nml-agents/\n\u251c\u2500\u2500 src/ # Main source code\n\u2502 \u251c\u2500\u2500 cli/ # CLI interface (Phase 5)\n\u2502 \u2502 \u251c\u2500\u2500 main.py # CLI entry point\n\u2502 \u2502 \u251c\u2500\u2500 commands.py # Run/compare commands\n\u2502 \u2502 \u251c\u2500\u2500 config_loader.py # Configuration management\n\u2502 \u2502 \u251c\u2500\u2500 display.py # Rich output formatting\n\u2502 \u2502 \u2514\u2500\u2500 validators.py # Input validation\n\u2502 \u251c\u2500\u2500 core/ # Core experiment logic\n\u2502 \u2502 \u251c\u2500\u2500 experiment_runner.py # Experiment orchestration\n\u2502 \u2502 \u251c\u2500\u2500 dataset_loader.py # Dataset loading\n\u2502 \u2502 \u2514\u2500\u2500 reasoning_inference.py # Inference engine\n\u2502 \u251c\u2500\u2500 reasoning/ # Reasoning approaches\n\u2502 \u2502 \u251c\u2500\u2500 base.py # Base reasoning class\n\u2502 \u2502 \u251c\u2500\u2500 chain_of_thought.py # CoT implementation\n\u2502 \u2502 \u251c\u2500\u2500 tree_of_thought.py # ToT implementation\n\u2502 \u2502 \u2514\u2500\u2500 ... # Other approaches\n\u2502 \u2514\u2500\u2500 utils/ # Utilities\n\u2502 \u251c\u2500\u2500 api_clients.py # API wrappers\n\u2502 \u251c\u2500\u2500 rate_limiter.py # Rate limiting\n\u2502 \u2514\u2500\u2500 logging_config.py # Logging setup\n\u251c\u2500\u2500 examples/ # Usage examples\n\u2502 \u251c\u2500\u2500 configs/ # Configuration templates\n\u2502 \u251c\u2500\u2500 scripts/ # Batch processing scripts\n\u2502 \u2514\u2500\u2500 README.md # Examples documentation\n\u251c\u2500\u2500 tests/ # Test suite\n\u251c\u2500\u2500 outputs/ # Organized experiment and preprocessing results\n\u2502 \u2514\u2500\u2500 {dataset_name}/ # Dataset-centric organization\n\u2502 \u251c\u2500\u2500 preprocessing/ # Preprocessing runs with timestamps\n\u2502 \u2514\u2500\u2500 eval/ # Evaluation runs with timestamps\n\u251c\u2500\u2500 Reasoning_LLM.ipynb # Original Jupyter notebook\n\u251c\u2500\u2500 config.py # Environment configuration\n\u251c\u2500\u2500 requirements.txt # Python dependencies\n\u251c\u2500\u2500 setup.sh # Automated setup script\n\u251c\u2500\u2500 Makefile # Development commands\n\u2514\u2500\u2500 README.md # This file\n```\n\n## Best Practices\n\n### For Researchers\n\n1. **Start Small**: Begin with `--samples 10` to test approaches quickly\n2. **Use Baselines**: Always include `None` approach for comparison\n3. **Cost Control**: Monitor costs with `--verbose` and set `--max-reasoning-calls`\n4. **Parallel Processing**: Use `--parallel` for faster comparison studies\n5. **Reproducibility**: Save configuration files and use checkpoints\n\n### For Cost Management\n\n1. **Temperature Settings**: Lower values (0.1-0.3) for consistent, cost-effective results\n2. **Token Limits**: Set appropriate `--max-tokens` based on your task complexity\n3. **Sample Sizing**: Use smaller samples for initial exploration\n4. **Provider Selection**: Compare costs across different providers\n5. **Multi-step Limits**: Control `--max-reasoning-calls` for approaches like Chain-of-Verification\n\n### For Performance\n\n1. **Parallel Execution**: Use `--parallel --max-workers N` for comparison studies\n2. **Checkpoint Usage**: Enable checkpoints for long-running experiments\n3. **Rate Limiting**: Adjust `--max-workers` based on provider rate limits\n4. **Batch Processing**: Use configuration files and scripts for multiple experiments\n\n## Troubleshooting\n\n### Common Issues\n\n#### Environment Setup\n\n```bash\n# Check environment\nml-agents setup validate-env\n\n# Fix dependency issues\nmake clean && make install-dev\n\n# Verify imports\nmake debug-imports\n```\n\n#### API Key Problems\n\n```bash\n# Check .env file exists and has keys\ncat .env\n\n# Validate specific provider\nml-agents setup validate-env\n```\n\nError messages will guide you to set missing keys:\n\n```bash\nexport OPENROUTER_API_KEY=\"your_key_here\"\nexport ANTHROPIC_API_KEY=\"your_key_here\"\n```\n\n#### Rate Limiting\n\nIf you encounter rate limits:\n\n```bash\n# Reduce parallel workers\nml-agents eval compare --approaches \"ChainOfThought,AsPlanning\" --max-workers 1\n\n# Add delays between requests\nml-agents eval run --approach ChainOfThought --samples 50 --parallel false\n```\n\n#### Memory Issues\n\nFor large experiments:\n\n```bash\n# Reduce sample size\nml-agents eval compare --approaches \"ChainOfThought,TreeOfThought\" --samples 50\n\n# Disable parallel processing\nml-agents eval compare --approaches \"...\" --parallel false\n```\n\n#### NumPy Compatibility Warning\n\nThe warning about NumPy 1.x vs 2.x is cosmetic and doesn't affect functionality:\n\n```\nA module that was compiled using NumPy 1.x cannot be run in NumPy 2.3.2...\n```\n\nThis is a known PyTorch compatibility issue and can be ignored.\n\n### CLI Issues\n\n#### Command Not Found\n\n```bash\n# Reinstall the package\nmake install-dev\n\n# Check entry point\nwhich ml-agents\n```\n\n#### Import Errors\n\n```bash\n# Activate virtual environment\nsource .venv/bin/activate\n\n# Test imports\nmake debug-imports\n```\n\n#### Configuration Validation\n\nFor configuration errors, check:\n\n1. YAML/JSON syntax is valid\n2. All required fields are present\n3. Approach names match available options (`ml-agents list-approaches`)\n4. Provider/model combinations are supported\n\n### Getting Help\n\n1. **Command Help**: Use `--help` with any command\n\n ```bash\n ml-agents --help\n ml-agents eval run --help\n ml-agents eval compare --help\n ```\n\n2. **Verbose Output**: Add `--verbose` to see detailed execution logs\n\n ```bash\n ml-agents eval run --approach ChainOfThought --samples 5 --verbose\n ```\n\n3. **Check Status**: Validate your setup\n\n ```bash\n ml-agents setup validate-env\n ml-agents setup list-approaches\n make validate-env\n ```\n\n4. **Community Support**: Join the Discord #ml-agents channel for help\n\n### Development Issues\n\nFor developers working on the codebase:\n\n```bash\n# Run test suite\nmake test\n\n# Check code quality\nmake lint\n\n# Type checking\nmake type-check\n\n# Full development check\nmake check\n```\n\n## Development Tools\n\n### Claude Code MCP Server Setup\n\nFor developers using Claude Code, enable direct database queries in conversations:\n\n```bash\n# Configure MCP server (one-time setup)\nmake configure-mcp\n\n# Or run the script directly\n./scripts/install-sqlite-mcp-server.sh\n```\n\n**Available MCP Tools**:\n\n- `read_query`: Execute validated SELECT queries\n- `list_tables`: Show all database tables\n- `describe_table`: Show table schemas\n\n**\u26a0\ufe0f Note**: Project-scoped MCP servers don't appear in `claude mcp list` due to a [known bug](https://github.com/anthropics/claude-code/issues/5963). Use `claude mcp get sqlite-read-only` to verify installation.\n\n## Contributing\n\nFeel free to extend the notebook with:\n\n- Additional reasoning approaches\n- New evaluation metrics\n- Support for more models/providers\n- Performance optimizations\n\n## License\n\n### Recommended: CC BY 4.0 (Creative Commons Attribution 4.0 International)\n\nThis project is licensed under the Creative Commons Attribution 4.0 International License. This means:\n\n- \u2705 **Share** - Copy and redistribute the material in any medium or format\n- \u2705 **Adapt** - Remix, transform, and build upon the material for any purpose, even commercially\n- \u2705 **Attribution** - You must give appropriate credit, provide a link to the license, and indicate if changes were made\n\nThis license is chosen because:\n\n1. **Open Science**: Aligns with Cohere Labs' open science mission\n2. **Maximum Impact**: Allows both academic and commercial use, accelerating AI research\n3. **Community Growth**: Enables derivatives while ensuring original work is credited\n4. **Simplicity**: Easy to understand and implement\n\n**Note**: For the code components specifically, you may want to consider dual-licensing with MIT or Apache 2.0 for better software compatibility.\n\n<a rel=\"license\" href=\"http://creativecommons.org/licenses/by/4.0/\"><img alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by/4.0/88x31.png\" /></a>\n\n### Alternative Options Considered\n\n- **CC BY-SA 4.0**: Adds \"ShareAlike\" requirement - derivatives must use same license (more restrictive but ensures openness)\n- **CC BY-NC 4.0**: Adds \"NonCommercial\" restriction - prevents commercial use (limits industry collaboration)\n- **CC0**: Public domain dedication - no attribution required (maximum freedom but no credit requirement)\n",
"bugtrack_url": null,
"license": null,
"summary": "ML Agents Reasoning Research Platform",
"version": "0.0.18a0",
"project_urls": {
"Documentation": "https://github.com/thompsonson/c4ai-ml-agents#readme",
"Homepage": "https://github.com/thompsonson/c4ai-ml-agents",
"Issues": "https://github.com/thompsonson/c4ai-ml-agents/issues",
"Repository": "https://github.com/thompsonson/c4ai-ml-agents"
},
"split_keywords": [
"reasoning",
" llm",
" benchmark",
" research",
" ai"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "504d2698c359afeb7aa94fec8d0a8bd25bb27a14f855df83ef96428ebd3cfdc5",
"md5": "e1e10f97c5b1a45c1fbebc0c04e09862",
"sha256": "8c233d9e67669525bede4821da733830ebc8e47a18c6024d2cfc7b9b42140b93"
},
"downloads": -1,
"filename": "ml_agents_reasoning-0.0.18a0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e1e10f97c5b1a45c1fbebc0c04e09862",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 147285,
"upload_time": "2025-09-02T20:18:57",
"upload_time_iso_8601": "2025-09-02T20:18:57.834480Z",
"url": "https://files.pythonhosted.org/packages/50/4d/2698c359afeb7aa94fec8d0a8bd25bb27a14f855df83ef96428ebd3cfdc5/ml_agents_reasoning-0.0.18a0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c82fcf94c581e5e9f8bbd080c81b3fe915b32d70bd2b94c8ed74d0f3b66f4c0e",
"md5": "f2048b2e0cde11afd0d4c9380dd41de9",
"sha256": "deb8845734f8aff82fa75c4d54f58066d4ffcc1d34f13aed63b98caabcfa28e1"
},
"downloads": -1,
"filename": "ml_agents_reasoning-0.0.18a0.tar.gz",
"has_sig": false,
"md5_digest": "f2048b2e0cde11afd0d4c9380dd41de9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 408766,
"upload_time": "2025-09-02T20:18:59",
"upload_time_iso_8601": "2025-09-02T20:18:59.522262Z",
"url": "https://files.pythonhosted.org/packages/c8/2f/cf94c581e5e9f8bbd080c81b3fe915b32d70bd2b94c8ed74d0f3b66f4c0e/ml_agents_reasoning-0.0.18a0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-02 20:18:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "thompsonson",
"github_project": "c4ai-ml-agents#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": [
[
"<",
"2.0.0"
]
]
},
{
"name": "openai",
"specs": []
},
{
"name": "cohere",
"specs": []
},
{
"name": "anthropic",
"specs": []
},
{
"name": "ipywidgets",
"specs": []
},
{
"name": "tqdm",
"specs": []
},
{
"name": "huggingface_hub",
"specs": []
},
{
"name": "pandas",
"specs": []
},
{
"name": "datasets",
"specs": []
},
{
"name": "python-dotenv",
"specs": []
},
{
"name": "jupyter",
"specs": []
},
{
"name": "ipykernel",
"specs": []
},
{
"name": "typer",
"specs": []
},
{
"name": "rich",
"specs": []
},
{
"name": "pydantic",
"specs": []
},
{
"name": "instructor",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "openpyxl",
"specs": [
[
">=",
"3.1.0"
]
]
}
],
"lcname": "ml-agents-reasoning"
}