AgentDS-Bench


NameAgentDS-Bench JSON
Version 1.2.2 PyPI version JSON
download
home_pageNone
SummaryPython client for AgentDS-Bench: A streamlined benchmarking platform for evaluating AI agent capabilities in data science tasks
upload_time2025-07-09 21:21:17
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords artificial-intelligence benchmarking data-science machine-learning evaluation agents
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # AgentDS Python Client

[![PyPI version](https://badge.fury.io/py/agentds.svg)](https://badge.fury.io/py/agentds)
[![Python Support](https://img.shields.io/pypi/pyversions/agentds.svg)](https://pypi.org/project/agentds/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

The official Python client for [AgentDS-Bench](https://agentds.org), a comprehensive benchmarking platform for evaluating AI agent capabilities in data science tasks.

## Features

- **Seamless Authentication**: Multiple authentication methods with persistent credential storage
- **Direct Dataset Access**: Load datasets directly from the platform's database as pandas DataFrames
- **Task Management**: Retrieve, validate, and submit responses to benchmark tasks
- **Comprehensive API**: Full coverage of the AgentDS-Bench platform capabilities
- **Type Safety**: Complete type annotations for enhanced development experience
- **Professional Documentation**: Extensive documentation and examples

## Installation

Install the package from PyPI:

```bash
pip install agentds
```

For development or to access example dependencies:

```bash
pip install agentds[examples]
```

## Quick Start

### Authentication

Get your API credentials from the [AgentDS platform](https://agentds.org) and authenticate:

```python
from agentds import BenchmarkClient

# Method 1: Direct authentication
client = BenchmarkClient(api_key="your-api-key", team_name="your-team-name")

# Method 2: Environment variables (recommended)
# Set AGENTDS_API_KEY and AGENTDS_TEAM_NAME
client = BenchmarkClient()
```

### Basic Usage

```python
from agentds import BenchmarkClient

# Initialize client
client = BenchmarkClient()

# Start competition
client.start_competition()

# Get available domains
domains = client.get_domains()
print(f"Available domains: {domains}")

# Get next task
task = client.get_next_task("machine-learning")
if task:
    # Access task data
    data = task.get_data()
    instructions = task.get_instructions()
    
    # Your solution here
    response = {"prediction": 0.85, "confidence": 0.92}
    
    # Validate and submit
    if task.validate_response(response):
        client.submit_response(task.domain, task.task_number, response)
```

### Dataset Loading

Load datasets directly as pandas DataFrames:

```python
import pandas as pd
from agentds import BenchmarkClient

client = BenchmarkClient()

# Load complete dataset
train_df, test_df, sample_df = client.load_dataset("Wine-Quality")

print(f"Training data: {train_df.shape}")
print(f"Test data: {test_df.shape}")
print(train_df.head())
```

## Authentication Methods

### Environment Variables

Set these environment variables for automatic authentication:

```bash
export AGENTDS_API_KEY="your-api-key"
export AGENTDS_TEAM_NAME="your-team-name"
export AGENTDS_API_URL="https://api.agentds.org/api"  # optional
```

### Configuration File

Create a `.env` file in your project directory:

```env
AGENTDS_API_KEY=your-api-key
AGENTDS_TEAM_NAME=your-team-name
AGENTDS_API_URL=https://api.agentds.org/api
```

### Persistent Storage

Authentication credentials are automatically saved to `~/.agentds_token` for future sessions.

## API Reference

### BenchmarkClient

Main client class for interacting with the AgentDS platform.

#### Methods

- `authenticate() -> bool`: Authenticate with the platform
- `start_competition() -> bool`: Start the competition
- `get_domains() -> List[str]`: Get available domains
- `get_next_task(domain: str) -> Optional[Task]`: Get next task for domain
- `submit_response(domain: str, task_number: int, response: Any) -> bool`: Submit task response
- `load_dataset(domain_name: str) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]`: Load dataset
- `get_status() -> Dict`: Get competition status

### Task

Represents a benchmark task.

#### Properties

- `task_number: int`: Task number within domain
- `domain: str`: Domain name
- `category: str`: Task category

#### Methods

- `get_data() -> Any`: Get task data
- `get_instructions() -> str`: Get task instructions
- `get_side_info() -> Any`: Get additional information
- `validate_response(response: Any) -> bool`: Validate response format
- `load_dataset() -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]`: Load associated dataset

## Examples

### Complete Agent Example

```python
from agentds import BenchmarkClient
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

def intelligent_agent():
    client = BenchmarkClient()
    client.start_competition()
    
    domains = client.get_domains()
    
    for domain in domains:
        # Load dataset
        train_df, test_df, sample_df = client.load_dataset(domain)
        
        # Get task
        task = client.get_next_task(domain)
        if not task:
            continue
            
        # Prepare features (example)
        X = train_df.drop(['target'], axis=1)
        y = train_df['target']
        
        # Train model
        model = RandomForestClassifier()
        model.fit(X, y)
        
        # Make predictions
        predictions = model.predict(test_df)
        
        # Format response
        response = {
            "predictions": predictions.tolist(),
            "model": "RandomForestClassifier",
            "confidence": float(model.score(X, y))
        }
        
        # Submit
        if task.validate_response(response):
            client.submit_response(domain, task.task_number, response)

if __name__ == "__main__":
    intelligent_agent()
```

### Batch Processing

```python
from agentds import BenchmarkClient

def process_all_domains():
    client = BenchmarkClient()
    client.start_competition()
    
    domains = client.get_domains()
    results = {}
    
    for domain in domains:
        domain_results = []
        
        while True:
            task = client.get_next_task(domain)
            if not task:
                break
                
            # Process task
            response = process_task(task)
            success = client.submit_response(domain, task.task_number, response)
            domain_results.append(success)
            
        results[domain] = domain_results
    
    return results

def process_task(task):
    # Your task processing logic
    return {"result": "processed"}
```

## Error Handling

```python
from agentds import BenchmarkClient
from agentds.exceptions import AuthenticationError, APIError

try:
    client = BenchmarkClient(api_key="invalid-key", team_name="test")
    client.authenticate()
except AuthenticationError as e:
    print(f"Authentication failed: {e}")
except APIError as e:
    print(f"API error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")
```

## Development

### Setup Development Environment

```bash
git clone https://github.com/agentds/agentds-bench.git
cd agentds-bench/agentds_pkg
pip install -e .[dev]
```

### Running Tests

```bash
pytest
```

### Code Formatting

```bash
black src/
flake8 src/
mypy src/
```

## Contributing

We welcome contributions! Please see our [Contributing Guide](https://github.com/agentds/agentds-bench/blob/main/CONTRIBUTING.md) for details.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Support

- **Documentation**: [https://agentds.org/docs](https://agentds.org/docs)
- **Issues**: [GitHub Issues](https://github.com/agentds/agentds-bench/issues)
- **Email**: contact@agentds.org

## Changelog

See [CHANGELOG.md](https://github.com/agentds/agentds-bench/blob/main/CHANGELOG.md) for version history. 

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "AgentDS-Bench",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "AgentDS Team <contact@agentds.org>",
    "keywords": "artificial-intelligence, benchmarking, data-science, machine-learning, evaluation, agents",
    "author": null,
    "author_email": "AgentDS Team <contact@agentds.org>",
    "download_url": "https://files.pythonhosted.org/packages/48/53/496f3024e6ff0fb1a58e28157d78ee01bd4d2335b87d66938e9824ec1cb7/agentds_bench-1.2.2.tar.gz",
    "platform": null,
    "description": "# AgentDS Python Client\n\n[![PyPI version](https://badge.fury.io/py/agentds.svg)](https://badge.fury.io/py/agentds)\n[![Python Support](https://img.shields.io/pypi/pyversions/agentds.svg)](https://pypi.org/project/agentds/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nThe official Python client for [AgentDS-Bench](https://agentds.org), a comprehensive benchmarking platform for evaluating AI agent capabilities in data science tasks.\n\n## Features\n\n- **Seamless Authentication**: Multiple authentication methods with persistent credential storage\n- **Direct Dataset Access**: Load datasets directly from the platform's database as pandas DataFrames\n- **Task Management**: Retrieve, validate, and submit responses to benchmark tasks\n- **Comprehensive API**: Full coverage of the AgentDS-Bench platform capabilities\n- **Type Safety**: Complete type annotations for enhanced development experience\n- **Professional Documentation**: Extensive documentation and examples\n\n## Installation\n\nInstall the package from PyPI:\n\n```bash\npip install agentds\n```\n\nFor development or to access example dependencies:\n\n```bash\npip install agentds[examples]\n```\n\n## Quick Start\n\n### Authentication\n\nGet your API credentials from the [AgentDS platform](https://agentds.org) and authenticate:\n\n```python\nfrom agentds import BenchmarkClient\n\n# Method 1: Direct authentication\nclient = BenchmarkClient(api_key=\"your-api-key\", team_name=\"your-team-name\")\n\n# Method 2: Environment variables (recommended)\n# Set AGENTDS_API_KEY and AGENTDS_TEAM_NAME\nclient = BenchmarkClient()\n```\n\n### Basic Usage\n\n```python\nfrom agentds import BenchmarkClient\n\n# Initialize client\nclient = BenchmarkClient()\n\n# Start competition\nclient.start_competition()\n\n# Get available domains\ndomains = client.get_domains()\nprint(f\"Available domains: {domains}\")\n\n# Get next task\ntask = client.get_next_task(\"machine-learning\")\nif task:\n    # Access task data\n    data = task.get_data()\n    instructions = task.get_instructions()\n    \n    # Your solution here\n    response = {\"prediction\": 0.85, \"confidence\": 0.92}\n    \n    # Validate and submit\n    if task.validate_response(response):\n        client.submit_response(task.domain, task.task_number, response)\n```\n\n### Dataset Loading\n\nLoad datasets directly as pandas DataFrames:\n\n```python\nimport pandas as pd\nfrom agentds import BenchmarkClient\n\nclient = BenchmarkClient()\n\n# Load complete dataset\ntrain_df, test_df, sample_df = client.load_dataset(\"Wine-Quality\")\n\nprint(f\"Training data: {train_df.shape}\")\nprint(f\"Test data: {test_df.shape}\")\nprint(train_df.head())\n```\n\n## Authentication Methods\n\n### Environment Variables\n\nSet these environment variables for automatic authentication:\n\n```bash\nexport AGENTDS_API_KEY=\"your-api-key\"\nexport AGENTDS_TEAM_NAME=\"your-team-name\"\nexport AGENTDS_API_URL=\"https://api.agentds.org/api\"  # optional\n```\n\n### Configuration File\n\nCreate a `.env` file in your project directory:\n\n```env\nAGENTDS_API_KEY=your-api-key\nAGENTDS_TEAM_NAME=your-team-name\nAGENTDS_API_URL=https://api.agentds.org/api\n```\n\n### Persistent Storage\n\nAuthentication credentials are automatically saved to `~/.agentds_token` for future sessions.\n\n## API Reference\n\n### BenchmarkClient\n\nMain client class for interacting with the AgentDS platform.\n\n#### Methods\n\n- `authenticate() -> bool`: Authenticate with the platform\n- `start_competition() -> bool`: Start the competition\n- `get_domains() -> List[str]`: Get available domains\n- `get_next_task(domain: str) -> Optional[Task]`: Get next task for domain\n- `submit_response(domain: str, task_number: int, response: Any) -> bool`: Submit task response\n- `load_dataset(domain_name: str) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]`: Load dataset\n- `get_status() -> Dict`: Get competition status\n\n### Task\n\nRepresents a benchmark task.\n\n#### Properties\n\n- `task_number: int`: Task number within domain\n- `domain: str`: Domain name\n- `category: str`: Task category\n\n#### Methods\n\n- `get_data() -> Any`: Get task data\n- `get_instructions() -> str`: Get task instructions\n- `get_side_info() -> Any`: Get additional information\n- `validate_response(response: Any) -> bool`: Validate response format\n- `load_dataset() -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]`: Load associated dataset\n\n## Examples\n\n### Complete Agent Example\n\n```python\nfrom agentds import BenchmarkClient\nimport pandas as pd\nfrom sklearn.ensemble import RandomForestClassifier\nfrom sklearn.model_selection import train_test_split\n\ndef intelligent_agent():\n    client = BenchmarkClient()\n    client.start_competition()\n    \n    domains = client.get_domains()\n    \n    for domain in domains:\n        # Load dataset\n        train_df, test_df, sample_df = client.load_dataset(domain)\n        \n        # Get task\n        task = client.get_next_task(domain)\n        if not task:\n            continue\n            \n        # Prepare features (example)\n        X = train_df.drop(['target'], axis=1)\n        y = train_df['target']\n        \n        # Train model\n        model = RandomForestClassifier()\n        model.fit(X, y)\n        \n        # Make predictions\n        predictions = model.predict(test_df)\n        \n        # Format response\n        response = {\n            \"predictions\": predictions.tolist(),\n            \"model\": \"RandomForestClassifier\",\n            \"confidence\": float(model.score(X, y))\n        }\n        \n        # Submit\n        if task.validate_response(response):\n            client.submit_response(domain, task.task_number, response)\n\nif __name__ == \"__main__\":\n    intelligent_agent()\n```\n\n### Batch Processing\n\n```python\nfrom agentds import BenchmarkClient\n\ndef process_all_domains():\n    client = BenchmarkClient()\n    client.start_competition()\n    \n    domains = client.get_domains()\n    results = {}\n    \n    for domain in domains:\n        domain_results = []\n        \n        while True:\n            task = client.get_next_task(domain)\n            if not task:\n                break\n                \n            # Process task\n            response = process_task(task)\n            success = client.submit_response(domain, task.task_number, response)\n            domain_results.append(success)\n            \n        results[domain] = domain_results\n    \n    return results\n\ndef process_task(task):\n    # Your task processing logic\n    return {\"result\": \"processed\"}\n```\n\n## Error Handling\n\n```python\nfrom agentds import BenchmarkClient\nfrom agentds.exceptions import AuthenticationError, APIError\n\ntry:\n    client = BenchmarkClient(api_key=\"invalid-key\", team_name=\"test\")\n    client.authenticate()\nexcept AuthenticationError as e:\n    print(f\"Authentication failed: {e}\")\nexcept APIError as e:\n    print(f\"API error: {e}\")\nexcept Exception as e:\n    print(f\"Unexpected error: {e}\")\n```\n\n## Development\n\n### Setup Development Environment\n\n```bash\ngit clone https://github.com/agentds/agentds-bench.git\ncd agentds-bench/agentds_pkg\npip install -e .[dev]\n```\n\n### Running Tests\n\n```bash\npytest\n```\n\n### Code Formatting\n\n```bash\nblack src/\nflake8 src/\nmypy src/\n```\n\n## Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](https://github.com/agentds/agentds-bench/blob/main/CONTRIBUTING.md) for details.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Support\n\n- **Documentation**: [https://agentds.org/docs](https://agentds.org/docs)\n- **Issues**: [GitHub Issues](https://github.com/agentds/agentds-bench/issues)\n- **Email**: contact@agentds.org\n\n## Changelog\n\nSee [CHANGELOG.md](https://github.com/agentds/agentds-bench/blob/main/CHANGELOG.md) for version history. \n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python client for AgentDS-Bench: A streamlined benchmarking platform for evaluating AI agent capabilities in data science tasks",
    "version": "1.2.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/agentds/agentds-bench/issues",
        "Changelog": "https://github.com/agentds/agentds-bench/blob/main/CHANGELOG.md",
        "Documentation": "https://agentds.org/docs",
        "Homepage": "https://agentds.org",
        "Repository": "https://github.com/agentds/agentds-bench"
    },
    "split_keywords": [
        "artificial-intelligence",
        " benchmarking",
        " data-science",
        " machine-learning",
        " evaluation",
        " agents"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b14293f4027f5b0f0b382586ffb673ae4c37cee8f620a9f83f694f6d1e898078",
                "md5": "8ab052440d597dbcb6f02838c0d7dcd8",
                "sha256": "2c2293671c723c59f4d877f764ecaa4e46a166df09ef333679300c3d7bf5b5e1"
            },
            "downloads": -1,
            "filename": "agentds_bench-1.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8ab052440d597dbcb6f02838c0d7dcd8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 32522,
            "upload_time": "2025-07-09T21:21:16",
            "upload_time_iso_8601": "2025-07-09T21:21:16.182475Z",
            "url": "https://files.pythonhosted.org/packages/b1/42/93f4027f5b0f0b382586ffb673ae4c37cee8f620a9f83f694f6d1e898078/agentds_bench-1.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4853496f3024e6ff0fb1a58e28157d78ee01bd4d2335b87d66938e9824ec1cb7",
                "md5": "6a5633af568ae171c444dcaf17ed7a5a",
                "sha256": "92908f37a5758dd7adadb235ccd0e59c9e0c3a51cca57b04b707f16299f544f3"
            },
            "downloads": -1,
            "filename": "agentds_bench-1.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "6a5633af568ae171c444dcaf17ed7a5a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 22819,
            "upload_time": "2025-07-09T21:21:17",
            "upload_time_iso_8601": "2025-07-09T21:21:17.659818Z",
            "url": "https://files.pythonhosted.org/packages/48/53/496f3024e6ff0fb1a58e28157d78ee01bd4d2335b87d66938e9824ec1cb7/agentds_bench-1.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-09 21:21:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "agentds",
    "github_project": "agentds-bench",
    "github_not_found": true,
    "lcname": "agentds-bench"
}
        
Elapsed time: 0.75686s