# ML Research Benchmark Tasks
This repository contains the tasks for ML Research Benchmark, a benchmarkdesigned to evaluate the capabilities of AI agents in accelerating ML research and development. The benchmark consists of 9 competition-level tasks that span the spectrum of activities typically undertaken by ML researchers.
## Introduction
The MLRB aims to measure the acceleration of AI agents in ML research and development. It focuses on competition-level tasks that reflect the current frontiers of ML research, providing a more nuanced and challenging evaluation environment than existing benchmarks.
[![arXiv](https://img.shields.io/badge/arXiv-2410.22553-b31b1b.svg)](https://arxiv.org/abs/2410.22553)
- [:paperclip: ML Research Benchmark Paper](https://arxiv.org/abs/2410.22553)
- [:robot: ML Research Agent](https://github.com/AlgorithmicResearchGroup/ML-Research-Agent)
- [:white_check_mark: ML Research Tasks](https://github.com/AlgorithmicResearchGroup/ML-Research-Agent-Tasks)
- [:chart_with_upwards_trend: ML Research Evaluation](https://github.com/AlgorithmicResearchGroup/ML-Research-Agent-Evals)
## Installation
```bash
pip install mlrb-agent-tasks
```
# Usage
The library exposes a single function, get_task
get_task:
- path: path to copy the task to
- benchmark: name of the benchmark
- task: name of the task
This function will copy the task to the specified path and return a dictionary with the task name and prompt.
```
{
"name": str, - name of the task
"prompt": str, - prompt for the task
}
```
## Example Usage
```python
from mlrb_agent_tasks import get_task
# Example usage
result = get_task("./", "full_benchmark", "llm_efficiency")
print(result['prompt'])
```
## Contributing
We welcome contributions to the ML Research Benchmark! Please read our [CONTRIBUTING.md](CONTRIBUTING.md) file for guidelines on how to submit issues, feature requests, and pull requests.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Contact
For questions or feedback, please open an issue in this repository or contact [matt@algorithmicresearchgroup.com](mailto:matt@algorithmicresearchgroup.com).
Raw data
{
"_id": null,
"home_page": "http://github.com/AlgorithmicResearchGroup/agent-tasks",
"name": "mlrb-agent-tasks",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "tasks, agent, benchmark",
"author": "Algorithmic Research Group",
"author_email": "matt@algorithmicresearchgroup.com",
"download_url": "https://files.pythonhosted.org/packages/72/37/cce338189d644ea6fa831d5300c25e760bb75441143236dd6d5e55dd19ce/mlrb_agent_tasks-0.0.23.tar.gz",
"platform": null,
"description": "# ML Research Benchmark Tasks\n\nThis repository contains the tasks for ML Research Benchmark, a benchmarkdesigned to evaluate the capabilities of AI agents in accelerating ML research and development. The benchmark consists of 9 competition-level tasks that span the spectrum of activities typically undertaken by ML researchers.\n\n\n## Introduction\n\nThe MLRB aims to measure the acceleration of AI agents in ML research and development. It focuses on competition-level tasks that reflect the current frontiers of ML research, providing a more nuanced and challenging evaluation environment than existing benchmarks.\n\n[![arXiv](https://img.shields.io/badge/arXiv-2410.22553-b31b1b.svg)](https://arxiv.org/abs/2410.22553)\n\n- [:paperclip: ML Research Benchmark Paper](https://arxiv.org/abs/2410.22553) \n- [:robot: ML Research Agent](https://github.com/AlgorithmicResearchGroup/ML-Research-Agent)\n- [:white_check_mark: ML Research Tasks](https://github.com/AlgorithmicResearchGroup/ML-Research-Agent-Tasks)\n- [:chart_with_upwards_trend: ML Research Evaluation](https://github.com/AlgorithmicResearchGroup/ML-Research-Agent-Evals)\n\n## Installation\n\n```bash\npip install mlrb-agent-tasks\n```\n\n# Usage\n\nThe library exposes a single function, get_task\n\nget_task:\n- path: path to copy the task to\n- benchmark: name of the benchmark\n- task: name of the task\n\nThis function will copy the task to the specified path and return a dictionary with the task name and prompt.\n\n```\n{\n \"name\": str, - name of the task\n \"prompt\": str, - prompt for the task\n}\n```\n\n## Example Usage\n\n```python\nfrom mlrb_agent_tasks import get_task\n\n# Example usage\nresult = get_task(\"./\", \"full_benchmark\", \"llm_efficiency\")\nprint(result['prompt'])\n```\n\n\n## Contributing\n\nWe welcome contributions to the ML Research Benchmark! Please read our [CONTRIBUTING.md](CONTRIBUTING.md) file for guidelines on how to submit issues, feature requests, and pull requests.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Contact\n\nFor questions or feedback, please open an issue in this repository or contact [matt@algorithmicresearchgroup.com](mailto:matt@algorithmicresearchgroup.com).\n",
"bugtrack_url": null,
"license": null,
"summary": "A task package for ML Research Bench",
"version": "0.0.23",
"project_urls": {
"Homepage": "http://github.com/AlgorithmicResearchGroup/agent-tasks"
},
"split_keywords": [
"tasks",
" agent",
" benchmark"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0d5a54639025c455d5ac622c6a5c626a1332b4e17fb5ec87812e74420e2cd177",
"md5": "91aee643f2cc028675ca04a911894dfe",
"sha256": "c2fe007b53bcb3a543a6833234c8474e7198c1c8144c4c0c60056ff66b2d7018"
},
"downloads": -1,
"filename": "mlrb_agent_tasks-0.0.23-py3-none-any.whl",
"has_sig": false,
"md5_digest": "91aee643f2cc028675ca04a911894dfe",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 56000,
"upload_time": "2024-12-10T16:21:42",
"upload_time_iso_8601": "2024-12-10T16:21:42.381841Z",
"url": "https://files.pythonhosted.org/packages/0d/5a/54639025c455d5ac622c6a5c626a1332b4e17fb5ec87812e74420e2cd177/mlrb_agent_tasks-0.0.23-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7237cce338189d644ea6fa831d5300c25e760bb75441143236dd6d5e55dd19ce",
"md5": "35e93eb48bf3dbc77abb92c9eaceffd9",
"sha256": "1e3e78b5c7cb9160205d4b206803b303a7b5b0d1da4a1225cf34f536acf5ff41"
},
"downloads": -1,
"filename": "mlrb_agent_tasks-0.0.23.tar.gz",
"has_sig": false,
"md5_digest": "35e93eb48bf3dbc77abb92c9eaceffd9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 38444,
"upload_time": "2024-12-10T16:21:44",
"upload_time_iso_8601": "2024-12-10T16:21:44.905698Z",
"url": "https://files.pythonhosted.org/packages/72/37/cce338189d644ea6fa831d5300c25e760bb75441143236dd6d5e55dd19ce/mlrb_agent_tasks-0.0.23.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-10 16:21:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AlgorithmicResearchGroup",
"github_project": "agent-tasks",
"github_not_found": true,
"lcname": "mlrb-agent-tasks"
}