<div align="center">
# 🌐 mcts-llm
<p>
<a href="https://www.python.org/downloads/release/python-3127/"><img src="https://img.shields.io/badge/-Python_3.12+-blue?logo=python&logoColor=white" alt="Python"></a>
<a href="https://pypi.org/project/mcts-llm/"><img src="https://img.shields.io/pypi/v/mcts-llm.svg" alt="PyPI"></a>
<a href="https://github.com/stanfordnlp/dspy/releases/tag/2.4.17"><img src="https://img.shields.io/badge/dspy-2.4.17-blue" alt="DSPy"></a>
<a href="https://codecov.io/github/NumberChiffre/mcts-llm"><img src="https://codecov.io/github/NumberChiffre/mcts-llm/graph/badge.svg?token=zOL5kP7Xf9"/></a>
<a href="https://black.readthedocs.io/en/stable/"><img src="https://img.shields.io/badge/Code%20Style-Black-black.svg?labelColor=gray" alt="Black"></a>
<a href="https://github.com/numberchiffre/mcts-llm/stargazers"><img src="https://img.shields.io/github/stars/numberchiffre/mcts-llm?style=social" alt="GitHub stars"></a>
</p>
**MCTS + LLM + Prompt Engineering => Enhanced LLM Reponse Quality** 🌲📝✨
</div>
<br>
## 🌟 Overview
**mcts-llm** is a lightweight repo that integrates Monte Carlo Tree Search (MCTS) with prompt engineering techniques to enhance the performance of Large Language Models (LLMs). The idea is that scaling up during inference for better LLM reponse quality could become very valuable versus spending more on compute during training. This can extend beyond math problems such as reasoning, knowledge extraction. This repo can fine-tune prompt instructions and benchmark the performance of various MCTS adaptations for prompt engineering.
<br>
## 🛠️ Installation
### PyPI
```shell
pip install mcts-llm
```
### Docker
Create a `.env` file with the following variables:
```
OPENAI_API_KEY=<your-openai-api-key>
DEEPSEEK_API_KEY=<your-deepseek-api-key>
DEEPSEEK_BASE_URL=<your-deepseek-base-url>
OLLAMA_BASE_URL=http://host.docker.internal:11434
```
Build the docker container:
```shell
cd mcts-llm
make debug
```
<br>
## 🚀 Run
### Quickstart
```python
import dspy
from mcts_llm.mctsr import MCTSr
ollama = dspy.OllamaLocal(
model="qwen2.5:7b-instruct",
model_type="chat",
temperature=1.0,
max_tokens=1024,
num_ctx=1024,
timeout_s=600
)
dspy.settings.configure(lm=ollama, experimental=True)
mctsr = MCTSr()
mctsr_answer = mctsr(problem).answer
print(f"MCStr answer: {mctsr_answer}")
```
<br>
### Demo
```shell
make debug
python examples/demo.py
```
<br>
## 📊 Preliminary Results
Initial experiments conducted using `qwen2.5:7B-Instruct` with the following settings:
****
- **Temperature**: 1.0
- **Model Type**: Chat
- **Max Tokens**: 1024
- **Context Length**: 1024
- **Dataset**: Shuffled GSM8K (20 examples)
- **Prompts**: Standard, non-optimized instructions
- **Hardware**: M3 Mac Pro (12 threads)
Default hyperparameters:
- **c**: sqrt(2)
- **initialization**: "I don't know."
- **eps**: 1e-8
- **reward_ub**: 95
- **reward_penalty**: 50
- **default_uct_score**: 1000
| Method | Accuracy | Total Time | Avg Time per Example | Additional Parameters |
|----------------------|---------------|---------------|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------|
| Zero-shot CoT | 13 / 20 (65%) | 2m 01s | 6.09s | N/A |
| One-Turn Self-Refine | 15 / 20 (75%) | 7m 45s | 23.75s | N/A |
| **MCTSr** | 16 / 20 (80%) | 43m 03s | 129.18s | • max_rollouts = 4<br>• policy = "greedy"<br>• samples_per_node = 3 |
| **MCTSr** | 17 / 20 (85%) | 44m 09s | 132.50s | • max_rollouts = 4<br>• policy = "importance_sampling"<br>• samples_per_node = 3 |
| **MCTSr** | 16 / 20 (80%) | 51m 10s | 153.51s | • max_rollouts = 4<br>• policy = "importance_sampling"<br>• samples_per_node = 4 |
| **MCTSr** | 18 / 20 (90%) | 51m 42s | 153.13s | • max_rollouts = 4<br>• policy = "greedy"<br>• samples_per_node = 4 |
| **MCTSr** | 15 / 20 (75%) | 1h 38m 53s | 296.68s | • max_rollouts = 8<br>• policy = "greedy"<br>• samples_per_node = 4 |
| **MCTSr** | 14 / 20 (70%) | 1h 39m 03s | 298.40s | • max_rollouts = 8<br>• policy = "importance_sampling"<br>• samples_per_node = 4 |
*Note: These results are preliminary and obtained under specific conditions. Further experimentation is needed to generalize the findings.*
<br>
## Paper Implementations
- [Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B](https://arxiv.org/abs/2406.07394)
- More to come!
<br>
## 🚀 TODOs
- Upgrade DSPy to >= 2.5.0.
- Include datasets for evaluation such as MATH, AIME, Math Odyssey.
- Fine-Tune optimal hyperparameters for MCTSr.
- Fine-Tune with Llama3.1-8B.
- Fine-Tune with Qwen2.5-7B.
- Fine-Tune with DeepSeek-Chat as the prompting model and smaller LLMs with Ollama as the task model.
<br>
## ⚠️ Disclaimer
Please be aware of potential costs when using OpenAI/Anthropic LLMs, especially with larger rollouts. Familiarize yourself with DSPy and its optimizers before extensive use.
<br>
## 📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/numberchiffre/mcts-llm",
"name": "mcts-llm",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.12",
"maintainer_email": null,
"keywords": "mcts, llm, dspy, prompt-engineering",
"author": "NumberChiffre",
"author_email": "llebox@hotmail.com",
"download_url": "https://files.pythonhosted.org/packages/e2/69/686f4156e3a9111f21c6dfe02905cbe0e8b06b5d3bf46c550ec4e56e197f/mcts_llm-0.0.1.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n\n# \ud83c\udf10 mcts-llm\n\n<p>\n <a href=\"https://www.python.org/downloads/release/python-3127/\"><img src=\"https://img.shields.io/badge/-Python_3.12+-blue?logo=python&logoColor=white\" alt=\"Python\"></a>\n <a href=\"https://pypi.org/project/mcts-llm/\"><img src=\"https://img.shields.io/pypi/v/mcts-llm.svg\" alt=\"PyPI\"></a>\n <a href=\"https://github.com/stanfordnlp/dspy/releases/tag/2.4.17\"><img src=\"https://img.shields.io/badge/dspy-2.4.17-blue\" alt=\"DSPy\"></a>\n <a href=\"https://codecov.io/github/NumberChiffre/mcts-llm\"><img src=\"https://codecov.io/github/NumberChiffre/mcts-llm/graph/badge.svg?token=zOL5kP7Xf9\"/></a>\n <a href=\"https://black.readthedocs.io/en/stable/\"><img src=\"https://img.shields.io/badge/Code%20Style-Black-black.svg?labelColor=gray\" alt=\"Black\"></a>\n <a href=\"https://github.com/numberchiffre/mcts-llm/stargazers\"><img src=\"https://img.shields.io/github/stars/numberchiffre/mcts-llm?style=social\" alt=\"GitHub stars\"></a>\n</p>\n\n**MCTS + LLM + Prompt Engineering => Enhanced LLM Reponse Quality** \ud83c\udf32\ud83d\udcdd\u2728\n\n</div>\n<br>\n\n## \ud83c\udf1f Overview\n\n**mcts-llm** is a lightweight repo that integrates Monte Carlo Tree Search (MCTS) with prompt engineering techniques to enhance the performance of Large Language Models (LLMs). The idea is that scaling up during inference for better LLM reponse quality could become very valuable versus spending more on compute during training. This can extend beyond math problems such as reasoning, knowledge extraction. This repo can fine-tune prompt instructions and benchmark the performance of various MCTS adaptations for prompt engineering.\n\n<br>\n\n\n## \ud83d\udee0\ufe0f Installation\n\n### PyPI\n```shell\npip install mcts-llm\n```\n\n### Docker\nCreate a `.env` file with the following variables:\n```\nOPENAI_API_KEY=<your-openai-api-key>\nDEEPSEEK_API_KEY=<your-deepseek-api-key>\nDEEPSEEK_BASE_URL=<your-deepseek-base-url>\nOLLAMA_BASE_URL=http://host.docker.internal:11434\n```\n\nBuild the docker container:\n\n```shell\ncd mcts-llm\nmake debug\n```\n\n<br>\n\n## \ud83d\ude80 Run\n### Quickstart\n```python\nimport dspy\nfrom mcts_llm.mctsr import MCTSr\n\nollama = dspy.OllamaLocal(\n model=\"qwen2.5:7b-instruct\",\n model_type=\"chat\",\n temperature=1.0,\n max_tokens=1024,\n num_ctx=1024,\n timeout_s=600\n)\ndspy.settings.configure(lm=ollama, experimental=True)\nmctsr = MCTSr()\nmctsr_answer = mctsr(problem).answer\nprint(f\"MCStr answer: {mctsr_answer}\")\n```\n\n<br>\n\n### Demo\n```shell\nmake debug\npython examples/demo.py\n```\n\n<br>\n\n## \ud83d\udcca Preliminary Results\n\nInitial experiments conducted using `qwen2.5:7B-Instruct` with the following settings:\n****\n- **Temperature**: 1.0\n- **Model Type**: Chat\n- **Max Tokens**: 1024\n- **Context Length**: 1024\n- **Dataset**: Shuffled GSM8K (20 examples)\n- **Prompts**: Standard, non-optimized instructions\n- **Hardware**: M3 Mac Pro (12 threads)\n\nDefault hyperparameters:\n- **c**: sqrt(2)\n- **initialization**: \"I don't know.\"\n- **eps**: 1e-8\n- **reward_ub**: 95\n- **reward_penalty**: 50\n- **default_uct_score**: 1000\n\n| Method | Accuracy | Total Time | Avg Time per Example | Additional Parameters |\n|----------------------|---------------|---------------|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------|\n| Zero-shot CoT | 13 / 20 (65%) | 2m 01s | 6.09s | N/A |\n| One-Turn Self-Refine | 15 / 20 (75%) | 7m 45s | 23.75s | N/A |\n| **MCTSr** | 16 / 20 (80%) | 43m 03s | 129.18s | \u2022 max_rollouts = 4<br>\u2022 policy = \"greedy\"<br>\u2022 samples_per_node = 3 |\n| **MCTSr** | 17 / 20 (85%) | 44m 09s | 132.50s | \u2022 max_rollouts = 4<br>\u2022 policy = \"importance_sampling\"<br>\u2022 samples_per_node = 3 |\n| **MCTSr** | 16 / 20 (80%) | 51m 10s | 153.51s | \u2022 max_rollouts = 4<br>\u2022 policy = \"importance_sampling\"<br>\u2022 samples_per_node = 4 |\n| **MCTSr** | 18 / 20 (90%) | 51m 42s | 153.13s | \u2022 max_rollouts = 4<br>\u2022 policy = \"greedy\"<br>\u2022 samples_per_node = 4 |\n| **MCTSr** | 15 / 20 (75%) | 1h 38m 53s | 296.68s | \u2022 max_rollouts = 8<br>\u2022 policy = \"greedy\"<br>\u2022 samples_per_node = 4 |\n| **MCTSr** | 14 / 20 (70%) | 1h 39m 03s | 298.40s | \u2022 max_rollouts = 8<br>\u2022 policy = \"importance_sampling\"<br>\u2022 samples_per_node = 4 |\n\n*Note: These results are preliminary and obtained under specific conditions. Further experimentation is needed to generalize the findings.*\n\n<br>\n\n## Paper Implementations\n\n- [Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B](https://arxiv.org/abs/2406.07394)\n- More to come!\n\n<br>\n\n## \ud83d\ude80 TODOs\n- Upgrade DSPy to >= 2.5.0.\n- Include datasets for evaluation such as MATH, AIME, Math Odyssey.\n- Fine-Tune optimal hyperparameters for MCTSr.\n- Fine-Tune with Llama3.1-8B.\n- Fine-Tune with Qwen2.5-7B.\n- Fine-Tune with DeepSeek-Chat as the prompting model and smaller LLMs with Ollama as the task model.\n\n<br>\n\n\n## \u26a0\ufe0f Disclaimer\nPlease be aware of potential costs when using OpenAI/Anthropic LLMs, especially with larger rollouts. Familiarize yourself with DSPy and its optimizers before extensive use.\n\n<br>\n\n## \ud83d\udcc4 License\nThis project is licensed under the MIT License - see the LICENSE file for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "MCTS + LLM + Prompt Engineering => Enhanced LLM Reponse Quality.",
"version": "0.0.1",
"project_urls": {
"Bug Tracker": "https://github.com/numberchiffre/mcts-llm/issues",
"Documentation": "https://github.com/numberchiffre/mcts-llm#readme",
"Homepage": "https://github.com/numberchiffre/mcts-llm",
"Repository": "https://github.com/numberchiffre/mcts-llm"
},
"split_keywords": [
"mcts",
" llm",
" dspy",
" prompt-engineering"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fc2d2561ab56e7f44d8669f1aa265c05ab02541095bf61b1cc156238ec944abc",
"md5": "3709b41ac083c0f408dc69b3d5fc4bcd",
"sha256": "d735932edf6560f0e4a17a7ae870b21fd878c359e5068b655fda4eb441f0c355"
},
"downloads": -1,
"filename": "mcts_llm-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3709b41ac083c0f408dc69b3d5fc4bcd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.12",
"size": 8969,
"upload_time": "2024-10-07T01:37:20",
"upload_time_iso_8601": "2024-10-07T01:37:20.151916Z",
"url": "https://files.pythonhosted.org/packages/fc/2d/2561ab56e7f44d8669f1aa265c05ab02541095bf61b1cc156238ec944abc/mcts_llm-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e269686f4156e3a9111f21c6dfe02905cbe0e8b06b5d3bf46c550ec4e56e197f",
"md5": "2770166cd4f1b4bc0da9ab89ff5408b5",
"sha256": "cd1ab1ff7deef10e4c70adec958d1b1db3bae30b573ce85a050af4ddc9a89ec1"
},
"downloads": -1,
"filename": "mcts_llm-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "2770166cd4f1b4bc0da9ab89ff5408b5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.12",
"size": 7943,
"upload_time": "2024-10-07T01:37:22",
"upload_time_iso_8601": "2024-10-07T01:37:22.447349Z",
"url": "https://files.pythonhosted.org/packages/e2/69/686f4156e3a9111f21c6dfe02905cbe0e8b06b5d3bf46c550ec4e56e197f/mcts_llm-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-07 01:37:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "numberchiffre",
"github_project": "mcts-llm",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "mcts-llm"
}