openrubricrl

Name	openrubricrl JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	Open-source pipeline that converts human-written rubrics into LLM-based reward functions for RL and RLHF training
upload_time	2025-08-13 23:34:29
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	MIT
keywords	rlhf reinforcement-learning llm reward-function rubric
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # OpenRubricRL

**An open-source pipeline that converts human-written rubrics into LLM-based reward functions for RL and RLHF training.**

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## 🎯 Problem It Solves

Current RLHF pipelines require expensive human labelers to score outputs. Labs want LLM-based reward models to scale scoring — but they need high-quality rubrics to make that work. **No open standard exists for turning a rubric into a reusable, consistent reward function.**

OpenRubricRL fills this gap by providing:
- 📋 A standard JSON/YAML schema for defining evaluation rubrics
- 🤖 Automatic conversion of rubrics into LLM scoring prompts  
- 🔌 Ready-to-use API and CLI tools for scoring model outputs
- 🧪 Integration with popular RL libraries (RLlib, TRL, CleanRL)

## 🚀 Quick Start

### Installation

```bash
pip install openrubricrl
```

For development with all features:
```bash
pip install openrubricrl[all]
```

### Basic Usage

#### 1. Create a Rubric

```bash
openrubricrl create-template my_rubric --domain code
```

This creates `my_rubric.json` with a basic template. Edit it to define your criteria:

```json
{
  "name": "code_quality_basic",
  "version": "1.0.0",
  "description": "Basic code quality evaluation",
  "domain": "code",
  "scale": {"min": 0.0, "max": 10.0},
  "criteria": [
    {
      "name": "correctness",
      "description": "Does the code solve the problem correctly?",
      "weight": 0.4,
      "examples": {
        "excellent": [
          {
            "input": "Write a function to reverse a string",
            "output": "def reverse_string(s): return s[::-1]",
            "score": 9.0,
            "explanation": "Correct and efficient implementation"
          }
        ]
      }
    },
    {
      "name": "readability", 
      "description": "Is the code clean and readable?",
      "weight": 0.6
    }
  ]
}
```

#### 2. Score Model Outputs

**Command Line:**
```bash
export OPENAI_API_KEY="your-key-here"

openrubricrl score my_rubric.json \
  "Write a function to add two numbers" \
  "def add(a, b): return a + b"
```

**Python API:**
```python
from openrubricrl import Rubric, create_openai_scorer

# Load rubric
rubric = Rubric.from_file("my_rubric.json")

# Create scorer
scorer = create_openai_scorer(rubric, api_key="your-key")

# Score an output
result = await scorer.score(
    task_input="Write a function to add two numbers",
    model_output="def add(a, b): return a + b"
)

print(f"Score: {result.overall_score}/10")
print(f"Explanation: {result.overall_explanation}")
```

**REST API:**
```bash
# Start server
openrubricrl serve --rubrics-dir ./rubrics

# Score via HTTP
curl -X POST "http://localhost:8000/score/my_rubric" \
  -H "Content-Type: application/json" \
  -d '{
    "task_input": "Write a function to add two numbers",
    "model_output": "def add(a, b): return a + b"
  }'
```

## 🏗️ Architecture

### Core Components

1. **Rubric Schema** (`rubric_schema.json`): JSON schema defining the standard format
2. **Prompt Builder** (`prompt_builder.py`): Converts rubrics into LLM prompts
3. **Scorer** (`scorer.py`): Handles LLM API calls and response parsing
4. **API Server** (`server.py`): FastAPI-based REST API
5. **CLI** (`cli.py`): Command-line interface

### Supported LLM Providers

- ✅ OpenAI (GPT-4, GPT-3.5)
- ✅ Anthropic (Claude)
- 🔄 Local models via vLLM (coming soon)

## 📖 Examples

See the [`examples/`](examples/) directory for complete examples:

- [`code_evaluation.py`](examples/code_evaluation.py) - Scoring code generation
- [`dialogue_quality.py`](examples/dialogue_quality.py) - Evaluating chatbot responses  
- [`creative_writing.py`](examples/creative_writing.py) - Scoring creative content
- [`batch_scoring.py`](examples/batch_scoring.py) - Processing multiple outputs

## 🔗 Integrations

### Reinforcement Learning Libraries

```python
# RLlib integration example
from openrubricrl.integrations.rllib import RubricRewardFunction

reward_fn = RubricRewardFunction(
    rubric_path="my_rubric.json",
    provider="openai"
)

# Use in your RL training loop
reward = reward_fn(state, action, context)
```

### Hugging Face Transformers

```python
from openrubricrl.integrations.transformers import RubricCallback

trainer = Trainer(
    model=model,
    callbacks=[RubricCallback(rubric_path="my_rubric.json")],
    # ... other args
)
```

## 🧪 Development

### Setup

```bash
git clone https://github.com/openrubricrl/openrubricrl.git
cd openrubricrl
pip install -e ".[dev]"
```

### Run Tests

```bash
pytest tests/ -v
```

### Code Quality

```bash
black src/ tests/
isort src/ tests/
flake8 src/ tests/
mypy src/
```

## 📚 Documentation

- [Rubric Schema Reference](docs/schema.md)
- [API Documentation](docs/api.md)
- [Integration Guide](docs/integrations.md)
- [Contributing Guidelines](CONTRIBUTING.md)

## 🗓️ Roadmap

### Phase 1 - Foundation ✅
- [x] JSON/YAML schema for rubrics
- [x] Rubric → prompt converter  
- [x] Minimal scoring API with OpenAI/Anthropic
- [x] CLI tool for local scoring

### Phase 2 - Community & Repository (Q2 2024)
- [ ] Open Rubric Hub (Git repo with curated rubrics)
- [ ] Templates for common domains (code, dialogue, writing)
- [ ] Contribution guidelines and review process

### Phase 3 - Integrations & Scaling (Q3 2024)  
- [ ] RLlib / TRL integration examples
- [ ] Hybrid reward module (LLM + automated metrics)
- [ ] Bias/drift detection module
- [ ] Local model support via vLLM

### Phase 4 - Sustainability (Q4 2024)
- [ ] Hosted API service (optional paid tier)
- [ ] Enterprise features and support
- [ ] Dataset hosting for scoring logs

## 🤝 Contributing

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

### Early Adopters & Target Users

- 🔬 Small RL research teams without budget for large-scale human feedback
- 🏆 AI hackathon participants who want reward shaping quickly  
- 🚀 Startups doing RLHF in niche domains (customer service bots, educational tutors)
- 🎓 Academics studying automated evaluation methods

## 📜 License

MIT License - see [LICENSE](LICENSE) for details.

## 🙏 Acknowledgments

- Inspired by the need for standardized evaluation in RLHF
- Built on top of excellent libraries: FastAPI, Pydantic, Click
- Thanks to the open-source RL and NLP communities

---

**🔗 Links:**
- [GitHub Repository](https://github.com/openrubricrl/openrubricrl)
- [Documentation](https://openrubricrl.readthedocs.io)
- [PyPI Package](https://pypi.org/project/openrubricrl/)
- [Discord Community](https://discord.gg/openrubricrl)

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "openrubricrl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "rlhf, reinforcement-learning, llm, reward-function, rubric",
    "author": null,
    "author_email": "OpenRubricRL Team <anikal2001@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d1/6b/cc54e65d3c13d34d37edd9816b1227badb98e36b19c429bc28b1bcb05347/openrubricrl-0.1.0.tar.gz",
    "platform": null,
    "description": "# OpenRubricRL\n\n**An open-source pipeline that converts human-written rubrics into LLM-based reward functions for RL and RLHF training.**\n\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n## \ud83c\udfaf Problem It Solves\n\nCurrent RLHF pipelines require expensive human labelers to score outputs. Labs want LLM-based reward models to scale scoring \u2014 but they need high-quality rubrics to make that work. **No open standard exists for turning a rubric into a reusable, consistent reward function.**\n\nOpenRubricRL fills this gap by providing:\n- \ud83d\udccb A standard JSON/YAML schema for defining evaluation rubrics\n- \ud83e\udd16 Automatic conversion of rubrics into LLM scoring prompts  \n- \ud83d\udd0c Ready-to-use API and CLI tools for scoring model outputs\n- \ud83e\uddea Integration with popular RL libraries (RLlib, TRL, CleanRL)\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n```bash\npip install openrubricrl\n```\n\nFor development with all features:\n```bash\npip install openrubricrl[all]\n```\n\n### Basic Usage\n\n#### 1. Create a Rubric\n\n```bash\nopenrubricrl create-template my_rubric --domain code\n```\n\nThis creates `my_rubric.json` with a basic template. Edit it to define your criteria:\n\n```json\n{\n  \"name\": \"code_quality_basic\",\n  \"version\": \"1.0.0\",\n  \"description\": \"Basic code quality evaluation\",\n  \"domain\": \"code\",\n  \"scale\": {\"min\": 0.0, \"max\": 10.0},\n  \"criteria\": [\n    {\n      \"name\": \"correctness\",\n      \"description\": \"Does the code solve the problem correctly?\",\n      \"weight\": 0.4,\n      \"examples\": {\n        \"excellent\": [\n          {\n            \"input\": \"Write a function to reverse a string\",\n            \"output\": \"def reverse_string(s): return s[::-1]\",\n            \"score\": 9.0,\n            \"explanation\": \"Correct and efficient implementation\"\n          }\n        ]\n      }\n    },\n    {\n      \"name\": \"readability\", \n      \"description\": \"Is the code clean and readable?\",\n      \"weight\": 0.6\n    }\n  ]\n}\n```\n\n#### 2. Score Model Outputs\n\n**Command Line:**\n```bash\nexport OPENAI_API_KEY=\"your-key-here\"\n\nopenrubricrl score my_rubric.json \\\n  \"Write a function to add two numbers\" \\\n  \"def add(a, b): return a + b\"\n```\n\n**Python API:**\n```python\nfrom openrubricrl import Rubric, create_openai_scorer\n\n# Load rubric\nrubric = Rubric.from_file(\"my_rubric.json\")\n\n# Create scorer\nscorer = create_openai_scorer(rubric, api_key=\"your-key\")\n\n# Score an output\nresult = await scorer.score(\n    task_input=\"Write a function to add two numbers\",\n    model_output=\"def add(a, b): return a + b\"\n)\n\nprint(f\"Score: {result.overall_score}/10\")\nprint(f\"Explanation: {result.overall_explanation}\")\n```\n\n**REST API:**\n```bash\n# Start server\nopenrubricrl serve --rubrics-dir ./rubrics\n\n# Score via HTTP\ncurl -X POST \"http://localhost:8000/score/my_rubric\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"task_input\": \"Write a function to add two numbers\",\n    \"model_output\": \"def add(a, b): return a + b\"\n  }'\n```\n\n## \ud83c\udfd7\ufe0f Architecture\n\n### Core Components\n\n1. **Rubric Schema** (`rubric_schema.json`): JSON schema defining the standard format\n2. **Prompt Builder** (`prompt_builder.py`): Converts rubrics into LLM prompts\n3. **Scorer** (`scorer.py`): Handles LLM API calls and response parsing\n4. **API Server** (`server.py`): FastAPI-based REST API\n5. **CLI** (`cli.py`): Command-line interface\n\n### Supported LLM Providers\n\n- \u2705 OpenAI (GPT-4, GPT-3.5)\n- \u2705 Anthropic (Claude)\n- \ud83d\udd04 Local models via vLLM (coming soon)\n\n## \ud83d\udcd6 Examples\n\nSee the [`examples/`](examples/) directory for complete examples:\n\n- [`code_evaluation.py`](examples/code_evaluation.py) - Scoring code generation\n- [`dialogue_quality.py`](examples/dialogue_quality.py) - Evaluating chatbot responses  \n- [`creative_writing.py`](examples/creative_writing.py) - Scoring creative content\n- [`batch_scoring.py`](examples/batch_scoring.py) - Processing multiple outputs\n\n## \ud83d\udd17 Integrations\n\n### Reinforcement Learning Libraries\n\n```python\n# RLlib integration example\nfrom openrubricrl.integrations.rllib import RubricRewardFunction\n\nreward_fn = RubricRewardFunction(\n    rubric_path=\"my_rubric.json\",\n    provider=\"openai\"\n)\n\n# Use in your RL training loop\nreward = reward_fn(state, action, context)\n```\n\n### Hugging Face Transformers\n\n```python\nfrom openrubricrl.integrations.transformers import RubricCallback\n\ntrainer = Trainer(\n    model=model,\n    callbacks=[RubricCallback(rubric_path=\"my_rubric.json\")],\n    # ... other args\n)\n```\n\n## \ud83e\uddea Development\n\n### Setup\n\n```bash\ngit clone https://github.com/openrubricrl/openrubricrl.git\ncd openrubricrl\npip install -e \".[dev]\"\n```\n\n### Run Tests\n\n```bash\npytest tests/ -v\n```\n\n### Code Quality\n\n```bash\nblack src/ tests/\nisort src/ tests/\nflake8 src/ tests/\nmypy src/\n```\n\n## \ud83d\udcda Documentation\n\n- [Rubric Schema Reference](docs/schema.md)\n- [API Documentation](docs/api.md)\n- [Integration Guide](docs/integrations.md)\n- [Contributing Guidelines](CONTRIBUTING.md)\n\n## \ud83d\uddd3\ufe0f Roadmap\n\n### Phase 1 - Foundation \u2705\n- [x] JSON/YAML schema for rubrics\n- [x] Rubric \u2192 prompt converter  \n- [x] Minimal scoring API with OpenAI/Anthropic\n- [x] CLI tool for local scoring\n\n### Phase 2 - Community & Repository (Q2 2024)\n- [ ] Open Rubric Hub (Git repo with curated rubrics)\n- [ ] Templates for common domains (code, dialogue, writing)\n- [ ] Contribution guidelines and review process\n\n### Phase 3 - Integrations & Scaling (Q3 2024)  \n- [ ] RLlib / TRL integration examples\n- [ ] Hybrid reward module (LLM + automated metrics)\n- [ ] Bias/drift detection module\n- [ ] Local model support via vLLM\n\n### Phase 4 - Sustainability (Q4 2024)\n- [ ] Hosted API service (optional paid tier)\n- [ ] Enterprise features and support\n- [ ] Dataset hosting for scoring logs\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n### Early Adopters & Target Users\n\n- \ud83d\udd2c Small RL research teams without budget for large-scale human feedback\n- \ud83c\udfc6 AI hackathon participants who want reward shaping quickly  \n- \ud83d\ude80 Startups doing RLHF in niche domains (customer service bots, educational tutors)\n- \ud83c\udf93 Academics studying automated evaluation methods\n\n## \ud83d\udcdc License\n\nMIT License - see [LICENSE](LICENSE) for details.\n\n## \ud83d\ude4f Acknowledgments\n\n- Inspired by the need for standardized evaluation in RLHF\n- Built on top of excellent libraries: FastAPI, Pydantic, Click\n- Thanks to the open-source RL and NLP communities\n\n---\n\n**\ud83d\udd17 Links:**\n- [GitHub Repository](https://github.com/openrubricrl/openrubricrl)\n- [Documentation](https://openrubricrl.readthedocs.io)\n- [PyPI Package](https://pypi.org/project/openrubricrl/)\n- [Discord Community](https://discord.gg/openrubricrl)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Open-source pipeline that converts human-written rubrics into LLM-based reward functions for RL and RLHF training",
    "version": "0.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/openrubricrl/openrubricrl/issues",
        "Documentation": "https://openrubricrl.readthedocs.io",
        "Homepage": "https://github.com/openrubricrl/openrubricrl",
        "Repository": "https://github.com/openrubricrl/openrubricrl"
    },
    "split_keywords": [
        "rlhf",
        " reinforcement-learning",
        " llm",
        " reward-function",
        " rubric"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5a49f0be1c5be91c9ede29ed44643b8cb830e8015da60910eb0f7171eb39d171",
                "md5": "9b7b8f9b3ae8fbbc1d3f812f009eb467",
                "sha256": "81103d03180fc79b955f18748655131cd20ae11b8627c1ba7799b050ab84843c"
            },
            "downloads": -1,
            "filename": "openrubricrl-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9b7b8f9b3ae8fbbc1d3f812f009eb467",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 36331,
            "upload_time": "2025-08-13T23:34:28",
            "upload_time_iso_8601": "2025-08-13T23:34:28.047075Z",
            "url": "https://files.pythonhosted.org/packages/5a/49/f0be1c5be91c9ede29ed44643b8cb830e8015da60910eb0f7171eb39d171/openrubricrl-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d16bcc54e65d3c13d34d37edd9816b1227badb98e36b19c429bc28b1bcb05347",
                "md5": "baf2ac7bd6391c504c56b2d57e7babc3",
                "sha256": "b8576950e028db9a5f33d7b5749ad8ce0b7408bba17369c3d482b86a9ffb31b5"
            },
            "downloads": -1,
            "filename": "openrubricrl-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "baf2ac7bd6391c504c56b2d57e7babc3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 35464,
            "upload_time": "2025-08-13T23:34:29",
            "upload_time_iso_8601": "2025-08-13T23:34:29.330727Z",
            "url": "https://files.pythonhosted.org/packages/d1/6b/cc54e65d3c13d34d37edd9816b1227badb98e36b19c429bc28b1bcb05347/openrubricrl-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-13 23:34:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "openrubricrl",
    "github_project": "openrubricrl",
    "github_not_found": true,
    "lcname": "openrubricrl"
}

None