# FAI-RL: Foundation AI - Reinforcement Learning
**FAI-RL** is a modular, production-ready library for training, inference, and evaluation of large language models using state-of-the-art reinforcement learning methods.
## Overview
FAI-RL provides a unified framework for fine-tuning language models with multiple RL algorithms, featuring:
- ๐ฏ **Multiple RL Algorithms**: SFT, DPO, PPO, GRPO, GSPO
- ๐ **Production Ready**: Battle-tested on large-scale deployments
- ๐ฆ **Easy to Use**: Simple YAML configuration and CLI interface
- โก **Memory Efficient**: LoRA, QLoRA, and DeepSpeed ZeRO-3 support
- ๐ง **Modular Design**: Extensible architecture for custom implementations
## Table of Contents
- [Installation](#-installation)
- [Quick Start](#-quick-start)
- [Training](#training)
- [Inference](#inference)
- [Evaluation](#evaluation)
- [Supported Methods](#supported-methods)
- [Key Features](#key-features)
- [Project Structure](#-project-structure)
- [Memory Optimization](#memory-optimization)
- [System Requirements](#-system-requirements)
## ๐ฆ Installation
Install FAI-RL from PyPI:
```bash
pip install --extra-index-url https://download.pytorch.org/whl/cu118 FAI-RL
```
For development installation:
```bash
git clone https://github.com/your-org/FAI-RL-OSS.git
cd FAI-RL-OSS
pip install -e .
```
**PyPI Package**: [https://pypi.org/project/FAI-RL/](https://pypi.org/project/FAI-RL/)
For detailed installation instructions, see [INSTALL.md](INSTALL.md).
## ๐ Quick Start
### Training
Train a model using SFT, DPO, PPO, GRPO, or GSPO:
```bash
# Single GPU training
fai-rl-train --recipe recipes/training/sft/llama3_3B_lora.yaml --num-gpus 1
```
๐ **[See detailed Training Guide โ](./trainers/README.md)**
### Inference
Generate responses from your trained models:
```bash
# Run inference with debug mode
fai-rl-inference --recipe recipes/inference/llama3_3B.yaml --debug
```
๐ **[See detailed Inference Guide โ](./inference/README.md)**
### Evaluation
Evaluate model performance on benchmarks:
```bash
# Evaluate with debug output
fai-rl-eval --recipe recipes/evaluation/mmlu/llama3_3B.yaml --debug
```
๐ **[See detailed Evaluation Guide โ](./evaluations/README.md)**
## Supported Methods
FAI-RL implements state-of-the-art reinforcement learning algorithms for language model fine-tuning:
| Method | Description | Use Case |
|--------|-------------|----------|
| **SFT** | Supervised Fine-Tuning | Initial instruction tuning on high-quality datasets |
| **DPO** | Direct Preference Optimization | Align models with human preferences without reward models |
| **PPO** | Proximal Policy Optimization | Classic RL approach with value functions and rewards |
| **GRPO** | Group Relative Preference Optimization | Efficient preference learning with group-based sampling |
| **GSPO** | Group Sequence Policy Optimization | Advanced sequence-level optimization |
Each method supports:
- โ
Full fine-tuning
- โ
LoRA (Low-Rank Adaptation)
- โ
QLoRA (4-bit Quantized LoRA)
- โ
Multi-GPU training with DeepSpeed
## Key Features
### ๐ฏ **Flexible Configuration System**
- YAML-based configuration for all training parameters
- Pre-configured recipes for popular models (Llama, Qwen, etc.)
- Easy hyperparameter tuning and experimentation
### ๐ง **Modular Architecture**
- Extensible trainer base classes
- Custom reward functions
- Pluggable dataset templates
- Easy integration with HuggingFace ecosystem
## ๐ Project Structure
```
FAI-RL/
โโโ core/ # Core framework components
โโโ trainers/ # Training method implementations
โโโ inference/ # Inference components
โโโ evaluations/ # Evaluation system
โโโ recipes/ # Recipe configuration files
โ โโโ training/ # Training recipes
โ โโโ inference/ # Inference recipes
โ โโโ evaluation/ # Evaluation recipes
โโโ configs/ # Core configuration files
โ โโโ deepspeed/ # DeepSpeed ZeRO configurations
โโโ utils/ # Utility modules
โโโ logs/ # Training logs (auto-generated)
โโโ outputs/ # Inference output (auto-generated)
```
## Memory Optimization
FAI-RL supports various techniques to train large models efficiently:
| Technique | Memory Usage | Speed | Best For |
|-----------|-------------|-------|----------|
| **Full Fine-tuning** | High (100%) | Fastest | Small models, ample GPU memory |
| **LoRA** | Low (~10%) | Fast | Most use cases, balanced efficiency |
| **QLoRA** | Very Low (~25% of LoRA) | Medium | Large models (7B+) on consumer GPUs |
| **DeepSpeed ZeRO-3** | Distributed | Variable | Models exceeding single GPU capacity |
### Example Memory Requirements
- **Llama-3 8B Full**: ~32GB VRAM
- **Llama-3 8B LoRA**: ~12GB VRAM
- **Llama-3 8B QLoRA**: ~6GB VRAM
## ๐งช System Requirements
### Validated on Hardware
This framework has been validated on:
* **Instance:** AWS EC2 p4d.24xlarge
* **GPUs:** 8 x NVIDIA A100-SXM4-80GB (80GB VRAM each)
* **CPU:** 96 vCPUs
* **Memory:** 1152 GiB
* **Storage:** 8TB NVMe SSD
* **Network:** 400 Gbps
## โญ For Maintainers
<details>
### Publishing a New Release
1. Update version in `pyproject.toml`:
```toml
[project]
name = "FAI-RL"
version = "X.Y.Z" # Update version here
```
2. Build and publish:
```bash
# Install build tools
pip install --upgrade pip build twine
# Clean previous builds
rm -rf dist/ build/ *.egg-info
# Build the package
python -m build
# Upload to PyPI (requires credentials)
python -m twine upload dist/*
```
</details>
Raw data
{
"_id": null,
"home_page": null,
"name": "FAI-RL",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "reinforcement learning, language models, transformers, rlhf, dpo, ppo, sft",
"author": null,
"author_email": "Roblox <ylim@roblox.com>, Roblox <mnandwana@roblox.com>",
"download_url": "https://files.pythonhosted.org/packages/8a/71/3b2fa71ac9ebf1064125af56cc92f7114ab152d6ec27f4255d2105c35853/fai_rl-0.1.7.tar.gz",
"platform": null,
"description": "# FAI-RL: Foundation AI - Reinforcement Learning\n\n**FAI-RL** is a modular, production-ready library for training, inference, and evaluation of large language models using state-of-the-art reinforcement learning methods.\n\n## Overview\n\nFAI-RL provides a unified framework for fine-tuning language models with multiple RL algorithms, featuring:\n\n- \ud83c\udfaf **Multiple RL Algorithms**: SFT, DPO, PPO, GRPO, GSPO\n- \ud83d\ude80 **Production Ready**: Battle-tested on large-scale deployments\n- \ud83d\udce6 **Easy to Use**: Simple YAML configuration and CLI interface\n- \u26a1 **Memory Efficient**: LoRA, QLoRA, and DeepSpeed ZeRO-3 support\n- \ud83d\udd27 **Modular Design**: Extensible architecture for custom implementations\n\n## Table of Contents\n\n- [Installation](#-installation)\n- [Quick Start](#-quick-start)\n - [Training](#training)\n - [Inference](#inference)\n - [Evaluation](#evaluation)\n- [Supported Methods](#supported-methods)\n- [Key Features](#key-features)\n- [Project Structure](#-project-structure)\n- [Memory Optimization](#memory-optimization)\n- [System Requirements](#-system-requirements)\n\n## \ud83d\udce6 Installation\n\nInstall FAI-RL from PyPI:\n\n```bash\npip install --extra-index-url https://download.pytorch.org/whl/cu118 FAI-RL\n```\n\nFor development installation:\n\n```bash\ngit clone https://github.com/your-org/FAI-RL-OSS.git\ncd FAI-RL-OSS\npip install -e .\n```\n\n**PyPI Package**: [https://pypi.org/project/FAI-RL/](https://pypi.org/project/FAI-RL/)\n\nFor detailed installation instructions, see [INSTALL.md](INSTALL.md).\n\n## \ud83d\ude80 Quick Start\n\n### Training\n\nTrain a model using SFT, DPO, PPO, GRPO, or GSPO:\n\n```bash\n# Single GPU training\nfai-rl-train --recipe recipes/training/sft/llama3_3B_lora.yaml --num-gpus 1\n```\n\n\ud83d\udcd6 **[See detailed Training Guide \u2192](./trainers/README.md)**\n\n### Inference\n\nGenerate responses from your trained models:\n\n```bash\n# Run inference with debug mode\nfai-rl-inference --recipe recipes/inference/llama3_3B.yaml --debug\n```\n\n\ud83d\udcd6 **[See detailed Inference Guide \u2192](./inference/README.md)**\n\n### Evaluation\n\nEvaluate model performance on benchmarks:\n\n```bash\n# Evaluate with debug output\nfai-rl-eval --recipe recipes/evaluation/mmlu/llama3_3B.yaml --debug\n```\n\n\ud83d\udcd6 **[See detailed Evaluation Guide \u2192](./evaluations/README.md)**\n\n## Supported Methods\n\nFAI-RL implements state-of-the-art reinforcement learning algorithms for language model fine-tuning:\n\n| Method | Description | Use Case |\n|--------|-------------|----------|\n| **SFT** | Supervised Fine-Tuning | Initial instruction tuning on high-quality datasets |\n| **DPO** | Direct Preference Optimization | Align models with human preferences without reward models |\n| **PPO** | Proximal Policy Optimization | Classic RL approach with value functions and rewards |\n| **GRPO** | Group Relative Preference Optimization | Efficient preference learning with group-based sampling |\n| **GSPO** | Group Sequence Policy Optimization | Advanced sequence-level optimization |\n\nEach method supports:\n- \u2705 Full fine-tuning\n- \u2705 LoRA (Low-Rank Adaptation)\n- \u2705 QLoRA (4-bit Quantized LoRA)\n- \u2705 Multi-GPU training with DeepSpeed\n\n## Key Features\n\n### \ud83c\udfaf **Flexible Configuration System**\n- YAML-based configuration for all training parameters\n- Pre-configured recipes for popular models (Llama, Qwen, etc.)\n- Easy hyperparameter tuning and experimentation\n\n### \ud83d\udd27 **Modular Architecture**\n- Extensible trainer base classes\n- Custom reward functions\n- Pluggable dataset templates\n- Easy integration with HuggingFace ecosystem\n\n\n## \ud83d\udcc1 Project Structure\n\n```\nFAI-RL/\n\u251c\u2500\u2500 core/ # Core framework components\n\u251c\u2500\u2500 trainers/ # Training method implementations\n\u251c\u2500\u2500 inference/ # Inference components\n\u251c\u2500\u2500 evaluations/ # Evaluation system\n\u251c\u2500\u2500 recipes/ # Recipe configuration files\n\u2502 \u251c\u2500\u2500 training/ # Training recipes\n\u2502 \u251c\u2500\u2500 inference/ # Inference recipes\n\u2502 \u2514\u2500\u2500 evaluation/ # Evaluation recipes\n\u251c\u2500\u2500 configs/ # Core configuration files\n\u2502 \u2514\u2500\u2500 deepspeed/ # DeepSpeed ZeRO configurations\n\u251c\u2500\u2500 utils/ # Utility modules\n\u251c\u2500\u2500 logs/ # Training logs (auto-generated)\n\u2514\u2500\u2500 outputs/ # Inference output (auto-generated)\n```\n\n## Memory Optimization\n\nFAI-RL supports various techniques to train large models efficiently:\n\n| Technique | Memory Usage | Speed | Best For |\n|-----------|-------------|-------|----------|\n| **Full Fine-tuning** | High (100%) | Fastest | Small models, ample GPU memory |\n| **LoRA** | Low (~10%) | Fast | Most use cases, balanced efficiency |\n| **QLoRA** | Very Low (~25% of LoRA) | Medium | Large models (7B+) on consumer GPUs |\n| **DeepSpeed ZeRO-3** | Distributed | Variable | Models exceeding single GPU capacity |\n\n### Example Memory Requirements\n\n- **Llama-3 8B Full**: ~32GB VRAM\n- **Llama-3 8B LoRA**: ~12GB VRAM\n- **Llama-3 8B QLoRA**: ~6GB VRAM\n\n## \ud83e\uddea System Requirements\n\n### Validated on Hardware\n\nThis framework has been validated on:\n\n* **Instance:** AWS EC2 p4d.24xlarge\n* **GPUs:** 8 x NVIDIA A100-SXM4-80GB (80GB VRAM each)\n* **CPU:** 96 vCPUs\n* **Memory:** 1152 GiB\n* **Storage:** 8TB NVMe SSD\n* **Network:** 400 Gbps\n\n## \u2b50 For Maintainers\n\n<details>\n\n### Publishing a New Release\n\n1. Update version in `pyproject.toml`:\n```toml\n[project]\nname = \"FAI-RL\"\nversion = \"X.Y.Z\" # Update version here\n```\n\n2. Build and publish:\n```bash\n# Install build tools\npip install --upgrade pip build twine\n\n# Clean previous builds\nrm -rf dist/ build/ *.egg-info\n\n# Build the package\npython -m build\n\n# Upload to PyPI (requires credentials)\npython -m twine upload dist/*\n```\n\n</details>\n",
"bugtrack_url": null,
"license": null,
"summary": "Foundation of AI - Reinforcement learning Library",
"version": "0.1.7",
"project_urls": {
"Documentation": "https://github.com/Roblox/FAI-RL#readme",
"Homepage": "https://github.com/Roblox/FAI-RL",
"Issues": "https://github.com/Roblox/FAI-RL/issues",
"Repository": "https://github.com/Roblox/FAI-RL"
},
"split_keywords": [
"reinforcement learning",
" language models",
" transformers",
" rlhf",
" dpo",
" ppo",
" sft"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "49a37176f2726365e532e582b96d94b986a5623962889d21bd4f46844d299038",
"md5": "b7003ebf20d7767183efd8145ad52a25",
"sha256": "9f6c123fa7ed4d3bb451ded6b2ad4dd2b7c13f29c67656b19ff129d584e0bc7f"
},
"downloads": -1,
"filename": "fai_rl-0.1.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b7003ebf20d7767183efd8145ad52a25",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 102602,
"upload_time": "2025-10-23T22:22:42",
"upload_time_iso_8601": "2025-10-23T22:22:42.248504Z",
"url": "https://files.pythonhosted.org/packages/49/a3/7176f2726365e532e582b96d94b986a5623962889d21bd4f46844d299038/fai_rl-0.1.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "8a713b2fa71ac9ebf1064125af56cc92f7114ab152d6ec27f4255d2105c35853",
"md5": "43d2bddae422770791e6b602a75b99af",
"sha256": "47854969ff691e1645094f426856a3f8d26df660b9232dc229f050334fe640b2"
},
"downloads": -1,
"filename": "fai_rl-0.1.7.tar.gz",
"has_sig": false,
"md5_digest": "43d2bddae422770791e6b602a75b99af",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 54626,
"upload_time": "2025-10-23T22:22:43",
"upload_time_iso_8601": "2025-10-23T22:22:43.205835Z",
"url": "https://files.pythonhosted.org/packages/8a/71/3b2fa71ac9ebf1064125af56cc92f7114ab152d6ec27f4255d2105c35853/fai_rl-0.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-23 22:22:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Roblox",
"github_project": "FAI-RL#readme",
"github_not_found": true,
"lcname": "fai-rl"
}