FAI-RL


NameFAI-RL JSON
Version 0.1.7 PyPI version JSON
download
home_pageNone
SummaryFoundation of AI - Reinforcement learning Library
upload_time2025-10-23 22:22:43
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords reinforcement learning language models transformers rlhf dpo ppo sft
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # FAI-RL: Foundation AI - Reinforcement Learning

**FAI-RL** is a modular, production-ready library for training, inference, and evaluation of large language models using state-of-the-art reinforcement learning methods.

## Overview

FAI-RL provides a unified framework for fine-tuning language models with multiple RL algorithms, featuring:

- ๐ŸŽฏ **Multiple RL Algorithms**: SFT, DPO, PPO, GRPO, GSPO
- ๐Ÿš€ **Production Ready**: Battle-tested on large-scale deployments
- ๐Ÿ“ฆ **Easy to Use**: Simple YAML configuration and CLI interface
- โšก **Memory Efficient**: LoRA, QLoRA, and DeepSpeed ZeRO-3 support
- ๐Ÿ”ง **Modular Design**: Extensible architecture for custom implementations

## Table of Contents

- [Installation](#-installation)
- [Quick Start](#-quick-start)
  - [Training](#training)
  - [Inference](#inference)
  - [Evaluation](#evaluation)
- [Supported Methods](#supported-methods)
- [Key Features](#key-features)
- [Project Structure](#-project-structure)
- [Memory Optimization](#memory-optimization)
- [System Requirements](#-system-requirements)

## ๐Ÿ“ฆ Installation

Install FAI-RL from PyPI:

```bash
pip install --extra-index-url https://download.pytorch.org/whl/cu118 FAI-RL
```

For development installation:

```bash
git clone https://github.com/your-org/FAI-RL-OSS.git
cd FAI-RL-OSS
pip install -e .
```

**PyPI Package**: [https://pypi.org/project/FAI-RL/](https://pypi.org/project/FAI-RL/)

For detailed installation instructions, see [INSTALL.md](INSTALL.md).

## ๐Ÿš€ Quick Start

### Training

Train a model using SFT, DPO, PPO, GRPO, or GSPO:

```bash
# Single GPU training
fai-rl-train --recipe recipes/training/sft/llama3_3B_lora.yaml --num-gpus 1
```

๐Ÿ“– **[See detailed Training Guide โ†’](./trainers/README.md)**

### Inference

Generate responses from your trained models:

```bash
# Run inference with debug mode
fai-rl-inference --recipe recipes/inference/llama3_3B.yaml --debug
```

๐Ÿ“– **[See detailed Inference Guide โ†’](./inference/README.md)**

### Evaluation

Evaluate model performance on benchmarks:

```bash
# Evaluate with debug output
fai-rl-eval --recipe recipes/evaluation/mmlu/llama3_3B.yaml --debug
```

๐Ÿ“– **[See detailed Evaluation Guide โ†’](./evaluations/README.md)**

## Supported Methods

FAI-RL implements state-of-the-art reinforcement learning algorithms for language model fine-tuning:

| Method | Description | Use Case |
|--------|-------------|----------|
| **SFT** | Supervised Fine-Tuning | Initial instruction tuning on high-quality datasets |
| **DPO** | Direct Preference Optimization | Align models with human preferences without reward models |
| **PPO** | Proximal Policy Optimization | Classic RL approach with value functions and rewards |
| **GRPO** | Group Relative Preference Optimization | Efficient preference learning with group-based sampling |
| **GSPO** | Group Sequence Policy Optimization | Advanced sequence-level optimization |

Each method supports:
- โœ… Full fine-tuning
- โœ… LoRA (Low-Rank Adaptation)
- โœ… QLoRA (4-bit Quantized LoRA)
- โœ… Multi-GPU training with DeepSpeed

## Key Features

### ๐ŸŽฏ **Flexible Configuration System**
- YAML-based configuration for all training parameters
- Pre-configured recipes for popular models (Llama, Qwen, etc.)
- Easy hyperparameter tuning and experimentation

### ๐Ÿ”ง **Modular Architecture**
- Extensible trainer base classes
- Custom reward functions
- Pluggable dataset templates
- Easy integration with HuggingFace ecosystem


## ๐Ÿ“ Project Structure

```
FAI-RL/
โ”œโ”€โ”€ core/                      # Core framework components
โ”œโ”€โ”€ trainers/                  # Training method implementations
โ”œโ”€โ”€ inference/                 # Inference components
โ”œโ”€โ”€ evaluations/               # Evaluation system
โ”œโ”€โ”€ recipes/                   # Recipe configuration files
โ”‚   โ”œโ”€โ”€ training/              # Training recipes
โ”‚   โ”œโ”€โ”€ inference/             # Inference recipes
โ”‚   โ””โ”€โ”€ evaluation/            # Evaluation recipes
โ”œโ”€โ”€ configs/                   # Core configuration files
โ”‚   โ””โ”€โ”€ deepspeed/             # DeepSpeed ZeRO configurations
โ”œโ”€โ”€ utils/                     # Utility modules
โ”œโ”€โ”€ logs/                      # Training logs (auto-generated)
โ””โ”€โ”€ outputs/                   # Inference output (auto-generated)
```

## Memory Optimization

FAI-RL supports various techniques to train large models efficiently:

| Technique | Memory Usage | Speed | Best For |
|-----------|-------------|-------|----------|
| **Full Fine-tuning** | High (100%) | Fastest | Small models, ample GPU memory |
| **LoRA** | Low (~10%) | Fast | Most use cases, balanced efficiency |
| **QLoRA** | Very Low (~25% of LoRA) | Medium | Large models (7B+) on consumer GPUs |
| **DeepSpeed ZeRO-3** | Distributed | Variable | Models exceeding single GPU capacity |

### Example Memory Requirements

- **Llama-3 8B Full**: ~32GB VRAM
- **Llama-3 8B LoRA**: ~12GB VRAM
- **Llama-3 8B QLoRA**: ~6GB VRAM

## ๐Ÿงช System Requirements

### Validated on Hardware

This framework has been validated on:

* **Instance:** AWS EC2 p4d.24xlarge
* **GPUs:** 8 x NVIDIA A100-SXM4-80GB (80GB VRAM each)
* **CPU:** 96 vCPUs
* **Memory:** 1152 GiB
* **Storage:** 8TB NVMe SSD
* **Network:** 400 Gbps

## โญ For Maintainers

<details>

### Publishing a New Release

1. Update version in `pyproject.toml`:
```toml
[project]
name = "FAI-RL"
version = "X.Y.Z"  # Update version here
```

2. Build and publish:
```bash
# Install build tools
pip install --upgrade pip build twine

# Clean previous builds
rm -rf dist/ build/ *.egg-info

# Build the package
python -m build

# Upload to PyPI (requires credentials)
python -m twine upload dist/*
```

</details>

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "FAI-RL",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "reinforcement learning, language models, transformers, rlhf, dpo, ppo, sft",
    "author": null,
    "author_email": "Roblox <ylim@roblox.com>, Roblox <mnandwana@roblox.com>",
    "download_url": "https://files.pythonhosted.org/packages/8a/71/3b2fa71ac9ebf1064125af56cc92f7114ab152d6ec27f4255d2105c35853/fai_rl-0.1.7.tar.gz",
    "platform": null,
    "description": "# FAI-RL: Foundation AI - Reinforcement Learning\n\n**FAI-RL** is a modular, production-ready library for training, inference, and evaluation of large language models using state-of-the-art reinforcement learning methods.\n\n## Overview\n\nFAI-RL provides a unified framework for fine-tuning language models with multiple RL algorithms, featuring:\n\n- \ud83c\udfaf **Multiple RL Algorithms**: SFT, DPO, PPO, GRPO, GSPO\n- \ud83d\ude80 **Production Ready**: Battle-tested on large-scale deployments\n- \ud83d\udce6 **Easy to Use**: Simple YAML configuration and CLI interface\n- \u26a1 **Memory Efficient**: LoRA, QLoRA, and DeepSpeed ZeRO-3 support\n- \ud83d\udd27 **Modular Design**: Extensible architecture for custom implementations\n\n## Table of Contents\n\n- [Installation](#-installation)\n- [Quick Start](#-quick-start)\n  - [Training](#training)\n  - [Inference](#inference)\n  - [Evaluation](#evaluation)\n- [Supported Methods](#supported-methods)\n- [Key Features](#key-features)\n- [Project Structure](#-project-structure)\n- [Memory Optimization](#memory-optimization)\n- [System Requirements](#-system-requirements)\n\n## \ud83d\udce6 Installation\n\nInstall FAI-RL from PyPI:\n\n```bash\npip install --extra-index-url https://download.pytorch.org/whl/cu118 FAI-RL\n```\n\nFor development installation:\n\n```bash\ngit clone https://github.com/your-org/FAI-RL-OSS.git\ncd FAI-RL-OSS\npip install -e .\n```\n\n**PyPI Package**: [https://pypi.org/project/FAI-RL/](https://pypi.org/project/FAI-RL/)\n\nFor detailed installation instructions, see [INSTALL.md](INSTALL.md).\n\n## \ud83d\ude80 Quick Start\n\n### Training\n\nTrain a model using SFT, DPO, PPO, GRPO, or GSPO:\n\n```bash\n# Single GPU training\nfai-rl-train --recipe recipes/training/sft/llama3_3B_lora.yaml --num-gpus 1\n```\n\n\ud83d\udcd6 **[See detailed Training Guide \u2192](./trainers/README.md)**\n\n### Inference\n\nGenerate responses from your trained models:\n\n```bash\n# Run inference with debug mode\nfai-rl-inference --recipe recipes/inference/llama3_3B.yaml --debug\n```\n\n\ud83d\udcd6 **[See detailed Inference Guide \u2192](./inference/README.md)**\n\n### Evaluation\n\nEvaluate model performance on benchmarks:\n\n```bash\n# Evaluate with debug output\nfai-rl-eval --recipe recipes/evaluation/mmlu/llama3_3B.yaml --debug\n```\n\n\ud83d\udcd6 **[See detailed Evaluation Guide \u2192](./evaluations/README.md)**\n\n## Supported Methods\n\nFAI-RL implements state-of-the-art reinforcement learning algorithms for language model fine-tuning:\n\n| Method | Description | Use Case |\n|--------|-------------|----------|\n| **SFT** | Supervised Fine-Tuning | Initial instruction tuning on high-quality datasets |\n| **DPO** | Direct Preference Optimization | Align models with human preferences without reward models |\n| **PPO** | Proximal Policy Optimization | Classic RL approach with value functions and rewards |\n| **GRPO** | Group Relative Preference Optimization | Efficient preference learning with group-based sampling |\n| **GSPO** | Group Sequence Policy Optimization | Advanced sequence-level optimization |\n\nEach method supports:\n- \u2705 Full fine-tuning\n- \u2705 LoRA (Low-Rank Adaptation)\n- \u2705 QLoRA (4-bit Quantized LoRA)\n- \u2705 Multi-GPU training with DeepSpeed\n\n## Key Features\n\n### \ud83c\udfaf **Flexible Configuration System**\n- YAML-based configuration for all training parameters\n- Pre-configured recipes for popular models (Llama, Qwen, etc.)\n- Easy hyperparameter tuning and experimentation\n\n### \ud83d\udd27 **Modular Architecture**\n- Extensible trainer base classes\n- Custom reward functions\n- Pluggable dataset templates\n- Easy integration with HuggingFace ecosystem\n\n\n## \ud83d\udcc1 Project Structure\n\n```\nFAI-RL/\n\u251c\u2500\u2500 core/                      # Core framework components\n\u251c\u2500\u2500 trainers/                  # Training method implementations\n\u251c\u2500\u2500 inference/                 # Inference components\n\u251c\u2500\u2500 evaluations/               # Evaluation system\n\u251c\u2500\u2500 recipes/                   # Recipe configuration files\n\u2502   \u251c\u2500\u2500 training/              # Training recipes\n\u2502   \u251c\u2500\u2500 inference/             # Inference recipes\n\u2502   \u2514\u2500\u2500 evaluation/            # Evaluation recipes\n\u251c\u2500\u2500 configs/                   # Core configuration files\n\u2502   \u2514\u2500\u2500 deepspeed/             # DeepSpeed ZeRO configurations\n\u251c\u2500\u2500 utils/                     # Utility modules\n\u251c\u2500\u2500 logs/                      # Training logs (auto-generated)\n\u2514\u2500\u2500 outputs/                   # Inference output (auto-generated)\n```\n\n## Memory Optimization\n\nFAI-RL supports various techniques to train large models efficiently:\n\n| Technique | Memory Usage | Speed | Best For |\n|-----------|-------------|-------|----------|\n| **Full Fine-tuning** | High (100%) | Fastest | Small models, ample GPU memory |\n| **LoRA** | Low (~10%) | Fast | Most use cases, balanced efficiency |\n| **QLoRA** | Very Low (~25% of LoRA) | Medium | Large models (7B+) on consumer GPUs |\n| **DeepSpeed ZeRO-3** | Distributed | Variable | Models exceeding single GPU capacity |\n\n### Example Memory Requirements\n\n- **Llama-3 8B Full**: ~32GB VRAM\n- **Llama-3 8B LoRA**: ~12GB VRAM\n- **Llama-3 8B QLoRA**: ~6GB VRAM\n\n## \ud83e\uddea System Requirements\n\n### Validated on Hardware\n\nThis framework has been validated on:\n\n* **Instance:** AWS EC2 p4d.24xlarge\n* **GPUs:** 8 x NVIDIA A100-SXM4-80GB (80GB VRAM each)\n* **CPU:** 96 vCPUs\n* **Memory:** 1152 GiB\n* **Storage:** 8TB NVMe SSD\n* **Network:** 400 Gbps\n\n## \u2b50 For Maintainers\n\n<details>\n\n### Publishing a New Release\n\n1. Update version in `pyproject.toml`:\n```toml\n[project]\nname = \"FAI-RL\"\nversion = \"X.Y.Z\"  # Update version here\n```\n\n2. Build and publish:\n```bash\n# Install build tools\npip install --upgrade pip build twine\n\n# Clean previous builds\nrm -rf dist/ build/ *.egg-info\n\n# Build the package\npython -m build\n\n# Upload to PyPI (requires credentials)\npython -m twine upload dist/*\n```\n\n</details>\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Foundation of AI - Reinforcement learning Library",
    "version": "0.1.7",
    "project_urls": {
        "Documentation": "https://github.com/Roblox/FAI-RL#readme",
        "Homepage": "https://github.com/Roblox/FAI-RL",
        "Issues": "https://github.com/Roblox/FAI-RL/issues",
        "Repository": "https://github.com/Roblox/FAI-RL"
    },
    "split_keywords": [
        "reinforcement learning",
        " language models",
        " transformers",
        " rlhf",
        " dpo",
        " ppo",
        " sft"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "49a37176f2726365e532e582b96d94b986a5623962889d21bd4f46844d299038",
                "md5": "b7003ebf20d7767183efd8145ad52a25",
                "sha256": "9f6c123fa7ed4d3bb451ded6b2ad4dd2b7c13f29c67656b19ff129d584e0bc7f"
            },
            "downloads": -1,
            "filename": "fai_rl-0.1.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b7003ebf20d7767183efd8145ad52a25",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 102602,
            "upload_time": "2025-10-23T22:22:42",
            "upload_time_iso_8601": "2025-10-23T22:22:42.248504Z",
            "url": "https://files.pythonhosted.org/packages/49/a3/7176f2726365e532e582b96d94b986a5623962889d21bd4f46844d299038/fai_rl-0.1.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8a713b2fa71ac9ebf1064125af56cc92f7114ab152d6ec27f4255d2105c35853",
                "md5": "43d2bddae422770791e6b602a75b99af",
                "sha256": "47854969ff691e1645094f426856a3f8d26df660b9232dc229f050334fe640b2"
            },
            "downloads": -1,
            "filename": "fai_rl-0.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "43d2bddae422770791e6b602a75b99af",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 54626,
            "upload_time": "2025-10-23T22:22:43",
            "upload_time_iso_8601": "2025-10-23T22:22:43.205835Z",
            "url": "https://files.pythonhosted.org/packages/8a/71/3b2fa71ac9ebf1064125af56cc92f7114ab152d6ec27f4255d2105c35853/fai_rl-0.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-23 22:22:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Roblox",
    "github_project": "FAI-RL#readme",
    "github_not_found": true,
    "lcname": "fai-rl"
}
        
Elapsed time: 2.31096s