nemo-aligner

Name	nemo-aligner JSON
Version	0.6.0 JSON
	download
home_page	https://github.com/NVIDIA/NeMo-Aligner
Summary	NeMo-Aligner - a toolkit for model alignment
upload_time	2025-01-07 23:05:48
maintainer	NVIDIA
docs_url	None
author	NVIDIA
requires_python	None
license	Apache2
keywords	deep learning machine learning gpu nlp nemo nvidia pytorch torch language reinforcement learning rlhf preference modeling steerlm dpo
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # NVIDIA NeMo-Aligner

# Latest News
- We released Nemotron-4-340B [Base](https://huggingface.co/nvidia/Nemotron-4-340B-Base), [Instruct](https://huggingface.co/nvidia/Nemotron-4-340B-Instruct), [Reward](https://huggingface.co/nvidia/Nemotron-4-340B-Reward). The Instruct and Reward variants are trained in Nemo-Aligner. Please see the [Helpsteer2](https://arxiv.org/abs/2406.08673) paper for more details on the reward model training.
- We are excited to announce the release of accelerated generation support in our RLHF pipeline using [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). For more information, please refer to our [RLHF documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/rlhf.html).
- [NeMo-Aligner Paper](https://arxiv.org/abs/2405.01481) is now out on arxiv!

## Introduction

NeMo-Aligner is a scalable toolkit for efficient model alignment. The toolkit has support for state-of-the-art model alignment algorithms such as SteerLM, DPO, and Reinforcement Learning from Human Feedback (RLHF). These algorithms enable users to align language models to be more safe, harmless, and helpful. Users can perform end-to-end model alignment on a wide range of model sizes and take advantage of all the parallelism techniques to ensure their model alignment is done in a performant and resource-efficient manner. For more technical details, please refer to our [paper](https://arxiv.org/abs/2405.01481).

The NeMo-Aligner toolkit is built using the NeMo Framework, which enables scalable training across thousands of GPUs using tensor, data, and pipeline parallelism for all alignment components. Additionally, our checkpoints are cross-compatible with the NeMo ecosystem, facilitating inference deployment and further customization (https://github.com/NVIDIA/NeMo-Aligner).

The toolkit is currently in it's early stages. We are committed to improving the toolkit to make it easier for developers to pick and choose different alignment algorithms to build safe, helpful, and reliable models.

## Key Features

* **SteerLM: Attribute Conditioned SFT as an (User-Steerable) alternative to RLHF.** 
    * [Llama3-70B-SteerLM-Chat](https://huggingface.co/nvidia/Llama3-70B-SteerLM-Chat) aligned with NeMo-Aligner.
    * Corresponding reward model [Llama3-70B-SteerLM-RM](https://huggingface.co/nvidia/Llama3-70B-SteerLM-RM).
    * Learn more at our [SteerLM](https://arxiv.org/abs/2310.05344) and [HelpSteer2](https://arxiv.org/abs/2406.08673) papers.
* **Supervised Fine Tuning**
* **Reward Model Training**
* **Reinforcement Learning from Human Feedback using the [PPO](https://arxiv.org/pdf/1707.06347.pdf) Algorithm**
    * [Llama3-70B-PPO-Chat](https://huggingface.co/nvidia/Llama3-70B-PPO-Chat) aligned with NeMo-Aligner using TRT-LLM.
* **Reinforcement Learning from Human Feedback using the REINFORCE Algorithm**
    * [Llama-3.1-Nemotron-70B-Instruct](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct) aligned with NeMo-Aligner using TRT-LLM.
* **Direct Preference Optimization** as described in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/pdf/2305.18290)
    * [Llama3-70B-DPO-Chat](https://huggingface.co/nvidia/Llama3-70B-DPO-Chat) aligned with NeMo Aligner.
* **Self-Play Fine-Tuning (SPIN)** as described in [Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models](https://arxiv.org/pdf/2401.01335)

## Learn More
* [Documentation](https://github.com/NVIDIA/NeMo-Aligner/blob/main/docs/README.md)
* [Examples](https://github.com/NVIDIA/NeMo-Aligner/tree/main/examples/nlp/gpt)
* [Tutorials](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/index.html)
* [Paper](https://arxiv.org/abs/2405.01481)

## Latest Release

For the latest stable release, please see the [releases page](https://github.com/NVIDIA/NeMo-Aligner/releases). All releases come with a pre-built container. Changes within each release will be documented in [CHANGELOG](https://github.com/NVIDIA/NeMo-Aligner/blob/main/CHANGELOG.md).

## Install Your Own Environment

### Requirements
NeMo-Aligner has the same requirements as the [NeMo Toolkit Requirements](https://github.com/NVIDIA/NeMo#requirements) with the addition of [PyTriton](https://github.com/triton-inference-server/pytriton).

### Quick start inside NeMo container
NeMo Aligner comes included with NeMo containers. On a machine with NVIDIA GPUs and drivers installed run NeMo container:
```bash
docker run --gpus all -it --rm --shm-size=8g --ulimit memlock=-1 --ulimit stack=67108864  nvcr.io/nvidia/nemo:24.07
```
Once you are inside the container, NeMo-Aligner is already installed and together with NeMo and other tools can be found under ```/opt/``` folder.

### Install NeMo-Aligner
Please follow the same steps as outlined in the [NeMo Toolkit Installation Guide](https://github.com/NVIDIA/NeMo#installation).  After installing NeMo, execute the following additional command:
```bash
pip install nemo-aligner
```
Alternatively, if you prefer to install the latest commit:
```bash
pip install .
```

### Docker Containers

We provide an official NeMo-Aligner Dockerfile which is based on stable, tested versions of NeMo, Megatron-LM, and TransformerEngine. The primary objective of this Dockerfile is to ensure stability, although it might not always reflect the very latest versions of those three packages. You can access our Dockerfile [here](https://github.com/NVIDIA/NeMo-Aligner/blob/main/Dockerfile).

Alternatively, you can build the NeMo Dockerfile here [NeMo Dockerfile](https://github.com/NVIDIA/NeMo/blob/main/Dockerfile) and add `RUN pip install nemo-aligner` at the end.

## Future work
- We will continue improving the stability of the PPO learning phase.
- Improve the performance of RLHF.
- Add TRT-LLM inference support for Rejection Sampling.

## Contribute to NeMo-Aligner
We welcome community contributions! Please refer to [CONTRIBUTING.md](https://github.com/NVIDIA/NeMo-Aligner/blob/main/CONTRIBUTING.md) for guidelines.

## Cite NeMo-Aligner in Your Work
```
@misc{shen2024nemoaligner,
      title={NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment},
      author={Gerald Shen and Zhilin Wang and Olivier Delalleau and Jiaqi Zeng and Yi Dong and Daniel Egert and Shengyang Sun and Jimmy Zhang and Sahil Jain and Ali Taghibakhshi and Markel Sanz Ausin and Ashwath Aithal and Oleksii Kuchaiev},
      year={2024},
      eprint={2405.01481},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

## License
This toolkit is licensed under the [Apache License, Version 2.0.](https://github.com/NVIDIA/NeMo-Aligner/blob/main/LICENSE)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/NVIDIA/NeMo-Aligner",
    "name": "nemo-aligner",
    "maintainer": "NVIDIA",
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": "nemo-toolkit@nvidia.com",
    "keywords": "deep learning, machine learning, gpu, NLP, NeMo, nvidia, pytorch, torch, language, reinforcement learning, RLHF, preference modeling, SteerLM, DPO",
    "author": "NVIDIA",
    "author_email": "nemo-toolkit@nvidia.com",
    "download_url": "https://files.pythonhosted.org/packages/7f/67/bb501a8cdd1d5ecd172f2b7f20ca53c56f81514e1c7d62b5364025495170/nemo_aligner-0.6.0.tar.gz",
    "platform": null,
    "description": "# NVIDIA NeMo-Aligner\n\n# Latest News\n- We released Nemotron-4-340B [Base](https://huggingface.co/nvidia/Nemotron-4-340B-Base), [Instruct](https://huggingface.co/nvidia/Nemotron-4-340B-Instruct), [Reward](https://huggingface.co/nvidia/Nemotron-4-340B-Reward). The Instruct and Reward variants are trained in Nemo-Aligner. Please see the [Helpsteer2](https://arxiv.org/abs/2406.08673) paper for more details on the reward model training.\n- We are excited to announce the release of accelerated generation support in our RLHF pipeline using [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). For more information, please refer to our [RLHF documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/rlhf.html).\n- [NeMo-Aligner Paper](https://arxiv.org/abs/2405.01481) is now out on arxiv!\n\n## Introduction\n\nNeMo-Aligner is a scalable toolkit for efficient model alignment. The toolkit has support for state-of-the-art model alignment algorithms such as SteerLM, DPO, and Reinforcement Learning from Human Feedback (RLHF). These algorithms enable users to align language models to be more safe, harmless, and helpful. Users can perform end-to-end model alignment on a wide range of model sizes and take advantage of all the parallelism techniques to ensure their model alignment is done in a performant and resource-efficient manner. For more technical details, please refer to our [paper](https://arxiv.org/abs/2405.01481).\n\nThe NeMo-Aligner toolkit is built using the NeMo Framework, which enables scalable training across thousands of GPUs using tensor, data, and pipeline parallelism for all alignment components. Additionally, our checkpoints are cross-compatible with the NeMo ecosystem, facilitating inference deployment and further customization (https://github.com/NVIDIA/NeMo-Aligner).\n\nThe toolkit is currently in it's early stages. We are committed to improving the toolkit to make it easier for developers to pick and choose different alignment algorithms to build safe, helpful, and reliable models.\n\n## Key Features\n\n* **SteerLM: Attribute Conditioned SFT as an (User-Steerable) alternative to RLHF.** \n    * [Llama3-70B-SteerLM-Chat](https://huggingface.co/nvidia/Llama3-70B-SteerLM-Chat) aligned with NeMo-Aligner.\n    * Corresponding reward model [Llama3-70B-SteerLM-RM](https://huggingface.co/nvidia/Llama3-70B-SteerLM-RM).\n    * Learn more at our [SteerLM](https://arxiv.org/abs/2310.05344) and [HelpSteer2](https://arxiv.org/abs/2406.08673) papers.\n* **Supervised Fine Tuning**\n* **Reward Model Training**\n* **Reinforcement Learning from Human Feedback using the [PPO](https://arxiv.org/pdf/1707.06347.pdf) Algorithm**\n    * [Llama3-70B-PPO-Chat](https://huggingface.co/nvidia/Llama3-70B-PPO-Chat) aligned with NeMo-Aligner using TRT-LLM.\n* **Reinforcement Learning from Human Feedback using the REINFORCE Algorithm**\n    * [Llama-3.1-Nemotron-70B-Instruct](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct) aligned with NeMo-Aligner using TRT-LLM.\n* **Direct Preference Optimization** as described in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/pdf/2305.18290)\n    * [Llama3-70B-DPO-Chat](https://huggingface.co/nvidia/Llama3-70B-DPO-Chat) aligned with NeMo Aligner.\n* **Self-Play Fine-Tuning (SPIN)** as described in [Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models](https://arxiv.org/pdf/2401.01335)\n\n## Learn More\n* [Documentation](https://github.com/NVIDIA/NeMo-Aligner/blob/main/docs/README.md)\n* [Examples](https://github.com/NVIDIA/NeMo-Aligner/tree/main/examples/nlp/gpt)\n* [Tutorials](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/index.html)\n* [Paper](https://arxiv.org/abs/2405.01481)\n\n## Latest Release\n\nFor the latest stable release, please see the [releases page](https://github.com/NVIDIA/NeMo-Aligner/releases). All releases come with a pre-built container. Changes within each release will be documented in [CHANGELOG](https://github.com/NVIDIA/NeMo-Aligner/blob/main/CHANGELOG.md).\n\n## Install Your Own Environment\n\n### Requirements\nNeMo-Aligner has the same requirements as the [NeMo Toolkit Requirements](https://github.com/NVIDIA/NeMo#requirements) with the addition of [PyTriton](https://github.com/triton-inference-server/pytriton).\n\n### Quick start inside NeMo container\nNeMo Aligner comes included with NeMo containers. On a machine with NVIDIA GPUs and drivers installed run NeMo container:\n```bash\ndocker run --gpus all -it --rm --shm-size=8g --ulimit memlock=-1 --ulimit stack=67108864  nvcr.io/nvidia/nemo:24.07\n```\nOnce you are inside the container, NeMo-Aligner is already installed and together with NeMo and other tools can be found under ```/opt/``` folder.\n\n### Install NeMo-Aligner\nPlease follow the same steps as outlined in the [NeMo Toolkit Installation Guide](https://github.com/NVIDIA/NeMo#installation).  After installing NeMo, execute the following additional command:\n```bash\npip install nemo-aligner\n```\nAlternatively, if you prefer to install the latest commit:\n```bash\npip install .\n```\n\n### Docker Containers\n\nWe provide an official NeMo-Aligner Dockerfile which is based on stable, tested versions of NeMo, Megatron-LM, and TransformerEngine. The primary objective of this Dockerfile is to ensure stability, although it might not always reflect the very latest versions of those three packages. You can access our Dockerfile [here](https://github.com/NVIDIA/NeMo-Aligner/blob/main/Dockerfile).\n\nAlternatively, you can build the NeMo Dockerfile here [NeMo Dockerfile](https://github.com/NVIDIA/NeMo/blob/main/Dockerfile) and add `RUN pip install nemo-aligner` at the end.\n\n## Future work\n- We will continue improving the stability of the PPO learning phase.\n- Improve the performance of RLHF.\n- Add TRT-LLM inference support for Rejection Sampling.\n\n## Contribute to NeMo-Aligner\nWe welcome community contributions! Please refer to [CONTRIBUTING.md](https://github.com/NVIDIA/NeMo-Aligner/blob/main/CONTRIBUTING.md) for guidelines.\n\n## Cite NeMo-Aligner in Your Work\n```\n@misc{shen2024nemoaligner,\n      title={NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment},\n      author={Gerald Shen and Zhilin Wang and Olivier Delalleau and Jiaqi Zeng and Yi Dong and Daniel Egert and Shengyang Sun and Jimmy Zhang and Sahil Jain and Ali Taghibakhshi and Markel Sanz Ausin and Ashwath Aithal and Oleksii Kuchaiev},\n      year={2024},\n      eprint={2405.01481},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n\n## License\nThis toolkit is licensed under the [Apache License, Version 2.0.](https://github.com/NVIDIA/NeMo-Aligner/blob/main/LICENSE)\n",
    "bugtrack_url": null,
    "license": "Apache2",
    "summary": "NeMo-Aligner - a toolkit for model alignment",
    "version": "0.6.0",
    "project_urls": {
        "Download": "https://github.com/NVIDIA/NeMo-Aligner/releases",
        "Homepage": "https://github.com/NVIDIA/NeMo-Aligner"
    },
    "split_keywords": [
        "deep learning",
        " machine learning",
        " gpu",
        " nlp",
        " nemo",
        " nvidia",
        " pytorch",
        " torch",
        " language",
        " reinforcement learning",
        " rlhf",
        " preference modeling",
        " steerlm",
        " dpo"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bccb8bc6ee2afc2186fff5265a1005951991b5c3b2a3bf90b39b87bd33d6a643",
                "md5": "03de873443c9318d414938e599163f25",
                "sha256": "e8d87324b5223817c9b93de3b6c584548fb53baafd2554b8b0a3b792fe173b91"
            },
            "downloads": -1,
            "filename": "nemo_aligner-0.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "03de873443c9318d414938e599163f25",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 196862,
            "upload_time": "2025-01-07T23:05:44",
            "upload_time_iso_8601": "2025-01-07T23:05:44.776268Z",
            "url": "https://files.pythonhosted.org/packages/bc/cb/8bc6ee2afc2186fff5265a1005951991b5c3b2a3bf90b39b87bd33d6a643/nemo_aligner-0.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7f67bb501a8cdd1d5ecd172f2b7f20ca53c56f81514e1c7d62b5364025495170",
                "md5": "aba1481c036355715ac3b8535368fc02",
                "sha256": "ca2f7734240335af083c12caf62f5a219afa7b3f7d94b30081d7c40f0c618a76"
            },
            "downloads": -1,
            "filename": "nemo_aligner-0.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "aba1481c036355715ac3b8535368fc02",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 133903,
            "upload_time": "2025-01-07T23:05:48",
            "upload_time_iso_8601": "2025-01-07T23:05:48.361695Z",
            "url": "https://files.pythonhosted.org/packages/7f/67/bb501a8cdd1d5ecd172f2b7f20ca53c56f81514e1c7d62b5364025495170/nemo_aligner-0.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-07 23:05:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "NVIDIA",
    "github_project": "NeMo-Aligner",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "nemo-aligner"
}

NVIDIA