rebel-forge


Namerebel-forge JSON
Version 0.10.13 PyPI version JSON
download
home_pageNone
SummaryConfig-driven QLoRA/LoRA fine-tuning toolkit for Rebel Forge
upload_time2025-10-11 03:46:52
maintainerNone
docs_urlNone
authorRebel AI
requires_python>=3.9
licenseNone
keywords qlora lora fine-tuning transformers rebel
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # rebel-forge

`rebel-forge` is a config-driven QLoRA/LoRA fine-tuning toolkit that runs smoothly on the Nebius GPU stack. It wraps the Hugging Face Transformers + PEFT workflow so teams can fine-tune hosted or user-provided models with a single command.

## Installation

`rebel-forge` targets Python 3.9 and newer. The base install ships just the configuration and dataset tooling so you can bring the exact PyTorch build you need.

### Minimal install

```bash
pip install rebel-forge
```

This installs the config/CLI plumbing plus `transformers`, `peft`, and `datasets`. Choose a runtime extra (or your own PyTorch wheel) when you know whether you need CPU-only or CUDA acceleration.

### Optional extras

```bash
# CPU-only wheels from PyPI
pip install rebel-forge[cpu]

# CUDA wheels (use the official PyTorch index if desired)
pip install rebel-forge[cuda] --extra-index-url https://download.pytorch.org/whl/cu121
```

### From source

```bash
git clone <repo-url>
cd rebel-forge
pip install -e .
```

### Export installed sources

`pip install rebel-forge` automatically drops a read-only copy to `~/rebel-forge`. Use the helper below to duplicate it elsewhere or refresh the snapshot.

```bash
rebel-forge source --dest ./rebel-forge-src
```

This copies the installed Python package into `./rebel-forge-src` so you can inspect or version-control the exact training scripts. Pass `--force` to overwrite an existing export.


## First run onboarding

Running `rebel-forge` launches a guided onboarding banner, exports the workspace into `~/rebel-forge`, and opens the Clerk portal at `http://localhost:3000/cli?token=…` (configurable via `.env.local`). Zero-argument runs render a compact “Welcome to Rebel” card with a single `Sign in with Rebel` button; press Enter and the CLI opens the portal with a fresh token and keeps the terminal watcher running until Clerk confirms the link. The CLI auto-starts `npm run dev` when it cannot detect the frontend, unlocks automatically after Clerk sign-in, and writes `~/.rebel-forge/onboarding.done` so future runs skip the blocking wizard. Automation helpers: set `REBEL_FORGE_SKIP_ONBOARDING=1` to bypass entirely or `REBEL_FORGE_AUTO_UNLOCK=1` (optionally `REBEL_FORGE_HANDSHAKE_USER`) to create the handshake file non-interactively.

## Usage

Prepare an INI/`.conf` file that names your base model, datasets, and training preferences. Then launch training with:

```bash
rebel-forge --config path/to/run.conf
```

The CLI infers sensible defaults (epochs, LoRA hyperparameters, dataset splits, etc.) and stores summaries plus adapter checkpoints inside the configured `output_dir`.

## Example configuration

```ini
[model]
base_model = meta-llama/Llama-3.1-8B
output_dir = /mnt/checkpoints/llama-3.1-chat
quant_type = nf4

[data]
format = plain
train_data = /mnt/datasets/fta/train.jsonl
eval_data = /mnt/datasets/fta/val.jsonl
text_column = text

[training]
batch_size = 2
epochs = 3
learning_rate = 2e-4
warmup_ratio = 0.05
save_steps = 250

[lora]
lora_r = 64
lora_alpha = 16
lora_dropout = 0.05
```

## Key features

- Optional 4-bit QLoRA via bitsandbytes (install `rebel-forge[cuda]` or add `bitsandbytes` manually)
- Dataset auto-loading for JSON/JSONL/CSV/TSV/local directories and Hugging Face Hub references
- Configurable LoRA target modules, quantization type, and training hyperparameters
- One-line Nebius provisioning (`forge.device(...)`) that spins up a fresh GPU VM on demand
- Summary JSON + adapter checkpoints emitted for downstream pipelines (Convex sync, artifact uploads, etc.)

## Development

```bash
python -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
```

## Nebius Remote Execution

Run `python -m rebel_forge.sample` after installation to push a Torch demo onto Nebius GPUs.


### Quick GPU smoke test

After `pip install rebel-forge`, run the packaged sampler:

```bash
python -m rebel_forge.sample
```

The helper syncs your project (using `forge.ensure_remote()`), relaunches on Nebius, and trains a tiny Torch model on CUDA.

`rebel-forge` ships a remote orchestrator so any Python project can offload execution to the Nebius GPU VM with a single helper call.

```python
import rebel_forge as forge

forge.ensure_remote()  # syncs and re-runs the script remotely on Nebius

# your existing training code stays untouched below this line
```

Configuration relies on the `FORGE_REMOTE_*` variables (falling back to the existing `NEBIUS_*` keys):

- `FORGE_REMOTE_HOST` / `NEBIUS_HOST`
- `FORGE_REMOTE_USER` / `NEBIUS_USERNAME`
- `FORGE_REMOTE_PORT` / `NEBIUS_PORT`
- `FORGE_REMOTE_KEY_PATH` or a `.nebius_key` file for the SSH identity
- `FORGE_REMOTE_VENV` (defaults to `~/venvs/rebel-forge`)
- `FORGE_REMOTE_ROOT` (defaults to `~/forge_runs`)

`forge.ensure_remote()` rsyncs the project tree (excluding caches, build artefacts, and virtualenvs), copies optional `.env` secrets, and relaunches the entrypoint on Nebius while streaming logs back to STDOUT. Once on the VM the helper is a no-op because the flag `FORGE_REMOTE_ACTIVE=1` is auto-set.

Need bespoke orchestration? Build a config and invoke commands directly:

```python
import rebel_forge as forge

cfg = forge.RemoteConfig.from_env()
forge.run_remote_command(cfg, ["python", "-m", "torch.utils.collect_env"])
```

## On-demand Nebius provisioning

Swap your manual `torch.device` selection for a call into Rebel Forge and the
library will stand up a Nebius VM, inject your SSH credentials, and re-run the
script remotely:

```python
import rebel_forge as forge

device = forge.device("h200", storage_gib=512, count=1)  # count defaults per platform

# from here on you can use ``device`` exactly like ``torch.device("cuda")``
model.to(device)
```

Behind the scenes the helper performs the following steps when invoked from
your local environment:

1. Configures the Nebius CLI using the service-account credentials provided
   via environment variables.
2. Creates a boot disk from your preferred image (defaults to
   `ubuntu24.04-cuda12.0.2`) sized according to `storage_gib`.
3. Launches a VM on the requested GPU platform/count inside your Nebius
   project (auto-selecting the correct Nebius preset) and waits for SSH to
   become available.
4. Updates the `NEBIUS_*` environment variables and calls
   `forge.ensure_remote()` so the remainder of the script executes on the new
   instance.

When the code re-executes on the VM, `forge.device(...)` simply returns
`torch.device("cuda")` so the rest of your training script behaves exactly as
before.

### Authentication

After the CLI has been linked to your Rebel account, `forge.device()` requests
an ephemeral provisioning bundle directly from the Rebel portal – you do not
need to copy Nebius credentials into your code. The helper now requires an
active CLI session; run `rebel-forge` and complete the sign-in flow before
invoking it. For development or offline work you can still provide the legacy
overrides:

- `project_id`, `service_account_id`, `Authorized_key`, `AUTHORIZED_KEY_PRIVATE`
  – explicit Nebius service-account details.
- `ssh_key_public` / `ssh_key_private` – custom SSH key pair to install on the
  VM (automatically supplied by the portal otherwise).
- `NEBIUS_SUBNET_ID`, `NEBIUS_IMAGE_ID`, `NEBIUS_ENDPOINT` – optional
  provisioning overrides.

Ensure the Nebius CLI (`nebius`) is on your `PATH`. The first call will install
an ephemeral profile under `~/.nebius/` using the retrieved credentials.

#### Credential caching

`rebel-forge` stores the provisioning bundle inside the system keyring whenever
possible so the Nebius keys never touch disk. When no keyring backend is
available the bundle falls back to a 0600-scoped cache under
`~/.rebel-forge/bundle.json`. Run `rebel-forge logout` (optionally with
`--quiet`) to wipe cached credentials and require a fresh portal handshake.

### Cleaning up

Provisioning currently leaves the instance running after your training script
completes. You can tear it down with the Nebius CLI:

```bash
# delete the VM
nebius compute instance delete "$FORGE_ACTIVE_INSTANCE_ID" --async=false

# delete the boot disk if you no longer need it
nebius compute disk delete <boot-disk-id> --async=false
```

Future releases will add a convenience helper for reclaiming the VM
automatically.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rebel-forge",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "qlora, lora, fine-tuning, transformers, rebel",
    "author": "Rebel AI",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/7e/1a/d63f308d674fb506f90b2d1b4bb809ae1f4510cdac37a65f347ff91cd9e2/rebel_forge-0.10.13.tar.gz",
    "platform": null,
    "description": "# rebel-forge\n\n`rebel-forge` is a config-driven QLoRA/LoRA fine-tuning toolkit that runs smoothly on the Nebius GPU stack. It wraps the Hugging Face Transformers + PEFT workflow so teams can fine-tune hosted or user-provided models with a single command.\n\n## Installation\n\n`rebel-forge` targets Python 3.9 and newer. The base install ships just the configuration and dataset tooling so you can bring the exact PyTorch build you need.\n\n### Minimal install\n\n```bash\npip install rebel-forge\n```\n\nThis installs the config/CLI plumbing plus `transformers`, `peft`, and `datasets`. Choose a runtime extra (or your own PyTorch wheel) when you know whether you need CPU-only or CUDA acceleration.\n\n### Optional extras\n\n```bash\n# CPU-only wheels from PyPI\npip install rebel-forge[cpu]\n\n# CUDA wheels (use the official PyTorch index if desired)\npip install rebel-forge[cuda] --extra-index-url https://download.pytorch.org/whl/cu121\n```\n\n### From source\n\n```bash\ngit clone <repo-url>\ncd rebel-forge\npip install -e .\n```\n\n### Export installed sources\n\n`pip install rebel-forge` automatically drops a read-only copy to `~/rebel-forge`. Use the helper below to duplicate it elsewhere or refresh the snapshot.\n\n```bash\nrebel-forge source --dest ./rebel-forge-src\n```\n\nThis copies the installed Python package into `./rebel-forge-src` so you can inspect or version-control the exact training scripts. Pass `--force` to overwrite an existing export.\n\n\n## First run onboarding\n\nRunning `rebel-forge` launches a guided onboarding banner, exports the workspace into `~/rebel-forge`, and opens the Clerk portal at `http://localhost:3000/cli?token=\u2026` (configurable via `.env.local`). Zero-argument runs render a compact \u201cWelcome to Rebel\u201d card with a single `Sign in with Rebel` button; press Enter and the CLI opens the portal with a fresh token and keeps the terminal watcher running until Clerk confirms the link. The CLI auto-starts `npm run dev` when it cannot detect the frontend, unlocks automatically after Clerk sign-in, and writes `~/.rebel-forge/onboarding.done` so future runs skip the blocking wizard. Automation helpers: set `REBEL_FORGE_SKIP_ONBOARDING=1` to bypass entirely or `REBEL_FORGE_AUTO_UNLOCK=1` (optionally `REBEL_FORGE_HANDSHAKE_USER`) to create the handshake file non-interactively.\n\n## Usage\n\nPrepare an INI/`.conf` file that names your base model, datasets, and training preferences. Then launch training with:\n\n```bash\nrebel-forge --config path/to/run.conf\n```\n\nThe CLI infers sensible defaults (epochs, LoRA hyperparameters, dataset splits, etc.) and stores summaries plus adapter checkpoints inside the configured `output_dir`.\n\n## Example configuration\n\n```ini\n[model]\nbase_model = meta-llama/Llama-3.1-8B\noutput_dir = /mnt/checkpoints/llama-3.1-chat\nquant_type = nf4\n\n[data]\nformat = plain\ntrain_data = /mnt/datasets/fta/train.jsonl\neval_data = /mnt/datasets/fta/val.jsonl\ntext_column = text\n\n[training]\nbatch_size = 2\nepochs = 3\nlearning_rate = 2e-4\nwarmup_ratio = 0.05\nsave_steps = 250\n\n[lora]\nlora_r = 64\nlora_alpha = 16\nlora_dropout = 0.05\n```\n\n## Key features\n\n- Optional 4-bit QLoRA via bitsandbytes (install `rebel-forge[cuda]` or add `bitsandbytes` manually)\n- Dataset auto-loading for JSON/JSONL/CSV/TSV/local directories and Hugging Face Hub references\n- Configurable LoRA target modules, quantization type, and training hyperparameters\n- One-line Nebius provisioning (`forge.device(...)`) that spins up a fresh GPU VM on demand\n- Summary JSON + adapter checkpoints emitted for downstream pipelines (Convex sync, artifact uploads, etc.)\n\n## Development\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate\npip install -e .[dev]\n```\n\n## Nebius Remote Execution\n\nRun `python -m rebel_forge.sample` after installation to push a Torch demo onto Nebius GPUs.\n\n\n### Quick GPU smoke test\n\nAfter `pip install rebel-forge`, run the packaged sampler:\n\n```bash\npython -m rebel_forge.sample\n```\n\nThe helper syncs your project (using `forge.ensure_remote()`), relaunches on Nebius, and trains a tiny Torch model on CUDA.\n\n`rebel-forge` ships a remote orchestrator so any Python project can offload execution to the Nebius GPU VM with a single helper call.\n\n```python\nimport rebel_forge as forge\n\nforge.ensure_remote()  # syncs and re-runs the script remotely on Nebius\n\n# your existing training code stays untouched below this line\n```\n\nConfiguration relies on the `FORGE_REMOTE_*` variables (falling back to the existing `NEBIUS_*` keys):\n\n- `FORGE_REMOTE_HOST` / `NEBIUS_HOST`\n- `FORGE_REMOTE_USER` / `NEBIUS_USERNAME`\n- `FORGE_REMOTE_PORT` / `NEBIUS_PORT`\n- `FORGE_REMOTE_KEY_PATH` or a `.nebius_key` file for the SSH identity\n- `FORGE_REMOTE_VENV` (defaults to `~/venvs/rebel-forge`)\n- `FORGE_REMOTE_ROOT` (defaults to `~/forge_runs`)\n\n`forge.ensure_remote()` rsyncs the project tree (excluding caches, build artefacts, and virtualenvs), copies optional `.env` secrets, and relaunches the entrypoint on Nebius while streaming logs back to STDOUT. Once on the VM the helper is a no-op because the flag `FORGE_REMOTE_ACTIVE=1` is auto-set.\n\nNeed bespoke orchestration? Build a config and invoke commands directly:\n\n```python\nimport rebel_forge as forge\n\ncfg = forge.RemoteConfig.from_env()\nforge.run_remote_command(cfg, [\"python\", \"-m\", \"torch.utils.collect_env\"])\n```\n\n## On-demand Nebius provisioning\n\nSwap your manual `torch.device` selection for a call into Rebel Forge and the\nlibrary will stand up a Nebius VM, inject your SSH credentials, and re-run the\nscript remotely:\n\n```python\nimport rebel_forge as forge\n\ndevice = forge.device(\"h200\", storage_gib=512, count=1)  # count defaults per platform\n\n# from here on you can use ``device`` exactly like ``torch.device(\"cuda\")``\nmodel.to(device)\n```\n\nBehind the scenes the helper performs the following steps when invoked from\nyour local environment:\n\n1. Configures the Nebius CLI using the service-account credentials provided\n   via environment variables.\n2. Creates a boot disk from your preferred image (defaults to\n   `ubuntu24.04-cuda12.0.2`) sized according to `storage_gib`.\n3. Launches a VM on the requested GPU platform/count inside your Nebius\n   project (auto-selecting the correct Nebius preset) and waits for SSH to\n   become available.\n4. Updates the `NEBIUS_*` environment variables and calls\n   `forge.ensure_remote()` so the remainder of the script executes on the new\n   instance.\n\nWhen the code re-executes on the VM, `forge.device(...)` simply returns\n`torch.device(\"cuda\")` so the rest of your training script behaves exactly as\nbefore.\n\n### Authentication\n\nAfter the CLI has been linked to your Rebel account, `forge.device()` requests\nan ephemeral provisioning bundle directly from the Rebel portal \u2013 you do not\nneed to copy Nebius credentials into your code. The helper now requires an\nactive CLI session; run `rebel-forge` and complete the sign-in flow before\ninvoking it. For development or offline work you can still provide the legacy\noverrides:\n\n- `project_id`, `service_account_id`, `Authorized_key`, `AUTHORIZED_KEY_PRIVATE`\n  \u2013 explicit Nebius service-account details.\n- `ssh_key_public` / `ssh_key_private` \u2013 custom SSH key pair to install on the\n  VM (automatically supplied by the portal otherwise).\n- `NEBIUS_SUBNET_ID`, `NEBIUS_IMAGE_ID`, `NEBIUS_ENDPOINT` \u2013 optional\n  provisioning overrides.\n\nEnsure the Nebius CLI (`nebius`) is on your `PATH`. The first call will install\nan ephemeral profile under `~/.nebius/` using the retrieved credentials.\n\n#### Credential caching\n\n`rebel-forge` stores the provisioning bundle inside the system keyring whenever\npossible so the Nebius keys never touch disk. When no keyring backend is\navailable the bundle falls back to a 0600-scoped cache under\n`~/.rebel-forge/bundle.json`. Run `rebel-forge logout` (optionally with\n`--quiet`) to wipe cached credentials and require a fresh portal handshake.\n\n### Cleaning up\n\nProvisioning currently leaves the instance running after your training script\ncompletes. You can tear it down with the Nebius CLI:\n\n```bash\n# delete the VM\nnebius compute instance delete \"$FORGE_ACTIVE_INSTANCE_ID\" --async=false\n\n# delete the boot disk if you no longer need it\nnebius compute disk delete <boot-disk-id> --async=false\n```\n\nFuture releases will add a convenience helper for reclaiming the VM\nautomatically.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Config-driven QLoRA/LoRA fine-tuning toolkit for Rebel Forge",
    "version": "0.10.13",
    "project_urls": null,
    "split_keywords": [
        "qlora",
        " lora",
        " fine-tuning",
        " transformers",
        " rebel"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ad8cd9c64427fc22328073e0d546b71332ebac202f80ce2d2298cbe0f027a51b",
                "md5": "a3c7f19ff16474bbefae90102cb4bf52",
                "sha256": "b88a5b154be1534caf9e8bab65c17f2c104d2cbb52dea80a1448fd0f8b37f795"
            },
            "downloads": -1,
            "filename": "rebel_forge-0.10.13-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a3c7f19ff16474bbefae90102cb4bf52",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 42733,
            "upload_time": "2025-10-11T03:46:50",
            "upload_time_iso_8601": "2025-10-11T03:46:50.961118Z",
            "url": "https://files.pythonhosted.org/packages/ad/8c/d9c64427fc22328073e0d546b71332ebac202f80ce2d2298cbe0f027a51b/rebel_forge-0.10.13-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7e1ad63f308d674fb506f90b2d1b4bb809ae1f4510cdac37a65f347ff91cd9e2",
                "md5": "07d1819684e379e676333ad04e785a82",
                "sha256": "5ec4bc5a64c32c5fae356f84ade66fbdac460c3e7f365ed5c0ff0357fdc36850"
            },
            "downloads": -1,
            "filename": "rebel_forge-0.10.13.tar.gz",
            "has_sig": false,
            "md5_digest": "07d1819684e379e676333ad04e785a82",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 46972,
            "upload_time": "2025-10-11T03:46:52",
            "upload_time_iso_8601": "2025-10-11T03:46:52.536485Z",
            "url": "https://files.pythonhosted.org/packages/7e/1a/d63f308d674fb506f90b2d1b4bb809ae1f4510cdac37a65f347ff91cd9e2/rebel_forge-0.10.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-11 03:46:52",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "rebel-forge"
}
        
Elapsed time: 2.78059s