forgellm


Nameforgellm JSON
Version 0.3.7 PyPI version JSON
download
home_pageNone
SummaryA comprehensive toolkit for end-to-end continued pre-training, fine-tuning, monitoring, testing and publishing of language models with MLX-LM
upload_time2025-07-09 00:17:29
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords llm language-models machine-learning mlx mlx-lm fine-tuning pre-training
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ForgeLLM

ForgeLLM is a comprehensive platform for continued pre-training and instruction fine-tuning of large language models using MLX on Apple Silicon.

## What ForgeLLM Does

- **🚀 Train**: Continued pre-training (CPT) via web interface *(IFT coming soon - see [Development Perspectives](docs/perspectives.md))*
- **📊 Monitor**: Real-time training dashboards and checkpoint management
- **🆚 Compare**: Enable comparison of multiple training sessions with validation loss, perplexity, stability and generalization gap
- **🔗 Fuse**: Merge LoRA/DoRA adapters with base models for deployment
- **⚡ Quantize**: Convert models to 8-bit or 4-bit precision for efficient deployment
- **💬 Chat & Test**: Interactive chat with models and adapters via CLI or web
- **📦 Publish**: Convert and publish trained models with comprehensive documentation

### Screenshots
Training:
![Training](docs/assets/training-tab.png)

Monitoring:
![Monitoring](docs/assets/monitoring-tab.png)

Compare:
![Compare](docs/assets/compare-tab.png)

Testing:
![Testing](docs/assets/testing-tab.png)

## Quick Start

### 1. Installation

#### Option A: Install from PyPI (Recommended)

```bash
# Install latest version
pip install forgellm

# Install specific version
pip install forgellm==0.3.7

# Upgrade existing installation
pip install --upgrade forgellm
```

#### Option B: Install from Source (Development)

```bash
git clone https://github.com/lpalbou/forgellm.git
cd forgellm
pip install -e .
```

> **Requirements**: Python 3.9+ and Apple Silicon Mac (M1/M2/M3/M4). All dependencies including MLX are installed automatically.

### 2. Download Models

```bash
# Install HuggingFace CLI
pip install huggingface_hub

# Download a model (examples)
huggingface-cli download mlx-community/gemma-3-1b-it-bf16     # Small model
huggingface-cli download mlx-community/Qwen3-4B-bf16         # Medium model
```

### 3. Start ForgeLLM

```bash
# Start both servers (recommended)
forgellm start

# Opens web interface at http://localhost:5002
# Model server runs at http://localhost:5001
```

That's it! 🎉

## Usage

### Web Interface (Recommended)

The web interface provides everything you need:

```bash
forgellm start                    # Start both servers
# or
forgellm web --port 5002         # Web interface only
forgellm server --port 5001      # Model server only (separate terminal)
```

**Web Interface Features:**
- **Training Tab**: Configure and start CPT training *(IFT support coming soon)*
- **Monitoring Tab**: View training progress and dashboards  
- **Testing Tab**: Chat with models and test different prompts

### Command Line Interface

The CLI is perfect for quick model testing and interactive chat:

```bash
# Interactive chat with a model (REPL mode)
forgellm cli generate --model mlx-community/gemma-3-1b-it-bf16

# Single prompt test
forgellm cli generate --model mlx-community/gemma-3-1b-it-bf16 --prompt "Hello, how are you?"

# Get model architecture info
forgellm cli info --model mlx-community/gemma-3-1b-it-bf16

# Test with an adapter (your trained model)
forgellm cli generate --model mlx-community/Qwen3-4B-bf16 --adapter-path models/cpt/my_trained_model
```

**REPL Mode Commands:**
- Type normally to chat
- `/help` - Show available commands
- `/q` or `/exit` - Quit
- `/stats` - Show session statistics
- `/system [prompt]` - Set/show system prompt

## Model Downloads

ForgeLLM works with MLX-compatible models from HuggingFace. All models are cached locally in `~/.cache/huggingface/hub/`.

### Recommended Models

**Small Models (1-2B) - Good for testing:**
```bash
huggingface-cli download mlx-community/gemma-3-1b-it-bf16
huggingface-cli download mlx-community/gemma-3-1b-pt-bf16
```

**Medium Models (3-4B) - Good balance:**
```bash
huggingface-cli download mlx-community/Qwen3-4B-bf16
huggingface-cli download mlx-community/gemma-3-4b-it-bf16
```

**Large Models (7-8B) - Best quality:**
```bash
huggingface-cli download mlx-community/Qwen3-8B-bf16
huggingface-cli download mlx-community/Meta-Llama-3.1-8B-Instruct-bf16
```

### Model Types

- **Base Models** (`-bf16`, `-pt-`): Ideal for continued pre-training, clean slate for domain adaptation
- **Instruct Models** (`-it-`, `-Instruct-`): Can also be used for continued pre-training with careful data mixing
- **Quantized Models** (`-4bit`, `-8bit`): Smaller memory usage, slightly lower quality

### Continued Pre-training: Base vs Instruct Models

**Base Models (Recommended for CPT):**
- ✅ No instruction-following capabilities to preserve
- ✅ Clean foundation for domain-specific knowledge
- ✅ Higher learning rates and longer training possible

**Instruct Models (Advanced CPT):**
- ✅ Better at learning from complex documents (recent research)
- ⚠️ Requires careful data mixing (1-5% original pretraining data)
- ⚠️ Lower learning rates to prevent catastrophic forgetting
- ⚠️ Shorter training to avoid losing instruction-following abilities

Choose base models for straightforward domain adaptation, instruct models when you need better knowledge absorption from complex documents.

> **📖 For detailed CPT best practices and latest research findings, see [docs/cpt.md](docs/cpt.md)**

## Training Your Own Models

### Continued Pre-Training (CPT) - Available Now

1. **Prepare Data**: Place text files in `dataset/` directory
2. **Start Web Interface**: `forgellm start`
3. **Training Tab**: Configure model, data, and parameters
4. **Monitor**: Watch progress in real-time
5. **Publish**: Convert best checkpoints to full models

Training is currently only available through the web interface.

### Instruction Fine-Tuning (IFT) - Coming Soon

IFT capabilities are currently in development. For technical details and implementation roadmap, see **[Development Perspectives](docs/perspectives.md)**.

## Directory Structure

```
forgellm/
├── dataset/          # Your training data (text files)
├── models/           # Trained model outputs
│   ├── cpt/         # Continued pre-training models
│   └── ift/         # Instruction fine-tuning models (coming soon)
└── data/            # Processed training data
```

## Commands Reference

### Main Commands

```bash
forgellm start                    # Start both servers (recommended)
forgellm web [--port 5002]       # Web interface only
forgellm server [--port 5001]    # Model server only
forgellm cli <command>            # Command-line operations
```

### CLI Commands

```bash
# Interactive chat (REPL mode)
forgellm cli generate --model <model>

# Single prompt
forgellm cli generate --model <model> --prompt "Your question"

# Model information
forgellm cli info --model <model>

# Test with adapter
forgellm cli generate --model <model> --adapter-path <path>
```

## Requirements

- **Hardware**: Apple Silicon Mac (M1/M2/M3/M4)
- **Memory**: 16GB+ RAM recommended
- **Storage**: 5-20GB per model
- **Python**: 3.9+
- **MLX**: Automatically installed

## Architecture

ForgeLLM uses a clean separation:

- **Model Server** (`forgellm server`): Handles model loading and inference
- **Web Server** (`forgellm web`): Provides UI and training coordination
- **CLI** (`forgellm cli`): Direct model interaction and testing

This allows you to use just the CLI for testing, or the full web interface for training.

## Documentation

### 📚 Comprehensive Guides

- **[Getting Started](docs/getting_started.md)**: Complete setup and first training session
- **[Architecture](docs/architecture.md)**: System design and component overview
- **[Data Flow](docs/data_flow.md)**: How data moves through the system
- **[API Reference](docs/api_reference.md)**: Complete REST API and CLI documentation
- **[CPT Best Practices](docs/cpt.md)**: Advanced continued pre-training techniques
- **[Development Perspectives](docs/perspectives.md)**: Current capabilities and IFT roadmap

### 🔧 Technical Documentation

- **Architecture**: Multi-process design with model server separation
- **Training Pipeline**: Real-time monitoring with automatic checkpoint management
- **Model Publishing**: LoRA to full model conversion with comprehensive documentation
- **Error Recovery**: Robust error handling and automatic recovery mechanisms

## Contributing

Contributions welcome! Please submit pull requests.

## License

MIT License - see LICENSE file.

## Acknowledgments

- **ForgeLLM Team**: Continued pre-training platform
- **[MLX-LM](https://github.com/ml-explore/mlx-lm)**: Apple's MLX framework for LLMs
- **[MLX](https://github.com/ml-explore/mlx)**: Apple's machine learning framework 

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "forgellm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "llm, language-models, machine-learning, mlx, mlx-lm, fine-tuning, pre-training",
    "author": null,
    "author_email": "Laurent-Philippe Albou <lpalbou@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/1a/9b/82220671c9399c7877b9117aecec1ecc1245563e812b984c982bc682d99a/forgellm-0.3.7.tar.gz",
    "platform": null,
    "description": "# ForgeLLM\n\nForgeLLM is a comprehensive platform for continued pre-training and instruction fine-tuning of large language models using MLX on Apple Silicon.\n\n## What ForgeLLM Does\n\n- **\ud83d\ude80 Train**: Continued pre-training (CPT) via web interface *(IFT coming soon - see [Development Perspectives](docs/perspectives.md))*\n- **\ud83d\udcca Monitor**: Real-time training dashboards and checkpoint management\n- **\ud83c\udd9a Compare**: Enable comparison of multiple training sessions with validation loss, perplexity, stability and generalization gap\n- **\ud83d\udd17 Fuse**: Merge LoRA/DoRA adapters with base models for deployment\n- **\u26a1 Quantize**: Convert models to 8-bit or 4-bit precision for efficient deployment\n- **\ud83d\udcac Chat & Test**: Interactive chat with models and adapters via CLI or web\n- **\ud83d\udce6 Publish**: Convert and publish trained models with comprehensive documentation\n\n### Screenshots\nTraining:\n![Training](docs/assets/training-tab.png)\n\nMonitoring:\n![Monitoring](docs/assets/monitoring-tab.png)\n\nCompare:\n![Compare](docs/assets/compare-tab.png)\n\nTesting:\n![Testing](docs/assets/testing-tab.png)\n\n## Quick Start\n\n### 1. Installation\n\n#### Option A: Install from PyPI (Recommended)\n\n```bash\n# Install latest version\npip install forgellm\n\n# Install specific version\npip install forgellm==0.3.7\n\n# Upgrade existing installation\npip install --upgrade forgellm\n```\n\n#### Option B: Install from Source (Development)\n\n```bash\ngit clone https://github.com/lpalbou/forgellm.git\ncd forgellm\npip install -e .\n```\n\n> **Requirements**: Python 3.9+ and Apple Silicon Mac (M1/M2/M3/M4). All dependencies including MLX are installed automatically.\n\n### 2. Download Models\n\n```bash\n# Install HuggingFace CLI\npip install huggingface_hub\n\n# Download a model (examples)\nhuggingface-cli download mlx-community/gemma-3-1b-it-bf16     # Small model\nhuggingface-cli download mlx-community/Qwen3-4B-bf16         # Medium model\n```\n\n### 3. Start ForgeLLM\n\n```bash\n# Start both servers (recommended)\nforgellm start\n\n# Opens web interface at http://localhost:5002\n# Model server runs at http://localhost:5001\n```\n\nThat's it! \ud83c\udf89\n\n## Usage\n\n### Web Interface (Recommended)\n\nThe web interface provides everything you need:\n\n```bash\nforgellm start                    # Start both servers\n# or\nforgellm web --port 5002         # Web interface only\nforgellm server --port 5001      # Model server only (separate terminal)\n```\n\n**Web Interface Features:**\n- **Training Tab**: Configure and start CPT training *(IFT support coming soon)*\n- **Monitoring Tab**: View training progress and dashboards  \n- **Testing Tab**: Chat with models and test different prompts\n\n### Command Line Interface\n\nThe CLI is perfect for quick model testing and interactive chat:\n\n```bash\n# Interactive chat with a model (REPL mode)\nforgellm cli generate --model mlx-community/gemma-3-1b-it-bf16\n\n# Single prompt test\nforgellm cli generate --model mlx-community/gemma-3-1b-it-bf16 --prompt \"Hello, how are you?\"\n\n# Get model architecture info\nforgellm cli info --model mlx-community/gemma-3-1b-it-bf16\n\n# Test with an adapter (your trained model)\nforgellm cli generate --model mlx-community/Qwen3-4B-bf16 --adapter-path models/cpt/my_trained_model\n```\n\n**REPL Mode Commands:**\n- Type normally to chat\n- `/help` - Show available commands\n- `/q` or `/exit` - Quit\n- `/stats` - Show session statistics\n- `/system [prompt]` - Set/show system prompt\n\n## Model Downloads\n\nForgeLLM works with MLX-compatible models from HuggingFace. All models are cached locally in `~/.cache/huggingface/hub/`.\n\n### Recommended Models\n\n**Small Models (1-2B) - Good for testing:**\n```bash\nhuggingface-cli download mlx-community/gemma-3-1b-it-bf16\nhuggingface-cli download mlx-community/gemma-3-1b-pt-bf16\n```\n\n**Medium Models (3-4B) - Good balance:**\n```bash\nhuggingface-cli download mlx-community/Qwen3-4B-bf16\nhuggingface-cli download mlx-community/gemma-3-4b-it-bf16\n```\n\n**Large Models (7-8B) - Best quality:**\n```bash\nhuggingface-cli download mlx-community/Qwen3-8B-bf16\nhuggingface-cli download mlx-community/Meta-Llama-3.1-8B-Instruct-bf16\n```\n\n### Model Types\n\n- **Base Models** (`-bf16`, `-pt-`): Ideal for continued pre-training, clean slate for domain adaptation\n- **Instruct Models** (`-it-`, `-Instruct-`): Can also be used for continued pre-training with careful data mixing\n- **Quantized Models** (`-4bit`, `-8bit`): Smaller memory usage, slightly lower quality\n\n### Continued Pre-training: Base vs Instruct Models\n\n**Base Models (Recommended for CPT):**\n- \u2705 No instruction-following capabilities to preserve\n- \u2705 Clean foundation for domain-specific knowledge\n- \u2705 Higher learning rates and longer training possible\n\n**Instruct Models (Advanced CPT):**\n- \u2705 Better at learning from complex documents (recent research)\n- \u26a0\ufe0f Requires careful data mixing (1-5% original pretraining data)\n- \u26a0\ufe0f Lower learning rates to prevent catastrophic forgetting\n- \u26a0\ufe0f Shorter training to avoid losing instruction-following abilities\n\nChoose base models for straightforward domain adaptation, instruct models when you need better knowledge absorption from complex documents.\n\n> **\ud83d\udcd6 For detailed CPT best practices and latest research findings, see [docs/cpt.md](docs/cpt.md)**\n\n## Training Your Own Models\n\n### Continued Pre-Training (CPT) - Available Now\n\n1. **Prepare Data**: Place text files in `dataset/` directory\n2. **Start Web Interface**: `forgellm start`\n3. **Training Tab**: Configure model, data, and parameters\n4. **Monitor**: Watch progress in real-time\n5. **Publish**: Convert best checkpoints to full models\n\nTraining is currently only available through the web interface.\n\n### Instruction Fine-Tuning (IFT) - Coming Soon\n\nIFT capabilities are currently in development. For technical details and implementation roadmap, see **[Development Perspectives](docs/perspectives.md)**.\n\n## Directory Structure\n\n```\nforgellm/\n\u251c\u2500\u2500 dataset/          # Your training data (text files)\n\u251c\u2500\u2500 models/           # Trained model outputs\n\u2502   \u251c\u2500\u2500 cpt/         # Continued pre-training models\n\u2502   \u2514\u2500\u2500 ift/         # Instruction fine-tuning models (coming soon)\n\u2514\u2500\u2500 data/            # Processed training data\n```\n\n## Commands Reference\n\n### Main Commands\n\n```bash\nforgellm start                    # Start both servers (recommended)\nforgellm web [--port 5002]       # Web interface only\nforgellm server [--port 5001]    # Model server only\nforgellm cli <command>            # Command-line operations\n```\n\n### CLI Commands\n\n```bash\n# Interactive chat (REPL mode)\nforgellm cli generate --model <model>\n\n# Single prompt\nforgellm cli generate --model <model> --prompt \"Your question\"\n\n# Model information\nforgellm cli info --model <model>\n\n# Test with adapter\nforgellm cli generate --model <model> --adapter-path <path>\n```\n\n## Requirements\n\n- **Hardware**: Apple Silicon Mac (M1/M2/M3/M4)\n- **Memory**: 16GB+ RAM recommended\n- **Storage**: 5-20GB per model\n- **Python**: 3.9+\n- **MLX**: Automatically installed\n\n## Architecture\n\nForgeLLM uses a clean separation:\n\n- **Model Server** (`forgellm server`): Handles model loading and inference\n- **Web Server** (`forgellm web`): Provides UI and training coordination\n- **CLI** (`forgellm cli`): Direct model interaction and testing\n\nThis allows you to use just the CLI for testing, or the full web interface for training.\n\n## Documentation\n\n### \ud83d\udcda Comprehensive Guides\n\n- **[Getting Started](docs/getting_started.md)**: Complete setup and first training session\n- **[Architecture](docs/architecture.md)**: System design and component overview\n- **[Data Flow](docs/data_flow.md)**: How data moves through the system\n- **[API Reference](docs/api_reference.md)**: Complete REST API and CLI documentation\n- **[CPT Best Practices](docs/cpt.md)**: Advanced continued pre-training techniques\n- **[Development Perspectives](docs/perspectives.md)**: Current capabilities and IFT roadmap\n\n### \ud83d\udd27 Technical Documentation\n\n- **Architecture**: Multi-process design with model server separation\n- **Training Pipeline**: Real-time monitoring with automatic checkpoint management\n- **Model Publishing**: LoRA to full model conversion with comprehensive documentation\n- **Error Recovery**: Robust error handling and automatic recovery mechanisms\n\n## Contributing\n\nContributions welcome! Please submit pull requests.\n\n## License\n\nMIT License - see LICENSE file.\n\n## Acknowledgments\n\n- **ForgeLLM Team**: Continued pre-training platform\n- **[MLX-LM](https://github.com/ml-explore/mlx-lm)**: Apple's MLX framework for LLMs\n- **[MLX](https://github.com/ml-explore/mlx)**: Apple's machine learning framework \n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A comprehensive toolkit for end-to-end continued pre-training, fine-tuning, monitoring, testing and publishing of language models with MLX-LM",
    "version": "0.3.7",
    "project_urls": {
        "Bug Tracker": "https://github.com/lpalbou/forgellm/issues",
        "Documentation": "https://github.com/lpalbou/forgellm/tree/main/docs",
        "Homepage": "https://github.com/lpalbou/forgellm",
        "Source Code": "https://github.com/lpalbou/forgellm"
    },
    "split_keywords": [
        "llm",
        " language-models",
        " machine-learning",
        " mlx",
        " mlx-lm",
        " fine-tuning",
        " pre-training"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e21a2e959382b24261b3512ba845f1a26078c73f0cbeb490d509fe7696c715c5",
                "md5": "d1027aa1ce467363db42225e66da7cd1",
                "sha256": "88873a8b7d08e84b62554fc1f5766e8b73abc1210a75f82498c8130b008219a3"
            },
            "downloads": -1,
            "filename": "forgellm-0.3.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d1027aa1ce467363db42225e66da7cd1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 253725,
            "upload_time": "2025-07-09T00:17:28",
            "upload_time_iso_8601": "2025-07-09T00:17:28.244425Z",
            "url": "https://files.pythonhosted.org/packages/e2/1a/2e959382b24261b3512ba845f1a26078c73f0cbeb490d509fe7696c715c5/forgellm-0.3.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1a9b82220671c9399c7877b9117aecec1ecc1245563e812b984c982bc682d99a",
                "md5": "b9f58cf46833887568d44e2dfbec2b6b",
                "sha256": "beb5b618270518adcb24d78f290de7fdc192c4d2fe62b8ba74ab5c0249e3b3f5"
            },
            "downloads": -1,
            "filename": "forgellm-0.3.7.tar.gz",
            "has_sig": false,
            "md5_digest": "b9f58cf46833887568d44e2dfbec2b6b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 258433,
            "upload_time": "2025-07-09T00:17:29",
            "upload_time_iso_8601": "2025-07-09T00:17:29.868359Z",
            "url": "https://files.pythonhosted.org/packages/1a/9b/82220671c9399c7877b9117aecec1ecc1245563e812b984c982bc682d99a/forgellm-0.3.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-09 00:17:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lpalbou",
    "github_project": "forgellm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "forgellm"
}
        
Elapsed time: 1.14815s