# ForgeLLM
ForgeLLM is a comprehensive platform for continued pre-training and instruction fine-tuning of large language models using MLX on Apple Silicon.
## What ForgeLLM Does
- **🚀 Train**: Continued pre-training (CPT) via web interface *(IFT coming soon - see [Development Perspectives](docs/perspectives.md))*
- **📊 Monitor**: Real-time training dashboards and checkpoint management
- **🆚 Compare**: Enable comparison of multiple training sessions with validation loss, perplexity, stability and generalization gap
- **🔗 Fuse**: Merge LoRA/DoRA adapters with base models for deployment
- **⚡ Quantize**: Convert models to 8-bit or 4-bit precision for efficient deployment
- **💬 Chat & Test**: Interactive chat with models and adapters via CLI or web
- **📦 Publish**: Convert and publish trained models with comprehensive documentation
### Screenshots
Training:

Monitoring:

Compare:

Testing:

## Quick Start
### 1. Installation
#### Option A: Install from PyPI (Recommended)
```bash
# Install latest version
pip install forgellm
# Install specific version
pip install forgellm==0.3.7
# Upgrade existing installation
pip install --upgrade forgellm
```
#### Option B: Install from Source (Development)
```bash
git clone https://github.com/lpalbou/forgellm.git
cd forgellm
pip install -e .
```
> **Requirements**: Python 3.9+ and Apple Silicon Mac (M1/M2/M3/M4). All dependencies including MLX are installed automatically.
### 2. Download Models
```bash
# Install HuggingFace CLI
pip install huggingface_hub
# Download a model (examples)
huggingface-cli download mlx-community/gemma-3-1b-it-bf16 # Small model
huggingface-cli download mlx-community/Qwen3-4B-bf16 # Medium model
```
### 3. Start ForgeLLM
```bash
# Start both servers (recommended)
forgellm start
# Opens web interface at http://localhost:5002
# Model server runs at http://localhost:5001
```
That's it! 🎉
## Usage
### Web Interface (Recommended)
The web interface provides everything you need:
```bash
forgellm start # Start both servers
# or
forgellm web --port 5002 # Web interface only
forgellm server --port 5001 # Model server only (separate terminal)
```
**Web Interface Features:**
- **Training Tab**: Configure and start CPT training *(IFT support coming soon)*
- **Monitoring Tab**: View training progress and dashboards
- **Testing Tab**: Chat with models and test different prompts
### Command Line Interface
The CLI is perfect for quick model testing and interactive chat:
```bash
# Interactive chat with a model (REPL mode)
forgellm cli generate --model mlx-community/gemma-3-1b-it-bf16
# Single prompt test
forgellm cli generate --model mlx-community/gemma-3-1b-it-bf16 --prompt "Hello, how are you?"
# Get model architecture info
forgellm cli info --model mlx-community/gemma-3-1b-it-bf16
# Test with an adapter (your trained model)
forgellm cli generate --model mlx-community/Qwen3-4B-bf16 --adapter-path models/cpt/my_trained_model
```
**REPL Mode Commands:**
- Type normally to chat
- `/help` - Show available commands
- `/q` or `/exit` - Quit
- `/stats` - Show session statistics
- `/system [prompt]` - Set/show system prompt
## Model Downloads
ForgeLLM works with MLX-compatible models from HuggingFace. All models are cached locally in `~/.cache/huggingface/hub/`.
### Recommended Models
**Small Models (1-2B) - Good for testing:**
```bash
huggingface-cli download mlx-community/gemma-3-1b-it-bf16
huggingface-cli download mlx-community/gemma-3-1b-pt-bf16
```
**Medium Models (3-4B) - Good balance:**
```bash
huggingface-cli download mlx-community/Qwen3-4B-bf16
huggingface-cli download mlx-community/gemma-3-4b-it-bf16
```
**Large Models (7-8B) - Best quality:**
```bash
huggingface-cli download mlx-community/Qwen3-8B-bf16
huggingface-cli download mlx-community/Meta-Llama-3.1-8B-Instruct-bf16
```
### Model Types
- **Base Models** (`-bf16`, `-pt-`): Ideal for continued pre-training, clean slate for domain adaptation
- **Instruct Models** (`-it-`, `-Instruct-`): Can also be used for continued pre-training with careful data mixing
- **Quantized Models** (`-4bit`, `-8bit`): Smaller memory usage, slightly lower quality
### Continued Pre-training: Base vs Instruct Models
**Base Models (Recommended for CPT):**
- ✅ No instruction-following capabilities to preserve
- ✅ Clean foundation for domain-specific knowledge
- ✅ Higher learning rates and longer training possible
**Instruct Models (Advanced CPT):**
- ✅ Better at learning from complex documents (recent research)
- ⚠️ Requires careful data mixing (1-5% original pretraining data)
- ⚠️ Lower learning rates to prevent catastrophic forgetting
- ⚠️ Shorter training to avoid losing instruction-following abilities
Choose base models for straightforward domain adaptation, instruct models when you need better knowledge absorption from complex documents.
> **📖 For detailed CPT best practices and latest research findings, see [docs/cpt.md](docs/cpt.md)**
## Training Your Own Models
### Continued Pre-Training (CPT) - Available Now
1. **Prepare Data**: Place text files in `dataset/` directory
2. **Start Web Interface**: `forgellm start`
3. **Training Tab**: Configure model, data, and parameters
4. **Monitor**: Watch progress in real-time
5. **Publish**: Convert best checkpoints to full models
Training is currently only available through the web interface.
### Instruction Fine-Tuning (IFT) - Coming Soon
IFT capabilities are currently in development. For technical details and implementation roadmap, see **[Development Perspectives](docs/perspectives.md)**.
## Directory Structure
```
forgellm/
├── dataset/ # Your training data (text files)
├── models/ # Trained model outputs
│ ├── cpt/ # Continued pre-training models
│ └── ift/ # Instruction fine-tuning models (coming soon)
└── data/ # Processed training data
```
## Commands Reference
### Main Commands
```bash
forgellm start # Start both servers (recommended)
forgellm web [--port 5002] # Web interface only
forgellm server [--port 5001] # Model server only
forgellm cli <command> # Command-line operations
```
### CLI Commands
```bash
# Interactive chat (REPL mode)
forgellm cli generate --model <model>
# Single prompt
forgellm cli generate --model <model> --prompt "Your question"
# Model information
forgellm cli info --model <model>
# Test with adapter
forgellm cli generate --model <model> --adapter-path <path>
```
## Requirements
- **Hardware**: Apple Silicon Mac (M1/M2/M3/M4)
- **Memory**: 16GB+ RAM recommended
- **Storage**: 5-20GB per model
- **Python**: 3.9+
- **MLX**: Automatically installed
## Architecture
ForgeLLM uses a clean separation:
- **Model Server** (`forgellm server`): Handles model loading and inference
- **Web Server** (`forgellm web`): Provides UI and training coordination
- **CLI** (`forgellm cli`): Direct model interaction and testing
This allows you to use just the CLI for testing, or the full web interface for training.
## Documentation
### 📚 Comprehensive Guides
- **[Getting Started](docs/getting_started.md)**: Complete setup and first training session
- **[Architecture](docs/architecture.md)**: System design and component overview
- **[Data Flow](docs/data_flow.md)**: How data moves through the system
- **[API Reference](docs/api_reference.md)**: Complete REST API and CLI documentation
- **[CPT Best Practices](docs/cpt.md)**: Advanced continued pre-training techniques
- **[Development Perspectives](docs/perspectives.md)**: Current capabilities and IFT roadmap
### 🔧 Technical Documentation
- **Architecture**: Multi-process design with model server separation
- **Training Pipeline**: Real-time monitoring with automatic checkpoint management
- **Model Publishing**: LoRA to full model conversion with comprehensive documentation
- **Error Recovery**: Robust error handling and automatic recovery mechanisms
## Contributing
Contributions welcome! Please submit pull requests.
## License
MIT License - see LICENSE file.
## Acknowledgments
- **ForgeLLM Team**: Continued pre-training platform
- **[MLX-LM](https://github.com/ml-explore/mlx-lm)**: Apple's MLX framework for LLMs
- **[MLX](https://github.com/ml-explore/mlx)**: Apple's machine learning framework
Raw data
{
"_id": null,
"home_page": null,
"name": "forgellm",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "llm, language-models, machine-learning, mlx, mlx-lm, fine-tuning, pre-training",
"author": null,
"author_email": "Laurent-Philippe Albou <lpalbou@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/1a/9b/82220671c9399c7877b9117aecec1ecc1245563e812b984c982bc682d99a/forgellm-0.3.7.tar.gz",
"platform": null,
"description": "# ForgeLLM\n\nForgeLLM is a comprehensive platform for continued pre-training and instruction fine-tuning of large language models using MLX on Apple Silicon.\n\n## What ForgeLLM Does\n\n- **\ud83d\ude80 Train**: Continued pre-training (CPT) via web interface *(IFT coming soon - see [Development Perspectives](docs/perspectives.md))*\n- **\ud83d\udcca Monitor**: Real-time training dashboards and checkpoint management\n- **\ud83c\udd9a Compare**: Enable comparison of multiple training sessions with validation loss, perplexity, stability and generalization gap\n- **\ud83d\udd17 Fuse**: Merge LoRA/DoRA adapters with base models for deployment\n- **\u26a1 Quantize**: Convert models to 8-bit or 4-bit precision for efficient deployment\n- **\ud83d\udcac Chat & Test**: Interactive chat with models and adapters via CLI or web\n- **\ud83d\udce6 Publish**: Convert and publish trained models with comprehensive documentation\n\n### Screenshots\nTraining:\n\n\nMonitoring:\n\n\nCompare:\n\n\nTesting:\n\n\n## Quick Start\n\n### 1. Installation\n\n#### Option A: Install from PyPI (Recommended)\n\n```bash\n# Install latest version\npip install forgellm\n\n# Install specific version\npip install forgellm==0.3.7\n\n# Upgrade existing installation\npip install --upgrade forgellm\n```\n\n#### Option B: Install from Source (Development)\n\n```bash\ngit clone https://github.com/lpalbou/forgellm.git\ncd forgellm\npip install -e .\n```\n\n> **Requirements**: Python 3.9+ and Apple Silicon Mac (M1/M2/M3/M4). All dependencies including MLX are installed automatically.\n\n### 2. Download Models\n\n```bash\n# Install HuggingFace CLI\npip install huggingface_hub\n\n# Download a model (examples)\nhuggingface-cli download mlx-community/gemma-3-1b-it-bf16 # Small model\nhuggingface-cli download mlx-community/Qwen3-4B-bf16 # Medium model\n```\n\n### 3. Start ForgeLLM\n\n```bash\n# Start both servers (recommended)\nforgellm start\n\n# Opens web interface at http://localhost:5002\n# Model server runs at http://localhost:5001\n```\n\nThat's it! \ud83c\udf89\n\n## Usage\n\n### Web Interface (Recommended)\n\nThe web interface provides everything you need:\n\n```bash\nforgellm start # Start both servers\n# or\nforgellm web --port 5002 # Web interface only\nforgellm server --port 5001 # Model server only (separate terminal)\n```\n\n**Web Interface Features:**\n- **Training Tab**: Configure and start CPT training *(IFT support coming soon)*\n- **Monitoring Tab**: View training progress and dashboards \n- **Testing Tab**: Chat with models and test different prompts\n\n### Command Line Interface\n\nThe CLI is perfect for quick model testing and interactive chat:\n\n```bash\n# Interactive chat with a model (REPL mode)\nforgellm cli generate --model mlx-community/gemma-3-1b-it-bf16\n\n# Single prompt test\nforgellm cli generate --model mlx-community/gemma-3-1b-it-bf16 --prompt \"Hello, how are you?\"\n\n# Get model architecture info\nforgellm cli info --model mlx-community/gemma-3-1b-it-bf16\n\n# Test with an adapter (your trained model)\nforgellm cli generate --model mlx-community/Qwen3-4B-bf16 --adapter-path models/cpt/my_trained_model\n```\n\n**REPL Mode Commands:**\n- Type normally to chat\n- `/help` - Show available commands\n- `/q` or `/exit` - Quit\n- `/stats` - Show session statistics\n- `/system [prompt]` - Set/show system prompt\n\n## Model Downloads\n\nForgeLLM works with MLX-compatible models from HuggingFace. All models are cached locally in `~/.cache/huggingface/hub/`.\n\n### Recommended Models\n\n**Small Models (1-2B) - Good for testing:**\n```bash\nhuggingface-cli download mlx-community/gemma-3-1b-it-bf16\nhuggingface-cli download mlx-community/gemma-3-1b-pt-bf16\n```\n\n**Medium Models (3-4B) - Good balance:**\n```bash\nhuggingface-cli download mlx-community/Qwen3-4B-bf16\nhuggingface-cli download mlx-community/gemma-3-4b-it-bf16\n```\n\n**Large Models (7-8B) - Best quality:**\n```bash\nhuggingface-cli download mlx-community/Qwen3-8B-bf16\nhuggingface-cli download mlx-community/Meta-Llama-3.1-8B-Instruct-bf16\n```\n\n### Model Types\n\n- **Base Models** (`-bf16`, `-pt-`): Ideal for continued pre-training, clean slate for domain adaptation\n- **Instruct Models** (`-it-`, `-Instruct-`): Can also be used for continued pre-training with careful data mixing\n- **Quantized Models** (`-4bit`, `-8bit`): Smaller memory usage, slightly lower quality\n\n### Continued Pre-training: Base vs Instruct Models\n\n**Base Models (Recommended for CPT):**\n- \u2705 No instruction-following capabilities to preserve\n- \u2705 Clean foundation for domain-specific knowledge\n- \u2705 Higher learning rates and longer training possible\n\n**Instruct Models (Advanced CPT):**\n- \u2705 Better at learning from complex documents (recent research)\n- \u26a0\ufe0f Requires careful data mixing (1-5% original pretraining data)\n- \u26a0\ufe0f Lower learning rates to prevent catastrophic forgetting\n- \u26a0\ufe0f Shorter training to avoid losing instruction-following abilities\n\nChoose base models for straightforward domain adaptation, instruct models when you need better knowledge absorption from complex documents.\n\n> **\ud83d\udcd6 For detailed CPT best practices and latest research findings, see [docs/cpt.md](docs/cpt.md)**\n\n## Training Your Own Models\n\n### Continued Pre-Training (CPT) - Available Now\n\n1. **Prepare Data**: Place text files in `dataset/` directory\n2. **Start Web Interface**: `forgellm start`\n3. **Training Tab**: Configure model, data, and parameters\n4. **Monitor**: Watch progress in real-time\n5. **Publish**: Convert best checkpoints to full models\n\nTraining is currently only available through the web interface.\n\n### Instruction Fine-Tuning (IFT) - Coming Soon\n\nIFT capabilities are currently in development. For technical details and implementation roadmap, see **[Development Perspectives](docs/perspectives.md)**.\n\n## Directory Structure\n\n```\nforgellm/\n\u251c\u2500\u2500 dataset/ # Your training data (text files)\n\u251c\u2500\u2500 models/ # Trained model outputs\n\u2502 \u251c\u2500\u2500 cpt/ # Continued pre-training models\n\u2502 \u2514\u2500\u2500 ift/ # Instruction fine-tuning models (coming soon)\n\u2514\u2500\u2500 data/ # Processed training data\n```\n\n## Commands Reference\n\n### Main Commands\n\n```bash\nforgellm start # Start both servers (recommended)\nforgellm web [--port 5002] # Web interface only\nforgellm server [--port 5001] # Model server only\nforgellm cli <command> # Command-line operations\n```\n\n### CLI Commands\n\n```bash\n# Interactive chat (REPL mode)\nforgellm cli generate --model <model>\n\n# Single prompt\nforgellm cli generate --model <model> --prompt \"Your question\"\n\n# Model information\nforgellm cli info --model <model>\n\n# Test with adapter\nforgellm cli generate --model <model> --adapter-path <path>\n```\n\n## Requirements\n\n- **Hardware**: Apple Silicon Mac (M1/M2/M3/M4)\n- **Memory**: 16GB+ RAM recommended\n- **Storage**: 5-20GB per model\n- **Python**: 3.9+\n- **MLX**: Automatically installed\n\n## Architecture\n\nForgeLLM uses a clean separation:\n\n- **Model Server** (`forgellm server`): Handles model loading and inference\n- **Web Server** (`forgellm web`): Provides UI and training coordination\n- **CLI** (`forgellm cli`): Direct model interaction and testing\n\nThis allows you to use just the CLI for testing, or the full web interface for training.\n\n## Documentation\n\n### \ud83d\udcda Comprehensive Guides\n\n- **[Getting Started](docs/getting_started.md)**: Complete setup and first training session\n- **[Architecture](docs/architecture.md)**: System design and component overview\n- **[Data Flow](docs/data_flow.md)**: How data moves through the system\n- **[API Reference](docs/api_reference.md)**: Complete REST API and CLI documentation\n- **[CPT Best Practices](docs/cpt.md)**: Advanced continued pre-training techniques\n- **[Development Perspectives](docs/perspectives.md)**: Current capabilities and IFT roadmap\n\n### \ud83d\udd27 Technical Documentation\n\n- **Architecture**: Multi-process design with model server separation\n- **Training Pipeline**: Real-time monitoring with automatic checkpoint management\n- **Model Publishing**: LoRA to full model conversion with comprehensive documentation\n- **Error Recovery**: Robust error handling and automatic recovery mechanisms\n\n## Contributing\n\nContributions welcome! Please submit pull requests.\n\n## License\n\nMIT License - see LICENSE file.\n\n## Acknowledgments\n\n- **ForgeLLM Team**: Continued pre-training platform\n- **[MLX-LM](https://github.com/ml-explore/mlx-lm)**: Apple's MLX framework for LLMs\n- **[MLX](https://github.com/ml-explore/mlx)**: Apple's machine learning framework \n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A comprehensive toolkit for end-to-end continued pre-training, fine-tuning, monitoring, testing and publishing of language models with MLX-LM",
"version": "0.3.7",
"project_urls": {
"Bug Tracker": "https://github.com/lpalbou/forgellm/issues",
"Documentation": "https://github.com/lpalbou/forgellm/tree/main/docs",
"Homepage": "https://github.com/lpalbou/forgellm",
"Source Code": "https://github.com/lpalbou/forgellm"
},
"split_keywords": [
"llm",
" language-models",
" machine-learning",
" mlx",
" mlx-lm",
" fine-tuning",
" pre-training"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "e21a2e959382b24261b3512ba845f1a26078c73f0cbeb490d509fe7696c715c5",
"md5": "d1027aa1ce467363db42225e66da7cd1",
"sha256": "88873a8b7d08e84b62554fc1f5766e8b73abc1210a75f82498c8130b008219a3"
},
"downloads": -1,
"filename": "forgellm-0.3.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d1027aa1ce467363db42225e66da7cd1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 253725,
"upload_time": "2025-07-09T00:17:28",
"upload_time_iso_8601": "2025-07-09T00:17:28.244425Z",
"url": "https://files.pythonhosted.org/packages/e2/1a/2e959382b24261b3512ba845f1a26078c73f0cbeb490d509fe7696c715c5/forgellm-0.3.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1a9b82220671c9399c7877b9117aecec1ecc1245563e812b984c982bc682d99a",
"md5": "b9f58cf46833887568d44e2dfbec2b6b",
"sha256": "beb5b618270518adcb24d78f290de7fdc192c4d2fe62b8ba74ab5c0249e3b3f5"
},
"downloads": -1,
"filename": "forgellm-0.3.7.tar.gz",
"has_sig": false,
"md5_digest": "b9f58cf46833887568d44e2dfbec2b6b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 258433,
"upload_time": "2025-07-09T00:17:29",
"upload_time_iso_8601": "2025-07-09T00:17:29.868359Z",
"url": "https://files.pythonhosted.org/packages/1a/9b/82220671c9399c7877b9117aecec1ecc1245563e812b984c982bc682d99a/forgellm-0.3.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-09 00:17:29",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "lpalbou",
"github_project": "forgellm",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "forgellm"
}