# Nthuku-Fast
Efficient Multimodal Vision-Language Model with Mixture of Experts (MoE) Architecture
## Features
✨ **High Performance**
- Flash Attention for 2-4x speedup
- Extended 8K context window (32x larger)
- Optimized MoE routing (20-30% faster)
💰 **Cost Effective**
- Prompt caching (10x cost reduction)
- ~8B active parameters (efficient)
- 90%+ cache hit rates
🧠 **Advanced Capabilities**
- Vision understanding
- Text generation
- Speculative decoding (2-3x faster)
- Thinking traces / chain-of-thought
## Installation
### From PyPI (once published)
```bash
pip install nthuku-fast
```
### From source
```bash
git clone https://github.com/elijahnzeli1/Nthuku-fast_v2.git
cd Nthuku-fast_v2/nthuku-fast-package
pip install -e .
```
### Local installation (development)
```bash
cd nthuku-fast-package
pip install -e .
```
## Quick Start
```python
from nthuku_fast import create_nthuku_fast_model
import torch
# Create model (all optimizations enabled by default)
model = create_nthuku_fast_model(
hidden_dim=512,
num_experts=8,
top_k_experts=2
)
# Or use presets for different sizes
model = create_nthuku_fast_model(preset="150M") # 150M parameters
# Generate text from image
pixel_values = torch.randn(1, 3, 224, 224)
text = model.generate_text(
pixel_values,
max_length=100,
use_cache=True, # Enable prompt caching
show_thinking=False # Show reasoning traces
)
```
## Model Presets
```python
# 50M parameters (default)
model = create_nthuku_fast_model(preset="50M")
# 150M parameters (recommended)
model = create_nthuku_fast_model(preset="150M")
# 500M parameters (high capacity)
model = create_nthuku_fast_model(preset="500M")
# 1B parameters (maximum)
model = create_nthuku_fast_model(preset="1B")
```
## Advanced Features
### Prompt Caching
```python
# Get cache statistics
stats = model.get_cache_stats()
print(f"Cache hit rate: {stats['hit_rate']:.2%}")
```
### Speculative Decoding
```python
from nthuku_fast import SpeculativeDecoder
spec_decoder = SpeculativeDecoder(model, num_speculative_tokens=4)
generated, stats = spec_decoder.generate(
input_ids, vision_features,
max_new_tokens=100,
show_stats=True
)
```
### Thinking Traces
```python
# Enable visible reasoning
text = model.generate_text(
pixel_values,
show_thinking=True # Shows step-by-step reasoning
)
```
## Training
```python
from nthuku_fast import train_nthuku_fast, MultiDatasetManager
# Load datasets
dataset_manager = MultiDatasetManager()
data_sources = dataset_manager.load_all_datasets()
# Train
results = train_nthuku_fast(
model=model,
data_sources=data_sources,
batch_size=8,
num_epochs=10,
learning_rate=2e-4
)
```
## Performance
| Feature | Improvement |
|---------|-------------|
| Flash Attention | 2-4x faster |
| Extended Context | 32x longer (8K tokens) |
| Optimized MoE | 20-30% faster |
| Prompt Caching | 10x cost reduction |
| Speculative Decoding | 2-3x faster generation |
**Combined: 5-7x faster, 81% cheaper!**
## Requirements
- Python ≥ 3.8
- PyTorch ≥ 2.0.0 (for Flash Attention)
- transformers ≥ 4.30.0
- Other dependencies (auto-installed)
## License
MIT License
## Citation
```bibtex
@software{nthuku_fast,
title={Nthuku-Fast: Efficient Multimodal Vision-Language Model},
author={Nthuku Team},
year={2025},
url={https://github.com/elijahnzeli1/Nthuku-fast_v2}
}
```
## Links
- GitHub: https://github.com/elijahnzeli1/Nthuku-fast_v2
- Documentation: [Coming soon]
- HuggingFace: https://huggingface.co/Qybera/nthuku-fast-1.5
Raw data
{
"_id": null,
"home_page": "https://github.com/elijahnzeli1/Nthuku-fast_v2",
"name": "nthuku-fast",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "machine learning, deep learning, transformers, multimodal, vision-language, mixture-of-experts",
"author": "Nthuku Team",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/88/e6/d362ae3c78e9cecc733df80dae3423dbe8417abfd4f8df143dc6f320d5e8/nthuku_fast-0.1.2.tar.gz",
"platform": null,
"description": "# Nthuku-Fast\n\nEfficient Multimodal Vision-Language Model with Mixture of Experts (MoE) Architecture\n\n## Features\n\n\u2728 **High Performance**\n- Flash Attention for 2-4x speedup\n- Extended 8K context window (32x larger)\n- Optimized MoE routing (20-30% faster)\n\n\ud83d\udcb0 **Cost Effective**\n- Prompt caching (10x cost reduction)\n- ~8B active parameters (efficient)\n- 90%+ cache hit rates\n\n\ud83e\udde0 **Advanced Capabilities**\n- Vision understanding\n- Text generation\n- Speculative decoding (2-3x faster)\n- Thinking traces / chain-of-thought\n\n## Installation\n\n### From PyPI (once published)\n```bash\npip install nthuku-fast\n```\n\n### From source\n```bash\ngit clone https://github.com/elijahnzeli1/Nthuku-fast_v2.git\ncd Nthuku-fast_v2/nthuku-fast-package\npip install -e .\n```\n\n### Local installation (development)\n```bash\ncd nthuku-fast-package\npip install -e .\n```\n\n## Quick Start\n\n```python\nfrom nthuku_fast import create_nthuku_fast_model\nimport torch\n\n# Create model (all optimizations enabled by default)\nmodel = create_nthuku_fast_model(\n hidden_dim=512,\n num_experts=8,\n top_k_experts=2\n)\n\n# Or use presets for different sizes\nmodel = create_nthuku_fast_model(preset=\"150M\") # 150M parameters\n\n# Generate text from image\npixel_values = torch.randn(1, 3, 224, 224)\ntext = model.generate_text(\n pixel_values,\n max_length=100,\n use_cache=True, # Enable prompt caching\n show_thinking=False # Show reasoning traces\n)\n```\n\n## Model Presets\n\n```python\n# 50M parameters (default)\nmodel = create_nthuku_fast_model(preset=\"50M\")\n\n# 150M parameters (recommended)\nmodel = create_nthuku_fast_model(preset=\"150M\")\n\n# 500M parameters (high capacity)\nmodel = create_nthuku_fast_model(preset=\"500M\")\n\n# 1B parameters (maximum)\nmodel = create_nthuku_fast_model(preset=\"1B\")\n```\n\n## Advanced Features\n\n### Prompt Caching\n```python\n# Get cache statistics\nstats = model.get_cache_stats()\nprint(f\"Cache hit rate: {stats['hit_rate']:.2%}\")\n```\n\n### Speculative Decoding\n```python\nfrom nthuku_fast import SpeculativeDecoder\n\nspec_decoder = SpeculativeDecoder(model, num_speculative_tokens=4)\ngenerated, stats = spec_decoder.generate(\n input_ids, vision_features,\n max_new_tokens=100,\n show_stats=True\n)\n```\n\n### Thinking Traces\n```python\n# Enable visible reasoning\ntext = model.generate_text(\n pixel_values,\n show_thinking=True # Shows step-by-step reasoning\n)\n```\n\n## Training\n\n```python\nfrom nthuku_fast import train_nthuku_fast, MultiDatasetManager\n\n# Load datasets\ndataset_manager = MultiDatasetManager()\ndata_sources = dataset_manager.load_all_datasets()\n\n# Train\nresults = train_nthuku_fast(\n model=model,\n data_sources=data_sources,\n batch_size=8,\n num_epochs=10,\n learning_rate=2e-4\n)\n```\n\n## Performance\n\n| Feature | Improvement |\n|---------|-------------|\n| Flash Attention | 2-4x faster |\n| Extended Context | 32x longer (8K tokens) |\n| Optimized MoE | 20-30% faster |\n| Prompt Caching | 10x cost reduction |\n| Speculative Decoding | 2-3x faster generation |\n\n**Combined: 5-7x faster, 81% cheaper!**\n\n## Requirements\n\n- Python \u2265 3.8\n- PyTorch \u2265 2.0.0 (for Flash Attention)\n- transformers \u2265 4.30.0\n- Other dependencies (auto-installed)\n\n## License\n\nMIT License\n\n## Citation\n\n```bibtex\n@software{nthuku_fast,\n title={Nthuku-Fast: Efficient Multimodal Vision-Language Model},\n author={Nthuku Team},\n year={2025},\n url={https://github.com/elijahnzeli1/Nthuku-fast_v2}\n}\n```\n\n## Links\n\n- GitHub: https://github.com/elijahnzeli1/Nthuku-fast_v2\n- Documentation: [Coming soon]\n- HuggingFace: https://huggingface.co/Qybera/nthuku-fast-1.5\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Nthuku-Fast: A blazing-fast multimodal AI model with vision and language understanding",
"version": "0.1.2",
"project_urls": {
"Bug Tracker": "https://github.com/elijahnzeli1/Nthuku-fast_v2/issues",
"Homepage": "https://github.com/elijahnzeli1/Nthuku-fast_v2",
"Repository": "https://github.com/elijahnzeli1/Nthuku-fast_v2"
},
"split_keywords": [
"machine learning",
" deep learning",
" transformers",
" multimodal",
" vision-language",
" mixture-of-experts"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "7cf0b7613878c36e6102052b98720cba9c8e92d418c4fa824dd087e77215da06",
"md5": "f3f24980acc0a7dde58b2b165e50b6cb",
"sha256": "cd229b1ae28d2e59088570c6015f9553aadf95f257208d14470d497867b0b6a1"
},
"downloads": -1,
"filename": "nthuku_fast-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f3f24980acc0a7dde58b2b165e50b6cb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 39288,
"upload_time": "2025-10-22T18:29:20",
"upload_time_iso_8601": "2025-10-22T18:29:20.862473Z",
"url": "https://files.pythonhosted.org/packages/7c/f0/b7613878c36e6102052b98720cba9c8e92d418c4fa824dd087e77215da06/nthuku_fast-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "88e6d362ae3c78e9cecc733df80dae3423dbe8417abfd4f8df143dc6f320d5e8",
"md5": "42f3f3e3f36ea3adeb3c9cd394a9621d",
"sha256": "218d92d1330a76ab323c3eb7610e7c7dd9927481fea617dc1b1ab82cddcc37a6"
},
"downloads": -1,
"filename": "nthuku_fast-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "42f3f3e3f36ea3adeb3c9cd394a9621d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 39231,
"upload_time": "2025-10-22T18:29:22",
"upload_time_iso_8601": "2025-10-22T18:29:22.521369Z",
"url": "https://files.pythonhosted.org/packages/88/e6/d362ae3c78e9cecc733df80dae3423dbe8417abfd4f8df143dc6f320d5e8/nthuku_fast-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-22 18:29:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "elijahnzeli1",
"github_project": "Nthuku-fast_v2",
"github_not_found": true,
"lcname": "nthuku-fast"
}