chat-with-mlx


Namechat-with-mlx JSON
Version 0.1.11 PyPI version JSON
download
home_page
SummaryA Retrieval-augmented Generation (RAG) chat interface with support for multiple open-source models, designed to run natively on MacOS and Apple Silicon with MLX.
upload_time2024-03-01 02:16:52
maintainer
docs_urlNone
author
requires_python>=3.8
licenseMIT License
keywords mlx chat chatbot chat_with_mlx chat_with_mlx
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">

# Native RAG on Apple Sillicon Mac with MLX 🧑‍💻

[![version](https://badge.fury.io/py/chat-with-mlx.svg)](https://badge.fury.io/py/chat-with-mlx)
[![downloads](https://img.shields.io/pypi/dm/chat-with-mlx)](https://pypistats.org/packages/chat-with-mlx)
[![license](https://img.shields.io/pypi/l/chat-with-mlx)](https://github.com/qnguyen3/chat-with-mlx/blob/main/LICENSE.md)
[![python-version](https://img.shields.io/pypi/pyversions/chat-with-mlx)](https://badge.fury.io/py/chat-with-mlx)
</div>

This repository showcases a Retrieval-augmented Generation (RAG) chat interface with support for multiple open-source models.

![chat_with_mlx](assets/chat-w-mlx.gif)

## Features

- **Chat with your Data**: `doc(x), pdf, txt` and YouTube video via URL.
- **Multilingual**: Chinese 🇨🇳, English🏴, French🇫🇷, German🇩🇪, Hindi🇮🇳, Italian🇮🇹, Japanese🇯🇵,Korean🇰🇷, Spanish🇪🇸, Turkish🇹🇷 and Vietnamese🇻🇳
- **Easy Integration**: Easy integrate any HuggingFace and MLX Compatible Open-Source Model.

## Installation and Usage

### Easy Setup

- Install Pip
- Install: `pip install chat-with-mlx`
- Note: Setting up this way is really hard if you want to add your own model (which I will let you add later in the UI), but it is a fast way to test the app.

### Manual Pip Installation

```bash
git clone https://github.com/qnguyen3/chat-with-mlx.git
cd chat-with-mlx
python -m venv .venv
source .venv/bin/activate
pip install -e .
```

#### Manual Conda Installation

```bash
git clone https://github.com/qnguyen3/chat-with-mlx.git
cd chat-with-mlx
conda create -n mlx-chat python=3.11
conda activate mlx-chat
pip install -e .
```

#### Usage

- Start the app: `chat-with-mlx`

## Supported Models

- Google Gemma-7b-it, Gemma-2b-it
- Mistral-7B-Instruct, OpenHermes-2.5-Mistral-7B, NousHermes-2-Mistral-7B-DPO
- Mixtral-8x7B-Instruct-v0.1, Nous-Hermes-2-Mixtral-8x7B-DPO
- Quyen-SE (0.5B), Quyen (4B)
- StableLM 2 Zephyr (1.6B)
- Vistral-7B-Chat, VBD-Llama2-7b-chat, vinallama-7b-chat

## Add Your Own Models

### Solution 1

This solution only requires you to add your own model with a simple .yaml config file in `chat_with_mlx/models/configs`

`examlple.yaml`:

```yaml
original_repo: google/gemma-2b-it # The original HuggingFace Repo, this helps with displaying
mlx-repo: mlx-community/quantized-gemma-2b-it # The MLX models Repo, most are available through `mlx-community`
quantize: 4bit # Optional: [4bit, 8bit]
default_language: multi # Optional: [en, es, zh, vi, multi]
```

After adding the .yaml config, you can go and load the model inside the app (for now you need to keep track the download through your Terminal/CLI)

### Solution 2

Do the same as Solution 1. Sometimes, the `download_snapshot` method that is used to download the models are slow, and you would like to download it by your own.

After the adding the .yaml config, you can download the repo by yourself and add it to `chat_with_mlx/models/download`. The folder name MUST be the same as the orginal repo name without the username (so `google/gemma-2b-it` -> `gemma-2b-it`).

A complete model should have the following files:

- `model.safetensors`
- `config.json`
- `merges.txt`
- `model.safetensors.index.json`
- `special_tokens_map.json` - this is optinal by model
- `tokenizer_config.json`
- `tokenizer.json`
- `vocab.json`

## Known Issues

- You HAVE TO unload a model before loading in a new model. Otherwise, you would need to restart the app to use a new model, it would stuck at the old one.
- When the model is downloading by Solution 1, the only way to stop it is to hit `control + C` on your Terminal.
- If you want to switch the file, you have to manually hit STOP INDEXING. Otherwise, the vector database would add the second document to the current database.
- You have to choose a dataset mode (Document or YouTube) in order for it to work.

## WHY MLX?

MLX is an array framework for machine learning research on Apple silicon,
brought to you by Apple machine learning research.

Some key features of MLX include:

- **Familiar APIs**: MLX has a Python API that closely follows NumPy.  MLX
   also has fully featured C++, [C](https://github.com/ml-explore/mlx-c), and
   [Swift](https://github.com/ml-explore/mlx-swift/) APIs, which closely mirror
   the Python API.  MLX has higher-level packages like `mlx.nn` and
   `mlx.optimizers` with APIs that closely follow PyTorch to simplify building
   more complex models.

- **Composable function transformations**: MLX supports composable function
   transformations for automatic differentiation, automatic vectorization,
   and computation graph optimization.

- **Lazy computation**: Computations in MLX are lazy. Arrays are only
   materialized when needed.

- **Dynamic graph construction**: Computation graphs in MLX are constructed
   dynamically. Changing the shapes of function arguments does not trigger
   slow compilations, and debugging is simple and intuitive.

- **Multi-device**: Operations can run on any of the supported devices
   (currently the CPU and the GPU).

- **Unified memory**: A notable difference from MLX and other frameworks
   is the *unified memory model*. Arrays in MLX live in shared memory.
   Operations on MLX arrays can be performed on any of the supported
   device types without transferring data.

## Acknowledgement

I would like to send my many thanks to:

- The Apple Machine Learning Research team for the amazing MLX library.
- LangChain and ChromaDB for such easy RAG Implementation
- People from Nous, VinBigData and Qwen team that helped me during the implementation.

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=qnguyen3/chat-with-mlx&type=Date)](https://star-history.com/#qnguyen3/chat-with-mlx&Date)

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "chat-with-mlx",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "mlx,chat,chatbot,chat_with_mlx,chat_with_mlx",
    "author": "",
    "author_email": "Quan Nguyen <quanjenkey@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/2d/81/73712a5c2ccfd688335f98dca5600c8f439f4cd177a636dacddf3c577545/chat_with_mlx-0.1.11.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n\n# Native RAG on Apple Sillicon Mac with MLX \ud83e\uddd1\u200d\ud83d\udcbb\n\n[![version](https://badge.fury.io/py/chat-with-mlx.svg)](https://badge.fury.io/py/chat-with-mlx)\n[![downloads](https://img.shields.io/pypi/dm/chat-with-mlx)](https://pypistats.org/packages/chat-with-mlx)\n[![license](https://img.shields.io/pypi/l/chat-with-mlx)](https://github.com/qnguyen3/chat-with-mlx/blob/main/LICENSE.md)\n[![python-version](https://img.shields.io/pypi/pyversions/chat-with-mlx)](https://badge.fury.io/py/chat-with-mlx)\n</div>\n\nThis repository showcases a Retrieval-augmented Generation (RAG) chat interface with support for multiple open-source models.\n\n![chat_with_mlx](assets/chat-w-mlx.gif)\n\n## Features\n\n- **Chat with your Data**: `doc(x), pdf, txt` and YouTube video via URL.\n- **Multilingual**: Chinese \ud83c\udde8\ud83c\uddf3, English\ud83c\udff4, French\ud83c\uddeb\ud83c\uddf7, German\ud83c\udde9\ud83c\uddea, Hindi\ud83c\uddee\ud83c\uddf3, Italian\ud83c\uddee\ud83c\uddf9, Japanese\ud83c\uddef\ud83c\uddf5,Korean\ud83c\uddf0\ud83c\uddf7, Spanish\ud83c\uddea\ud83c\uddf8, Turkish\ud83c\uddf9\ud83c\uddf7 and Vietnamese\ud83c\uddfb\ud83c\uddf3\n- **Easy Integration**: Easy integrate any HuggingFace and MLX Compatible Open-Source Model.\n\n## Installation and Usage\n\n### Easy Setup\n\n- Install Pip\n- Install: `pip install chat-with-mlx`\n- Note: Setting up this way is really hard if you want to add your own model (which I will let you add later in the UI), but it is a fast way to test the app.\n\n### Manual Pip Installation\n\n```bash\ngit clone https://github.com/qnguyen3/chat-with-mlx.git\ncd chat-with-mlx\npython -m venv .venv\nsource .venv/bin/activate\npip install -e .\n```\n\n#### Manual Conda Installation\n\n```bash\ngit clone https://github.com/qnguyen3/chat-with-mlx.git\ncd chat-with-mlx\nconda create -n mlx-chat python=3.11\nconda activate mlx-chat\npip install -e .\n```\n\n#### Usage\n\n- Start the app: `chat-with-mlx`\n\n## Supported Models\n\n- Google Gemma-7b-it, Gemma-2b-it\n- Mistral-7B-Instruct, OpenHermes-2.5-Mistral-7B, NousHermes-2-Mistral-7B-DPO\n- Mixtral-8x7B-Instruct-v0.1, Nous-Hermes-2-Mixtral-8x7B-DPO\n- Quyen-SE (0.5B), Quyen (4B)\n- StableLM 2 Zephyr (1.6B)\n- Vistral-7B-Chat, VBD-Llama2-7b-chat, vinallama-7b-chat\n\n## Add Your Own Models\n\n### Solution 1\n\nThis solution only requires you to add your own model with a simple .yaml config file in `chat_with_mlx/models/configs`\n\n`examlple.yaml`:\n\n```yaml\noriginal_repo: google/gemma-2b-it # The original HuggingFace Repo, this helps with displaying\nmlx-repo: mlx-community/quantized-gemma-2b-it # The MLX models Repo, most are available through `mlx-community`\nquantize: 4bit # Optional: [4bit, 8bit]\ndefault_language: multi # Optional: [en, es, zh, vi, multi]\n```\n\nAfter adding the .yaml config, you can go and load the model inside the app (for now you need to keep track the download through your Terminal/CLI)\n\n### Solution 2\n\nDo the same as Solution 1. Sometimes, the `download_snapshot` method that is used to download the models are slow, and you would like to download it by your own.\n\nAfter the adding the .yaml config, you can download the repo by yourself and add it to `chat_with_mlx/models/download`. The folder name MUST be the same as the orginal repo name without the username (so `google/gemma-2b-it` -> `gemma-2b-it`).\n\nA complete model should have the following files:\n\n- `model.safetensors`\n- `config.json`\n- `merges.txt`\n- `model.safetensors.index.json`\n- `special_tokens_map.json` - this is optinal by model\n- `tokenizer_config.json`\n- `tokenizer.json`\n- `vocab.json`\n\n## Known Issues\n\n- You HAVE TO unload a model before loading in a new model. Otherwise, you would need to restart the app to use a new model, it would stuck at the old one.\n- When the model is downloading by Solution 1, the only way to stop it is to hit `control + C` on your Terminal.\n- If you want to switch the file, you have to manually hit STOP INDEXING. Otherwise, the vector database would add the second document to the current database.\n- You have to choose a dataset mode (Document or YouTube) in order for it to work.\n\n## WHY MLX?\n\nMLX is an array framework for machine learning research on Apple silicon,\nbrought to you by Apple machine learning research.\n\nSome key features of MLX include:\n\n- **Familiar APIs**: MLX has a Python API that closely follows NumPy.  MLX\n   also has fully featured C++, [C](https://github.com/ml-explore/mlx-c), and\n   [Swift](https://github.com/ml-explore/mlx-swift/) APIs, which closely mirror\n   the Python API.  MLX has higher-level packages like `mlx.nn` and\n   `mlx.optimizers` with APIs that closely follow PyTorch to simplify building\n   more complex models.\n\n- **Composable function transformations**: MLX supports composable function\n   transformations for automatic differentiation, automatic vectorization,\n   and computation graph optimization.\n\n- **Lazy computation**: Computations in MLX are lazy. Arrays are only\n   materialized when needed.\n\n- **Dynamic graph construction**: Computation graphs in MLX are constructed\n   dynamically. Changing the shapes of function arguments does not trigger\n   slow compilations, and debugging is simple and intuitive.\n\n- **Multi-device**: Operations can run on any of the supported devices\n   (currently the CPU and the GPU).\n\n- **Unified memory**: A notable difference from MLX and other frameworks\n   is the *unified memory model*. Arrays in MLX live in shared memory.\n   Operations on MLX arrays can be performed on any of the supported\n   device types without transferring data.\n\n## Acknowledgement\n\nI would like to send my many thanks to:\n\n- The Apple Machine Learning Research team for the amazing MLX library.\n- LangChain and ChromaDB for such easy RAG Implementation\n- People from Nous, VinBigData and Qwen team that helped me during the implementation.\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=qnguyen3/chat-with-mlx&type=Date)](https://star-history.com/#qnguyen3/chat-with-mlx&Date)\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "A Retrieval-augmented Generation (RAG) chat interface with support for multiple open-source models, designed to run natively on MacOS and Apple Silicon with MLX.",
    "version": "0.1.11",
    "project_urls": null,
    "split_keywords": [
        "mlx",
        "chat",
        "chatbot",
        "chat_with_mlx",
        "chat_with_mlx"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3fd31f5127d33609180bfc4ffbec41c22fb802cdca50167943e15c2d7bd0af51",
                "md5": "0666b5597559ac6c1575f2b308b36fda",
                "sha256": "f2c786bac0f18f2311feca35516d77414c80d8b827cd7200b02a2b5dacdfa372"
            },
            "downloads": -1,
            "filename": "chat_with_mlx-0.1.11-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0666b5597559ac6c1575f2b308b36fda",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 20071,
            "upload_time": "2024-03-01T02:16:51",
            "upload_time_iso_8601": "2024-03-01T02:16:51.200614Z",
            "url": "https://files.pythonhosted.org/packages/3f/d3/1f5127d33609180bfc4ffbec41c22fb802cdca50167943e15c2d7bd0af51/chat_with_mlx-0.1.11-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2d8173712a5c2ccfd688335f98dca5600c8f439f4cd177a636dacddf3c577545",
                "md5": "ba3d4dcd0d21aa7d1cad151ab4be9d6b",
                "sha256": "7f7683b7b3781d2bf4b2f3e962933947f3f0b0d80862ccac6616e68f9820bcba"
            },
            "downloads": -1,
            "filename": "chat_with_mlx-0.1.11.tar.gz",
            "has_sig": false,
            "md5_digest": "ba3d4dcd0d21aa7d1cad151ab4be9d6b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 18600,
            "upload_time": "2024-03-01T02:16:52",
            "upload_time_iso_8601": "2024-03-01T02:16:52.779853Z",
            "url": "https://files.pythonhosted.org/packages/2d/81/73712a5c2ccfd688335f98dca5600c8f439f4cd177a636dacddf3c577545/chat_with_mlx-0.1.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-01 02:16:52",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "chat-with-mlx"
}
        
Elapsed time: 0.20422s