llama-jarvis


Namellama-jarvis JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryTrain a speech-to-speech model using your own language model
upload_time2024-10-06 17:51:56
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords llama llm speech-to-speech transformers
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🦙🎤 Llama-Jarvis
![Lint Status](https://github.com/johnsutor/llama-jarvis/workflows/Lint/badge.svg)
![Tests Status](https://github.com/johnsutor/llama-jarvis/workflows/Test/badge.svg)
![contributions welcome](https://img.shields.io/badge/contributions-welcome-blue.svg?style=flat)

![alt text](./assets/llama.webp)
Train a speech-to-speech model using your own language model. Currently based on the [Seamless Model](https://huggingface.co/collections/facebook/seamless-communication-6568d486ef451c6ba62c7724), but plan to support more models in the future.

This model is based on speech-to-speech models such as [Llama-Omni](https://github.com/ictnlp/LLaMA-Omni). However, it aims to take advantage of the joint speech-text embeddings of the Seamless Model.

This code is very much a work in progress. Any and all contributions are welcome!  

## Why this Library? 
This library aims to make speech-to-speech models more compatible with the HuggingFace ecosystem, rather than requiring you to modify your models and datasets to work with a new library. This allows us to take advantage of things like the [HuggingFace Trainer](https://huggingface.co/docs/transformers/en/main_classes/trainer).

## Getting Started
**NOTE** For some of the below, you may have to first [log in to HuggingFace](https://huggingface.co/docs/huggingface_hub/main/package_reference/authentication) to gain access to the gated models (especially Llama models).  

### Running Locally 
This code is not yet available via PyPi (I am hesitant to release it without thoroughly testing the code). Thus, to try it locally, please run
```shell 
git clone https://github.com/johnsutor/llama-jarvis
cd llama-jarvis 
pip install -e . 
```

### Phase One Loss
The example code will return the phase one loss (i.e., when training the first phase of Llama-Omni) 
```py 
from llama_jarvis.model import JarvisModel, JarvisConfig, JarvisProcessor

BASE_LLM = "meta-llama/Llama-3.2-1B"
SEAMLESS_MODEL = "facebook/hf-seamless-m4t-medium"
LANGUAGE = "eng"

jarvis_config = JarvisConfig(
    BASE_LLM,
    SEAMLESS_MODEL
)
jarvis_model = JarvisModel(jarvis_config)
jarvis_processor = JarvisProcessor(
    BASE_LLM,
    SEAMLESS_MODEL
)

inputs = processor(
    instruction=["You are a language model who should respond to my speech"],
    text=["What is two plus two?"],
    label=["Two plus two is four"],
    src_lang=LANGUAGE,
    return_tensors="pt",
    padding=True
)

outputs = model.forward(
    **inputs,
    tgt_lang=LANGUAGE
)

print(output.loss)
```

### Phase One Two
The example code will return the phase two loss (i.e., when training the second phase of Llama-Omni) 
```py 
from llama_jarvis.model import JarvisModel, JarvisConfig, JarvisProcessor

BASE_LLM = "meta-llama/Llama-3.2-1B"
SEAMLESS_MODEL = "facebook/hf-seamless-m4t-medium"
LANGUAGE = "eng"

jarvis_config = JarvisConfig(
    BASE_LLM,
    SEAMLESS_MODEL
)
jarvis_model = JarvisModel(jarvis_config)
jarvis_processor = JarvisProcessor(
    BASE_LLM,
    SEAMLESS_MODEL
)

inputs = processor(
    instruction=["You are a language model who should respond to my speech"],
    text=["What is two plus two?"],
    label=["Two plus two is four"],
    src_lang=LANGUAGE,
    return_tensors="pt",
    padding=True
)

outputs = model.forward(
    **inputs,
    tgt_lang=LANGUAGE,
    train_phase=2
)

print(output.loss)
```

## Roadmap
- [x] Release the code on PyPi 
- [ ] Train a baseline model using Llama 3.2 1B and Seamless Medium
- [ ] Provide training example code 
- [ ] Fully document the code 
- [ ] Create an inference script for the model
- [ ] Write thorough tests for the code, and test with a multitude of open-source models 

## Other Cool Libraries 
We take a lot of inspiration from some other nice open-source libraries out there. Shoutout to 
- [SLAM-LLM](https://github.com/X-LANCE/SLAM-LLM?tab=readme-ov-file)
- [CosyVoice](https://github.com/FunAudioLLM/CosyVoice)
- [Llama-Omni](https://github.com/ictnlp/LLaMA-Omni?tab=readme-ov-file)
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-jarvis",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "llama, llm, speech-to-speech, transformers",
    "author": null,
    "author_email": "John Sutor <johnsutor3@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/eb/85/ca0bc953b342855eb6f2dfb2cf2b5e7b65b898242df617fd90a0710a0669/llama_jarvis-0.1.0.tar.gz",
    "platform": null,
    "description": "# \ud83e\udd99\ud83c\udfa4 Llama-Jarvis\n![Lint Status](https://github.com/johnsutor/llama-jarvis/workflows/Lint/badge.svg)\n![Tests Status](https://github.com/johnsutor/llama-jarvis/workflows/Test/badge.svg)\n![contributions welcome](https://img.shields.io/badge/contributions-welcome-blue.svg?style=flat)\n\n![alt text](./assets/llama.webp)\nTrain a speech-to-speech model using your own language model. Currently based on the [Seamless Model](https://huggingface.co/collections/facebook/seamless-communication-6568d486ef451c6ba62c7724), but plan to support more models in the future.\n\nThis model is based on speech-to-speech models such as [Llama-Omni](https://github.com/ictnlp/LLaMA-Omni). However, it aims to take advantage of the joint speech-text embeddings of the Seamless Model.\n\nThis code is very much a work in progress. Any and all contributions are welcome!  \n\n## Why this Library? \nThis library aims to make speech-to-speech models more compatible with the HuggingFace ecosystem, rather than requiring you to modify your models and datasets to work with a new library. This allows us to take advantage of things like the [HuggingFace Trainer](https://huggingface.co/docs/transformers/en/main_classes/trainer).\n\n## Getting Started\n**NOTE** For some of the below, you may have to first [log in to HuggingFace](https://huggingface.co/docs/huggingface_hub/main/package_reference/authentication) to gain access to the gated models (especially Llama models).  \n\n### Running Locally \nThis code is not yet available via PyPi (I am hesitant to release it without thoroughly testing the code). Thus, to try it locally, please run\n```shell \ngit clone https://github.com/johnsutor/llama-jarvis\ncd llama-jarvis \npip install -e . \n```\n\n### Phase One Loss\nThe example code will return the phase one loss (i.e., when training the first phase of Llama-Omni) \n```py \nfrom llama_jarvis.model import JarvisModel, JarvisConfig, JarvisProcessor\n\nBASE_LLM = \"meta-llama/Llama-3.2-1B\"\nSEAMLESS_MODEL = \"facebook/hf-seamless-m4t-medium\"\nLANGUAGE = \"eng\"\n\njarvis_config = JarvisConfig(\n    BASE_LLM,\n    SEAMLESS_MODEL\n)\njarvis_model = JarvisModel(jarvis_config)\njarvis_processor = JarvisProcessor(\n    BASE_LLM,\n    SEAMLESS_MODEL\n)\n\ninputs = processor(\n    instruction=[\"You are a language model who should respond to my speech\"],\n    text=[\"What is two plus two?\"],\n    label=[\"Two plus two is four\"],\n    src_lang=LANGUAGE,\n    return_tensors=\"pt\",\n    padding=True\n)\n\noutputs = model.forward(\n    **inputs,\n    tgt_lang=LANGUAGE\n)\n\nprint(output.loss)\n```\n\n### Phase One Two\nThe example code will return the phase two loss (i.e., when training the second phase of Llama-Omni) \n```py \nfrom llama_jarvis.model import JarvisModel, JarvisConfig, JarvisProcessor\n\nBASE_LLM = \"meta-llama/Llama-3.2-1B\"\nSEAMLESS_MODEL = \"facebook/hf-seamless-m4t-medium\"\nLANGUAGE = \"eng\"\n\njarvis_config = JarvisConfig(\n    BASE_LLM,\n    SEAMLESS_MODEL\n)\njarvis_model = JarvisModel(jarvis_config)\njarvis_processor = JarvisProcessor(\n    BASE_LLM,\n    SEAMLESS_MODEL\n)\n\ninputs = processor(\n    instruction=[\"You are a language model who should respond to my speech\"],\n    text=[\"What is two plus two?\"],\n    label=[\"Two plus two is four\"],\n    src_lang=LANGUAGE,\n    return_tensors=\"pt\",\n    padding=True\n)\n\noutputs = model.forward(\n    **inputs,\n    tgt_lang=LANGUAGE,\n    train_phase=2\n)\n\nprint(output.loss)\n```\n\n## Roadmap\n- [x] Release the code on PyPi \n- [ ] Train a baseline model using Llama 3.2 1B and Seamless Medium\n- [ ] Provide training example code \n- [ ] Fully document the code \n- [ ] Create an inference script for the model\n- [ ] Write thorough tests for the code, and test with a multitude of open-source models \n\n## Other Cool Libraries \nWe take a lot of inspiration from some other nice open-source libraries out there. Shoutout to \n- [SLAM-LLM](https://github.com/X-LANCE/SLAM-LLM?tab=readme-ov-file)\n- [CosyVoice](https://github.com/FunAudioLLM/CosyVoice)\n- [Llama-Omni](https://github.com/ictnlp/LLaMA-Omni?tab=readme-ov-file)",
    "bugtrack_url": null,
    "license": null,
    "summary": "Train a speech-to-speech model using your own language model",
    "version": "0.1.0",
    "project_urls": {
        "Documentation": "https://github.com/johnsutor/llama-jarvis#readme",
        "Issues": "https://github.com/johnsutor/llama-jarvis/issues",
        "Source": "https://github.com/johnsutor/llama-jarvis"
    },
    "split_keywords": [
        "llama",
        " llm",
        " speech-to-speech",
        " transformers"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "75656ff8b96c166864380239df2d3a6c913d4706e8f10e7704d039ae6889c3f6",
                "md5": "99783667c1febf17daff75f07eb7265a",
                "sha256": "1d060dc0b6a41d68eac42fea2c9fa3b1174b90ae40df91b39c2dd7ddc221b2e6"
            },
            "downloads": -1,
            "filename": "llama_jarvis-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "99783667c1febf17daff75f07eb7265a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 13676,
            "upload_time": "2024-10-06T17:51:55",
            "upload_time_iso_8601": "2024-10-06T17:51:55.870441Z",
            "url": "https://files.pythonhosted.org/packages/75/65/6ff8b96c166864380239df2d3a6c913d4706e8f10e7704d039ae6889c3f6/llama_jarvis-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "eb85ca0bc953b342855eb6f2dfb2cf2b5e7b65b898242df617fd90a0710a0669",
                "md5": "17567f5bd70e8110dc951826e6256eba",
                "sha256": "ba157a5dfed9dec02c6f73e14c70dfe99187d76a7e9187b68e1d0b303b72c110"
            },
            "downloads": -1,
            "filename": "llama_jarvis-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "17567f5bd70e8110dc951826e6256eba",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 110631,
            "upload_time": "2024-10-06T17:51:56",
            "upload_time_iso_8601": "2024-10-06T17:51:56.957201Z",
            "url": "https://files.pythonhosted.org/packages/eb/85/ca0bc953b342855eb6f2dfb2cf2b5e7b65b898242df617fd90a0710a0669/llama_jarvis-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-06 17:51:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "johnsutor",
    "github_project": "llama-jarvis#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "llama-jarvis"
}
        
Elapsed time: 0.34182s