llama-deck


Namellama-deck JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/xxxbf0222/LlamaDeck
Summaryllama-deck is a command-line tool for quickly managing and experimenting with multiple versions of llama inference implementations. It can help you quickly filter and download different llama implementations) and llama2-like transformer-based LLM models. We also provide some images based on some implementations, which can be easily deploy and run through our tool. Inspired by llama2.c project.
upload_time2024-10-08 21:47:38
maintainerNone
docs_urlNone
authorbufan0222
requires_pythonNone
licenseNone
keywords python llama llama2 llm llama2.c llama2.java docker llama-deck
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Llama Deck

**Llama Deck** is a command-line tool for quickly managing and experimenting with multiple versions of llama inference implementations. It can help you quickly filter and download [different llama implementations](#available-repositories)) and [llama2-like transformer-based LLM models](#available-models). We also provide [some images](#available-images) based on some implementations, which can be easily deploy and run through our tool. Inspired by [llama2.c project](https://github.com/karpathy/llama2.c).

## Shortcuts
[Install The Tool](#install): `pip install llama-deck`

[Manage Repositories](#explore--download-llama-repositories) : `list_repo` `install_repo` `-l <language>`

[Manage Models](#explore--download-models): `list_model` `install_model`
`-m <model_name>`

[Manage and Run Images](#install--run-images) :`install_img` `run_img`

## Install 
To install the tool, simply run:
```bash
pip install llama-deck
```

## Explore & Download Llama Repositories

### List Repositories
To list all Llama Implementations, run:
```bash
llama-deck list_repo
```
You can also set `-l` to specify the language of the repository, like:
![list repositories](https://github.com/user-attachments/assets/1d0d347f-c79f-4f63-8e09-ae07a8662ebd)

### Download Repositories

You can also download those [implementation repositories](#repository-list) through our tool:
```bash
llama-deck install_repo
```
![install repositories](https://github.com/user-attachments/assets/dc12703a-a960-4044-8eef-6619fa553569)
You can also set `-l` to specify a language. 
Once it runs, it supports to download multiple repositories at once, by input row numbers from the listed table. And if you don't like the default download path, you can also specify your own path to download. 

Repositories are saved and splitted by the language and the author name, you can find them in `<specified download path>/llamaRepos`.

[Back to Shortcuts](#shortcuts)



### Available Repositories

Originating from [llama2.c project](https://github.com/karpathy/llama2.c) by Andrej Karpathy. 

-------------------------------------------------------------------------------------------------------------------------

| #   | Language    | Name                     | Github                                               | Author          |
|:----|:------------|:-------------------------|:-----------------------------------------------------|:----------------|
| 1.  | Rust        | llama2.rs                | https://github.com/gaxler/llama2.rs                  | @gaxler         |
| 2.  | Rust        | llama2.rs                | https://github.com/leo-du/llama2.rs                  | @leo-du         |
| 3.  | Rust        | llama2-rs                | https://github.com/danielgrittner/llama2-rs          | @danielgrittner |
| 4.  | Rust        | llama2.rs                | https://github.com/lintian06/llama2.rs               | @lintian06      |
| 5.  | Rust        | pecca.rs                 | https://github.com/rahoua/pecca-rs                   | @rahoua         |
| 6.  | Rust        | llama2.rs                | https://github.com/flaneur2020/llama2.rs             | @flaneur2020    |
|     |             |                          |                                                      |                 |
| 7.  | Go          | go-llama2                | https://github.com/tmc/go-llama2                     | @tmc            |
| 8.  | Go          | llama2.go                | https://github.com/nikolaydubina/llama2.go           | @nikolaydubina  |
| 9.  | Go          | llama2.go                | https://github.com/haormj/llama2.go                  | @haormj         |
| 10. | Go          | llama2.go                | https://github.com/saracen/llama2.go                 | @saracen        |
|     |             |                          |                                                      |                 |
| 11. | Android     | llama2.c-android         | https://github.com/Manuel030/llama2.c-android        | @Manuel030      |
| 12. | Android     | llama2.c-android-wrapper | https://github.com/celikin/llama2.c-android-wrapper  | @celikin        |
|     |             |                          |                                                      |                 |
| 13. | C++         | llama2.cpp               | https://github.com/leloykun/llama2.cpp               | @leloykun       |
| 14. | C++         | llama2.cpp               | https://github.com/coldlarry/llama2.cpp              | @coldlarry      |
|     |             |                          |                                                      |                 |
| 15. | CUDA        | llama_cu_awq             | https://github.com/ankan-ban/llama_cu_awq            | @ankan-ban      |
|     |             |                          |                                                      |                 |
| 16. | JavaScript  | llama2.js                | https://github.com/epicure/llama2.js                 | @epicure        |
| 17. | JavaScript  | llamajs                  | https://github.com/agershun/llamajs                  | @agershun       |
| 18. | JavaScript  | llama2.ts                | https://github.com/wizzard0/llama2.ts                | @oleksandr_now  |
| 19. | JavaScript  | llama2.c-emscripten      | https://github.com/gohai/llama2.c-emscripten         | @gohai          |
|     |             |                          |                                                      |                 |
| 20. | Zig         | llama2.zig               | https://github.com/cgbur/llama2.zig                  | @cgbur          |
| 21. | Zig         | llama2.zig               | https://github.com/vodkaslime/llama2.zig             | @vodkaslime     |
| 22. | Zig         | llama2.zig               | https://github.com/clebert/llama2.zig                | @clebert        |
|     |             |                          |                                                      |                 |
| 23. | Julia       | llama2.jl                | https://github.com/juvi21/llama2.jl                  | @juvi21         |
|     |             |                          |                                                      |                 |
| 24. | Scala       | llama2.scala             | https://github.com/jrudolph/llama2.scala             | @jrudolph       |
|     |             |                          |                                                      |                 |
| 25. | Java        | llama2.java              | https://github.com/mukel/llama2.java                 | @mukel          |
| 26. | Java        | llama2.tornadovm.java    | https://github.com/mikepapadim/llama2.tornadovm.java | @mikepapadim    |
| 27. | Java        | Jlama                    | https://github.com/tjake/Jlama                       | @tjake          |
| 28. | Java        | llama2j                  | https://github.com/LastBotInc/llama2j                | @lasttero       |
|     |             |                          |                                                      |                 |
| 29. | Kotlin      | llama2.kt                | https://github.com/madroidmaq/llama2.kt              | @madroidmaq     |
|     |             |                          |                                                      |                 |
| 30. | Python      | llama2.py                | https://github.com/tairov/llama2.py                  | @tairov         |
|     |             |                          |                                                      |                 |
| 31. | C#          | llama2.cs                | https://github.com/trrahul/llama2.cs                 | @trrahul        |
|     |             |                          |                                                      |                 |
| 32. | Dart        | llama2.dart              | https://github.com/yiminghan/llama2.dart             | @yiminghan      |
|     |             |                          |                                                      |                 |
| 33. | Web         | llama2c-web              | https://github.com/dmarcos/llama2.c-web              | @dmarcos        |
|     |             |                          |                                                      |                 |
| 34. | WebAssembly | icpp-llm                 | https://github.com/icppWorld/icpp-llm                | N/A             |
|     |             |                          |                                                      |                 |
| 35. | Fortran     | llama2.f90               | https://github.com/rbitr/llama2.f90                  | N/A             |
|     |             |                          |                                                      |                 |
| 36. | Mojo        | llama2.🔥                | https://github.com/tairov/llama2.mojo                | @tairov         |
|     |             |                          |                                                      |                 |
| 37. | OCaml       | llama2.ml                | https://github.com/jackpeck/llama2.ml                | @jackpeck       |
|     |             |                          |                                                      |                 |
| 38. | Everywhere  | llama2.c                 | https://github.com/trholding/llama2.c                | @trholding      |
|     |             |                          |                                                      |                 |
| 39. | Bilingual   | llama2.c-zh              | https://github.com/chenyangMl/llama2.c-zh            | @chenyangMl     |

-------------------------------------------------------------------------------------------------------------------------
[Back to Shortcuts](#shortcuts)

## Explore & Download Models
Currently the tool only contains Tinyllamas provided in [llama2.c project](https://github.com/karpathy/llama2.c), and Meta-Llama. More model options will be extended and provided to download.

The oprations for listing and downloading models are similar to [repositories](#explore--download-llama-repositories). For list available models, run:
```bash
llama-deck list_model
```
And for download model:
```bash
llama-deck install_model
```

Similarly, `-m` is optional and can be set to specify the model name you want to show and download. 


The tool could also helps you to download the default tokenizer provided in [llama2.c](https://github.com/karpathy/llama2.c).

[Back to Shortcuts](#shortcuts)

### Available Models
More model options will be extended and provided to download.

-------------------------------------------------------------------------------------------------------------------------

|    | Model       | url                                                                     |
|---:|:------------|:------------------------------------------------------------------------|
|  1 | stories15M  | https://huggingface.co/karpathy/tinyllamas  |
|  2 | stories42M  | https://huggingface.co/karpathy/tinyllamas  |
|  3 | stories110M | https://huggingface.co/karpathy/tinyllamas |
|  4 | Meta-Llama  | https://llama.meta.com/llama-downloads/                                 |

-------------------------------------------------------------------------------------------------------------------------


**IMPORTANT!** It is lisence protected to download Meta-Llama models, which means you still needs to [apply for a download permission by Meta](https://llama.meta.com/llama-downloads). But once you received the download url from Meta's confirmation email, this tool will automatically grab and run [download.sh](https://github.com/meta-llama/llama?tab=readme-ov-file#download) provided by Meta to help you download Meta-Llama models.

[Back to Shortcuts](#shortcuts)

## Install & Run Images
In order to quickliy deploying and experimenting with multiple versions of llama inference implementations, we build an image repository consists of some dockerized popular implementations. [See our image repository](https://hub.docker.com/r/bufan0222/ll_implements/tags).

`llama-deck` can access, pull and run these dockerized implementations. When you need to run multiple implementations, or compare the performance differences between implementations, this will greatly save your effort in deploying implementations, configuring many runtime environment, and learning how to infer a certain implementation.

Before trying these functions, make sure **docker** is already installed and running on your device.

### Install Images
To list images from our image repository, use:
```bash
llama-deck list_img
```
And install image:
```bash
llama-deck install_img
```

Both for `list_img` and `run_img` action, an optional flag `-i <image tag>` can be set to check if a specific tag is included. All image tags are named with format `<repository name>_<author>`. (e.g. for Karpathy's [llama2.c](https://github.com/karpathy/llama2.c), the image tag is `llama2.c_karpathy`) 

The process of installing images is mostly the same as installing repositories and models.
[Back to Shortcuts](#shortcuts)

### Run Images
There are 2 ways to run images.

#### 1 Follow the Instructions
Run:
`llama-deck run_img`

Simply call `run_img` action and let the tool find resources and helps you set all configs for model inference. After running this, it will automatically check and list installed images that can be run by this tool.

You will be asked to:

**Step 1**. Select one or more images you want to run.

**Step 2**. Select one model, or specify the model path (abs path needed).

**Step 3**. Set inference arguments: `-i`,`-t`,`-p`... (Optional)

*e.g. In **Step 3**, input `-n 256 -i "Once upon a time"`, then all selected images will inference model with `step=256, prompt="Once upon a time"` .*

Then the tool will run all your selected images, with args your set. And you will see stdout from all those running containers (images), with arg status and inference result printed, looks like:
![Screencast from 2024年08月07日 03时17分20秒](https://github.com/user-attachments/assets/a148dd0c-e4e0-4911-8d7a-d051cbba3bda)


[Back to Shortcuts](#shortcuts)
#### Run Image in Single Command
A faster way to run a specific image is to call `run_img` action with specified `image_tag` and `model_path`, followed by inference args if needed.
```bash
llama-deck run_img <image_tag> <model_path> <other args (optional)>
```
For example, if I want to:
1. Inference the model: `/home/bufan/LlamaDeckResources/llamaModels/stories15M.bin`
2. run [llama2.java](https://github.com/mukel/llama2.java) inside image with tag 'llama2.java_mukel'
3. 128 steps and prompt "Once upon a time"

Then the command is:
```bash
llama-deck run_img llama2.java_mukel \
/home/bufan/LlamaDeckResources/llamaModels/stories15M.bin \
-n 128 -i "Once opon a time"
```
Result:
```bash
$ llama-deck run_img llama2.java_mukel /home/bufan/LlamaDeckResources/llamaModels/stories15M.bin  -n 128 -i "Once opon a time"

==> Selected run arguments:
image_tag: llama2.java_mukel
model_path: /home/bufan/LlamaDeckResources/llamaModels/stories15M.bin
steps: 128
input_prompt: Once opon a time


Running llama2.java_mukel...

################## stdout from llama2.java_mukel ####################

==>Supported args:
tokenizer
prompt
chat_init_prompt
mode
temperature
step
top-p
seed

==> Set args:
prompt = 'Once opon a time'
step = 128

==> RUN COMMAND: java --enable-preview --add-modules=jdk.incubator.vector Llama2 /models/model.bin  -i 'Once opon a time'    -n 128  
WARNING: Using incubator modules: jdk.incubator.vector
Config{dim=288, hidden_dim=768, n_layers=6, n_heads=6, n_kv_heads=6, vocab_size=32000, seq_len=256, shared_weights=true, head_size=48}
Once opon a time, there was a boy. He liked to throw a ball. Every day, he would go outside and throw the ball with his friends.
One day, the boy saw something funny. He saw a penny, made of copper and was very happy. He liked the penny so much that he wanted to throw it again.
He threw the penny and tried to make it go even higher. But, the penny was too lazy to go higher. So, the boy went back to the penny and tried again. He threw it as far as he could.
But this time

achieved tok/s: 405.750799
#####################################################################

All images finished.
```

**IMPORTANT!** Please always give the absolute path when inputing the `<model_path>`: Since the llama model file is always large, instead of copying it into each container and improve IO and memory cost, `llama-deck` choose to mount the model into each running container (image), where absolute path is needed to mount it when starting an image.

[Back to Shortcuts](#shortcuts)

#### More about passing inference arguments ####

Inference args supported by `llama-deck` are the same as [llama2.c](https://github.com/karpathy/llama2.c). Those are:

`-t <float>`  temperature in [0,inf], default 1.0

`-p <float>`  p value in top-p (nucleus) sampling in [0,1] default 0.9

`-s <int>`    random seed, default time(NULL)

`-n <int>`    number of steps to run for, default 256. 0 = max_seq_len

`-i <string>` input prompt

`-z <string>` optional path to custom tokenizer (not implemented yet)

`-m <string>` mode: generate|chat, default: generate

`-y <string>` (optional) system prompt in chat mode

It is noticed that not all implementations supports all these args from [llama2.c](https://github.com/karpathy/llama2.c). And due to the nature of different implementations, different ways/formats are used to pass these args. 

So for each selected image to run, `llama-deck` will automatically detect its supported args and drop out those unsupported. Then it convert args you set into correct format, put it to correct position (in a command to run the implementation) and finally pass them to inplementation inside the image. This operation is done inside each running container.

[Back to Shortcuts](#shortcuts)

### Available Images

-------------------------------------------------------------------------------------------------------------------------

|    | Tag                 | Size     | Author    | Repository                             |
|---:|:--------------------|:---------|:----------|:---------------------------------------|
|  1 | llama2.zig_cgbur    | 259.0 MB | @cgbur    | https://github.com/cgbur/llama2.zig    |
|  2 | llama2.cs_trrahul   | 374.0 MB | @trrahul  | https://github.com/trrahul/llama2.cs   |
|  3 | llama2.py_tairov    | 57.0 MB  | @tairov   | https://github.com/tairov/llama2.py    |
|  4 | llama2.rs_gaxler    | 331.0 MB | @gaxler   | https://github.com/gaxler/llama2.rs    |
|  5 | llama2.c_karpathy   | 139.0 MB | @karpathy | https://github.com/karpathy/llama2.c   |
|  6 | llama2.java_mukel   | 178.0 MB | @mukel    | https://github.com/mukel/llama2.java   |
|  7 | go-llama2_tmc       | 133.0 MB | @tmc      | https://github.com/tmc/go-llama2       |
|  8 | llama2.cpp_leloykun | 169.0 MB | @leloykun | https://github.com/leloykun/llama2.cpp |

-------------------------------------------------------------------------------------------------------------------------

More dockerized implementations will be extended.

[See our image repository](https://hub.docker.com/r/bufan0222/ll_implements/tags).


# TODO List
1. Implement customized tokenizer when running images.
2. Extend more models and build more images.
3. Try Multi-thread in running images?

# License
This project is licensed under the MIT License - see the LICENSE file for details.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/xxxbf0222/LlamaDeck",
    "name": "llama-deck",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "python, llama, llama2, LLM, llama2.c, llama2.java, docker, llama-deck",
    "author": "bufan0222",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/e7/c7/0ef998ad02054a2f01047d89cc0403a924633acaf6d68648cf828184c156/llama-deck-1.0.0.tar.gz",
    "platform": null,
    "description": "# Llama Deck\n\n**Llama Deck** is a command-line tool for quickly managing and experimenting with multiple versions of llama inference implementations. It can help you quickly filter and download [different llama implementations](#available-repositories)) and [llama2-like transformer-based LLM models](#available-models). We also provide [some images](#available-images) based on some implementations, which can be easily deploy and run through our tool. Inspired by [llama2.c project](https://github.com/karpathy/llama2.c).\n\n## Shortcuts\n[Install The Tool](#install): `pip install llama-deck`\n\n[Manage Repositories](#explore--download-llama-repositories) : `list_repo` `install_repo` `-l <language>`\n\n[Manage Models](#explore--download-models): `list_model` `install_model`\n`-m <model_name>`\n\n[Manage and Run Images](#install--run-images) :`install_img` `run_img`\n\n## Install \nTo install the tool, simply run:\n```bash\npip install llama-deck\n```\n\n## Explore & Download Llama Repositories\n\n### List Repositories\nTo list all Llama Implementations, run:\n```bash\nllama-deck list_repo\n```\nYou can also set `-l` to specify the language of the repository, like:\n![list repositories](https://github.com/user-attachments/assets/1d0d347f-c79f-4f63-8e09-ae07a8662ebd)\n\n### Download Repositories\n\nYou can also download those [implementation repositories](#repository-list) through our tool:\n```bash\nllama-deck install_repo\n```\n![install repositories](https://github.com/user-attachments/assets/dc12703a-a960-4044-8eef-6619fa553569)\nYou can also set `-l` to specify a language. \nOnce it runs, it supports to download multiple repositories at once, by input row numbers from the listed table. And if you don't like the default download path, you can also specify your own path to download. \n\nRepositories are saved and splitted by the language and the author name, you can find them in `<specified download path>/llamaRepos`.\n\n[Back to Shortcuts](#shortcuts)\n\n\n\n### Available Repositories\n\nOriginating from [llama2.c project](https://github.com/karpathy/llama2.c) by Andrej Karpathy. \n\n-------------------------------------------------------------------------------------------------------------------------\n\n| #   | Language    | Name                     | Github                                               | Author          |\n|:----|:------------|:-------------------------|:-----------------------------------------------------|:----------------|\n| 1.  | Rust        | llama2.rs                | https://github.com/gaxler/llama2.rs                  | @gaxler         |\n| 2.  | Rust        | llama2.rs                | https://github.com/leo-du/llama2.rs                  | @leo-du         |\n| 3.  | Rust        | llama2-rs                | https://github.com/danielgrittner/llama2-rs          | @danielgrittner |\n| 4.  | Rust        | llama2.rs                | https://github.com/lintian06/llama2.rs               | @lintian06      |\n| 5.  | Rust        | pecca.rs                 | https://github.com/rahoua/pecca-rs                   | @rahoua         |\n| 6.  | Rust        | llama2.rs                | https://github.com/flaneur2020/llama2.rs             | @flaneur2020    |\n|     |             |                          |                                                      |                 |\n| 7.  | Go          | go-llama2                | https://github.com/tmc/go-llama2                     | @tmc            |\n| 8.  | Go          | llama2.go                | https://github.com/nikolaydubina/llama2.go           | @nikolaydubina  |\n| 9.  | Go          | llama2.go                | https://github.com/haormj/llama2.go                  | @haormj         |\n| 10. | Go          | llama2.go                | https://github.com/saracen/llama2.go                 | @saracen        |\n|     |             |                          |                                                      |                 |\n| 11. | Android     | llama2.c-android         | https://github.com/Manuel030/llama2.c-android        | @Manuel030      |\n| 12. | Android     | llama2.c-android-wrapper | https://github.com/celikin/llama2.c-android-wrapper  | @celikin        |\n|     |             |                          |                                                      |                 |\n| 13. | C++         | llama2.cpp               | https://github.com/leloykun/llama2.cpp               | @leloykun       |\n| 14. | C++         | llama2.cpp               | https://github.com/coldlarry/llama2.cpp              | @coldlarry      |\n|     |             |                          |                                                      |                 |\n| 15. | CUDA        | llama_cu_awq             | https://github.com/ankan-ban/llama_cu_awq            | @ankan-ban      |\n|     |             |                          |                                                      |                 |\n| 16. | JavaScript  | llama2.js                | https://github.com/epicure/llama2.js                 | @epicure        |\n| 17. | JavaScript  | llamajs                  | https://github.com/agershun/llamajs                  | @agershun       |\n| 18. | JavaScript  | llama2.ts                | https://github.com/wizzard0/llama2.ts                | @oleksandr_now  |\n| 19. | JavaScript  | llama2.c-emscripten      | https://github.com/gohai/llama2.c-emscripten         | @gohai          |\n|     |             |                          |                                                      |                 |\n| 20. | Zig         | llama2.zig               | https://github.com/cgbur/llama2.zig                  | @cgbur          |\n| 21. | Zig         | llama2.zig               | https://github.com/vodkaslime/llama2.zig             | @vodkaslime     |\n| 22. | Zig         | llama2.zig               | https://github.com/clebert/llama2.zig                | @clebert        |\n|     |             |                          |                                                      |                 |\n| 23. | Julia       | llama2.jl                | https://github.com/juvi21/llama2.jl                  | @juvi21         |\n|     |             |                          |                                                      |                 |\n| 24. | Scala       | llama2.scala             | https://github.com/jrudolph/llama2.scala             | @jrudolph       |\n|     |             |                          |                                                      |                 |\n| 25. | Java        | llama2.java              | https://github.com/mukel/llama2.java                 | @mukel          |\n| 26. | Java        | llama2.tornadovm.java    | https://github.com/mikepapadim/llama2.tornadovm.java | @mikepapadim    |\n| 27. | Java        | Jlama                    | https://github.com/tjake/Jlama                       | @tjake          |\n| 28. | Java        | llama2j                  | https://github.com/LastBotInc/llama2j                | @lasttero       |\n|     |             |                          |                                                      |                 |\n| 29. | Kotlin      | llama2.kt                | https://github.com/madroidmaq/llama2.kt              | @madroidmaq     |\n|     |             |                          |                                                      |                 |\n| 30. | Python      | llama2.py                | https://github.com/tairov/llama2.py                  | @tairov         |\n|     |             |                          |                                                      |                 |\n| 31. | C#          | llama2.cs                | https://github.com/trrahul/llama2.cs                 | @trrahul        |\n|     |             |                          |                                                      |                 |\n| 32. | Dart        | llama2.dart              | https://github.com/yiminghan/llama2.dart             | @yiminghan      |\n|     |             |                          |                                                      |                 |\n| 33. | Web         | llama2c-web              | https://github.com/dmarcos/llama2.c-web              | @dmarcos        |\n|     |             |                          |                                                      |                 |\n| 34. | WebAssembly | icpp-llm                 | https://github.com/icppWorld/icpp-llm                | N/A             |\n|     |             |                          |                                                      |                 |\n| 35. | Fortran     | llama2.f90               | https://github.com/rbitr/llama2.f90                  | N/A             |\n|     |             |                          |                                                      |                 |\n| 36. | Mojo        | llama2.\ud83d\udd25                | https://github.com/tairov/llama2.mojo                | @tairov         |\n|     |             |                          |                                                      |                 |\n| 37. | OCaml       | llama2.ml                | https://github.com/jackpeck/llama2.ml                | @jackpeck       |\n|     |             |                          |                                                      |                 |\n| 38. | Everywhere  | llama2.c                 | https://github.com/trholding/llama2.c                | @trholding      |\n|     |             |                          |                                                      |                 |\n| 39. | Bilingual   | llama2.c-zh              | https://github.com/chenyangMl/llama2.c-zh            | @chenyangMl     |\n\n-------------------------------------------------------------------------------------------------------------------------\n[Back to Shortcuts](#shortcuts)\n\n## Explore & Download Models\nCurrently the tool only contains Tinyllamas provided in [llama2.c project](https://github.com/karpathy/llama2.c), and Meta-Llama. More model options will be extended and provided to download.\n\nThe oprations for listing and downloading models are similar to [repositories](#explore--download-llama-repositories). For list available models, run:\n```bash\nllama-deck list_model\n```\nAnd for download model:\n```bash\nllama-deck install_model\n```\n\nSimilarly, `-m` is optional and can be set to specify the model name you want to show and download. \n\n\nThe tool could also helps you to download the default tokenizer provided in [llama2.c](https://github.com/karpathy/llama2.c).\n\n[Back to Shortcuts](#shortcuts)\n\n### Available Models\nMore model options will be extended and provided to download.\n\n-------------------------------------------------------------------------------------------------------------------------\n\n|    | Model       | url                                                                     |\n|---:|:------------|:------------------------------------------------------------------------|\n|  1 | stories15M  | https://huggingface.co/karpathy/tinyllamas  |\n|  2 | stories42M  | https://huggingface.co/karpathy/tinyllamas  |\n|  3 | stories110M | https://huggingface.co/karpathy/tinyllamas |\n|  4 | Meta-Llama  | https://llama.meta.com/llama-downloads/                                 |\n\n-------------------------------------------------------------------------------------------------------------------------\n\n\n**IMPORTANT!** It is lisence protected to download Meta-Llama models, which means you still needs to [apply for a download permission by Meta](https://llama.meta.com/llama-downloads). But once you received the download url from Meta's confirmation email, this tool will automatically grab and run [download.sh](https://github.com/meta-llama/llama?tab=readme-ov-file#download) provided by Meta to help you download Meta-Llama models.\n\n[Back to Shortcuts](#shortcuts)\n\n## Install & Run Images\nIn order to quickliy deploying and experimenting with multiple versions of llama inference implementations, we build an image repository consists of some dockerized popular implementations. [See our image repository](https://hub.docker.com/r/bufan0222/ll_implements/tags).\n\n`llama-deck` can access, pull and run these dockerized implementations. When you need to run multiple implementations, or compare the performance differences between implementations, this will greatly save your effort in deploying implementations, configuring many runtime environment, and learning how to infer a certain implementation.\n\nBefore trying these functions, make sure **docker** is already installed and running on your device.\n\n### Install Images\nTo list images from our image repository, use:\n```bash\nllama-deck list_img\n```\nAnd install image:\n```bash\nllama-deck install_img\n```\n\nBoth for `list_img` and `run_img` action, an optional flag `-i <image tag>` can be set to check if a specific tag is included. All image tags are named with format `<repository name>_<author>`. (e.g. for Karpathy's [llama2.c](https://github.com/karpathy/llama2.c), the image tag is `llama2.c_karpathy`) \n\nThe process of installing images is mostly the same as installing repositories and models.\n[Back to Shortcuts](#shortcuts)\n\n### Run Images\nThere are 2 ways to run images.\n\n#### 1 Follow the Instructions\nRun:\n`llama-deck run_img`\n\nSimply call `run_img` action and let the tool find resources and helps you set all configs for model inference. After running this, it will automatically check and list installed images that can be run by this tool.\n\nYou will be asked to:\n\n**Step 1**. Select one or more images you want to run.\n\n**Step 2**. Select one model, or specify the model path (abs path needed).\n\n**Step 3**. Set inference arguments: `-i`,`-t`,`-p`... (Optional)\n\n*e.g. In **Step 3**, input `-n 256 -i \"Once upon a time\"`, then all selected images will inference model with `step=256, prompt=\"Once upon a time\"` .*\n\nThen the tool will run all your selected images, with args your set. And you will see stdout from all those running containers (images), with arg status and inference result printed, looks like:\n![Screencast from 2024\u5e7408\u670807\u65e5 03\u65f617\u520620\u79d2](https://github.com/user-attachments/assets/a148dd0c-e4e0-4911-8d7a-d051cbba3bda)\n\n\n[Back to Shortcuts](#shortcuts)\n#### Run Image in Single Command\nA faster way to run a specific image is to call `run_img` action with specified `image_tag` and `model_path`, followed by inference args if needed.\n```bash\nllama-deck run_img <image_tag> <model_path> <other args (optional)>\n```\nFor example, if I want to:\n1. Inference the model: `/home/bufan/LlamaDeckResources/llamaModels/stories15M.bin`\n2. run [llama2.java](https://github.com/mukel/llama2.java) inside image with tag 'llama2.java_mukel'\n3. 128 steps and prompt \"Once upon a time\"\n\nThen the command is:\n```bash\nllama-deck run_img llama2.java_mukel \\\n/home/bufan/LlamaDeckResources/llamaModels/stories15M.bin \\\n-n 128 -i \"Once opon a time\"\n```\nResult:\n```bash\n$ llama-deck run_img llama2.java_mukel /home/bufan/LlamaDeckResources/llamaModels/stories15M.bin  -n 128 -i \"Once opon a time\"\n\n==> Selected run arguments:\nimage_tag: llama2.java_mukel\nmodel_path: /home/bufan/LlamaDeckResources/llamaModels/stories15M.bin\nsteps: 128\ninput_prompt: Once opon a time\n\n\nRunning llama2.java_mukel...\n\n################## stdout from llama2.java_mukel ####################\n\n==>Supported args:\ntokenizer\nprompt\nchat_init_prompt\nmode\ntemperature\nstep\ntop-p\nseed\n\n==> Set args:\nprompt = 'Once opon a time'\nstep = 128\n\n==> RUN COMMAND: java --enable-preview --add-modules=jdk.incubator.vector Llama2 /models/model.bin  -i 'Once opon a time'    -n 128  \nWARNING: Using incubator modules: jdk.incubator.vector\nConfig{dim=288, hidden_dim=768, n_layers=6, n_heads=6, n_kv_heads=6, vocab_size=32000, seq_len=256, shared_weights=true, head_size=48}\nOnce opon a time, there was a boy. He liked to throw a ball. Every day, he would go outside and throw the ball with his friends.\nOne day, the boy saw something funny. He saw a penny, made of copper and was very happy. He liked the penny so much that he wanted to throw it again.\nHe threw the penny and tried to make it go even higher. But, the penny was too lazy to go higher. So, the boy went back to the penny and tried again. He threw it as far as he could.\nBut this time\n\nachieved tok/s: 405.750799\n#####################################################################\n\nAll images finished.\n```\n\n**IMPORTANT!** Please always give the absolute path when inputing the `<model_path>`: Since the llama model file is always large, instead of copying it into each container and improve IO and memory cost, `llama-deck` choose to mount the model into each running container (image), where absolute path is needed to mount it when starting an image.\n\n[Back to Shortcuts](#shortcuts)\n\n#### More about passing inference arguments ####\n\nInference args supported by `llama-deck` are the same as [llama2.c](https://github.com/karpathy/llama2.c). Those are:\n\n`-t <float>`  temperature in [0,inf], default 1.0\n\n`-p <float>`  p value in top-p (nucleus) sampling in [0,1] default 0.9\n\n`-s <int>`    random seed, default time(NULL)\n\n`-n <int>`    number of steps to run for, default 256. 0 = max_seq_len\n\n`-i <string>` input prompt\n\n`-z <string>` optional path to custom tokenizer (not implemented yet)\n\n`-m <string>` mode: generate|chat, default: generate\n\n`-y <string>` (optional) system prompt in chat mode\n\nIt is noticed that not all implementations supports all these args from [llama2.c](https://github.com/karpathy/llama2.c). And due to the nature of different implementations, different ways/formats are used to pass these args. \n\nSo for each selected image to run, `llama-deck` will automatically detect its supported args and drop out those unsupported. Then it convert args you set into correct format, put it to correct position (in a command to run the implementation) and finally pass them to inplementation inside the image. This operation is done inside each running container.\n\n[Back to Shortcuts](#shortcuts)\n\n### Available Images\n\n-------------------------------------------------------------------------------------------------------------------------\n\n|    | Tag                 | Size     | Author    | Repository                             |\n|---:|:--------------------|:---------|:----------|:---------------------------------------|\n|  1 | llama2.zig_cgbur    | 259.0 MB | @cgbur    | https://github.com/cgbur/llama2.zig    |\n|  2 | llama2.cs_trrahul   | 374.0 MB | @trrahul  | https://github.com/trrahul/llama2.cs   |\n|  3 | llama2.py_tairov    | 57.0 MB  | @tairov   | https://github.com/tairov/llama2.py    |\n|  4 | llama2.rs_gaxler    | 331.0 MB | @gaxler   | https://github.com/gaxler/llama2.rs    |\n|  5 | llama2.c_karpathy   | 139.0 MB | @karpathy | https://github.com/karpathy/llama2.c   |\n|  6 | llama2.java_mukel   | 178.0 MB | @mukel    | https://github.com/mukel/llama2.java   |\n|  7 | go-llama2_tmc       | 133.0 MB | @tmc      | https://github.com/tmc/go-llama2       |\n|  8 | llama2.cpp_leloykun | 169.0 MB | @leloykun | https://github.com/leloykun/llama2.cpp |\n\n-------------------------------------------------------------------------------------------------------------------------\n\nMore dockerized implementations will be extended.\n\n[See our image repository](https://hub.docker.com/r/bufan0222/ll_implements/tags).\n\n\n# TODO List\n1. Implement customized tokenizer when running images.\n2. Extend more models and build more images.\n3. Try Multi-thread in running images?\n\n# License\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "llama-deck is a command-line tool for quickly managing and experimenting with multiple versions of llama inference implementations. It can help you quickly filter and download different llama implementations) and llama2-like transformer-based LLM models. We also provide some images based on some implementations, which can be easily deploy and run through our tool. Inspired by llama2.c project.",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/xxxbf0222/LlamaDeck"
    },
    "split_keywords": [
        "python",
        " llama",
        " llama2",
        " llm",
        " llama2.c",
        " llama2.java",
        " docker",
        " llama-deck"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "153148741ee1b509e13ce254f45833a6c74497b8ef3032d6e6f182d4503b85da",
                "md5": "e1b11a67ed972131a5810a6044dee01c",
                "sha256": "1309fcce8b847b3c72ef92ca2ef998eb0b9c9163188f98af52235b4d7b7eaf0a"
            },
            "downloads": -1,
            "filename": "llama_deck-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e1b11a67ed972131a5810a6044dee01c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 14595,
            "upload_time": "2024-10-08T21:47:36",
            "upload_time_iso_8601": "2024-10-08T21:47:36.273257Z",
            "url": "https://files.pythonhosted.org/packages/15/31/48741ee1b509e13ce254f45833a6c74497b8ef3032d6e6f182d4503b85da/llama_deck-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e7c70ef998ad02054a2f01047d89cc0403a924633acaf6d68648cf828184c156",
                "md5": "e322cbaeb1ee72fd665fec4f02c4a388",
                "sha256": "66c830643bd9f5ffed19f0ddc7dd1d298e72cab8ec22f6916ac26b0081a389b4"
            },
            "downloads": -1,
            "filename": "llama-deck-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e322cbaeb1ee72fd665fec4f02c4a388",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 14470,
            "upload_time": "2024-10-08T21:47:38",
            "upload_time_iso_8601": "2024-10-08T21:47:38.095839Z",
            "url": "https://files.pythonhosted.org/packages/e7/c7/0ef998ad02054a2f01047d89cc0403a924633acaf6d68648cf828184c156/llama-deck-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-08 21:47:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "xxxbf0222",
    "github_project": "LlamaDeck",
    "github_not_found": true,
    "lcname": "llama-deck"
}
        
Elapsed time: 0.85962s