llamakey

Name	llamakey JSON
Version	0.1.3 JSON
	download
home_page
Summary	One master key for all LLM/GenAI endpoints
upload_time	2024-02-11 03:05:25
maintainer
docs_url	None
author
requires_python	>=3.10
license
keywords	tokens
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # LlaMaKey: one master key for accessing all cloud LLM/GenAI APIs

LlaMa(ster)Key is the simplified and secure way to manage API keys and control the access to various cloud LLM/GenAI APIs for multiple users. 

LlaMaKey enables a user to access multiple cloud AI APIs through **one master key** with **no code change**, using the **official Python SDK** of major cloud AI (OpenAI, Cohere, AnyScale, HuggingFace, Perplexity) APIs still. The master key is unique to each user so that **revoking a user won't affect others**. The actual API keys are hidden to the user to minimize the risk of key leakage. 

```mermaid
graph TD
   subgraph Your team
     A[User 1] -- Master Key 1 --> L["LlaMasterKey server<br> (rate throttling, API/endpoint whitelisting, <br> logging, budgetting, etc.)"]
     B[User 2] -- Master Key 2 --> L
     C[User 100] -- Master Key 100 --> L
   end
    L -- Actual OPENAI_API_KEY--> O[OpenAI endpoints]
    L -- Actual CO_API_KEY--> P[Cohere  endpoints]
    L -- Actual HF_API_KEY--> Q[HuggingFace endpoints]
```

## Features and Benefits: 

* **Ease for the user**: The user only needs to know one key -- instead multiple keys in the old way
* **Ease for the administrator**: Reduce the number of keys to manage and isolate each user's access from others. Revoking or changing one user's access will not affect others.
* **Safety for the keys**: The actual API keys to authenticate with cloud APIs are hidden from the user. **No more key leakage.**
* **Granular control (coming)**: Per-user rate throttling, API/endpoint whitelisting, budget capping, etc. via policies. **No more surprising bills.**

Supported APIs:
* [x] OpenAI (all endpoints)
* [x] Cohere (all endpoints)
* [x] AnyScale (AnyScale API is OpenAI-client compatible)
* [x] Perplexity AI (AnyScale API is OpenAI-client compatible)
* [x] HuggingFace Inference API (free tier)
* [ ] HuggingFace EndPoint API
* [ ] Anthropic
* [ ] Google Vertex AI
* [x] [Vectara AI](https://vectara.com/)


Currently, authentication with the LlaMaKey server is not enabled. All users share the master key `LlaMaKey`. If you want to see it, please [upvote here](https://github.com/TexteaInc/LlaMasterKey/issues/6).



## Installation

* Stable version: 
  ```bash
  pip install LLaMasterKey
  ```
* Nightly version: download from [here](https://github.com/TexteaInc/LlaMasterKey/releases/tag/nightly)
* For building from source, see [Build from source](#build-from-source).

## Usage

### The server end
Set up the actual API keys as environment variables per their respective APIs, and then start the server, for example:

```bash
# Step 1: Set the actual API keys as environment variables
export OPENAI_API_KEY=sk-xxx # openai
export CO_API_KEY=co-xxx # cohere
export HF_TOKEN=hf-xxx # huggingface
export ANYSCALE_API_KEY=credential-xxx # anyscale
export PERPLEXITY_API_KEY=pplx-xxx # perplexity

lmk # Step 2: start the server
```

By default, the server is started at `http://localhost:8000` (8000 is the default port of FastAPI).

Shell commands to activate proper environment variables on your client end will be printed, like this:
```bash
export OPENAI_BASE_URL="http://127.0.0.1:8000/openai" # direct OpenAI calls to the LlaMaKey server
export CO_API_URL="http://127.0.0.1:8000/cohere"
export ANYSCALE_BASE_URL="http://127.0.0.1:8000/anyscale"
export HF_INFERENCE_ENDPOINT="http://127.0.0.1:8000/huggingface"

export OPENAI_API_KEY="LlaMaKey" # One master key for all APIs
export CO_API_KEY="LlaMaKey"
export ANYSCALE_API_KEY="LlaMaKey"
export HF_TOKEN="LlaMaKey"
```
Such environment variables will direct the API calls to the LlaMaKey server. For your convenience, the commands are also dumped to the file`./llamakey_local.env`.

###  The client end
Just activate the environment variables generated above and then run your code as usual!
You may copy and paste the commands above or simply source the `llamakey_local.env` file generated above, for example:

```bash
# step 1: activate the environment variables that tell official SDKs to make requests to LlaMaKey server
source llamakey_local.env

# Step 2: Call offical Python SDKs as usual, for example, for OpenAI:
python3 -c '\
from openai import OpenAI;
client = OpenAI();
print (\
  client.chat.completions.create(\
    model="gpt-3.5-turbo",\
    messages=[{"role": "user", "content": "What is FastAPI?"}]
  )
)'
```

## Build from source

Requirements: git and  [Rust Toolchain](https://www.rust-lang.org/tools/install).

```bash
git clone git@github.com:TexteaInc/LlaMasterKey.git
# you can switch to a different branch:
# git switch dev
cargo build --release
# binary at ./target/release/lmk

# run it without installation
cargo run
# you can also install it system-wide
cargo install --path .

# run it
lmk
```

## How LlaMaKey works

As a proxy, LlaMaKey takes advantage of a feature in the Python SDK of most cloud LLM/GenAI APIs that they allow setting the base URL and API keys/tokens to and with which a request is sent and authenticated ([OpenAI's](https://github.com/openai/openai-python/blob/d231d1fa783967c1d3a1db3ba1b52647fff148ac/src/openai/_client.py#L95-L108), [Cohere's](https://github.com/cohere-ai/cohere-python/blob/6e035811ecbf33744a5618946371e0e548eb2e73/cohere/client.py#L86-L87)). The base URL and API key can be set easily via environment variables. So a client just needs to set such environment variables (or manually configure in their code) and then call the APIs as usual -- see [how simple and easy](#the-client-end). LlaMaKey will receive the request, authenticate the user (if authentication is enabled), and then forward the request to the corresponding actual cloud API with an actual API key (set by the administrator when starting a LlaMaKey server). The response will be passed back to the client after the LlaMaKey server hears back from a cloud API.

## License

Ah, this is important. Let's say MIT for now?

## Contact

For usage, bugs, or feature requests, please open an issue on Github. For private inquiries, please email `hello@LlaMaKey.ai`.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "llamakey",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "tokens",
    "author": "",
    "author_email": "\"Textea Inc.\" <bao@textea.co>",
    "download_url": "https://files.pythonhosted.org/packages/c8/7c/102c4ecb98a2456581dba7436c9f21cff72a37c1b1e78c6db9090f566e64/llamakey-0.1.3.tar.gz",
    "platform": null,
    "description": "# LlaMaKey: one master key for accessing all cloud LLM/GenAI APIs\n\nLlaMa(ster)Key is the simplified and secure way to manage API keys and control the access to various cloud LLM/GenAI APIs for multiple users. \n\nLlaMaKey enables a user to access multiple cloud AI APIs through **one master key** with **no code change**, using the **official Python SDK** of major cloud AI (OpenAI, Cohere, AnyScale, HuggingFace, Perplexity) APIs still. The master key is unique to each user so that **revoking a user won't affect others**. The actual API keys are hidden to the user to minimize the risk of key leakage. \n\n```mermaid\ngraph TD\n   subgraph Your team\n     A[User 1] -- Master Key 1 --> L[\"LlaMasterKey server<br> (rate throttling, API/endpoint whitelisting, <br> logging, budgetting, etc.)\"]\n     B[User 2] -- Master Key 2 --> L\n     C[User 100] -- Master Key 100 --> L\n   end\n    L -- Actual OPENAI_API_KEY--> O[OpenAI endpoints]\n    L -- Actual CO_API_KEY--> P[Cohere  endpoints]\n    L -- Actual HF_API_KEY--> Q[HuggingFace endpoints]\n```\n\n## Features and Benefits: \n\n* **Ease for the user**: The user only needs to know one key -- instead multiple keys in the old way\n* **Ease for the administrator**: Reduce the number of keys to manage and isolate each user's access from others. Revoking or changing one user's access will not affect others.\n* **Safety for the keys**: The actual API keys to authenticate with cloud APIs are hidden from the user. **No more key leakage.**\n* **Granular control (coming)**: Per-user rate throttling, API/endpoint whitelisting, budget capping, etc. via policies. **No more surprising bills.**\n\nSupported APIs:\n* [x] OpenAI (all endpoints)\n* [x] Cohere (all endpoints)\n* [x] AnyScale (AnyScale API is OpenAI-client compatible)\n* [x] Perplexity AI (AnyScale API is OpenAI-client compatible)\n* [x] HuggingFace Inference API (free tier)\n* [ ] HuggingFace EndPoint API\n* [ ] Anthropic\n* [ ] Google Vertex AI\n* [x] [Vectara AI](https://vectara.com/)\n\n\nCurrently, authentication with the LlaMaKey server is not enabled. All users share the master key `LlaMaKey`. If you want to see it, please [upvote here](https://github.com/TexteaInc/LlaMasterKey/issues/6).\n\n\n\n## Installation\n\n* Stable version: \n  ```bash\n  pip install LLaMasterKey\n  ```\n* Nightly version: download from [here](https://github.com/TexteaInc/LlaMasterKey/releases/tag/nightly)\n* For building from source, see [Build from source](#build-from-source).\n\n## Usage\n\n### The server end\nSet up the actual API keys as environment variables per their respective APIs, and then start the server, for example:\n\n```bash\n# Step 1: Set the actual API keys as environment variables\nexport OPENAI_API_KEY=sk-xxx # openai\nexport CO_API_KEY=co-xxx # cohere\nexport HF_TOKEN=hf-xxx # huggingface\nexport ANYSCALE_API_KEY=credential-xxx # anyscale\nexport PERPLEXITY_API_KEY=pplx-xxx # perplexity\n\nlmk # Step 2: start the server\n```\n\nBy default, the server is started at `http://localhost:8000` (8000 is the default port of FastAPI).\n\nShell commands to activate proper environment variables on your client end will be printed, like this:\n```bash\nexport OPENAI_BASE_URL=\"http://127.0.0.1:8000/openai\" # direct OpenAI calls to the LlaMaKey server\nexport CO_API_URL=\"http://127.0.0.1:8000/cohere\"\nexport ANYSCALE_BASE_URL=\"http://127.0.0.1:8000/anyscale\"\nexport HF_INFERENCE_ENDPOINT=\"http://127.0.0.1:8000/huggingface\"\n\nexport OPENAI_API_KEY=\"LlaMaKey\" # One master key for all APIs\nexport CO_API_KEY=\"LlaMaKey\"\nexport ANYSCALE_API_KEY=\"LlaMaKey\"\nexport HF_TOKEN=\"LlaMaKey\"\n```\nSuch environment variables will direct the API calls to the LlaMaKey server. For your convenience, the commands are also dumped to the file`./llamakey_local.env`.\n\n###  The client end\nJust activate the environment variables generated above and then run your code as usual!\nYou may copy and paste the commands above or simply source the `llamakey_local.env` file generated above, for example:\n\n```bash\n# step 1: activate the environment variables that tell official SDKs to make requests to LlaMaKey server\nsource llamakey_local.env\n\n# Step 2: Call offical Python SDKs as usual, for example, for OpenAI:\npython3 -c '\\\nfrom openai import OpenAI;\nclient = OpenAI();\nprint (\\\n  client.chat.completions.create(\\\n    model=\"gpt-3.5-turbo\",\\\n    messages=[{\"role\": \"user\", \"content\": \"What is FastAPI?\"}]\n  )\n)'\n```\n\n## Build from source\n\nRequirements: git and  [Rust Toolchain](https://www.rust-lang.org/tools/install).\n\n```bash\ngit clone git@github.com:TexteaInc/LlaMasterKey.git\n# you can switch to a different branch:\n# git switch dev\ncargo build --release\n# binary at ./target/release/lmk\n\n# run it without installation\ncargo run\n# you can also install it system-wide\ncargo install --path .\n\n# run it\nlmk\n```\n\n## How LlaMaKey works\n\nAs a proxy, LlaMaKey takes advantage of a feature in the Python SDK of most cloud LLM/GenAI APIs that they allow setting the base URL and API keys/tokens to and with which a request is sent and authenticated ([OpenAI's](https://github.com/openai/openai-python/blob/d231d1fa783967c1d3a1db3ba1b52647fff148ac/src/openai/_client.py#L95-L108), [Cohere's](https://github.com/cohere-ai/cohere-python/blob/6e035811ecbf33744a5618946371e0e548eb2e73/cohere/client.py#L86-L87)). The base URL and API key can be set easily via environment variables. So a client just needs to set such environment variables (or manually configure in their code) and then call the APIs as usual -- see [how simple and easy](#the-client-end). LlaMaKey will receive the request, authenticate the user (if authentication is enabled), and then forward the request to the corresponding actual cloud API with an actual API key (set by the administrator when starting a LlaMaKey server). The response will be passed back to the client after the LlaMaKey server hears back from a cloud API.\n\n## License\n\nAh, this is important. Let's say MIT for now?\n\n## Contact\n\nFor usage, bugs, or feature requests, please open an issue on Github. For private inquiries, please email `hello@LlaMaKey.ai`.\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "One master key for all LLM/GenAI endpoints",
    "version": "0.1.3",
    "project_urls": null,
    "split_keywords": [
        "tokens"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1f8e45ee54791d10e89f34bf07b4b9b3293c77391c0cb5c378c17ef225483a1c",
                "md5": "0ac857d298506adf731f4977a6f302fa",
                "sha256": "74b299c9a5ef7d178a21bdb8dacf2cee6b6842c13875938101d24a55243aa871"
            },
            "downloads": -1,
            "filename": "llamakey-0.1.3-py3-none-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl",
            "has_sig": false,
            "md5_digest": "0ac857d298506adf731f4977a6f302fa",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 7040337,
            "upload_time": "2024-02-11T03:05:18",
            "upload_time_iso_8601": "2024-02-11T03:05:18.618731Z",
            "url": "https://files.pythonhosted.org/packages/1f/8e/45ee54791d10e89f34bf07b4b9b3293c77391c0cb5c378c17ef225483a1c/llamakey-0.1.3-py3-none-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1c0cf83b27da34b96936c6b6b069b38bcf777fa967b797c3aff94459031e1bda",
                "md5": "21b0ed0e3abb05cc56937c5e5e2e848c",
                "sha256": "83a76b805d6efeff45a743aa9ee40784a397196803526844a6667290b6cb6025"
            },
            "downloads": -1,
            "filename": "llamakey-0.1.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "21b0ed0e3abb05cc56937c5e5e2e848c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 4887130,
            "upload_time": "2024-02-11T03:05:21",
            "upload_time_iso_8601": "2024-02-11T03:05:21.268875Z",
            "url": "https://files.pythonhosted.org/packages/1c/0c/f83b27da34b96936c6b6b069b38bcf777fa967b797c3aff94459031e1bda/llamakey-0.1.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5c42fc60ce36f84c19e0dc9de88be2e3cba105daf1f699f0560b9791fe6a8ba7",
                "md5": "a30ff9e9835b098d9b633bde237fe19a",
                "sha256": "e5c2f47b37855e5dd2295a9b4a104af808dbfc315559703f940d7c676fe01647"
            },
            "downloads": -1,
            "filename": "llamakey-0.1.3-py3-none-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "a30ff9e9835b098d9b633bde237fe19a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 2860409,
            "upload_time": "2024-02-11T03:05:23",
            "upload_time_iso_8601": "2024-02-11T03:05:23.627513Z",
            "url": "https://files.pythonhosted.org/packages/5c/42/fc60ce36f84c19e0dc9de88be2e3cba105daf1f699f0560b9791fe6a8ba7/llamakey-0.1.3-py3-none-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c87c102c4ecb98a2456581dba7436c9f21cff72a37c1b1e78c6db9090f566e64",
                "md5": "d070b7e76277ef02e26c293dce194208",
                "sha256": "4349c94f52ea03bb06fa82ee09a812385e8432ecfe1e2558dd2d73c00edeef27"
            },
            "downloads": -1,
            "filename": "llamakey-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "d070b7e76277ef02e26c293dce194208",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 26650,
            "upload_time": "2024-02-11T03:05:25",
            "upload_time_iso_8601": "2024-02-11T03:05:25.016540Z",
            "url": "https://files.pythonhosted.org/packages/c8/7c/102c4ecb98a2456581dba7436c9f21cff72a37c1b1e78c6db9090f566e64/llamakey-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-11 03:05:25",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llamakey"
}