Name | datasette-llm-embed JSON |
Version |
0.2
JSON |
| download |
home_page | |
Summary | llm_embed(model_id, text) SQL function for Datasette |
upload_time | 2023-10-08 17:44:01 |
maintainer | |
docs_url | None |
author | Simon Willison |
requires_python | |
license | Apache-2.0 |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# datasette-llm-embed
[![PyPI](https://img.shields.io/pypi/v/datasette-llm-embed.svg)](https://pypi.org/project/datasette-llm-embed/)
[![Changelog](https://img.shields.io/github/v/release/simonw/datasette-llm-embed?include_prereleases&label=changelog)](https://github.com/simonw/datasette-llm-embed/releases)
[![Tests](https://github.com/simonw/datasette-llm-embed/workflows/Test/badge.svg)](https://github.com/simonw/datasette-llm-embed/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/datasette-llm-embed/blob/main/LICENSE)
Datasette plugin adding a `llm_embed(model_id, text)` SQL function.
## Installation
```bash
datasette install datasette-llm-embed
```
## Usage
Adds a SQL function that can be called like this:
```sql
select llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')
```
This embeds the provided text using the specified embedding model and returns a binary blob, suitable for use with plugins such as [datasette-faiss](https://datasette.io/plugins/datasette-faiss).
The models need to be installed using [LLM](https://llm.datasette.io/) plugins such as [llm-sentence-transformers](https://github.com/simonw/llm-sentence-transformers).
Use `llm_embed_cosine(a, b)` to calculate cosine similarity between two vector blobs:
```sql
select llm_embed_cosine(
llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text'),
llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some other text')
)
```
The `llm_embed_decode()` function can be used to decode a binary BLOB into a JSON array of floats:
```sql
select llm_embed_decode(
llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')
)
```
## Models that require API keys
If your embedding model needs an API key - for example the `ada-002` model from OpenAI - you can configure that key in `metadata.yml` (or JSON) like this:
```yaml
plugins:
datasette-llm-embed:
keys:
ada-002:
$env: OPENAI_API_KEY
```
The key here should be the full model ID of the model - not an alias.
You can then set the `OPENAI_API_KEY` environment variable to the key you want to use before starting Datasette:
```bash
export OPENAI_API_KEY=sk-1234567890
```
Once configured, calls like this will use the API key that has been provided:
```sql
select llm_embed('ada-002', 'This is some text')
```
## Development
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
```bash
cd datasette-llm-embed
python3 -m venv venv
source venv/bin/activate
```
Now install the dependencies and test dependencies:
```
pip install -e '.[test]'
```
```
To run the tests:
```bash
pytest
```
Raw data
{
"_id": null,
"home_page": "",
"name": "datasette-llm-embed",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Simon Willison",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/40/8b/26e2d501ddd920eb6a0bfa0ac76499178f1eeaea7bad608fe835eaca2990/datasette-llm-embed-0.2.tar.gz",
"platform": null,
"description": "# datasette-llm-embed\n\n[![PyPI](https://img.shields.io/pypi/v/datasette-llm-embed.svg)](https://pypi.org/project/datasette-llm-embed/)\n[![Changelog](https://img.shields.io/github/v/release/simonw/datasette-llm-embed?include_prereleases&label=changelog)](https://github.com/simonw/datasette-llm-embed/releases)\n[![Tests](https://github.com/simonw/datasette-llm-embed/workflows/Test/badge.svg)](https://github.com/simonw/datasette-llm-embed/actions?query=workflow%3ATest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/datasette-llm-embed/blob/main/LICENSE)\n\nDatasette plugin adding a `llm_embed(model_id, text)` SQL function.\n\n## Installation\n\n```bash\ndatasette install datasette-llm-embed\n```\n\n## Usage\n\nAdds a SQL function that can be called like this:\n```sql\nselect llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')\n```\nThis embeds the provided text using the specified embedding model and returns a binary blob, suitable for use with plugins such as [datasette-faiss](https://datasette.io/plugins/datasette-faiss).\n\nThe models need to be installed using [LLM](https://llm.datasette.io/) plugins such as [llm-sentence-transformers](https://github.com/simonw/llm-sentence-transformers).\n\nUse `llm_embed_cosine(a, b)` to calculate cosine similarity between two vector blobs:\n\n```sql\nselect llm_embed_cosine(\n llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text'),\n llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some other text')\n)\n```\n\nThe `llm_embed_decode()` function can be used to decode a binary BLOB into a JSON array of floats:\n\n```sql\nselect llm_embed_decode(\n llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')\n)\n```\n\n## Models that require API keys\n\nIf your embedding model needs an API key - for example the `ada-002` model from OpenAI - you can configure that key in `metadata.yml` (or JSON) like this:\n\n```yaml\nplugins:\n datasette-llm-embed:\n keys:\n ada-002:\n $env: OPENAI_API_KEY\n```\nThe key here should be the full model ID of the model - not an alias.\n\nYou can then set the `OPENAI_API_KEY` environment variable to the key you want to use before starting Datasette:\n```bash\nexport OPENAI_API_KEY=sk-1234567890\n```\nOnce configured, calls like this will use the API key that has been provided:\n```sql\nselect llm_embed('ada-002', 'This is some text')\n```\n\n## Development\n\nTo set up this plugin locally, first checkout the code. Then create a new virtual environment:\n```bash\ncd datasette-llm-embed\npython3 -m venv venv\nsource venv/bin/activate\n```\nNow install the dependencies and test dependencies:\n```\npip install -e '.[test]'\n```\n```\nTo run the tests:\n```bash\npytest\n```\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "llm_embed(model_id, text) SQL function for Datasette",
"version": "0.2",
"project_urls": {
"CI": "https://github.com/simonw/datasette-llm-embed/actions",
"Changelog": "https://github.com/simonw/datasette-llm-embed/releases",
"Homepage": "https://github.com/simonw/datasette-llm-embed",
"Issues": "https://github.com/simonw/datasette-llm-embed/issues"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "95eaa90fcaaa81310bf05b0d3eace0a752b6b15aa7238273a72b7c373899fea5",
"md5": "878efbfc2ebd653efd488a7aa28b7472",
"sha256": "c3474758a5d54af523c344dcf99a331ba33930e7de73d0815feee5cc352c47ff"
},
"downloads": -1,
"filename": "datasette_llm_embed-0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "878efbfc2ebd653efd488a7aa28b7472",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 7500,
"upload_time": "2023-10-08T17:43:59",
"upload_time_iso_8601": "2023-10-08T17:43:59.488079Z",
"url": "https://files.pythonhosted.org/packages/95/ea/a90fcaaa81310bf05b0d3eace0a752b6b15aa7238273a72b7c373899fea5/datasette_llm_embed-0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "408b26e2d501ddd920eb6a0bfa0ac76499178f1eeaea7bad608fe835eaca2990",
"md5": "debd714ffd6711b3470aecabb01013e0",
"sha256": "6793b0403546188db13ebcd1d2009078bc88695418aeae4f734e45c3114e510c"
},
"downloads": -1,
"filename": "datasette-llm-embed-0.2.tar.gz",
"has_sig": false,
"md5_digest": "debd714ffd6711b3470aecabb01013e0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7461,
"upload_time": "2023-10-08T17:44:01",
"upload_time_iso_8601": "2023-10-08T17:44:01.016490Z",
"url": "https://files.pythonhosted.org/packages/40/8b/26e2d501ddd920eb6a0bfa0ac76499178f1eeaea7bad608fe835eaca2990/datasette-llm-embed-0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-08 17:44:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "simonw",
"github_project": "datasette-llm-embed",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "datasette-llm-embed"
}