datasette-llm-embed


Namedatasette-llm-embed JSON
Version 0.2 PyPI version JSON
download
home_page
Summaryllm_embed(model_id, text) SQL function for Datasette
upload_time2023-10-08 17:44:01
maintainer
docs_urlNone
authorSimon Willison
requires_python
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # datasette-llm-embed

[![PyPI](https://img.shields.io/pypi/v/datasette-llm-embed.svg)](https://pypi.org/project/datasette-llm-embed/)
[![Changelog](https://img.shields.io/github/v/release/simonw/datasette-llm-embed?include_prereleases&label=changelog)](https://github.com/simonw/datasette-llm-embed/releases)
[![Tests](https://github.com/simonw/datasette-llm-embed/workflows/Test/badge.svg)](https://github.com/simonw/datasette-llm-embed/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/datasette-llm-embed/blob/main/LICENSE)

Datasette plugin adding a `llm_embed(model_id, text)` SQL function.

## Installation

```bash
datasette install datasette-llm-embed
```

## Usage

Adds a SQL function that can be called like this:
```sql
select llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')
```
This embeds the provided text using the specified embedding model and returns a binary blob, suitable for use with plugins such as [datasette-faiss](https://datasette.io/plugins/datasette-faiss).

The models need to be installed using [LLM](https://llm.datasette.io/) plugins such as [llm-sentence-transformers](https://github.com/simonw/llm-sentence-transformers).

Use `llm_embed_cosine(a, b)` to calculate cosine similarity between two vector blobs:

```sql
select llm_embed_cosine(
    llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text'),
    llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some other text')
)
```

The `llm_embed_decode()` function can be used to decode a binary BLOB into a JSON array of floats:

```sql
select llm_embed_decode(
    llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')
)
```

## Models that require API keys

If your embedding model needs an API key - for example the `ada-002` model from OpenAI - you can configure that key in `metadata.yml` (or JSON) like this:

```yaml
plugins:
  datasette-llm-embed:
    keys:
      ada-002:
        $env: OPENAI_API_KEY
```
The key here should be the full model ID of the model - not an alias.

You can then set the `OPENAI_API_KEY` environment variable to the key you want to use before starting Datasette:
```bash
export OPENAI_API_KEY=sk-1234567890
```
Once configured, calls like this will use the API key that has been provided:
```sql
select llm_embed('ada-002', 'This is some text')
```

## Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:
```bash
cd datasette-llm-embed
python3 -m venv venv
source venv/bin/activate
```
Now install the dependencies and test dependencies:
```
pip install -e '.[test]'
```
```
To run the tests:
```bash
pytest
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "datasette-llm-embed",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Simon Willison",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/40/8b/26e2d501ddd920eb6a0bfa0ac76499178f1eeaea7bad608fe835eaca2990/datasette-llm-embed-0.2.tar.gz",
    "platform": null,
    "description": "# datasette-llm-embed\n\n[![PyPI](https://img.shields.io/pypi/v/datasette-llm-embed.svg)](https://pypi.org/project/datasette-llm-embed/)\n[![Changelog](https://img.shields.io/github/v/release/simonw/datasette-llm-embed?include_prereleases&label=changelog)](https://github.com/simonw/datasette-llm-embed/releases)\n[![Tests](https://github.com/simonw/datasette-llm-embed/workflows/Test/badge.svg)](https://github.com/simonw/datasette-llm-embed/actions?query=workflow%3ATest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/datasette-llm-embed/blob/main/LICENSE)\n\nDatasette plugin adding a `llm_embed(model_id, text)` SQL function.\n\n## Installation\n\n```bash\ndatasette install datasette-llm-embed\n```\n\n## Usage\n\nAdds a SQL function that can be called like this:\n```sql\nselect llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')\n```\nThis embeds the provided text using the specified embedding model and returns a binary blob, suitable for use with plugins such as [datasette-faiss](https://datasette.io/plugins/datasette-faiss).\n\nThe models need to be installed using [LLM](https://llm.datasette.io/) plugins such as [llm-sentence-transformers](https://github.com/simonw/llm-sentence-transformers).\n\nUse `llm_embed_cosine(a, b)` to calculate cosine similarity between two vector blobs:\n\n```sql\nselect llm_embed_cosine(\n    llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text'),\n    llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some other text')\n)\n```\n\nThe `llm_embed_decode()` function can be used to decode a binary BLOB into a JSON array of floats:\n\n```sql\nselect llm_embed_decode(\n    llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')\n)\n```\n\n## Models that require API keys\n\nIf your embedding model needs an API key - for example the `ada-002` model from OpenAI - you can configure that key in `metadata.yml` (or JSON) like this:\n\n```yaml\nplugins:\n  datasette-llm-embed:\n    keys:\n      ada-002:\n        $env: OPENAI_API_KEY\n```\nThe key here should be the full model ID of the model - not an alias.\n\nYou can then set the `OPENAI_API_KEY` environment variable to the key you want to use before starting Datasette:\n```bash\nexport OPENAI_API_KEY=sk-1234567890\n```\nOnce configured, calls like this will use the API key that has been provided:\n```sql\nselect llm_embed('ada-002', 'This is some text')\n```\n\n## Development\n\nTo set up this plugin locally, first checkout the code. Then create a new virtual environment:\n```bash\ncd datasette-llm-embed\npython3 -m venv venv\nsource venv/bin/activate\n```\nNow install the dependencies and test dependencies:\n```\npip install -e '.[test]'\n```\n```\nTo run the tests:\n```bash\npytest\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "llm_embed(model_id, text) SQL function for Datasette",
    "version": "0.2",
    "project_urls": {
        "CI": "https://github.com/simonw/datasette-llm-embed/actions",
        "Changelog": "https://github.com/simonw/datasette-llm-embed/releases",
        "Homepage": "https://github.com/simonw/datasette-llm-embed",
        "Issues": "https://github.com/simonw/datasette-llm-embed/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "95eaa90fcaaa81310bf05b0d3eace0a752b6b15aa7238273a72b7c373899fea5",
                "md5": "878efbfc2ebd653efd488a7aa28b7472",
                "sha256": "c3474758a5d54af523c344dcf99a331ba33930e7de73d0815feee5cc352c47ff"
            },
            "downloads": -1,
            "filename": "datasette_llm_embed-0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "878efbfc2ebd653efd488a7aa28b7472",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 7500,
            "upload_time": "2023-10-08T17:43:59",
            "upload_time_iso_8601": "2023-10-08T17:43:59.488079Z",
            "url": "https://files.pythonhosted.org/packages/95/ea/a90fcaaa81310bf05b0d3eace0a752b6b15aa7238273a72b7c373899fea5/datasette_llm_embed-0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "408b26e2d501ddd920eb6a0bfa0ac76499178f1eeaea7bad608fe835eaca2990",
                "md5": "debd714ffd6711b3470aecabb01013e0",
                "sha256": "6793b0403546188db13ebcd1d2009078bc88695418aeae4f734e45c3114e510c"
            },
            "downloads": -1,
            "filename": "datasette-llm-embed-0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "debd714ffd6711b3470aecabb01013e0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7461,
            "upload_time": "2023-10-08T17:44:01",
            "upload_time_iso_8601": "2023-10-08T17:44:01.016490Z",
            "url": "https://files.pythonhosted.org/packages/40/8b/26e2d501ddd920eb6a0bfa0ac76499178f1eeaea7bad608fe835eaca2990/datasette-llm-embed-0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-08 17:44:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "simonw",
    "github_project": "datasette-llm-embed",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "datasette-llm-embed"
}
        
Elapsed time: 0.17156s