llm-sentence-transformers

Name	llm-sentence-transformers JSON
Version	0.2 JSON
	download
home_page
Summary	Use sentence-transformers for embeddings with LLM
upload_time	2024-02-04 18:58:06
maintainer
docs_url	None
author	Simon Willison
requires_python
license	Apache-2.0
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # llm-sentence-transformers

[![PyPI](https://img.shields.io/pypi/v/llm-sentence-transformers.svg)](https://pypi.org/project/llm-sentence-transformers/)
[![Changelog](https://img.shields.io/github/v/release/simonw/llm-sentence-transformers?include_prereleases&label=changelog)](https://github.com/simonw/llm-sentence-transformers/releases)
[![Tests](https://github.com/simonw/llm-sentence-transformers/workflows/Test/badge.svg)](https://github.com/simonw/llm-sentence-transformers/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/llm-sentence-transformers/blob/main/LICENSE)

[LLM](https://llm.datasette.io/) plugin for embedding models using [sentence-transformers](https://www.sbert.net/)

Further reading:
- [LLM now provides tools for working with embeddings](https://simonwillison.net/2023/Sep/4/llm-embeddings/)
- [Embedding paragraphs from my blog with E5-large-v2](https://til.simonwillison.net/llms/embed-paragraphs)

## Installation

Install this plugin in the same environment as LLM.
```bash
llm install llm-sentence-transformers
```
## Configuration

After installing the plugin you need to register one or more models in order to use it. The [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model is registered by default, and will be downloaded the first time you use it.

You can try that model out like this:

```bash
llm embed -m mini-l6 -c 'hello'
```
This will return a JSON array of floating point numbers.

You can add more models using the `llm sentence-transformers register` command. Here is a [list of available models](https://www.sbert.net/docs/pretrained_models.html).

Two good models to start experimenting with are `all-MiniLM-L12-v2` - a 120MB download - and `all-mpnet-base-v2`, which is 420MB.

To install that [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) model, run:

```bash
llm sentence-transformers register \
  all-mpnet-base-v2 \
  --alias mpnet
```
The `--alias` is optional, but can be used to configure one or more shorter aliases for the model.

You can run `llm aliases` to confirm which aliases you have configured, and [llm aliases set](https://llm.datasette.io/en/stable/aliases.html) to configure further aliases.

## Usage

Once you have installed an embedding model you can use it like this:

```bash
llm embed -m sentence-transformers/all-mpnet-base-v2 \
  -c "Hello world"
```
Or use its alias:
```bash
llm embed -m mpnet -c "Hello world"
```
Embeddings are more useful if you store them in a database - see [the LLM documentation](https://llm.datasette.io/en/stable/embeddings/cli.html#storing-embeddings-in-sqlite) for instructions on doing that.

Be sure to review the documentation for the model you are using. Many models will silently truncate content beyond a certain number of tokens. `all-mpnet-base-v2` says that "input text longer than 384 word pieces is truncated", for example.

## Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:
```bash
cd llm-sentence-transformers
python3 -m venv venv
source venv/bin/activate
```
Now install the dependencies and test dependencies:
```bash
pip install -e '.[test]'
```
To run the tests:
```bash
pytest
```

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "llm-sentence-transformers",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Simon Willison",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/99/da/c1d669d338b86ddfd0c18bc5f0bfd1aed24c8a59c4bbad21231e6ff09fd1/llm-sentence-transformers-0.2.tar.gz",
    "platform": null,
    "description": "# llm-sentence-transformers\n\n[![PyPI](https://img.shields.io/pypi/v/llm-sentence-transformers.svg)](https://pypi.org/project/llm-sentence-transformers/)\n[![Changelog](https://img.shields.io/github/v/release/simonw/llm-sentence-transformers?include_prereleases&label=changelog)](https://github.com/simonw/llm-sentence-transformers/releases)\n[![Tests](https://github.com/simonw/llm-sentence-transformers/workflows/Test/badge.svg)](https://github.com/simonw/llm-sentence-transformers/actions?query=workflow%3ATest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/llm-sentence-transformers/blob/main/LICENSE)\n\n[LLM](https://llm.datasette.io/) plugin for embedding models using [sentence-transformers](https://www.sbert.net/)\n\nFurther reading:\n- [LLM now provides tools for working with embeddings](https://simonwillison.net/2023/Sep/4/llm-embeddings/)\n- [Embedding paragraphs from my blog with E5-large-v2](https://til.simonwillison.net/llms/embed-paragraphs)\n\n## Installation\n\nInstall this plugin in the same environment as LLM.\n```bash\nllm install llm-sentence-transformers\n```\n## Configuration\n\nAfter installing the plugin you need to register one or more models in order to use it. The [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model is registered by default, and will be downloaded the first time you use it.\n\nYou can try that model out like this:\n\n```bash\nllm embed -m mini-l6 -c 'hello'\n```\nThis will return a JSON array of floating point numbers.\n\nYou can add more models using the `llm sentence-transformers register` command. Here is a [list of available models](https://www.sbert.net/docs/pretrained_models.html).\n\nTwo good models to start experimenting with are `all-MiniLM-L12-v2` - a 120MB download - and `all-mpnet-base-v2`, which is 420MB.\n\nTo install that [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) model, run:\n\n```bash\nllm sentence-transformers register \\\n  all-mpnet-base-v2 \\\n  --alias mpnet\n```\nThe `--alias` is optional, but can be used to configure one or more shorter aliases for the model.\n\nYou can run `llm aliases` to confirm which aliases you have configured, and [llm aliases set](https://llm.datasette.io/en/stable/aliases.html) to configure further aliases.\n\n## Usage\n\nOnce you have installed an embedding model you can use it like this:\n\n```bash\nllm embed -m sentence-transformers/all-mpnet-base-v2 \\\n  -c \"Hello world\"\n```\nOr use its alias:\n```bash\nllm embed -m mpnet -c \"Hello world\"\n```\nEmbeddings are more useful if you store them in a database - see [the LLM documentation](https://llm.datasette.io/en/stable/embeddings/cli.html#storing-embeddings-in-sqlite) for instructions on doing that.\n\nBe sure to review the documentation for the model you are using. Many models will silently truncate content beyond a certain number of tokens. `all-mpnet-base-v2` says that \"input text longer than 384 word pieces is truncated\", for example.\n\n## Development\n\nTo set up this plugin locally, first checkout the code. Then create a new virtual environment:\n```bash\ncd llm-sentence-transformers\npython3 -m venv venv\nsource venv/bin/activate\n```\nNow install the dependencies and test dependencies:\n```bash\npip install -e '.[test]'\n```\nTo run the tests:\n```bash\npytest\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Use sentence-transformers for embeddings with LLM",
    "version": "0.2",
    "project_urls": {
        "CI": "https://github.com/simonw/llm-sentence-transformers/actions",
        "Changelog": "https://github.com/simonw/llm-sentence-transformers/releases",
        "Homepage": "https://github.com/simonw/llm-sentence-transformers",
        "Issues": "https://github.com/simonw/llm-sentence-transformers/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bc4205fd68dc43031ad9c40f8bf12b54558b1d0cb990d2c1274570f63596a90f",
                "md5": "edf92876a67342114e96ffb42d8d1d1f",
                "sha256": "48e10fe9312454355aa8865b946b67b6854763da9b79d85a7d385561f1ff7c51"
            },
            "downloads": -1,
            "filename": "llm_sentence_transformers-0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "edf92876a67342114e96ffb42d8d1d1f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 8427,
            "upload_time": "2024-02-04T18:58:04",
            "upload_time_iso_8601": "2024-02-04T18:58:04.817835Z",
            "url": "https://files.pythonhosted.org/packages/bc/42/05fd68dc43031ad9c40f8bf12b54558b1d0cb990d2c1274570f63596a90f/llm_sentence_transformers-0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "99dac1d669d338b86ddfd0c18bc5f0bfd1aed24c8a59c4bbad21231e6ff09fd1",
                "md5": "01d7b01a94f4f31d4719163123ba675a",
                "sha256": "67f6b5cb8cd57276d90ed976d729fbe6357aae6c8daa7c98435c086687c3da3b"
            },
            "downloads": -1,
            "filename": "llm-sentence-transformers-0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "01d7b01a94f4f31d4719163123ba675a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 8296,
            "upload_time": "2024-02-04T18:58:06",
            "upload_time_iso_8601": "2024-02-04T18:58:06.237432Z",
            "url": "https://files.pythonhosted.org/packages/99/da/c1d669d338b86ddfd0c18bc5f0bfd1aed24c8a59c4bbad21231e6ff09fd1/llm-sentence-transformers-0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-04 18:58:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "simonw",
    "github_project": "llm-sentence-transformers",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "llm-sentence-transformers"
}

Simon Willison