Name | llm-sentence-transformers JSON |
Version |
0.2
JSON |
| download |
home_page | |
Summary | Use sentence-transformers for embeddings with LLM |
upload_time | 2024-02-04 18:58:06 |
maintainer | |
docs_url | None |
author | Simon Willison |
requires_python | |
license | Apache-2.0 |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# llm-sentence-transformers
[![PyPI](https://img.shields.io/pypi/v/llm-sentence-transformers.svg)](https://pypi.org/project/llm-sentence-transformers/)
[![Changelog](https://img.shields.io/github/v/release/simonw/llm-sentence-transformers?include_prereleases&label=changelog)](https://github.com/simonw/llm-sentence-transformers/releases)
[![Tests](https://github.com/simonw/llm-sentence-transformers/workflows/Test/badge.svg)](https://github.com/simonw/llm-sentence-transformers/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/llm-sentence-transformers/blob/main/LICENSE)
[LLM](https://llm.datasette.io/) plugin for embedding models using [sentence-transformers](https://www.sbert.net/)
Further reading:
- [LLM now provides tools for working with embeddings](https://simonwillison.net/2023/Sep/4/llm-embeddings/)
- [Embedding paragraphs from my blog with E5-large-v2](https://til.simonwillison.net/llms/embed-paragraphs)
## Installation
Install this plugin in the same environment as LLM.
```bash
llm install llm-sentence-transformers
```
## Configuration
After installing the plugin you need to register one or more models in order to use it. The [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model is registered by default, and will be downloaded the first time you use it.
You can try that model out like this:
```bash
llm embed -m mini-l6 -c 'hello'
```
This will return a JSON array of floating point numbers.
You can add more models using the `llm sentence-transformers register` command. Here is a [list of available models](https://www.sbert.net/docs/pretrained_models.html).
Two good models to start experimenting with are `all-MiniLM-L12-v2` - a 120MB download - and `all-mpnet-base-v2`, which is 420MB.
To install that [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) model, run:
```bash
llm sentence-transformers register \
all-mpnet-base-v2 \
--alias mpnet
```
The `--alias` is optional, but can be used to configure one or more shorter aliases for the model.
You can run `llm aliases` to confirm which aliases you have configured, and [llm aliases set](https://llm.datasette.io/en/stable/aliases.html) to configure further aliases.
## Usage
Once you have installed an embedding model you can use it like this:
```bash
llm embed -m sentence-transformers/all-mpnet-base-v2 \
-c "Hello world"
```
Or use its alias:
```bash
llm embed -m mpnet -c "Hello world"
```
Embeddings are more useful if you store them in a database - see [the LLM documentation](https://llm.datasette.io/en/stable/embeddings/cli.html#storing-embeddings-in-sqlite) for instructions on doing that.
Be sure to review the documentation for the model you are using. Many models will silently truncate content beyond a certain number of tokens. `all-mpnet-base-v2` says that "input text longer than 384 word pieces is truncated", for example.
## Development
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
```bash
cd llm-sentence-transformers
python3 -m venv venv
source venv/bin/activate
```
Now install the dependencies and test dependencies:
```bash
pip install -e '.[test]'
```
To run the tests:
```bash
pytest
```
Raw data
{
"_id": null,
"home_page": "",
"name": "llm-sentence-transformers",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Simon Willison",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/99/da/c1d669d338b86ddfd0c18bc5f0bfd1aed24c8a59c4bbad21231e6ff09fd1/llm-sentence-transformers-0.2.tar.gz",
"platform": null,
"description": "# llm-sentence-transformers\n\n[![PyPI](https://img.shields.io/pypi/v/llm-sentence-transformers.svg)](https://pypi.org/project/llm-sentence-transformers/)\n[![Changelog](https://img.shields.io/github/v/release/simonw/llm-sentence-transformers?include_prereleases&label=changelog)](https://github.com/simonw/llm-sentence-transformers/releases)\n[![Tests](https://github.com/simonw/llm-sentence-transformers/workflows/Test/badge.svg)](https://github.com/simonw/llm-sentence-transformers/actions?query=workflow%3ATest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/llm-sentence-transformers/blob/main/LICENSE)\n\n[LLM](https://llm.datasette.io/) plugin for embedding models using [sentence-transformers](https://www.sbert.net/)\n\nFurther reading:\n- [LLM now provides tools for working with embeddings](https://simonwillison.net/2023/Sep/4/llm-embeddings/)\n- [Embedding paragraphs from my blog with E5-large-v2](https://til.simonwillison.net/llms/embed-paragraphs)\n\n## Installation\n\nInstall this plugin in the same environment as LLM.\n```bash\nllm install llm-sentence-transformers\n```\n## Configuration\n\nAfter installing the plugin you need to register one or more models in order to use it. The [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model is registered by default, and will be downloaded the first time you use it.\n\nYou can try that model out like this:\n\n```bash\nllm embed -m mini-l6 -c 'hello'\n```\nThis will return a JSON array of floating point numbers.\n\nYou can add more models using the `llm sentence-transformers register` command. Here is a [list of available models](https://www.sbert.net/docs/pretrained_models.html).\n\nTwo good models to start experimenting with are `all-MiniLM-L12-v2` - a 120MB download - and `all-mpnet-base-v2`, which is 420MB.\n\nTo install that [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) model, run:\n\n```bash\nllm sentence-transformers register \\\n all-mpnet-base-v2 \\\n --alias mpnet\n```\nThe `--alias` is optional, but can be used to configure one or more shorter aliases for the model.\n\nYou can run `llm aliases` to confirm which aliases you have configured, and [llm aliases set](https://llm.datasette.io/en/stable/aliases.html) to configure further aliases.\n\n## Usage\n\nOnce you have installed an embedding model you can use it like this:\n\n```bash\nllm embed -m sentence-transformers/all-mpnet-base-v2 \\\n -c \"Hello world\"\n```\nOr use its alias:\n```bash\nllm embed -m mpnet -c \"Hello world\"\n```\nEmbeddings are more useful if you store them in a database - see [the LLM documentation](https://llm.datasette.io/en/stable/embeddings/cli.html#storing-embeddings-in-sqlite) for instructions on doing that.\n\nBe sure to review the documentation for the model you are using. Many models will silently truncate content beyond a certain number of tokens. `all-mpnet-base-v2` says that \"input text longer than 384 word pieces is truncated\", for example.\n\n## Development\n\nTo set up this plugin locally, first checkout the code. Then create a new virtual environment:\n```bash\ncd llm-sentence-transformers\npython3 -m venv venv\nsource venv/bin/activate\n```\nNow install the dependencies and test dependencies:\n```bash\npip install -e '.[test]'\n```\nTo run the tests:\n```bash\npytest\n```\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Use sentence-transformers for embeddings with LLM",
"version": "0.2",
"project_urls": {
"CI": "https://github.com/simonw/llm-sentence-transformers/actions",
"Changelog": "https://github.com/simonw/llm-sentence-transformers/releases",
"Homepage": "https://github.com/simonw/llm-sentence-transformers",
"Issues": "https://github.com/simonw/llm-sentence-transformers/issues"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bc4205fd68dc43031ad9c40f8bf12b54558b1d0cb990d2c1274570f63596a90f",
"md5": "edf92876a67342114e96ffb42d8d1d1f",
"sha256": "48e10fe9312454355aa8865b946b67b6854763da9b79d85a7d385561f1ff7c51"
},
"downloads": -1,
"filename": "llm_sentence_transformers-0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "edf92876a67342114e96ffb42d8d1d1f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 8427,
"upload_time": "2024-02-04T18:58:04",
"upload_time_iso_8601": "2024-02-04T18:58:04.817835Z",
"url": "https://files.pythonhosted.org/packages/bc/42/05fd68dc43031ad9c40f8bf12b54558b1d0cb990d2c1274570f63596a90f/llm_sentence_transformers-0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "99dac1d669d338b86ddfd0c18bc5f0bfd1aed24c8a59c4bbad21231e6ff09fd1",
"md5": "01d7b01a94f4f31d4719163123ba675a",
"sha256": "67f6b5cb8cd57276d90ed976d729fbe6357aae6c8daa7c98435c086687c3da3b"
},
"downloads": -1,
"filename": "llm-sentence-transformers-0.2.tar.gz",
"has_sig": false,
"md5_digest": "01d7b01a94f4f31d4719163123ba675a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 8296,
"upload_time": "2024-02-04T18:58:06",
"upload_time_iso_8601": "2024-02-04T18:58:06.237432Z",
"url": "https://files.pythonhosted.org/packages/99/da/c1d669d338b86ddfd0c18bc5f0bfd1aed24c8a59c4bbad21231e6ff09fd1/llm-sentence-transformers-0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-04 18:58:06",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "simonw",
"github_project": "llm-sentence-transformers",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "llm-sentence-transformers"
}