embedme

Name	embedme JSON
Version	0.1.6 JSON
	download
home_page	https://github.com/morganpartee/embedme
Summary	Easily create and search text embeddings using OpenAI's API using json for local storage. Just add dicts of info and search! Built for rapid prototyping.
upload_time	2024-05-23 22:27:01
maintainer	None
docs_url	None
author	John Partee
requires_python	<3.13,>=3.8
license	GPL-3.0-or-later
keywords	nlp embeddings search openai gpt3 ai machine learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Embedme

Embedme is a python module that allows you to easily use embeddings from text fields with OpenAI's Embedding API and store them in a local folder.

It's like a lazy version of pinecone - Numpy is actually pretty fast for embeddings stuff at smaller scale, why overthink stuff? We store the data and vectors as json and build the numpy array before you search (and store it until you add more)

## Installation

To install Embedme, you can use pip:

```sh
pip install embedme
```

## Setup

The only thing you _must_ do before you use `embedme` is setup auth with OpenAI. We use it to embed your items and search queries, so it is required. I don't want to touch **any** of that code - just sign in how they tell you to, either in the script via a file for the key, or an environment variable for your key.

[OpenAI Python Module (With Auth Instructions)](https://github.com/openai/openai-python)

## Usage

Embedme provides a simple interface to use embeddings from text fields with OpenAI's Embedding API and store them in a local folder.

Check out the example notebook for a better example, but useage is something like:

```py
import openai
import nltk
from more_itertools import chunked
from embedme import Embedme
from tqdm import tqdm

# Downloading the NLTK corpus
nltk.download('gutenberg')

# Creating an instance of the Embedme class
embedme = Embedme(data_folder='.embedme', model="text-embedding-ada-002")

# Getting the text
text = nltk.corpus.gutenberg.raw('melville-moby_dick.txt')

# Splitting the text into sentences
sentences = nltk.sent_tokenize(text)

input("Hey this call will cost you money and take a minute. Like, a few cents probably, but wanted to warn you.")

for i, chunk in enumerate(tqdm(chunked(sentences, 20))):
    data = {'name': f'moby_dick_chunk_{i}', 'text': ' '.join(chunk)}
    embedme.add(data, save=False)

embedme.save()
```

And to search:

```py
embedme.search("lessons")
```

You can do anything you would want to with `.vectors` after you call `.prepare_search()` (or... search something, it's automatic mostly), like plot clusters, etc.

## Follow Us

Some friends and I are writing about large language model stuff at [SensibleDefaults.io](https://sensibledefaults.io), honest to god free. Follow us (or star this repo!) if this helps you!

## Note

Embedme uses OpenAI's Embedding API to get embeddings for text fields, so an API key is required to use it. You can get one from https://beta.openai.com/signup/

The token limit today is about 8k, so... you're probably fine

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/morganpartee/embedme",
    "name": "embedme",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.8",
    "maintainer_email": null,
    "keywords": "nlp, embeddings, search, openai, gpt3, AI, machine learning",
    "author": "John Partee",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/fe/5e/c70c714c57952733b2fee9ce01ef7e53685e10a42bba51244788362c1a99/embedme-0.1.6.tar.gz",
    "platform": null,
    "description": "# Embedme\n\nEmbedme is a python module that allows you to easily use embeddings from text fields with OpenAI's Embedding API and store them in a local folder.\n\nIt's like a lazy version of pinecone - Numpy is actually pretty fast for embeddings stuff at smaller scale, why overthink stuff? We store the data and vectors as json and build the numpy array before you search (and store it until you add more)\n\n## Installation\n\nTo install Embedme, you can use pip:\n\n```sh\npip install embedme\n```\n\n## Setup\n\nThe only thing you _must_ do before you use `embedme` is setup auth with OpenAI. We use it to embed your items and search queries, so it is required. I don't want to touch **any** of that code - just sign in how they tell you to, either in the script via a file for the key, or an environment variable for your key.\n\n[OpenAI Python Module (With Auth Instructions)](https://github.com/openai/openai-python)\n\n## Usage\n\nEmbedme provides a simple interface to use embeddings from text fields with OpenAI's Embedding API and store them in a local folder.\n\nCheck out the example notebook for a better example, but useage is something like:\n\n```py\nimport openai\nimport nltk\nfrom more_itertools import chunked\nfrom embedme import Embedme\nfrom tqdm import tqdm\n\n# Downloading the NLTK corpus\nnltk.download('gutenberg')\n\n# Creating an instance of the Embedme class\nembedme = Embedme(data_folder='.embedme', model=\"text-embedding-ada-002\")\n\n# Getting the text\ntext = nltk.corpus.gutenberg.raw('melville-moby_dick.txt')\n\n# Splitting the text into sentences\nsentences = nltk.sent_tokenize(text)\n\ninput(\"Hey this call will cost you money and take a minute. Like, a few cents probably, but wanted to warn you.\")\n\nfor i, chunk in enumerate(tqdm(chunked(sentences, 20))):\n    data = {'name': f'moby_dick_chunk_{i}', 'text': ' '.join(chunk)}\n    embedme.add(data, save=False)\n\nembedme.save()\n```\n\nAnd to search:\n\n```py\nembedme.search(\"lessons\")\n```\n\nYou can do anything you would want to with `.vectors` after you call `.prepare_search()` (or... search something, it's automatic mostly), like plot clusters, etc.\n\n## Follow Us\n\nSome friends and I are writing about large language model stuff at [SensibleDefaults.io](https://sensibledefaults.io), honest to god free. Follow us (or star this repo!) if this helps you!\n\n## Note\n\nEmbedme uses OpenAI's Embedding API to get embeddings for text fields, so an API key is required to use it. You can get one from https://beta.openai.com/signup/\n\nThe token limit today is about 8k, so... you're probably fine\n",
    "bugtrack_url": null,
    "license": "GPL-3.0-or-later",
    "summary": "Easily create and search text embeddings using OpenAI's API using json for local storage. Just add dicts of info and search! Built for rapid prototyping.",
    "version": "0.1.6",
    "project_urls": {
        "Homepage": "https://github.com/morganpartee/embedme",
        "Repository": "https://github.com/morganpartee/embedme"
    },
    "split_keywords": [
        "nlp",
        " embeddings",
        " search",
        " openai",
        " gpt3",
        " ai",
        " machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "de3851a03f1aeb92a7c8f3070615f479e349bf1955f0b80a5c0fbe5f89beae4a",
                "md5": "f239264250f9c04803ef78383fbf8248",
                "sha256": "a80991f6814eafd609149ab481285e56d153ed70810f2dd8ecea216a9f0b61da"
            },
            "downloads": -1,
            "filename": "embedme-0.1.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f239264250f9c04803ef78383fbf8248",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.8",
            "size": 4544,
            "upload_time": "2024-05-23T22:26:59",
            "upload_time_iso_8601": "2024-05-23T22:26:59.460518Z",
            "url": "https://files.pythonhosted.org/packages/de/38/51a03f1aeb92a7c8f3070615f479e349bf1955f0b80a5c0fbe5f89beae4a/embedme-0.1.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fe5ec70c714c57952733b2fee9ce01ef7e53685e10a42bba51244788362c1a99",
                "md5": "eb98f0ef2b8f81fdc5d44f48e265caf3",
                "sha256": "d4bcb7d4487c2e0f1ff629191574a595e1a328e40911734056dc645dde174b99"
            },
            "downloads": -1,
            "filename": "embedme-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "eb98f0ef2b8f81fdc5d44f48e265caf3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.8",
            "size": 3962,
            "upload_time": "2024-05-23T22:27:01",
            "upload_time_iso_8601": "2024-05-23T22:27:01.055408Z",
            "url": "https://files.pythonhosted.org/packages/fe/5e/c70c714c57952733b2fee9ce01ef7e53685e10a42bba51244788362c1a99/embedme-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-23 22:27:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "morganpartee",
    "github_project": "embedme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "embedme"
}

John Partee