pinecone-cli

Name	pinecone-cli JSON
Version	1.1 JSON
	download
home_page	https://github.com/tullytim/pinecone-cli
Summary	pinecone-cli is a command-line client for interacting with the pinecone vector embedding database.
upload_time	2023-07-25 19:16:33
maintainer
docs_url	None
author	Tim Tully
requires_python	>=3
license	MIT
keywords	pinecone vector vectors embeddings database transformers models
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # pinecone-cli

Pinecode-cli is a command-line interface for control and data plane interfacing with [Pinecone](https://pinecone.io).

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![PyPI version](https://badge.fury.io/py/pinecone-cli.svg)](https://badge.fury.io/py/pinecone-cli)
[![codecov](https://codecov.io/gh/tullytim/pinecone-cli/branch/main/graph/badge.svg?token=NMZ3YGNYQE)](https://codecov.io/gh/tullytim/pinecone-cli)

In addition to ALL of the Pinecone "actions/verbs", Pinecone-cli has several additional features that make Pinecone even more powerful including:

* Upload vectors from CSV files
* Upload embeddings of text from a given website URL.  Embeddings generated by OpenAI embeddings API.
* New "head" command to peak into a given index, similar to "head" in linux/unix.

# Install

Feel free to use the tool directly from source here, or just

```console
pip install pinecone-cli
```
Pypi here: (<https://pypi.org/project/pinecone-cli/>)
# Usage

The CLI depends on a couple of simple environment variables:

* Your Pinecone API Key
* The region/environment of your Pinecone indexes

There is a simple order in which the CLI picks them up

   1. *.env* file in the current working dir.
   2. The environment variable your shell is in.
   3. Command line arguments overriding the above.

Let's look at a simple .env file:

```console
PINECONE_API_KEY=123456-9876-ABCDEF
PINECONE_ENVIRONMENT=us-west1-gcp
```

Of course setting in the shell is about the same:

```console
% export PINECONE_API_KEY=1234-4567-abc
```

Otherwise you wind up having to pass the key as so:

```console
% pinecli query --apikey=1234 ....
```

The pattern for using the tool is to invoke 'pinecli' and then use a command.  The list of commands appears with --help

```console

% pinecli.py --help
Usage: pinecli.py [OPTIONS] COMMAND [ARGS]...

  A command line interface for working with Pinecone.

Options:
  --help  Show this message and exit.

Commands:
  askquestion               Queries Pinecone with a given vector.
  configure-index-pod-type  Configures the given index to have a pod type.
  configure-index-replicas  Configures the number of replicas for a given
                            index.
  create-collection         Creates a Pinecone collection from the argument
                            'source_index'
  create-index              Creates a Pinecone Index.
  delete-all                Delete all vectors (note separate command [delete-
                            index] can completely delete an index)
  delete-collection         Deletes a collection.
  delete-index              Deletes an index.  You will be prompted to
                            confirm.
  describe-collection       Describes a collection.
  describe-index            Describes an index.
  describe-index-stats      Prints out index stats to stdout.
  fetch                     Fetches vectors from Pinecone specified by the
                            vectors' ids.
  head                      Shows a preview of vectors in the
                            <PINECONE_INDEX_NAME>
  list-collections          Lists collections for the given apikey.
  list-indexes              Lists the indexes for your api key.
  minimize-cluster          Minimizes everything for a cluster to lowest
                            settings.
  query                     Queries Pinecone with a given vector.
  update                    Updates the index based on the given id passed in.
  upsert                    Extracts text from url arg, vectorizes w/ openai
                            embedding api, and upserts to Pinecone.
  upsert-file               Upserts a file (csv) into the specified index.
  upsert-random             Upserts a vector(s) with random dimensions into
                            the specified vector.
  upsert-webpage            Extracts text from url arg, vectorizes w/ openai
                            embedding api, and upserts to Pinecone.
  version                   Prints version number.

```

# Commands With Examples

Before you can use Pinecone an index is required.  We can now do this on the commandline rather than in the UI: (Note not all of the cmdline options are required, they're shown here to demonstrate functionality and control)

```console
% pinecli create-index myindex --dims=1536 --metric=cosine --pods=2 --replicas=2 --shards=1 --pod-"type=p2.x1"
```

Note that for any command, if you want an exhasuive description of cmdline options, simply do something similar to the below, where "create-index" is replaced by one of the commands:

```console
% pinecli create-index --help
Usage: pinecli create-index [OPTIONS] PINECONE_INDEX_NAME

  Creates the Pinecone index named <PINECONE_INDEX_NAME>

Options:
  --apikey TEXT             Pinecone API Key
  --region TEXT             Pinecone Index Region
  --dims INTEGER            Number of dimensions for this index  [required]
  --metric TEXT             Distance metric to use.  [required]
  --pods INTEGER            Number of pods  [default: 1]
  --replicas INTEGER        Number of replicas  [default: 1]
  --shards INTEGER          Number of shards  [default: 1]
  --pod-type TEXT           Type of pods to create.  [required]
  --source_collection TEXT  Source collection to create index from
  --help                    Show this message and exit.
  ```

### Create Index From Collection

You can also create an index from a collection (effectively an index backup) as so:

```console
% pinecli.py create-collection --collection_name='testercollection'  --source_index='mysourcecollection'
```

Let's try some commands showing two missing features I'd love to have had over the last year: a "head" command and a quick "stats" command:

## Index Stats Including Number of Vectors

``` console
% pinecli describe-index-stats myindex
Dimensions: 1536
Vectors: 7745
Index_Fullness: 0.0
Namespace data:
        : 7745
```

## Head command to preview vectors

``` console
% pinecli head kids-facenet
{'matches': [{'id': 'bubba_50.jpg.vec',
              'metadata': {},
              'score': 12.182938,
              'values': [-0.016061664,
                         -0.4495437,
                         -0.034082577,
                         .....
```

Now, let's query some nonsensical data from the index named 'upsertfile'

## Inserting a vector directly

*Note the double quites around the vector*

```console
% pinecli query myindex  "[1.2, 1.0, 3.0]" --print-table  --include-meta=True
                      🌲 upsertfile ns=() Index Results                      
┏━━━━━━┳━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃   ID ┃ NS ┃ Values                   ┃                Meta ┃        Score ┃
┡━━━━━━╇━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ vec1 │    │ 0.1,0.2,0.3              │  {'genre': 'drama'} │    0.9640127 │
│ vec2 │    │ 0.2,0.3,0.4              │ {'genre': 'action'} │    0.9552943 │
│  abc │    │ 0.23223,-1.333,0.2222222 │      {'foo': 'bar'} │ -0.083585836 │
│  ghi │    │ 0.23223,-1.333,0.2222222 │      {'bar': 'baz'} │ -0.083585836 │
└──────┴────┴──────────────────────────┴─────────────────────┴──────────────┘
```

Markdown of course does a great job of mangling great terminal output so here's a screenshot from using ```---print-table```:
![alt](https://github.com/tullytim/pinecone-cli/blob/main/head-print.png?raw=true)

You can of course not output the pretty table by removing ```--print-table```:

```console
% pinecli query myindex "[1.2, 1.0, 3.0]" --include-meta=True
{'matches': [{'id': 'vec1',
              'metadata': {'genre': 'drama'},
              'score': 0.9640127,
              'values': [0.1, 0.2, 0.3]},
              ...
```

## Upsert Vectors in Command Line Manually

Following the Pinecone vector format of the tuple formatted as:

```python
('vectorid', [vecdim1, vecdim2, vecdim3], {'metakey':'metaval'})
```

You can pass this in as a comma separated list of vectors on the command line:

```console
pinecli upsert myindex "[('vec1', [0.1, 0.2, 0.3], {'genre': 'drama'}), ('vec2', [0.2, 0.3, 0.4], {'foo': 'bar'}),]"
```

## Upsert CSV file

Upserting a csv file is trivial.  Simply create your csv file with any headings you have, but there must be at least a labeled id column and a labeled vector column for the vectors.  Here's an example of a CSV file that is clearly a DataFrame dump due to the index column on the left which works great w/ pinecone-cli:

```console
index,my_id_column,my_vectors_column,Metadata
1,abc,"[0.23223, -1.333, 0.2222222]",{'foo':'bar'}
2,ghi,"[0.23223, -1.333, 0.2222222]",{'bar':'baz'}
```

The name of those columns in the header row can be arbitrary or you can name then "id", "vectors" and "metadata" which is our default assumption.  If you have custom column names and don't want to change them, just pass in the ```--colmap``` argument which takes in a python dictionary mapping "id" and "vectors" to the naming you have in your csv.  For example:
```"{"id":"my_id_column", "vectors":"my_vectors_column"}```

Note that as in other CSV file for Dataframes, we need an index column as in the example above.
Here's an example using the CSV headers and format above with the correct colmap argument:

```console
% pinecli upsert-file  embeddings.csv myindex "{'id':'my_id_column', 'vectors':'my_vectors_column'}"
```

### More on CSV Formatting

For now you will need to manually provide an index column (we are using dataframes under the hood.)

## Upserting Vector Embeddings of Webpage Text

pinecone-cli was built to make using Pinecone extremely easy and fast.  We have integrated [OpenAI](https://openai.com/) (others coming) - using its [embedding APIs](https://platform.openai.com/docs/guides/embeddings) to fetch embeddings.  We then upload them into your index for you, making uploading embeddings of an entire website's text - trivial.

```console
% pinecli upsert-webpage https://menlovc.com lpfactset  --openaiapikey=12345-9876-abcdef
[nltk_data] Downloading package punkt to /Users/tim/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
100%|████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 61680.94it/s]
['About Us  Our Promise  Focus Areas   Consumer  Cloud Infrastructure  Cybersecurity  Fintech  Healthcare  SaaS  Supply Chain and Automation    Team  Portfolio  Perspective            When we invest, we’re invested. Our promise to founders        Building a business is a team sport. As investors, we don’t just sit on the sidelines but do whatever it takes to help our teams win. About Us    The founders we back don’t limit themselves to what is, but relentlessly pursue what could be. We invest in transformative technology companies that are changing the way we live and work. Portfolio    Menlo Labs starts companies. We work shoulder-to-shoulder 
....
100%|█████████████████████████| 1/1 [00:00<00:00,  1.11it/s]
```

## Upsert Random Vectors

One of the more useful things we can do with pinecone-cli is insert random vectors, primarily for testing.  Often we will create our index and the length of the vector will be 1,536 dimensions, for example.  Instead of writing a bunch of code to go suddenly create those vectors somehow, we can use pinecone-cli to start generating vectors and upserting them:

```console
% pinecli upsert-random  upsertfile  --num_vector_dims=1536 --num_vectors=10 --debug                                                                                                                                      
upserted_count: 10
1it [00:00,  4.36it/s]  
```

The example above inserts 10 vectors that each have 1,536 random vectors in them. Note that the ```id``` for each vector is simply ```f'id-{i}'``` where is is the ith row (vector) inserted.

## Query Vectors

Querying can be done in two ways on the cmdline - pass in an actual vector string literal, or ask Pinecone to query randomly (maybe you want to just look at them or look at a TSNE).  In the example below, the last argument (required) is either the string 'random' or an actual vector such as '[0.0, 1.0, 3.14569]'.  Let's try random:

```console
% pinecli query myindex random
```

You can also plot a TSNE plot to view clustering of your vectors by using the ```-show-tsne=True``` flag.  Note that this will pop up the plt plot by default.

```console
% pinecli query lpfactset random --show-tsne=true --topk=2500 --num-clusters=4
```

![alt](https://github.com/tullytim/pinecone-cli/blob/main/tsne.png?raw=true)

## Fetching Vectors

Fetching is simple - just pass in the vector id(s) of the vectors you're looking for as a comma separated list:

```console
% pinecli fetch myindex --vector_ids="05b4509ee655aacb10bfbb6ba212c65c,c626975ec096b9108f158a56a59b2fd6"

{'namespace': '',
 'vectors': {'05b4509ee655aacb10bfbb6ba212c65c': {'id': '05b4509ee655aacb10bfbb6ba212c65c',
                                                  'metadata': {'content': 'Chime '
                                                                          'Scholar '
                                                                          'spotlight: '
```

## Updating Vectors

Updating vectors is simple - pass the id of the vector and the updating vector as below:

```console
% pinecli update "id-9" myindex  "[0.0, 1.0, 3.0]"
```

## List Operations

pinecone-cli has all of the necessary 'list' operations as shown below:

### List Indexes

This gives you a list of all indexes under your api key:

```console
% pinecli list-indexes                                    
cli
cli2
cli3
drivertest
```

### List Indexes Fully

You can also view the index data similarly to what you'd get on the Pinecone page using the (--print-table) flag with all data such as pods, metric type, shards, etc:

```console
% pinecli.py list-indexes  --print-table      
````

![alt](https://github.com/tullytim/pinecone-cli/blob/main/list-indexes-table.png?raw=true)

### List Collections

This obviously lists the collections you've created:

```console
% pinecli list-collections                                    
cli
cli2
cli3
drivertest
```

### Other Meta Operations

We showed the "describe-index-stats" command at the top of this page.  There is also "describe-index" which provides the following:

```console
% pinecli describe-index lpfactset
Name: lpfactset
Dimensions: 1536
Metric: cosine
Pods: 1
PodType: p2.x1
Shards: 1
Replicas: 1
Ready: True
State: Ready
Metaconfig: None
Sourcecollection: 
```

### Describe Collections

```console
% pinecli describe-collection testcoll
Name: testcoll
Dimensions: 1536
Vectors: 124
Status: Ready
Size: 3917544
```

## Deleting Vectors From Index

This will basically do a rm *from the index and clear it out, but will not*DELETE* the index. In other words, vector count will be 0.

```console
% pinecli delete-all myindexname
```

## Deleting Indexes

Deleting an index is straightforward.  To prevent catastrophic accidents, you'll be prompted to type in the name of the index backwards:

```console
% pinecli delete-index myindex2
Type name of index backwards to confirm: : 2xedniym
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/tullytim/pinecone-cli",
    "name": "pinecone-cli",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3",
    "maintainer_email": "",
    "keywords": "pinecone vector vectors embeddings database transformers models",
    "author": "Tim Tully",
    "author_email": "tim@menlovc.com",
    "download_url": "",
    "platform": "any",
    "description": "# pinecone-cli\n\nPinecode-cli is a command-line interface for control and data plane interfacing with [Pinecone](https://pinecone.io).\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![PyPI version](https://badge.fury.io/py/pinecone-cli.svg)](https://badge.fury.io/py/pinecone-cli)\n[![codecov](https://codecov.io/gh/tullytim/pinecone-cli/branch/main/graph/badge.svg?token=NMZ3YGNYQE)](https://codecov.io/gh/tullytim/pinecone-cli)\n\nIn addition to ALL of the Pinecone \"actions/verbs\", Pinecone-cli has several additional features that make Pinecone even more powerful including:\n\n* Upload vectors from CSV files\n* Upload embeddings of text from a given website URL.  Embeddings generated by OpenAI embeddings API.\n* New \"head\" command to peak into a given index, similar to \"head\" in linux/unix.\n\n# Install\n\nFeel free to use the tool directly from source here, or just\n\n```console\npip install pinecone-cli\n```\nPypi here: (<https://pypi.org/project/pinecone-cli/>)\n# Usage\n\nThe CLI depends on a couple of simple environment variables:\n\n* Your Pinecone API Key\n* The region/environment of your Pinecone indexes\n\nThere is a simple order in which the CLI picks them up\n\n   1. *.env* file in the current working dir.\n   2. The environment variable your shell is in.\n   3. Command line arguments overriding the above.\n\nLet's look at a simple .env file:\n\n```console\nPINECONE_API_KEY=123456-9876-ABCDEF\nPINECONE_ENVIRONMENT=us-west1-gcp\n```\n\nOf course setting in the shell is about the same:\n\n```console\n% export PINECONE_API_KEY=1234-4567-abc\n```\n\nOtherwise you wind up having to pass the key as so:\n\n```console\n% pinecli query --apikey=1234 ....\n```\n\nThe pattern for using the tool is to invoke 'pinecli' and then use a command.  The list of commands appears with --help\n\n```console\n\n% pinecli.py --help\nUsage: pinecli.py [OPTIONS] COMMAND [ARGS]...\n\n  A command line interface for working with Pinecone.\n\nOptions:\n  --help  Show this message and exit.\n\nCommands:\n  askquestion               Queries Pinecone with a given vector.\n  configure-index-pod-type  Configures the given index to have a pod type.\n  configure-index-replicas  Configures the number of replicas for a given\n                            index.\n  create-collection         Creates a Pinecone collection from the argument\n                            'source_index'\n  create-index              Creates a Pinecone Index.\n  delete-all                Delete all vectors (note separate command [delete-\n                            index] can completely delete an index)\n  delete-collection         Deletes a collection.\n  delete-index              Deletes an index.  You will be prompted to\n                            confirm.\n  describe-collection       Describes a collection.\n  describe-index            Describes an index.\n  describe-index-stats      Prints out index stats to stdout.\n  fetch                     Fetches vectors from Pinecone specified by the\n                            vectors' ids.\n  head                      Shows a preview of vectors in the\n                            <PINECONE_INDEX_NAME>\n  list-collections          Lists collections for the given apikey.\n  list-indexes              Lists the indexes for your api key.\n  minimize-cluster          Minimizes everything for a cluster to lowest\n                            settings.\n  query                     Queries Pinecone with a given vector.\n  update                    Updates the index based on the given id passed in.\n  upsert                    Extracts text from url arg, vectorizes w/ openai\n                            embedding api, and upserts to Pinecone.\n  upsert-file               Upserts a file (csv) into the specified index.\n  upsert-random             Upserts a vector(s) with random dimensions into\n                            the specified vector.\n  upsert-webpage            Extracts text from url arg, vectorizes w/ openai\n                            embedding api, and upserts to Pinecone.\n  version                   Prints version number.\n\n```\n\n# Commands With Examples\n\nBefore you can use Pinecone an index is required.  We can now do this on the commandline rather than in the UI: (Note not all of the cmdline options are required, they're shown here to demonstrate functionality and control)\n\n```console\n% pinecli create-index myindex --dims=1536 --metric=cosine --pods=2 --replicas=2 --shards=1 --pod-\"type=p2.x1\"\n```\n\nNote that for any command, if you want an exhasuive description of cmdline options, simply do something similar to the below, where \"create-index\" is replaced by one of the commands:\n\n```console\n% pinecli create-index --help\nUsage: pinecli create-index [OPTIONS] PINECONE_INDEX_NAME\n\n  Creates the Pinecone index named <PINECONE_INDEX_NAME>\n\nOptions:\n  --apikey TEXT             Pinecone API Key\n  --region TEXT             Pinecone Index Region\n  --dims INTEGER            Number of dimensions for this index  [required]\n  --metric TEXT             Distance metric to use.  [required]\n  --pods INTEGER            Number of pods  [default: 1]\n  --replicas INTEGER        Number of replicas  [default: 1]\n  --shards INTEGER          Number of shards  [default: 1]\n  --pod-type TEXT           Type of pods to create.  [required]\n  --source_collection TEXT  Source collection to create index from\n  --help                    Show this message and exit.\n  ```\n\n### Create Index From Collection\n\nYou can also create an index from a collection (effectively an index backup) as so:\n\n```console\n% pinecli.py create-collection --collection_name='testercollection'  --source_index='mysourcecollection'\n```\n\nLet's try some commands showing two missing features I'd love to have had over the last year: a \"head\" command and a quick \"stats\" command:\n\n## Index Stats Including Number of Vectors\n\n``` console\n% pinecli describe-index-stats myindex\nDimensions: 1536\nVectors: 7745\nIndex_Fullness: 0.0\nNamespace data:\n        : 7745\n```\n\n## Head command to preview vectors\n\n``` console\n% pinecli head kids-facenet\n{'matches': [{'id': 'bubba_50.jpg.vec',\n              'metadata': {},\n              'score': 12.182938,\n              'values': [-0.016061664,\n                         -0.4495437,\n                         -0.034082577,\n                         .....\n```\n\nNow, let's query some nonsensical data from the index named 'upsertfile'\n\n## Inserting a vector directly\n\n*Note the double quites around the vector*\n\n```console\n% pinecli query myindex  \"[1.2, 1.0, 3.0]\" --print-table  --include-meta=True\n                      \ud83c\udf32 upsertfile ns=() Index Results                      \n\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n\u2503   ID \u2503 NS \u2503 Values                   \u2503                Meta \u2503        Score \u2503\n\u2521\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2529\n\u2502 vec1 \u2502    \u2502 0.1,0.2,0.3              \u2502  {'genre': 'drama'} \u2502    0.9640127 \u2502\n\u2502 vec2 \u2502    \u2502 0.2,0.3,0.4              \u2502 {'genre': 'action'} \u2502    0.9552943 \u2502\n\u2502  abc \u2502    \u2502 0.23223,-1.333,0.2222222 \u2502      {'foo': 'bar'} \u2502 -0.083585836 \u2502\n\u2502  ghi \u2502    \u2502 0.23223,-1.333,0.2222222 \u2502      {'bar': 'baz'} \u2502 -0.083585836 \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\nMarkdown of course does a great job of mangling great terminal output so here's a screenshot from using ```---print-table```:\n![alt](https://github.com/tullytim/pinecone-cli/blob/main/head-print.png?raw=true)\n\nYou can of course not output the pretty table by removing ```--print-table```:\n\n```console\n% pinecli query myindex \"[1.2, 1.0, 3.0]\" --include-meta=True\n{'matches': [{'id': 'vec1',\n              'metadata': {'genre': 'drama'},\n              'score': 0.9640127,\n              'values': [0.1, 0.2, 0.3]},\n              ...\n```\n\n## Upsert Vectors in Command Line Manually\n\nFollowing the Pinecone vector format of the tuple formatted as:\n\n```python\n('vectorid', [vecdim1, vecdim2, vecdim3], {'metakey':'metaval'})\n```\n\nYou can pass this in as a comma separated list of vectors on the command line:\n\n```console\npinecli upsert myindex \"[('vec1', [0.1, 0.2, 0.3], {'genre': 'drama'}), ('vec2', [0.2, 0.3, 0.4], {'foo': 'bar'}),]\"\n```\n\n## Upsert CSV file\n\nUpserting a csv file is trivial.  Simply create your csv file with any headings you have, but there must be at least a labeled id column and a labeled vector column for the vectors.  Here's an example of a CSV file that is clearly a DataFrame dump due to the index column on the left which works great w/ pinecone-cli:\n\n```console\nindex,my_id_column,my_vectors_column,Metadata\n1,abc,\"[0.23223, -1.333, 0.2222222]\",{'foo':'bar'}\n2,ghi,\"[0.23223, -1.333, 0.2222222]\",{'bar':'baz'}\n```\n\nThe name of those columns in the header row can be arbitrary or you can name then \"id\", \"vectors\" and \"metadata\" which is our default assumption.  If you have custom column names and don't want to change them, just pass in the ```--colmap``` argument which takes in a python dictionary mapping \"id\" and \"vectors\" to the naming you have in your csv.  For example:\n```\"{\"id\":\"my_id_column\", \"vectors\":\"my_vectors_column\"}```\n\nNote that as in other CSV file for Dataframes, we need an index column as in the example above.\nHere's an example using the CSV headers and format above with the correct colmap argument:\n\n```console\n% pinecli upsert-file  embeddings.csv myindex \"{'id':'my_id_column', 'vectors':'my_vectors_column'}\"\n```\n\n### More on CSV Formatting\n\nFor now you will need to manually provide an index column (we are using dataframes under the hood.)\n\n## Upserting Vector Embeddings of Webpage Text\n\npinecone-cli was built to make using Pinecone extremely easy and fast.  We have integrated [OpenAI](https://openai.com/) (others coming) - using its [embedding APIs](https://platform.openai.com/docs/guides/embeddings) to fetch embeddings.  We then upload them into your index for you, making uploading embeddings of an entire website's text - trivial.\n\n```console\n% pinecli upsert-webpage https://menlovc.com lpfactset  --openaiapikey=12345-9876-abcdef\n[nltk_data] Downloading package punkt to /Users/tim/nltk_data...\n[nltk_data]   Package punkt is already up-to-date!\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 5/5 [00:00<00:00, 61680.94it/s]\n['About Us  Our Promise  Focus Areas   Consumer  Cloud Infrastructure  Cybersecurity  Fintech  Healthcare  SaaS  Supply Chain and Automation    Team  Portfolio  Perspective            When we invest, we\u2019re invested. Our promise to founders        Building a business is a team sport. As investors, we don\u2019t just sit on the sidelines but do whatever it takes to help our teams win. About Us    The founders we back don\u2019t limit themselves to what is, but relentlessly pursue what could be. We invest in transformative technology companies that are changing the way we live and work. Portfolio    Menlo Labs starts companies. We work shoulder-to-shoulder \n....\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1/1 [00:00<00:00,  1.11it/s]\n```\n\n## Upsert Random Vectors\n\nOne of the more useful things we can do with pinecone-cli is insert random vectors, primarily for testing.  Often we will create our index and the length of the vector will be 1,536 dimensions, for example.  Instead of writing a bunch of code to go suddenly create those vectors somehow, we can use pinecone-cli to start generating vectors and upserting them:\n\n```console\n% pinecli upsert-random  upsertfile  --num_vector_dims=1536 --num_vectors=10 --debug                                                                                                                                      \nupserted_count: 10\n1it [00:00,  4.36it/s]  \n```\n\nThe example above inserts 10 vectors that each have 1,536 random vectors in them. Note that the ```id``` for each vector is simply ```f'id-{i}'``` where is is the ith row (vector) inserted.\n\n## Query Vectors\n\nQuerying can be done in two ways on the cmdline - pass in an actual vector string literal, or ask Pinecone to query randomly (maybe you want to just look at them or look at a TSNE).  In the example below, the last argument (required) is either the string 'random' or an actual vector such as '[0.0, 1.0, 3.14569]'.  Let's try random:\n\n```console\n% pinecli query myindex random\n```\n\nYou can also plot a TSNE plot to view clustering of your vectors by using the ```-show-tsne=True``` flag.  Note that this will pop up the plt plot by default.\n\n```console\n% pinecli query lpfactset random --show-tsne=true --topk=2500 --num-clusters=4\n```\n\n![alt](https://github.com/tullytim/pinecone-cli/blob/main/tsne.png?raw=true)\n\n## Fetching Vectors\n\nFetching is simple - just pass in the vector id(s) of the vectors you're looking for as a comma separated list:\n\n```console\n% pinecli fetch myindex --vector_ids=\"05b4509ee655aacb10bfbb6ba212c65c,c626975ec096b9108f158a56a59b2fd6\"\n\n{'namespace': '',\n 'vectors': {'05b4509ee655aacb10bfbb6ba212c65c': {'id': '05b4509ee655aacb10bfbb6ba212c65c',\n                                                  'metadata': {'content': 'Chime '\n                                                                          'Scholar '\n                                                                          'spotlight: '\n```\n\n## Updating Vectors\n\nUpdating vectors is simple - pass the id of the vector and the updating vector as below:\n\n```console\n% pinecli update \"id-9\" myindex  \"[0.0, 1.0, 3.0]\"\n```\n\n## List Operations\n\npinecone-cli has all of the necessary 'list' operations as shown below:\n\n### List Indexes\n\nThis gives you a list of all indexes under your api key:\n\n```console\n% pinecli list-indexes                                    \ncli\ncli2\ncli3\ndrivertest\n```\n\n### List Indexes Fully\n\nYou can also view the index data similarly to what you'd get on the Pinecone page using the (--print-table) flag with all data such as pods, metric type, shards, etc:\n\n```console\n% pinecli.py list-indexes  --print-table      \n````\n\n![alt](https://github.com/tullytim/pinecone-cli/blob/main/list-indexes-table.png?raw=true)\n\n### List Collections\n\nThis obviously lists the collections you've created:\n\n```console\n% pinecli list-collections                                    \ncli\ncli2\ncli3\ndrivertest\n```\n\n### Other Meta Operations\n\nWe showed the \"describe-index-stats\" command at the top of this page.  There is also \"describe-index\" which provides the following:\n\n```console\n% pinecli describe-index lpfactset\nName: lpfactset\nDimensions: 1536\nMetric: cosine\nPods: 1\nPodType: p2.x1\nShards: 1\nReplicas: 1\nReady: True\nState: Ready\nMetaconfig: None\nSourcecollection: \n```\n\n### Describe Collections\n\n```console\n% pinecli describe-collection testcoll\nName: testcoll\nDimensions: 1536\nVectors: 124\nStatus: Ready\nSize: 3917544\n```\n\n## Deleting Vectors From Index\n\nThis will basically do a rm *from the index and clear it out, but will not*DELETE* the index. In other words, vector count will be 0.\n\n```console\n% pinecli delete-all myindexname\n```\n\n## Deleting Indexes\n\nDeleting an index is straightforward.  To prevent catastrophic accidents, you'll be prompted to type in the name of the index backwards:\n\n```console\n% pinecli delete-index myindex2\nType name of index backwards to confirm: : 2xedniym\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "pinecone-cli is a command-line client for interacting with the pinecone vector embedding database.",
    "version": "1.1",
    "project_urls": {
        "Homepage": "https://github.com/tullytim/pinecone-cli"
    },
    "split_keywords": [
        "pinecone",
        "vector",
        "vectors",
        "embeddings",
        "database",
        "transformers",
        "models"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c153e09654cacda0d22af73a3d82cb26e41917195bbd9ccd147b704405edd1b9",
                "md5": "1493935f422cbc4bc5d19b224fc03218",
                "sha256": "67729a45181dd853782f09975f0316c8f565fa9c3dca0289d3384a4735643c9a"
            },
            "downloads": -1,
            "filename": "pinecone_cli-1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1493935f422cbc4bc5d19b224fc03218",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3",
            "size": 17236,
            "upload_time": "2023-07-25T19:16:33",
            "upload_time_iso_8601": "2023-07-25T19:16:33.545019Z",
            "url": "https://files.pythonhosted.org/packages/c1/53/e09654cacda0d22af73a3d82cb26e41917195bbd9ccd147b704405edd1b9/pinecone_cli-1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-25 19:16:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tullytim",
    "github_project": "pinecone-cli",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "pinecone-cli"
}

Tim Tully