# microvector
Lightweight local vector database with persistence to disk, supporting multiple similarity metrics and an easy-to-use API.
> A refactor and repackaging of [HyperDB](https://github.com/jdagdelen/hyperDB/tree/main) optimized for CPU-only environments with improved type safety and developer experience.
## Features
- 🚀 **Simple API**: Clean, intuitive interface with `PartitionStore` pattern
- 💾 **Persistent Storage**: Automatically caches vector stores to `.pickle.gz` files
- 🔍 **Multiple Similarity Metrics**: Choose from cosine, dot product, Euclidean, or Derrida distance
- 🎯 **Type Safe**: Full type annotations with strict pyright compliance
- ⚡ **CPU Optimized**: Designed for CPU-only environments (no CUDA required)
- 🔄 **Flexible Caching**: Use persistent stores or create temporary in-memory collections
- 📦 **Easy Installation**: One-command setup with automatic PyTorch CPU configuration
- ✨ **Partition-level Operations**: Add, remove, and search documents through dedicated store objects
## Installation
```bash
pip install microvector
```
Or for development:
```bash
git clone https://github.com/loganpowell/microvector.git
cd microvector
uv sync
```
## Quick Start
```python
from microvector import Client
# Initialize the client
client = Client()
# Save a collection (by default, in-memory only)
store = client.save(
partition="my_documents",
collection=[
{"text": "Python is a popular programming language", "category": "tech"},
{"text": "Machine learning models learn from data", "category": "ai"},
{"text": "The quick brown fox jumps over the lazy dog", "category": "example"},
],
key="text"
)
# Search using the store
results = store.search(
term="artificial intelligence and ML",
)
for result in results:
print(f"Score: {result['similarity_score']:.4f} - {result['text']}")
# Add more documents (also in-memory by default)
store.add([
{"text": "Deep learning uses neural networks", "category": "ai"}
])
# Search again with the updated store
results = store.search("neural networks", top_k=3)
# Persist to disk when ready
client.save(
partition="my_documents",
collection=store.to_dict(),
cache=True # Now save to disk
)
```
## API Reference
### Client
The main interface for creating and managing vector stores.
```python
Client(
model_cache: str = "./.cached_models",
vector_cache: str = "./.vector_cache",
embedding_model: str = "avsolatorio/GIST-small-Embedding-v0",
search_algo: str = "cosine"
)
```
**Parameters:**
| Parameter | Description | Default |
| ----------------- | -------------------------------------------------- | --------------------------------------- |
| `model_cache` | Directory for caching downloaded embedding models | `"./.cached_models"` |
| `vector_cache` | Directory for persisting vector stores | `"./.vector_cache"` |
| `embedding_model` | HuggingFace model name for generating embeddings | `"avsolatorio/GIST-small-Embedding-v0"` |
| `search_algo` | `"cosine"`, `"dot"`, `"euclidean"`, or `"derrida"` | `"cosine"` |
**Note:** The `search_algo` is set at the client level and applies to all partitions created by that client. This ensures consistency and prevents issues with switching algorithms on already-normalized vectors.
### save()
Create or update a vector store and return a `PartitionStore` for operations.
```python
store = client.save(
partition: str,
collection: list[dict[str, Any]],
key: str = "text",
cache: bool = False,
append: bool = False
) -> PartitionStore
```
**Parameters:**
| Parameter | Description | Default |
| ------------ | ------------------------------------------------------- | -------- |
| `partition` | Unique identifier for this vector store | - |
| `collection` | List of documents (dictionaries) to vectorize | - |
| `key` | Field name to use for embedding | `"text"` |
| `cache` | If True, persist to disk; if False, keep in-memory only | `False` |
| `append` | If True, add to existing store; if False, replace | `False` |
**Returns:** `PartitionStore` object with methods for searching and managing documents
**Example:**
```python
store = client.save(
partition="products",
collection=[
{"description": "Wireless headphones", "price": 99.99},
{"description": "Smart watch", "price": 299.99},
],
key="description",
cache=True # Persist to disk
)
print(f"Partition: {store.partition}")
print(f"Size: {store.size}")
print(f"Algorithm: {store.algo}")
```
### PartitionStore
A partition-specific interface for vector operations returned by `client.save()`.
**Attributes:**
| Attribute | Type | Description |
| ----------- | ---- | -------------------------------------------------------------------------------- |
| `partition` | str | Name of the partition |
| `key` | str | Field being vectorized |
| `algo` | str | Similarity algorithm in use (`"cosine"`, `"dot"`, `"euclidean"`, or `"derrida"`) |
| `size` | int | Number of documents in the store (read-only property) |
**Methods:**
#### search()
Search this partition for similar documents.
```python
results = store.search(
term: str,
top_k: int = 5
) -> list[dict[str, Any]]
```
**Parameters:**
| Parameter | Description | Default |
| --------- | ----------------------------------- | ------- |
| `term` | Search query string | - |
| `top_k` | Maximum number of results to return | `5` |
**Returns:** List of documents with similarity scores
```python
[
{
"text": "Machine learning is awesome",
"category": "ai",
"similarity_score": 0.923
},
...
]
```
**Example:**
```python
results = store.search("laptop computers", top_k=3)
for result in results:
print(f"{result['similarity_score']:.3f} - {result['description']}")
```
#### add()
Add new documents to this partition.
```python
success = store.add(
collection: list[dict[str, Any]] | dict[str, Any],
cache: bool = False
) -> bool
```
**Parameters:**
| Parameter | Description | Default |
| ------------ | -------------------------------------------------- | ------- |
| `collection` | Single document or list of documents to add | - |
| `cache` | If True, persist changes; if False, keep in-memory | `False` |
**Returns:** `True` if successful, `False` otherwise
**Example:**
```python
# Add a single document
store.add({"description": "USB-C cable", "price": 12.99})
# Add multiple documents
store.add([
{"description": "Keyboard", "price": 79.99},
{"description": "Mouse pad", "price": 19.99}
], cache=True)
print(f"Updated size: {store.size}")
```
#### remove()
Remove a document from this partition by index or content.
```python
success = store.remove(
item: int | dict[str, Any],
cache: bool = False
) -> bool
```
**Parameters:**
| Parameter | Description | Default |
| --------- | --------------------------------------------------------- | ------- |
| `item` | Document index (int) or document content (dict) to remove | - |
| `cache` | If True, persist changes; if False, keep in-memory | `False` |
**Returns:** `True` if successful, `False` otherwise
**Example:**
```python
# Remove by index
store.remove(0)
# Remove by document content
store.remove({"description": "Wireless headphones", "price": 99.99})
```
#### delete()
Delete this partition's cache file from disk.
```python
success = store.delete() -> bool
```
**Returns:** `True` if successful, `False` otherwise
**Example:**
```python
if store.delete():
print(f"Partition '{store.partition}' deleted from disk")
```
## Similarity Algorithms
| Algorithm | Best For | Range |
| ----------- | --------------------------------- | ---------------------------- |
| `cosine` | General text similarity (default) | 0-1 (higher is more similar) |
| `dot` | When magnitude matters | Unbounded |
| `euclidean` | Spatial distance | 0-∞ (lower is more similar) |
| `derrida` | Experimental alternative distance | 0-∞ (lower is more similar) |
## Advanced Usage
### Using Different Algorithms
The similarity algorithm is set at the client level and applies to all partitions:
```python
# Create clients with different algorithms
cosine_client = Client(search_algo="cosine")
dot_client = Client(search_algo="dot")
euclidean_client = Client(search_algo="euclidean")
# Each client's partitions use its algorithm
cosine_store = cosine_client.save("docs_cosine", documents)
dot_store = dot_client.save("docs_dot", documents)
# Different algorithms, different results
cosine_results = cosine_store.search("query")
dot_results = dot_store.search("query")
```
**Why client-level?** Vectors are normalized based on the algorithm. Switching algorithms on existing vectors would produce incorrect results, so we lock the algorithm at creation time for consistency.
### Custom Embedding Models
Use any HuggingFace sentence-transformer model:
```python
client = Client(
embedding_model="intfloat/e5-small-v2",
search_algo="cosine"
)
```
### Nested Key Paths
Access nested fields using dot notation:
```python
collection = [
{
"product": {
"name": "Laptop",
"specs": {"cpu": "Intel i7"}
}
},
{
"product": {
"name": "Mouse",
"specs": {"cpu": None}
}
}
]
store = client.save(
partition="products",
collection=collection,
key="product.name" # Extract "Laptop", "Mouse" from nested structure
)
# Search works on the nested field
results = store.search("computer", top_k=1)
print(results[0]["product"]["name"]) # "Laptop"
```
### Working with Multiple Partitions
Organize different datasets in separate partitions:
```python
# Create stores for different content types
news_store = client.save("news_articles", news_data, key="content")
review_store = client.save("product_reviews", review_data, key="review_text")
ticket_store = client.save("support_tickets", tickets, key="description")
# Search each independently
news_results = news_store.search("economy")
review_results = review_store.search("quality")
ticket_results = ticket_store.search("login issue")
```
### Incremental Updates
Add new documents to existing stores without replacing them:
```python
# Create initial store (in-memory by default)
store = client.save(
partition="knowledge_base",
collection=[
{"text": "Python is a programming language"},
{"text": "JavaScript runs in browsers"},
]
)
print(f"Initial size: {store.size}") # 2
# Add more documents using the store's add() method
store.add([
{"text": "TypeScript adds types to JavaScript"},
{"text": "Rust is memory-safe"},
])
print(f"Updated size: {store.size}") # 4
# Or use save() with append=True
client.save(
partition="knowledge_base",
collection=[{"text": "Go is designed for concurrency"}],
append=True
)
# Persist all changes to disk when ready
store.add([], cache=True) # Flush to disk
```
### Persistent Storage
Enable caching to persist vector stores to disk:
```python
# Save directly to disk
store = client.save(
partition="permanent_docs",
collection=documents,
cache=True # Persist immediately
)
# Add documents and persist
store.add(new_documents, cache=True)
# Remove documents and persist
store.remove(0, cache=True)
# Later, load from cache
loaded_store = client.save(
partition="permanent_docs",
collection=[], # Empty collection loads from cache
cache=True
)
```
### In-Memory Operations
Work with documents without persisting to disk (default behavior):
```python
# Create a temporary store (cache=False is the default)
temp_store = client.save(
partition="temp_analysis",
collection=documents
)
# Add documents without caching (default behavior)
temp_store.add(more_documents)
# Search as normal
results = temp_store.search("query")
# Since cache=False (default), changes aren't persisted
```
### Document Management
```python
# Create store (in-memory by default)
store = client.save("products", initial_products)
# Add new products (in-memory)
store.add([
{"name": "New Product", "price": 49.99}
])
# Remove by index (in-memory)
store.remove(0)
# Remove by content (in-memory)
store.remove({"name": "Old Product", "price": 19.99})
# Check current size
print(f"Current inventory: {store.size} items")
# Persist changes to disk
store.add([], cache=True)
# Or delete entire partition from disk
if store.delete():
print("Partition deleted")
```
## Development Setup
This project uses `uv` for dependency management and automatically configures CPU-only PyTorch.
### Quick Start
1. **Install dependencies:**
```bash
uv sync
```
2. **Verify setup:**
```bash
uv run python setup_dev.py
```
3. **Run tests:**
```bash
uv run pytest
```
4. **Type checking:**
```bash
uv run pyright
```
### What Gets Installed
- **PyTorch (CPU-only)**: Automatically from PyTorch CPU index
- **Transformers**: HuggingFace transformers library
- **Sentence Transformers**: For embedding generation
- **NumPy**: Numerical computing
No special flags or manual PyTorch installation needed - just `uv sync` and go!
## Performance Tips
1. **Reuse Client instances** - Model loading is expensive
2. **Use persistent caching** - Vector computation is cached automatically
3. **Batch your saves** - Save collections together when possible
4. **Choose the right algorithm** - Cosine is fastest for most use cases
5. **Adjust top_k** - Lower values are faster
## Architecture
```
microvector/
├── main.py # Client API
├── partition_store.py # PartitionStore class for partition operations
├── store.py # Vector storage and similarity search
├── cache.py # Persistence layer
├── embed.py # Embedding generation
├── search.py # Search utilities
├── algos.py # Similarity algorithms
└── utils.py # Helper functions
```
## License
MIT License - see [LICENSE](LICENSE) file for details.
## Credits
Based on [HyperDB](https://github.com/jdagdelen/hyperDB) by John Dagdelen.
Refactored and maintained by Logan Powell.
Raw data
{
"_id": null,
"home_page": null,
"name": "microvector",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "automation, database, embeddings, llm, machine-learning, search, similarity, vector",
"author": "John Dagdelen, Logan Powell",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/6d/55/987cf23d304e6d64f6abacb9294169cb8682cc1fd0a26491c0e2a057d8d0/microvector-0.3.0.tar.gz",
"platform": null,
"description": "# microvector\n\nLightweight local vector database with persistence to disk, supporting multiple similarity metrics and an easy-to-use API.\n\n> A refactor and repackaging of [HyperDB](https://github.com/jdagdelen/hyperDB/tree/main) optimized for CPU-only environments with improved type safety and developer experience.\n\n## Features\n\n- \ud83d\ude80 **Simple API**: Clean, intuitive interface with `PartitionStore` pattern\n- \ud83d\udcbe **Persistent Storage**: Automatically caches vector stores to `.pickle.gz` files\n- \ud83d\udd0d **Multiple Similarity Metrics**: Choose from cosine, dot product, Euclidean, or Derrida distance\n- \ud83c\udfaf **Type Safe**: Full type annotations with strict pyright compliance\n- \u26a1 **CPU Optimized**: Designed for CPU-only environments (no CUDA required)\n- \ud83d\udd04 **Flexible Caching**: Use persistent stores or create temporary in-memory collections\n- \ud83d\udce6 **Easy Installation**: One-command setup with automatic PyTorch CPU configuration\n- \u2728 **Partition-level Operations**: Add, remove, and search documents through dedicated store objects\n\n## Installation\n\n```bash\npip install microvector\n```\n\nOr for development:\n\n```bash\ngit clone https://github.com/loganpowell/microvector.git\ncd microvector\nuv sync\n```\n\n## Quick Start\n\n```python\nfrom microvector import Client\n\n# Initialize the client\nclient = Client()\n\n# Save a collection (by default, in-memory only)\nstore = client.save(\n partition=\"my_documents\",\n collection=[\n {\"text\": \"Python is a popular programming language\", \"category\": \"tech\"},\n {\"text\": \"Machine learning models learn from data\", \"category\": \"ai\"},\n {\"text\": \"The quick brown fox jumps over the lazy dog\", \"category\": \"example\"},\n ],\n key=\"text\"\n)\n\n# Search using the store\nresults = store.search(\n term=\"artificial intelligence and ML\",\n)\n\nfor result in results:\n print(f\"Score: {result['similarity_score']:.4f} - {result['text']}\")\n\n# Add more documents (also in-memory by default)\nstore.add([\n {\"text\": \"Deep learning uses neural networks\", \"category\": \"ai\"}\n])\n\n# Search again with the updated store\nresults = store.search(\"neural networks\", top_k=3)\n\n# Persist to disk when ready\nclient.save(\n partition=\"my_documents\",\n collection=store.to_dict(),\n cache=True # Now save to disk\n)\n```\n\n## API Reference\n\n### Client\n\nThe main interface for creating and managing vector stores.\n\n```python\nClient(\n model_cache: str = \"./.cached_models\",\n vector_cache: str = \"./.vector_cache\",\n embedding_model: str = \"avsolatorio/GIST-small-Embedding-v0\",\n search_algo: str = \"cosine\"\n)\n```\n\n**Parameters:**\n\n| Parameter | Description | Default |\n| ----------------- | -------------------------------------------------- | --------------------------------------- |\n| `model_cache` | Directory for caching downloaded embedding models | `\"./.cached_models\"` |\n| `vector_cache` | Directory for persisting vector stores | `\"./.vector_cache\"` |\n| `embedding_model` | HuggingFace model name for generating embeddings | `\"avsolatorio/GIST-small-Embedding-v0\"` |\n| `search_algo` | `\"cosine\"`, `\"dot\"`, `\"euclidean\"`, or `\"derrida\"` | `\"cosine\"` |\n\n**Note:** The `search_algo` is set at the client level and applies to all partitions created by that client. This ensures consistency and prevents issues with switching algorithms on already-normalized vectors.\n\n### save()\n\nCreate or update a vector store and return a `PartitionStore` for operations.\n\n```python\nstore = client.save(\n partition: str,\n collection: list[dict[str, Any]],\n key: str = \"text\",\n cache: bool = False,\n append: bool = False\n) -> PartitionStore\n```\n\n**Parameters:**\n\n| Parameter | Description | Default |\n| ------------ | ------------------------------------------------------- | -------- |\n| `partition` | Unique identifier for this vector store | - |\n| `collection` | List of documents (dictionaries) to vectorize | - |\n| `key` | Field name to use for embedding | `\"text\"` |\n| `cache` | If True, persist to disk; if False, keep in-memory only | `False` |\n| `append` | If True, add to existing store; if False, replace | `False` |\n\n**Returns:** `PartitionStore` object with methods for searching and managing documents\n\n**Example:**\n\n```python\nstore = client.save(\n partition=\"products\",\n collection=[\n {\"description\": \"Wireless headphones\", \"price\": 99.99},\n {\"description\": \"Smart watch\", \"price\": 299.99},\n ],\n key=\"description\",\n cache=True # Persist to disk\n)\n\nprint(f\"Partition: {store.partition}\")\nprint(f\"Size: {store.size}\")\nprint(f\"Algorithm: {store.algo}\")\n```\n\n### PartitionStore\n\nA partition-specific interface for vector operations returned by `client.save()`.\n\n**Attributes:**\n\n| Attribute | Type | Description |\n| ----------- | ---- | -------------------------------------------------------------------------------- |\n| `partition` | str | Name of the partition |\n| `key` | str | Field being vectorized |\n| `algo` | str | Similarity algorithm in use (`\"cosine\"`, `\"dot\"`, `\"euclidean\"`, or `\"derrida\"`) |\n| `size` | int | Number of documents in the store (read-only property) |\n\n**Methods:**\n\n#### search()\n\nSearch this partition for similar documents.\n\n```python\nresults = store.search(\n term: str,\n top_k: int = 5\n) -> list[dict[str, Any]]\n```\n\n**Parameters:**\n\n| Parameter | Description | Default |\n| --------- | ----------------------------------- | ------- |\n| `term` | Search query string | - |\n| `top_k` | Maximum number of results to return | `5` |\n\n**Returns:** List of documents with similarity scores\n\n```python\n[\n {\n \"text\": \"Machine learning is awesome\",\n \"category\": \"ai\",\n \"similarity_score\": 0.923\n },\n ...\n]\n```\n\n**Example:**\n\n```python\nresults = store.search(\"laptop computers\", top_k=3)\nfor result in results:\n print(f\"{result['similarity_score']:.3f} - {result['description']}\")\n```\n\n#### add()\n\nAdd new documents to this partition.\n\n```python\nsuccess = store.add(\n collection: list[dict[str, Any]] | dict[str, Any],\n cache: bool = False\n) -> bool\n```\n\n**Parameters:**\n\n| Parameter | Description | Default |\n| ------------ | -------------------------------------------------- | ------- |\n| `collection` | Single document or list of documents to add | - |\n| `cache` | If True, persist changes; if False, keep in-memory | `False` |\n\n**Returns:** `True` if successful, `False` otherwise\n\n**Example:**\n\n```python\n# Add a single document\nstore.add({\"description\": \"USB-C cable\", \"price\": 12.99})\n\n# Add multiple documents\nstore.add([\n {\"description\": \"Keyboard\", \"price\": 79.99},\n {\"description\": \"Mouse pad\", \"price\": 19.99}\n], cache=True)\n\nprint(f\"Updated size: {store.size}\")\n```\n\n#### remove()\n\nRemove a document from this partition by index or content.\n\n```python\nsuccess = store.remove(\n item: int | dict[str, Any],\n cache: bool = False\n) -> bool\n```\n\n**Parameters:**\n\n| Parameter | Description | Default |\n| --------- | --------------------------------------------------------- | ------- |\n| `item` | Document index (int) or document content (dict) to remove | - |\n| `cache` | If True, persist changes; if False, keep in-memory | `False` |\n\n**Returns:** `True` if successful, `False` otherwise\n\n**Example:**\n\n```python\n# Remove by index\nstore.remove(0)\n\n# Remove by document content\nstore.remove({\"description\": \"Wireless headphones\", \"price\": 99.99})\n```\n\n#### delete()\n\nDelete this partition's cache file from disk.\n\n```python\nsuccess = store.delete() -> bool\n```\n\n**Returns:** `True` if successful, `False` otherwise\n\n**Example:**\n\n```python\nif store.delete():\n print(f\"Partition '{store.partition}' deleted from disk\")\n```\n\n## Similarity Algorithms\n\n| Algorithm | Best For | Range |\n| ----------- | --------------------------------- | ---------------------------- |\n| `cosine` | General text similarity (default) | 0-1 (higher is more similar) |\n| `dot` | When magnitude matters | Unbounded |\n| `euclidean` | Spatial distance | 0-\u221e (lower is more similar) |\n| `derrida` | Experimental alternative distance | 0-\u221e (lower is more similar) |\n\n## Advanced Usage\n\n### Using Different Algorithms\n\nThe similarity algorithm is set at the client level and applies to all partitions:\n\n```python\n# Create clients with different algorithms\ncosine_client = Client(search_algo=\"cosine\")\ndot_client = Client(search_algo=\"dot\")\neuclidean_client = Client(search_algo=\"euclidean\")\n\n# Each client's partitions use its algorithm\ncosine_store = cosine_client.save(\"docs_cosine\", documents)\ndot_store = dot_client.save(\"docs_dot\", documents)\n\n# Different algorithms, different results\ncosine_results = cosine_store.search(\"query\")\ndot_results = dot_store.search(\"query\")\n```\n\n**Why client-level?** Vectors are normalized based on the algorithm. Switching algorithms on existing vectors would produce incorrect results, so we lock the algorithm at creation time for consistency.\n\n### Custom Embedding Models\n\nUse any HuggingFace sentence-transformer model:\n\n```python\nclient = Client(\n embedding_model=\"intfloat/e5-small-v2\",\n search_algo=\"cosine\"\n)\n```\n\n### Nested Key Paths\n\nAccess nested fields using dot notation:\n\n```python\ncollection = [\n {\n \"product\": {\n \"name\": \"Laptop\",\n \"specs\": {\"cpu\": \"Intel i7\"}\n }\n },\n {\n \"product\": {\n \"name\": \"Mouse\",\n \"specs\": {\"cpu\": None}\n }\n }\n]\n\nstore = client.save(\n partition=\"products\",\n collection=collection,\n key=\"product.name\" # Extract \"Laptop\", \"Mouse\" from nested structure\n)\n\n# Search works on the nested field\nresults = store.search(\"computer\", top_k=1)\nprint(results[0][\"product\"][\"name\"]) # \"Laptop\"\n```\n\n### Working with Multiple Partitions\n\nOrganize different datasets in separate partitions:\n\n```python\n# Create stores for different content types\nnews_store = client.save(\"news_articles\", news_data, key=\"content\")\nreview_store = client.save(\"product_reviews\", review_data, key=\"review_text\")\nticket_store = client.save(\"support_tickets\", tickets, key=\"description\")\n\n# Search each independently\nnews_results = news_store.search(\"economy\")\nreview_results = review_store.search(\"quality\")\nticket_results = ticket_store.search(\"login issue\")\n```\n\n### Incremental Updates\n\nAdd new documents to existing stores without replacing them:\n\n```python\n# Create initial store (in-memory by default)\nstore = client.save(\n partition=\"knowledge_base\",\n collection=[\n {\"text\": \"Python is a programming language\"},\n {\"text\": \"JavaScript runs in browsers\"},\n ]\n)\n\nprint(f\"Initial size: {store.size}\") # 2\n\n# Add more documents using the store's add() method\nstore.add([\n {\"text\": \"TypeScript adds types to JavaScript\"},\n {\"text\": \"Rust is memory-safe\"},\n])\n\nprint(f\"Updated size: {store.size}\") # 4\n\n# Or use save() with append=True\nclient.save(\n partition=\"knowledge_base\",\n collection=[{\"text\": \"Go is designed for concurrency\"}],\n append=True\n)\n\n# Persist all changes to disk when ready\nstore.add([], cache=True) # Flush to disk\n```\n\n### Persistent Storage\n\nEnable caching to persist vector stores to disk:\n\n```python\n# Save directly to disk\nstore = client.save(\n partition=\"permanent_docs\",\n collection=documents,\n cache=True # Persist immediately\n)\n\n# Add documents and persist\nstore.add(new_documents, cache=True)\n\n# Remove documents and persist\nstore.remove(0, cache=True)\n\n# Later, load from cache\nloaded_store = client.save(\n partition=\"permanent_docs\",\n collection=[], # Empty collection loads from cache\n cache=True\n)\n```\n\n### In-Memory Operations\n\nWork with documents without persisting to disk (default behavior):\n\n```python\n# Create a temporary store (cache=False is the default)\ntemp_store = client.save(\n partition=\"temp_analysis\",\n collection=documents\n)\n\n# Add documents without caching (default behavior)\ntemp_store.add(more_documents)\n\n# Search as normal\nresults = temp_store.search(\"query\")\n\n# Since cache=False (default), changes aren't persisted\n```\n\n### Document Management\n\n```python\n# Create store (in-memory by default)\nstore = client.save(\"products\", initial_products)\n\n# Add new products (in-memory)\nstore.add([\n {\"name\": \"New Product\", \"price\": 49.99}\n])\n\n# Remove by index (in-memory)\nstore.remove(0)\n\n# Remove by content (in-memory)\nstore.remove({\"name\": \"Old Product\", \"price\": 19.99})\n\n# Check current size\nprint(f\"Current inventory: {store.size} items\")\n\n# Persist changes to disk\nstore.add([], cache=True)\n\n# Or delete entire partition from disk\nif store.delete():\n print(\"Partition deleted\")\n```\n\n## Development Setup\n\nThis project uses `uv` for dependency management and automatically configures CPU-only PyTorch.\n\n### Quick Start\n\n1. **Install dependencies:**\n\n ```bash\n uv sync\n ```\n\n2. **Verify setup:**\n\n ```bash\n uv run python setup_dev.py\n ```\n\n3. **Run tests:**\n\n ```bash\n uv run pytest\n ```\n\n4. **Type checking:**\n ```bash\n uv run pyright\n ```\n\n### What Gets Installed\n\n- **PyTorch (CPU-only)**: Automatically from PyTorch CPU index\n- **Transformers**: HuggingFace transformers library\n- **Sentence Transformers**: For embedding generation\n- **NumPy**: Numerical computing\n\nNo special flags or manual PyTorch installation needed - just `uv sync` and go!\n\n## Performance Tips\n\n1. **Reuse Client instances** - Model loading is expensive\n2. **Use persistent caching** - Vector computation is cached automatically\n3. **Batch your saves** - Save collections together when possible\n4. **Choose the right algorithm** - Cosine is fastest for most use cases\n5. **Adjust top_k** - Lower values are faster\n\n## Architecture\n\n```\nmicrovector/\n\u251c\u2500\u2500 main.py # Client API\n\u251c\u2500\u2500 partition_store.py # PartitionStore class for partition operations\n\u251c\u2500\u2500 store.py # Vector storage and similarity search\n\u251c\u2500\u2500 cache.py # Persistence layer\n\u251c\u2500\u2500 embed.py # Embedding generation\n\u251c\u2500\u2500 search.py # Search utilities\n\u251c\u2500\u2500 algos.py # Similarity algorithms\n\u2514\u2500\u2500 utils.py # Helper functions\n```\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## Credits\n\nBased on [HyperDB](https://github.com/jdagdelen/hyperDB) by John Dagdelen.\nRefactored and maintained by Logan Powell.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Lightweight local vector database with persistence to disk, supporting multiple similarity metrics and easy-to-use API.",
"version": "0.3.0",
"project_urls": {
"Homepage": "https://github.com/loganpowell/microvector",
"Issues": "https://github.com/loganpowell/microvector/issues",
"Repository": "https://github.com/loganpowell/microvector"
},
"split_keywords": [
"automation",
" database",
" embeddings",
" llm",
" machine-learning",
" search",
" similarity",
" vector"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "94d9480fe08a23ee8064251fb5b2c77c99c9d8dcf6c7c40ccc6e0e3ad2dae149",
"md5": "f7aa39701e5e1151a72e5cb2868d22f8",
"sha256": "3a611101a488ac19bc4386c23115a2103eba84a86b74dd131a1b686fe7c795ca"
},
"downloads": -1,
"filename": "microvector-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f7aa39701e5e1151a72e5cb2868d22f8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 36672,
"upload_time": "2025-10-29T02:17:41",
"upload_time_iso_8601": "2025-10-29T02:17:41.405799Z",
"url": "https://files.pythonhosted.org/packages/94/d9/480fe08a23ee8064251fb5b2c77c99c9d8dcf6c7c40ccc6e0e3ad2dae149/microvector-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6d55987cf23d304e6d64f6abacb9294169cb8682cc1fd0a26491c0e2a057d8d0",
"md5": "c45dc556dd07ee6e63a6dd9e0a710ed3",
"sha256": "afcd10f8fe59dae73296b39a6da88c6c7b40cff316ade64c1db4c9f7e4607601"
},
"downloads": -1,
"filename": "microvector-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "c45dc556dd07ee6e63a6dd9e0a710ed3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 175776,
"upload_time": "2025-10-29T02:17:42",
"upload_time_iso_8601": "2025-10-29T02:17:42.444090Z",
"url": "https://files.pythonhosted.org/packages/6d/55/987cf23d304e6d64f6abacb9294169cb8682cc1fd0a26491c0e2a057d8d0/microvector-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-29 02:17:42",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "loganpowell",
"github_project": "microvector",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "microvector"
}