# Thoth Vector Database Manager v0.6.2
A high-performance, Haystack v2-based vector database manager with **external embedding providers** and centralized embedding management for 4 production-ready backends.
## ๐ค MCP Server Support
This project is configured with MCP (Model Context Protocol) servers for enhanced AI-assisted development:
- **Context7**: Enhanced context management
- **Serena**: IDE assistance and development support
See [docs/MCP_SETUP.md](docs/MCP_SETUP.md) for details.
## ๐ Features
### ๐ **NEW in v0.6.0: External Embedding Providers**
- **OpenAI, Cohere, Mistral**: Support for major external embedding APIs
- **Cost-Effective**: Pay-per-use model with intelligent caching
- **High-Quality Embeddings**: State-of-the-art embedding models
- **Unified Management**: Centralized `ExternalEmbeddingManager`
### ๐๏ธ **Core Features**
- **Multi-backend support**: Qdrant, Chroma, PostgreSQL pgvector, Milvus
- **Haystack v2 integration**: Uses haystack-ai v2.12.0+ as an abstraction layer
- **Centralized embeddings**: No more client-side embedding management
- **Memory optimization**: Intelligent caching and lazy loading
- **API compatibility**: Backward compatible with existing APIs
- **Type safety**: Full type hints and Pydantic validation
- **Production-ready**: Comprehensive testing and robust error handling
## ๐ฆ Installation
### ๐ **Recommended: uv Package Manager**
This project uses [uv](https://docs.astral.sh/uv/) for fast, reliable Python package management. Install uv first:
```bash
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
```
### โ
**No Dependency Conflicts**
Version 0.4.0 resolves all dependency conflicts! All 4 supported databases can now be installed together:
#### All Databases (Recommended)
```bash
# Install all supported backends (Qdrant, Chroma, PgVector, Milvus)
uv add thoth-vdbmanager[all]
```
#### Individual Backends
```bash
# Individual backend installation
uv add thoth-vdbmanager[qdrant] # Qdrant support
uv add thoth-vdbmanager[chroma] # Chroma support
uv add thoth-vdbmanager[pgvector] # PostgreSQL pgvector support
uv add thoth-vdbmanager[milvus] # Milvus support
```
#### Development Installation
```bash
# For development with all backends and testing tools
uv add thoth-vdbmanager[all,test,dev]
```
### ๐ **pip Installation (Also Supported)**
If you prefer pip, all commands work by replacing `uv add` with `pip install`:
```bash
# Example with pip
pip install thoth-vdbmanager[all]
```
### ๐ **Breaking Changes in v0.4.0**
- **Removed**: Weaviate and Pinecone support (no longer maintained)
- **Updated**: Now requires haystack-ai v2.12.0+ (not compatible with legacy haystack)
- **Improved**: All remaining databases work together without conflicts
## ๐๏ธ Architecture
The library is built on a clean architecture with:
- **Core**: Base interfaces and document types
- **Adapters**: Backend-specific implementations using Haystack
- **Factory**: Unified creation interface
- **Compatibility**: Legacy API support
## ๐ External Embedding Providers (NEW in v0.6.0)
### Setup External Embeddings
Configure your external embedding provider using environment variables:
```bash
# OpenAI (recommended)
export EMBEDDING_PROVIDER=openai
export EMBEDDING_API_KEY=sk-your-openai-key
export EMBEDDING_MODEL=text-embedding-3-small
# Cohere
export EMBEDDING_PROVIDER=cohere
export EMBEDDING_API_KEY=your-cohere-key
export EMBEDDING_MODEL=embed-multilingual-v3.0
# Mistral
export EMBEDDING_PROVIDER=mistral
export EMBEDDING_API_KEY=your-mistral-key
export EMBEDDING_MODEL=mistral-embed
```
### Using External Embeddings
```python
import os
from thoth_vdbmanager import ExternalVectorStoreFactory, ColumnNameDocument
# Create store with external embeddings
store = ExternalVectorStoreFactory.create_from_env(
backend="qdrant",
collection="my_collection",
host="localhost",
port=6333
)
# Add document - embeddings generated via API
doc = ColumnNameDocument(
table_name="users",
column_name="email",
column_description="User email address",
value_description="Valid email format"
)
store.add_column_description(doc)
# Search - query embeddings generated via API
results = store.search_similar(
query="user email address",
doc_type="column_name",
top_k=5
)
```
### Available External Providers
| Provider | Models | Dimensions | Features |
|----------|--------|------------|----------|
| **OpenAI** | text-embedding-3-small, text-embedding-3-large | 1536, 3072 | High quality, multilingual |
| **Cohere** | embed-multilingual-v3.0, embed-english-v3.0 | 1024 | Optimized for search |
| **Mistral** | mistral-embed | 1024 | European provider |
### Cost Optimization with Caching
```python
# Enable intelligent caching to reduce API calls
embedding_config = {
'provider': 'openai',
'api_key': 'sk-your-key',
'model': 'text-embedding-3-small',
'enable_cache': True, # Enable caching
'cache_size': 10000 # Cache up to 10k embeddings
}
store = ExternalVectorStoreFactory.create(
backend="qdrant",
embedding_config=embedding_config,
collection="cached_collection",
host="localhost",
port=6333
)
```
## ๐ Quick Start
### External Embedding API (Recommended)
```python
import os
from thoth_vdbmanager import ExternalVectorStoreFactory, ColumnNameDocument, SqlDocument, EvidenceDocument
# Set up external embedding provider
os.environ['EMBEDDING_PROVIDER'] = 'openai'
os.environ['EMBEDDING_API_KEY'] = 'sk-your-openai-key'
os.environ['EMBEDDING_MODEL'] = 'text-embedding-3-small'
# Create a vector store with external embeddings
store = ExternalVectorStoreFactory.create_from_env(
backend="qdrant",
collection="my_collection",
host="localhost",
port=6333
)
# Add documents
column_doc = ColumnNameDocument(
table_name="users",
column_name="email",
original_column_name="user_email",
column_description="User email address",
value_description="Valid email format"
)
doc_id = store.add_column_description(column_doc)
# Search documents using external API embeddings
results = store.search_similar(
query="user email",
doc_type="column_name",
top_k=5
)
```
### Available Classes
```python
from thoth_vdbmanager import (
VectorStoreFactory, # Main factory for creating stores
ColumnNameDocument, # Column metadata documents
SqlDocument, # SQL example documents
EvidenceDocument, # Evidence/hint documents
ThothType, # Document type enumeration
VectorStoreInterface # Base interface for all stores
)
```
## ๐ง Configuration
### Qdrant
```python
store = VectorStoreFactory.create(
backend="qdrant",
collection="my_collection",
host="localhost",
port=6333,
api_key="your-api-key", # Optional
embedding_dim=384, # Optional
hnsw_config={"m": 16, "ef_construct": 100}
)
```
### Chroma (Multiple Modes)
**Memory Mode (Recommended for Testing):**
```python
store = VectorStoreFactory.create(
backend="chroma",
collection="my_collection",
mode="memory" # Fast, isolated, no persistence
)
```
**Filesystem Mode:**
```python
store = VectorStoreFactory.create(
backend="chroma",
collection="my_collection",
mode="filesystem",
persist_path="./chroma_db"
)
```
**Server Mode (Production):**
```python
store = VectorStoreFactory.create(
backend="chroma",
collection="my_collection",
mode="server",
host="localhost",
port=8000
)
```
> ๐ **See [Chroma Configuration Guide](docs/CHROMA_CONFIGURATION.md) for detailed setup instructions**
### PostgreSQL pgvector
```python
store = VectorStoreFactory.create(
backend="pgvector",
collection="my_table",
connection_string="postgresql://user:pass@localhost:5432/dbname"
)
```
### Milvus (Multiple Modes)
**Lite Mode (Recommended for Testing):**
```python
store = VectorStoreFactory.create(
backend="milvus",
collection="my_collection",
mode="lite",
connection_uri="./milvus.db" # File-based storage
)
```
**Server Mode (Production):**
```python
store = VectorStoreFactory.create(
backend="milvus",
collection="my_collection",
mode="server",
host="localhost",
port=19530
)
```
> ๐ **See [Milvus Configuration Guide](docs/MILVUS_CONFIGURATION.md) for detailed setup instructions**
## ๐ Performance Optimizations
### Memory Usage
- **Lazy initialization**: Embedders and connections are initialized on first use
- **Singleton pattern**: Same configuration reuses existing instances
- **Batch processing**: Efficient bulk operations
### Performance Tuning
```python
# Optimize for specific use cases
store = VectorStoreFactory.create(
backend="qdrant",
collection="optimized",
embedding_model="sentence-transformers/all-MiniLM-L6-v2", # 384-dim, fast
hnsw_config={"m": 32, "ef_construct": 200} # Better search quality
)
```
## ๐งช Testing
```bash
# Run all tests
pytest
# Run specific backend tests
pytest tests/test_qdrant.py -v
# Run with coverage
pytest --cov=vdbmanager tests/
```
## ๐ Migration Guide
### From v0.3.x to v0.4.0
#### Breaking Changes
- **Removed databases**: Weaviate and Pinecone are no longer supported
- **Haystack version**: Now requires haystack-ai v2.12.0+ (not compatible with legacy haystack)
- **Dependencies**: All remaining databases can now be installed together without conflicts
#### Migration Steps
**1. Update installation:**
```bash
# Old installation (v0.3.x)
pip install thoth-vdbmanager[all-safe] # Avoided conflicts
# New installation (v0.4.0)
pip install thoth-vdbmanager[all] # No conflicts!
```
**2. Update code (if using removed databases):**
```python
# If you were using Weaviate - migrate to Qdrant or Chroma
# Old code (v0.3.x)
store = VectorStoreFactory.create(
backend="weaviate", # No longer supported
collection="MyCollection",
url="http://localhost:8080"
)
# New code (v0.4.0) - migrate to similar database
store = VectorStoreFactory.create(
backend="qdrant", # Recommended alternative
collection="my_collection",
host="localhost",
port=6333
)
```
**3. Existing supported databases work unchanged:**
```python
# This code works exactly the same in v0.4.0
store = VectorStoreFactory.create(
backend="qdrant", # โ
Still supported
collection="my_docs",
host="localhost",
port=6333
)
```
## ๐ API Reference
### Core Classes
#### VectorStoreFactory
```python
# Create store
store = VectorStoreFactory.create(backend, collection, **kwargs)
# From config
config = {"backend": "qdrant", "params": {...}}
store = VectorStoreFactory.from_config(config)
# List backends
backends = VectorStoreFactory.list_backends()
```
#### Document Types
- `ColumnNameDocument`: Column metadata
- `SqlDocument`: SQL examples
- `EvidenceDocument`: General evidence/hints
### Methods
- `add_column_description(doc)`: Add column metadata
- `add_sql(doc)`: Add SQL example
- `add_evidence(doc)`: Add evidence/hint
- `search_similar(query, doc_type, top_k=5, score_threshold=0.7)`: Semantic search
- `get_document(doc_id)`: Retrieve by ID
- `bulk_add_documents(docs)`: Batch insert
- `get_collection_info()`: Get stats
## ๐ Troubleshooting
### Common Issues
#### Connection Errors
```python
# Check service availability
import requests
requests.get("http://localhost:6333") # Qdrant
```
#### Memory Issues
```python
# Use smaller embedding model
store = VectorStoreFactory.create(
backend="qdrant",
collection="my_collection",
embedding_model="sentence-transformers/all-MiniLM-L6-v2" # 384-dim
)
```
#### Performance Issues
```python
# Tune HNSW parameters
store = VectorStoreFactory.create(
backend="qdrant",
collection="my_collection",
hnsw_config={"m": 16, "ef_construct": 100}
)
```
## ๐ค Contributing
1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request
## ๐ License
MIT License - see LICENSE file for details.
## ๐ Directory Structure
```
thoth_vdbmanager/
โโโ vdbmanager/
โ โโโ core/ # Base interfaces and document types
โ โ โโโ base.py # Core document classes and interfaces
โ โ โโโ __init__.py
โ โโโ adapters/ # Backend-specific implementations
โ โ โโโ haystack_adapter.py # Base Haystack adapter
โ โ โโโ qdrant_adapter.py # Qdrant implementation
โ โ โโโ chroma_adapter.py # Chroma implementation
โ โ โโโ pgvector_adapter.py # PostgreSQL pgvector
โ โ โโโ milvus_adapter.py # Milvus implementation
โ โโโ factory.py # Unified creation interface
โ โโโ __init__.py # Public API exports
โโโ test_e2e_vectordb/ # End-to-end tests
โโโ pyproject.toml # Project configuration
โโโ README.md # This file
```
## ๐ Quick API Reference
### Main API
```python
from thoth_vdbmanager import VectorStoreFactory, ColumnNameDocument
# Create any backend
store = VectorStoreFactory.create(
backend="qdrant",
collection="my_docs",
host="localhost",
port=6333
)
# Use the methods
doc_id = store.add_column_description(column_doc)
results = store.search_similar("user email", "column_name")
```
---
**๐ Ready to use with Haystack v2 and 4 production-ready vector databases!**
Raw data
{
"_id": null,
"home_page": null,
"name": "thoth-vdbmanager",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.13",
"maintainer_email": null,
"keywords": "ai, embeddings, machine-learning, similarity-search, vector-database",
"author": null,
"author_email": "Marco Pancotti <mp@tylconsulting.it>",
"download_url": "https://files.pythonhosted.org/packages/fa/f1/f7a167cf110e48774e0fbf4e4e6668e7d3a914f9ef2cf8151516a1afec0d/thoth_vdbmanager-0.7.2.tar.gz",
"platform": null,
"description": "# Thoth Vector Database Manager v0.6.2\n\nA high-performance, Haystack v2-based vector database manager with **external embedding providers** and centralized embedding management for 4 production-ready backends.\n\n## \ud83e\udd16 MCP Server Support\n\nThis project is configured with MCP (Model Context Protocol) servers for enhanced AI-assisted development:\n- **Context7**: Enhanced context management\n- **Serena**: IDE assistance and development support\n\nSee [docs/MCP_SETUP.md](docs/MCP_SETUP.md) for details.\n\n## \ud83d\ude80 Features\n\n### \ud83c\udf10 **NEW in v0.6.0: External Embedding Providers**\n- **OpenAI, Cohere, Mistral**: Support for major external embedding APIs\n- **Cost-Effective**: Pay-per-use model with intelligent caching\n- **High-Quality Embeddings**: State-of-the-art embedding models\n- **Unified Management**: Centralized `ExternalEmbeddingManager`\n\n### \ud83c\udfd7\ufe0f **Core Features**\n- **Multi-backend support**: Qdrant, Chroma, PostgreSQL pgvector, Milvus\n- **Haystack v2 integration**: Uses haystack-ai v2.12.0+ as an abstraction layer\n- **Centralized embeddings**: No more client-side embedding management\n- **Memory optimization**: Intelligent caching and lazy loading\n- **API compatibility**: Backward compatible with existing APIs\n- **Type safety**: Full type hints and Pydantic validation\n- **Production-ready**: Comprehensive testing and robust error handling\n\n## \ud83d\udce6 Installation\n\n### \ud83d\ude80 **Recommended: uv Package Manager**\n\nThis project uses [uv](https://docs.astral.sh/uv/) for fast, reliable Python package management. Install uv first:\n\n```bash\n# Install uv (if not already installed)\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n```\n\n### \u2705 **No Dependency Conflicts**\n\nVersion 0.4.0 resolves all dependency conflicts! All 4 supported databases can now be installed together:\n\n#### All Databases (Recommended)\n\n```bash\n# Install all supported backends (Qdrant, Chroma, PgVector, Milvus)\nuv add thoth-vdbmanager[all]\n```\n\n#### Individual Backends\n\n```bash\n# Individual backend installation\nuv add thoth-vdbmanager[qdrant] # Qdrant support\nuv add thoth-vdbmanager[chroma] # Chroma support\nuv add thoth-vdbmanager[pgvector] # PostgreSQL pgvector support\nuv add thoth-vdbmanager[milvus] # Milvus support\n```\n\n#### Development Installation\n\n```bash\n# For development with all backends and testing tools\nuv add thoth-vdbmanager[all,test,dev]\n```\n\n### \ud83d\udd04 **pip Installation (Also Supported)**\n\nIf you prefer pip, all commands work by replacing `uv add` with `pip install`:\n\n```bash\n# Example with pip\npip install thoth-vdbmanager[all]\n```\n\n### \ud83d\udd04 **Breaking Changes in v0.4.0**\n\n- **Removed**: Weaviate and Pinecone support (no longer maintained)\n- **Updated**: Now requires haystack-ai v2.12.0+ (not compatible with legacy haystack)\n- **Improved**: All remaining databases work together without conflicts\n\n## \ud83c\udfd7\ufe0f Architecture\n\nThe library is built on a clean architecture with:\n\n- **Core**: Base interfaces and document types\n- **Adapters**: Backend-specific implementations using Haystack\n- **Factory**: Unified creation interface\n- **Compatibility**: Legacy API support\n\n## \ud83c\udf10 External Embedding Providers (NEW in v0.6.0)\n\n### Setup External Embeddings\n\nConfigure your external embedding provider using environment variables:\n\n```bash\n# OpenAI (recommended)\nexport EMBEDDING_PROVIDER=openai\nexport EMBEDDING_API_KEY=sk-your-openai-key\nexport EMBEDDING_MODEL=text-embedding-3-small\n\n# Cohere\nexport EMBEDDING_PROVIDER=cohere \nexport EMBEDDING_API_KEY=your-cohere-key\nexport EMBEDDING_MODEL=embed-multilingual-v3.0\n\n# Mistral\nexport EMBEDDING_PROVIDER=mistral\nexport EMBEDDING_API_KEY=your-mistral-key\nexport EMBEDDING_MODEL=mistral-embed\n```\n\n### Using External Embeddings\n\n```python\nimport os\nfrom thoth_vdbmanager import ExternalVectorStoreFactory, ColumnNameDocument\n\n# Create store with external embeddings\nstore = ExternalVectorStoreFactory.create_from_env(\n backend=\"qdrant\",\n collection=\"my_collection\",\n host=\"localhost\",\n port=6333\n)\n\n# Add document - embeddings generated via API\ndoc = ColumnNameDocument(\n table_name=\"users\",\n column_name=\"email\",\n column_description=\"User email address\",\n value_description=\"Valid email format\"\n)\nstore.add_column_description(doc)\n\n# Search - query embeddings generated via API\nresults = store.search_similar(\n query=\"user email address\",\n doc_type=\"column_name\", \n top_k=5\n)\n```\n\n### Available External Providers\n\n| Provider | Models | Dimensions | Features |\n|----------|--------|------------|----------|\n| **OpenAI** | text-embedding-3-small, text-embedding-3-large | 1536, 3072 | High quality, multilingual |\n| **Cohere** | embed-multilingual-v3.0, embed-english-v3.0 | 1024 | Optimized for search |\n| **Mistral** | mistral-embed | 1024 | European provider |\n\n### Cost Optimization with Caching\n\n```python\n# Enable intelligent caching to reduce API calls\nembedding_config = {\n 'provider': 'openai',\n 'api_key': 'sk-your-key',\n 'model': 'text-embedding-3-small',\n 'enable_cache': True, # Enable caching\n 'cache_size': 10000 # Cache up to 10k embeddings\n}\n\nstore = ExternalVectorStoreFactory.create(\n backend=\"qdrant\",\n embedding_config=embedding_config,\n collection=\"cached_collection\",\n host=\"localhost\",\n port=6333\n)\n```\n\n## \ud83d\ude80 Quick Start\n\n### External Embedding API (Recommended)\n\n```python\nimport os\nfrom thoth_vdbmanager import ExternalVectorStoreFactory, ColumnNameDocument, SqlDocument, EvidenceDocument\n\n# Set up external embedding provider\nos.environ['EMBEDDING_PROVIDER'] = 'openai'\nos.environ['EMBEDDING_API_KEY'] = 'sk-your-openai-key'\nos.environ['EMBEDDING_MODEL'] = 'text-embedding-3-small'\n\n# Create a vector store with external embeddings\nstore = ExternalVectorStoreFactory.create_from_env(\n backend=\"qdrant\",\n collection=\"my_collection\",\n host=\"localhost\",\n port=6333\n)\n\n# Add documents\ncolumn_doc = ColumnNameDocument(\n table_name=\"users\",\n column_name=\"email\",\n original_column_name=\"user_email\",\n column_description=\"User email address\",\n value_description=\"Valid email format\"\n)\n\ndoc_id = store.add_column_description(column_doc)\n\n# Search documents using external API embeddings\nresults = store.search_similar(\n query=\"user email\",\n doc_type=\"column_name\",\n top_k=5\n)\n```\n\n### Available Classes\n\n```python\nfrom thoth_vdbmanager import (\n VectorStoreFactory, # Main factory for creating stores\n ColumnNameDocument, # Column metadata documents\n SqlDocument, # SQL example documents\n EvidenceDocument, # Evidence/hint documents\n ThothType, # Document type enumeration\n VectorStoreInterface # Base interface for all stores\n)\n```\n\n## \ud83d\udd27 Configuration\n\n### Qdrant\n\n```python\nstore = VectorStoreFactory.create(\n backend=\"qdrant\",\n collection=\"my_collection\",\n host=\"localhost\",\n port=6333,\n api_key=\"your-api-key\", # Optional\n embedding_dim=384, # Optional\n hnsw_config={\"m\": 16, \"ef_construct\": 100}\n)\n```\n\n### Chroma (Multiple Modes)\n\n**Memory Mode (Recommended for Testing):**\n```python\nstore = VectorStoreFactory.create(\n backend=\"chroma\",\n collection=\"my_collection\",\n mode=\"memory\" # Fast, isolated, no persistence\n)\n```\n\n**Filesystem Mode:**\n```python\nstore = VectorStoreFactory.create(\n backend=\"chroma\",\n collection=\"my_collection\",\n mode=\"filesystem\",\n persist_path=\"./chroma_db\"\n)\n```\n\n**Server Mode (Production):**\n```python\nstore = VectorStoreFactory.create(\n backend=\"chroma\",\n collection=\"my_collection\",\n mode=\"server\",\n host=\"localhost\",\n port=8000\n)\n```\n\n> \ud83d\udcd6 **See [Chroma Configuration Guide](docs/CHROMA_CONFIGURATION.md) for detailed setup instructions**\n\n### PostgreSQL pgvector\n```python\nstore = VectorStoreFactory.create(\n backend=\"pgvector\",\n collection=\"my_table\",\n connection_string=\"postgresql://user:pass@localhost:5432/dbname\"\n)\n```\n\n### Milvus (Multiple Modes)\n\n**Lite Mode (Recommended for Testing):**\n```python\nstore = VectorStoreFactory.create(\n backend=\"milvus\",\n collection=\"my_collection\",\n mode=\"lite\",\n connection_uri=\"./milvus.db\" # File-based storage\n)\n```\n\n**Server Mode (Production):**\n```python\nstore = VectorStoreFactory.create(\n backend=\"milvus\",\n collection=\"my_collection\",\n mode=\"server\",\n host=\"localhost\",\n port=19530\n)\n```\n\n> \ud83d\udcd6 **See [Milvus Configuration Guide](docs/MILVUS_CONFIGURATION.md) for detailed setup instructions**\n\n\n\n## \ud83d\udcca Performance Optimizations\n\n### Memory Usage\n- **Lazy initialization**: Embedders and connections are initialized on first use\n- **Singleton pattern**: Same configuration reuses existing instances\n- **Batch processing**: Efficient bulk operations\n\n### Performance Tuning\n```python\n# Optimize for specific use cases\nstore = VectorStoreFactory.create(\n backend=\"qdrant\",\n collection=\"optimized\",\n embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\", # 384-dim, fast\n hnsw_config={\"m\": 32, \"ef_construct\": 200} # Better search quality\n)\n```\n\n## \ud83e\uddea Testing\n\n```bash\n# Run all tests\npytest\n\n# Run specific backend tests\npytest tests/test_qdrant.py -v\n\n# Run with coverage\npytest --cov=vdbmanager tests/\n```\n\n## \ud83d\udcc8 Migration Guide\n\n### From v0.3.x to v0.4.0\n\n#### Breaking Changes\n- **Removed databases**: Weaviate and Pinecone are no longer supported\n- **Haystack version**: Now requires haystack-ai v2.12.0+ (not compatible with legacy haystack)\n- **Dependencies**: All remaining databases can now be installed together without conflicts\n\n#### Migration Steps\n\n**1. Update installation:**\n```bash\n# Old installation (v0.3.x)\npip install thoth-vdbmanager[all-safe] # Avoided conflicts\n\n# New installation (v0.4.0)\npip install thoth-vdbmanager[all] # No conflicts!\n```\n\n**2. Update code (if using removed databases):**\n```python\n# If you were using Weaviate - migrate to Qdrant or Chroma\n# Old code (v0.3.x)\nstore = VectorStoreFactory.create(\n backend=\"weaviate\", # No longer supported\n collection=\"MyCollection\",\n url=\"http://localhost:8080\"\n)\n\n# New code (v0.4.0) - migrate to similar database\nstore = VectorStoreFactory.create(\n backend=\"qdrant\", # Recommended alternative\n collection=\"my_collection\",\n host=\"localhost\",\n port=6333\n)\n```\n\n**3. Existing supported databases work unchanged:**\n```python\n# This code works exactly the same in v0.4.0\nstore = VectorStoreFactory.create(\n backend=\"qdrant\", # \u2705 Still supported\n collection=\"my_docs\",\n host=\"localhost\",\n port=6333\n)\n```\n\n## \ud83d\udd0d API Reference\n\n### Core Classes\n\n#### VectorStoreFactory\n```python\n# Create store\nstore = VectorStoreFactory.create(backend, collection, **kwargs)\n\n# From config\nconfig = {\"backend\": \"qdrant\", \"params\": {...}}\nstore = VectorStoreFactory.from_config(config)\n\n# List backends\nbackends = VectorStoreFactory.list_backends()\n```\n\n#### Document Types\n- `ColumnNameDocument`: Column metadata\n- `SqlDocument`: SQL examples\n- `EvidenceDocument`: General evidence/hints\n\n### Methods\n- `add_column_description(doc)`: Add column metadata\n- `add_sql(doc)`: Add SQL example\n- `add_evidence(doc)`: Add evidence/hint\n- `search_similar(query, doc_type, top_k=5, score_threshold=0.7)`: Semantic search\n- `get_document(doc_id)`: Retrieve by ID\n- `bulk_add_documents(docs)`: Batch insert\n- `get_collection_info()`: Get stats\n\n## \ud83d\udc1b Troubleshooting\n\n### Common Issues\n\n#### Connection Errors\n```python\n# Check service availability\nimport requests\nrequests.get(\"http://localhost:6333\") # Qdrant\n```\n\n#### Memory Issues\n```python\n# Use smaller embedding model\nstore = VectorStoreFactory.create(\n backend=\"qdrant\",\n collection=\"my_collection\",\n embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\" # 384-dim\n)\n```\n\n#### Performance Issues\n```python\n# Tune HNSW parameters\nstore = VectorStoreFactory.create(\n backend=\"qdrant\",\n collection=\"my_collection\",\n hnsw_config={\"m\": 16, \"ef_construct\": 100}\n)\n```\n\n## \ud83e\udd1d Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Add tests for new functionality\n4. Ensure all tests pass\n5. Submit a pull request\n\n## \ud83d\udcc4 License\n\nMIT License - see LICENSE file for details.\n\n## \ud83d\udcc1 Directory Structure\n\n```\nthoth_vdbmanager/\n\u251c\u2500\u2500 vdbmanager/\n\u2502 \u251c\u2500\u2500 core/ # Base interfaces and document types\n\u2502 \u2502 \u251c\u2500\u2500 base.py # Core document classes and interfaces\n\u2502 \u2502 \u2514\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 adapters/ # Backend-specific implementations\n\u2502 \u2502 \u251c\u2500\u2500 haystack_adapter.py # Base Haystack adapter\n\u2502 \u2502 \u251c\u2500\u2500 qdrant_adapter.py # Qdrant implementation\n\u2502 \u2502 \u251c\u2500\u2500 chroma_adapter.py # Chroma implementation\n\u2502 \u2502 \u251c\u2500\u2500 pgvector_adapter.py # PostgreSQL pgvector\n\u2502 \u2502 \u2514\u2500\u2500 milvus_adapter.py # Milvus implementation\n\u2502 \u251c\u2500\u2500 factory.py # Unified creation interface\n\u2502 \u2514\u2500\u2500 __init__.py # Public API exports\n\u251c\u2500\u2500 test_e2e_vectordb/ # End-to-end tests\n\u251c\u2500\u2500 pyproject.toml # Project configuration\n\u2514\u2500\u2500 README.md # This file\n```\n\n## \ud83d\ude80 Quick API Reference\n\n### Main API\n\n```python\nfrom thoth_vdbmanager import VectorStoreFactory, ColumnNameDocument\n\n# Create any backend\nstore = VectorStoreFactory.create(\n backend=\"qdrant\",\n collection=\"my_docs\",\n host=\"localhost\",\n port=6333\n)\n\n# Use the methods\ndoc_id = store.add_column_description(column_doc)\nresults = store.search_similar(\"user email\", \"column_name\")\n```\n\n---\n\n**\ud83c\udf89 Ready to use with Haystack v2 and 4 production-ready vector databases!**\n",
"bugtrack_url": null,
"license": null,
"summary": "A vector database management module for ThothAI Project",
"version": "0.7.2",
"project_urls": {
"Bug Tracker": "https://github.com/mptyl/thoth_vdb2/issues",
"Documentation": "https://github.com/mptyl/thoth_vdb2#readme",
"Homepage": "https://github.com/mptyl/thoth_vdb2",
"Source Code": "https://github.com/mptyl/thoth_vdb2"
},
"split_keywords": [
"ai",
" embeddings",
" machine-learning",
" similarity-search",
" vector-database"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "94b79a57bed1004f5719ccf17cc54c11ea8e23f31d2034fcbcaa7878e49f7135",
"md5": "6ffbc81fd5049a53501d533fc7046d2c",
"sha256": "3c4d6e3be6483d85ff6263230f6c00dbb99b63572370707db105bd9005f7895f"
},
"downloads": -1,
"filename": "thoth_vdbmanager-0.7.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6ffbc81fd5049a53501d533fc7046d2c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.13",
"size": 44469,
"upload_time": "2025-08-12T09:19:52",
"upload_time_iso_8601": "2025-08-12T09:19:52.904111Z",
"url": "https://files.pythonhosted.org/packages/94/b7/9a57bed1004f5719ccf17cc54c11ea8e23f31d2034fcbcaa7878e49f7135/thoth_vdbmanager-0.7.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "faf1f7a167cf110e48774e0fbf4e4e6668e7d3a914f9ef2cf8151516a1afec0d",
"md5": "16826249ae3b995b0f889674c0e24f7e",
"sha256": "72a0a69e9be296edc57422a5cf0353d57077b11bb894b9ef080120ffee150ca7"
},
"downloads": -1,
"filename": "thoth_vdbmanager-0.7.2.tar.gz",
"has_sig": false,
"md5_digest": "16826249ae3b995b0f889674c0e24f7e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.13",
"size": 32248,
"upload_time": "2025-08-12T09:19:55",
"upload_time_iso_8601": "2025-08-12T09:19:55.144999Z",
"url": "https://files.pythonhosted.org/packages/fa/f1/f7a167cf110e48774e0fbf4e4e6668e7d3a914f9ef2cf8151516a1afec0d/thoth_vdbmanager-0.7.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-12 09:19:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mptyl",
"github_project": "thoth_vdb2",
"github_not_found": true,
"lcname": "thoth-vdbmanager"
}