thoth-vdbmanager

Name	thoth-vdbmanager JSON
Version	0.7.2 JSON
	download
home_page	None
Summary	A vector database management module for ThothAI Project
upload_time	2025-08-12 09:19:55
maintainer	None
docs_url	None
author	None
requires_python	>=3.13
license	None
keywords	ai embeddings machine-learning similarity-search vector-database
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Thoth Vector Database Manager v0.6.2

A high-performance, Haystack v2-based vector database manager with **external embedding providers** and centralized embedding management for 4 production-ready backends.

## 🤖 MCP Server Support

This project is configured with MCP (Model Context Protocol) servers for enhanced AI-assisted development:
- **Context7**: Enhanced context management
- **Serena**: IDE assistance and development support

See [docs/MCP_SETUP.md](docs/MCP_SETUP.md) for details.

## 🚀 Features

### 🌐 **NEW in v0.6.0: External Embedding Providers**
- **OpenAI, Cohere, Mistral**: Support for major external embedding APIs
- **Cost-Effective**: Pay-per-use model with intelligent caching
- **High-Quality Embeddings**: State-of-the-art embedding models
- **Unified Management**: Centralized `ExternalEmbeddingManager`

### 🏗️ **Core Features**
- **Multi-backend support**: Qdrant, Chroma, PostgreSQL pgvector, Milvus
- **Haystack v2 integration**: Uses haystack-ai v2.12.0+ as an abstraction layer
- **Centralized embeddings**: No more client-side embedding management
- **Memory optimization**: Intelligent caching and lazy loading
- **API compatibility**: Backward compatible with existing APIs
- **Type safety**: Full type hints and Pydantic validation
- **Production-ready**: Comprehensive testing and robust error handling

## 📦 Installation

### 🚀 **Recommended: uv Package Manager**

This project uses [uv](https://docs.astral.sh/uv/) for fast, reliable Python package management. Install uv first:

```bash
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
```

### ✅ **No Dependency Conflicts**

Version 0.4.0 resolves all dependency conflicts! All 4 supported databases can now be installed together:

#### All Databases (Recommended)

```bash
# Install all supported backends (Qdrant, Chroma, PgVector, Milvus)
uv add thoth-vdbmanager[all]
```

#### Individual Backends

```bash
# Individual backend installation
uv add thoth-vdbmanager[qdrant]    # Qdrant support
uv add thoth-vdbmanager[chroma]    # Chroma support
uv add thoth-vdbmanager[pgvector]  # PostgreSQL pgvector support
uv add thoth-vdbmanager[milvus]    # Milvus support
```

#### Development Installation

```bash
# For development with all backends and testing tools
uv add thoth-vdbmanager[all,test,dev]
```

### 🔄 **pip Installation (Also Supported)**

If you prefer pip, all commands work by replacing `uv add` with `pip install`:

```bash
# Example with pip
pip install thoth-vdbmanager[all]
```

### 🔄 **Breaking Changes in v0.4.0**

- **Removed**: Weaviate and Pinecone support (no longer maintained)
- **Updated**: Now requires haystack-ai v2.12.0+ (not compatible with legacy haystack)
- **Improved**: All remaining databases work together without conflicts

## 🏗️ Architecture

The library is built on a clean architecture with:

- **Core**: Base interfaces and document types
- **Adapters**: Backend-specific implementations using Haystack
- **Factory**: Unified creation interface
- **Compatibility**: Legacy API support

## 🌐 External Embedding Providers (NEW in v0.6.0)

### Setup External Embeddings

Configure your external embedding provider using environment variables:

```bash
# OpenAI (recommended)
export EMBEDDING_PROVIDER=openai
export EMBEDDING_API_KEY=sk-your-openai-key
export EMBEDDING_MODEL=text-embedding-3-small

# Cohere
export EMBEDDING_PROVIDER=cohere  
export EMBEDDING_API_KEY=your-cohere-key
export EMBEDDING_MODEL=embed-multilingual-v3.0

# Mistral
export EMBEDDING_PROVIDER=mistral
export EMBEDDING_API_KEY=your-mistral-key
export EMBEDDING_MODEL=mistral-embed
```

### Using External Embeddings

```python
import os
from thoth_vdbmanager import ExternalVectorStoreFactory, ColumnNameDocument

# Create store with external embeddings
store = ExternalVectorStoreFactory.create_from_env(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333
)

# Add document - embeddings generated via API
doc = ColumnNameDocument(
    table_name="users",
    column_name="email",
    column_description="User email address",
    value_description="Valid email format"
)
store.add_column_description(doc)

# Search - query embeddings generated via API
results = store.search_similar(
    query="user email address",
    doc_type="column_name", 
    top_k=5
)
```

### Available External Providers

| Provider | Models | Dimensions | Features |
|----------|--------|------------|----------|
| **OpenAI** | text-embedding-3-small, text-embedding-3-large | 1536, 3072 | High quality, multilingual |
| **Cohere** | embed-multilingual-v3.0, embed-english-v3.0 | 1024 | Optimized for search |
| **Mistral** | mistral-embed | 1024 | European provider |

### Cost Optimization with Caching

```python
# Enable intelligent caching to reduce API calls
embedding_config = {
    'provider': 'openai',
    'api_key': 'sk-your-key',
    'model': 'text-embedding-3-small',
    'enable_cache': True,    # Enable caching
    'cache_size': 10000      # Cache up to 10k embeddings
}

store = ExternalVectorStoreFactory.create(
    backend="qdrant",
    embedding_config=embedding_config,
    collection="cached_collection",
    host="localhost",
    port=6333
)
```

## 🚀 Quick Start

### External Embedding API (Recommended)

```python
import os
from thoth_vdbmanager import ExternalVectorStoreFactory, ColumnNameDocument, SqlDocument, EvidenceDocument

# Set up external embedding provider
os.environ['EMBEDDING_PROVIDER'] = 'openai'
os.environ['EMBEDDING_API_KEY'] = 'sk-your-openai-key'
os.environ['EMBEDDING_MODEL'] = 'text-embedding-3-small'

# Create a vector store with external embeddings
store = ExternalVectorStoreFactory.create_from_env(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333
)

# Add documents
column_doc = ColumnNameDocument(
    table_name="users",
    column_name="email",
    original_column_name="user_email",
    column_description="User email address",
    value_description="Valid email format"
)

doc_id = store.add_column_description(column_doc)

# Search documents using external API embeddings
results = store.search_similar(
    query="user email",
    doc_type="column_name",
    top_k=5
)
```

### Available Classes

```python
from thoth_vdbmanager import (
    VectorStoreFactory,      # Main factory for creating stores
    ColumnNameDocument,      # Column metadata documents
    SqlDocument,            # SQL example documents
    EvidenceDocument,       # Evidence/hint documents
    ThothType,              # Document type enumeration
    VectorStoreInterface    # Base interface for all stores
)
```

## 🔧 Configuration

### Qdrant

```python
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333,
    api_key="your-api-key",  # Optional
    embedding_dim=384,  # Optional
    hnsw_config={"m": 16, "ef_construct": 100}
)
```

### Chroma (Multiple Modes)

**Memory Mode (Recommended for Testing):**
```python
store = VectorStoreFactory.create(
    backend="chroma",
    collection="my_collection",
    mode="memory"  # Fast, isolated, no persistence
)
```

**Filesystem Mode:**
```python
store = VectorStoreFactory.create(
    backend="chroma",
    collection="my_collection",
    mode="filesystem",
    persist_path="./chroma_db"
)
```

**Server Mode (Production):**
```python
store = VectorStoreFactory.create(
    backend="chroma",
    collection="my_collection",
    mode="server",
    host="localhost",
    port=8000
)
```

> 📖 **See [Chroma Configuration Guide](docs/CHROMA_CONFIGURATION.md) for detailed setup instructions**

### PostgreSQL pgvector
```python
store = VectorStoreFactory.create(
    backend="pgvector",
    collection="my_table",
    connection_string="postgresql://user:pass@localhost:5432/dbname"
)
```

### Milvus (Multiple Modes)

**Lite Mode (Recommended for Testing):**
```python
store = VectorStoreFactory.create(
    backend="milvus",
    collection="my_collection",
    mode="lite",
    connection_uri="./milvus.db"  # File-based storage
)
```

**Server Mode (Production):**
```python
store = VectorStoreFactory.create(
    backend="milvus",
    collection="my_collection",
    mode="server",
    host="localhost",
    port=19530
)
```

> 📖 **See [Milvus Configuration Guide](docs/MILVUS_CONFIGURATION.md) for detailed setup instructions**



## 📊 Performance Optimizations

### Memory Usage
- **Lazy initialization**: Embedders and connections are initialized on first use
- **Singleton pattern**: Same configuration reuses existing instances
- **Batch processing**: Efficient bulk operations

### Performance Tuning
```python
# Optimize for specific use cases
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="optimized",
    embedding_model="sentence-transformers/all-MiniLM-L6-v2",  # 384-dim, fast
    hnsw_config={"m": 32, "ef_construct": 200}  # Better search quality
)
```

## 🧪 Testing

```bash
# Run all tests
pytest

# Run specific backend tests
pytest tests/test_qdrant.py -v

# Run with coverage
pytest --cov=vdbmanager tests/
```

## 📈 Migration Guide

### From v0.3.x to v0.4.0

#### Breaking Changes
- **Removed databases**: Weaviate and Pinecone are no longer supported
- **Haystack version**: Now requires haystack-ai v2.12.0+ (not compatible with legacy haystack)
- **Dependencies**: All remaining databases can now be installed together without conflicts

#### Migration Steps

**1. Update installation:**
```bash
# Old installation (v0.3.x)
pip install thoth-vdbmanager[all-safe]  # Avoided conflicts

# New installation (v0.4.0)
pip install thoth-vdbmanager[all]  # No conflicts!
```

**2. Update code (if using removed databases):**
```python
# If you were using Weaviate - migrate to Qdrant or Chroma
# Old code (v0.3.x)
store = VectorStoreFactory.create(
    backend="weaviate",  # No longer supported
    collection="MyCollection",
    url="http://localhost:8080"
)

# New code (v0.4.0) - migrate to similar database
store = VectorStoreFactory.create(
    backend="qdrant",  # Recommended alternative
    collection="my_collection",
    host="localhost",
    port=6333
)
```

**3. Existing supported databases work unchanged:**
```python
# This code works exactly the same in v0.4.0
store = VectorStoreFactory.create(
    backend="qdrant",  # ✅ Still supported
    collection="my_docs",
    host="localhost",
    port=6333
)
```

## 🔍 API Reference

### Core Classes

#### VectorStoreFactory
```python
# Create store
store = VectorStoreFactory.create(backend, collection, **kwargs)

# From config
config = {"backend": "qdrant", "params": {...}}
store = VectorStoreFactory.from_config(config)

# List backends
backends = VectorStoreFactory.list_backends()
```

#### Document Types
- `ColumnNameDocument`: Column metadata
- `SqlDocument`: SQL examples
- `EvidenceDocument`: General evidence/hints

### Methods
- `add_column_description(doc)`: Add column metadata
- `add_sql(doc)`: Add SQL example
- `add_evidence(doc)`: Add evidence/hint
- `search_similar(query, doc_type, top_k=5, score_threshold=0.7)`: Semantic search
- `get_document(doc_id)`: Retrieve by ID
- `bulk_add_documents(docs)`: Batch insert
- `get_collection_info()`: Get stats

## 🐛 Troubleshooting

### Common Issues

#### Connection Errors
```python
# Check service availability
import requests
requests.get("http://localhost:6333")  # Qdrant
```

#### Memory Issues
```python
# Use smaller embedding model
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    embedding_model="sentence-transformers/all-MiniLM-L6-v2"  # 384-dim
)
```

#### Performance Issues
```python
# Tune HNSW parameters
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    hnsw_config={"m": 16, "ef_construct": 100}
)
```

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request

## 📄 License

MIT License - see LICENSE file for details.

## 📁 Directory Structure

```
thoth_vdbmanager/
├── vdbmanager/
│   ├── core/                    # Base interfaces and document types
│   │   ├── base.py             # Core document classes and interfaces
│   │   └── __init__.py
│   ├── adapters/               # Backend-specific implementations
│   │   ├── haystack_adapter.py # Base Haystack adapter
│   │   ├── qdrant_adapter.py   # Qdrant implementation
│   │   ├── chroma_adapter.py   # Chroma implementation
│   │   ├── pgvector_adapter.py # PostgreSQL pgvector
│   │   └── milvus_adapter.py   # Milvus implementation
│   ├── factory.py              # Unified creation interface
│   └── __init__.py            # Public API exports
├── test_e2e_vectordb/          # End-to-end tests
├── pyproject.toml              # Project configuration
└── README.md                   # This file
```

## 🚀 Quick API Reference

### Main API

```python
from thoth_vdbmanager import VectorStoreFactory, ColumnNameDocument

# Create any backend
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_docs",
    host="localhost",
    port=6333
)

# Use the methods
doc_id = store.add_column_description(column_doc)
results = store.search_similar("user email", "column_name")
```

---

**🎉 Ready to use with Haystack v2 and 4 production-ready vector databases!**

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "thoth-vdbmanager",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.13",
    "maintainer_email": null,
    "keywords": "ai, embeddings, machine-learning, similarity-search, vector-database",
    "author": null,
    "author_email": "Marco Pancotti <mp@tylconsulting.it>",
    "download_url": "https://files.pythonhosted.org/packages/fa/f1/f7a167cf110e48774e0fbf4e4e6668e7d3a914f9ef2cf8151516a1afec0d/thoth_vdbmanager-0.7.2.tar.gz",
    "platform": null,
    "description": "# Thoth Vector Database Manager v0.6.2\n\nA high-performance, Haystack v2-based vector database manager with **external embedding providers** and centralized embedding management for 4 production-ready backends.\n\n## \ud83e\udd16 MCP Server Support\n\nThis project is configured with MCP (Model Context Protocol) servers for enhanced AI-assisted development:\n- **Context7**: Enhanced context management\n- **Serena**: IDE assistance and development support\n\nSee [docs/MCP_SETUP.md](docs/MCP_SETUP.md) for details.\n\n## \ud83d\ude80 Features\n\n### \ud83c\udf10 **NEW in v0.6.0: External Embedding Providers**\n- **OpenAI, Cohere, Mistral**: Support for major external embedding APIs\n- **Cost-Effective**: Pay-per-use model with intelligent caching\n- **High-Quality Embeddings**: State-of-the-art embedding models\n- **Unified Management**: Centralized `ExternalEmbeddingManager`\n\n### \ud83c\udfd7\ufe0f **Core Features**\n- **Multi-backend support**: Qdrant, Chroma, PostgreSQL pgvector, Milvus\n- **Haystack v2 integration**: Uses haystack-ai v2.12.0+ as an abstraction layer\n- **Centralized embeddings**: No more client-side embedding management\n- **Memory optimization**: Intelligent caching and lazy loading\n- **API compatibility**: Backward compatible with existing APIs\n- **Type safety**: Full type hints and Pydantic validation\n- **Production-ready**: Comprehensive testing and robust error handling\n\n## \ud83d\udce6 Installation\n\n### \ud83d\ude80 **Recommended: uv Package Manager**\n\nThis project uses [uv](https://docs.astral.sh/uv/) for fast, reliable Python package management. Install uv first:\n\n```bash\n# Install uv (if not already installed)\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n```\n\n### \u2705 **No Dependency Conflicts**\n\nVersion 0.4.0 resolves all dependency conflicts! All 4 supported databases can now be installed together:\n\n#### All Databases (Recommended)\n\n```bash\n# Install all supported backends (Qdrant, Chroma, PgVector, Milvus)\nuv add thoth-vdbmanager[all]\n```\n\n#### Individual Backends\n\n```bash\n# Individual backend installation\nuv add thoth-vdbmanager[qdrant]    # Qdrant support\nuv add thoth-vdbmanager[chroma]    # Chroma support\nuv add thoth-vdbmanager[pgvector]  # PostgreSQL pgvector support\nuv add thoth-vdbmanager[milvus]    # Milvus support\n```\n\n#### Development Installation\n\n```bash\n# For development with all backends and testing tools\nuv add thoth-vdbmanager[all,test,dev]\n```\n\n### \ud83d\udd04 **pip Installation (Also Supported)**\n\nIf you prefer pip, all commands work by replacing `uv add` with `pip install`:\n\n```bash\n# Example with pip\npip install thoth-vdbmanager[all]\n```\n\n### \ud83d\udd04 **Breaking Changes in v0.4.0**\n\n- **Removed**: Weaviate and Pinecone support (no longer maintained)\n- **Updated**: Now requires haystack-ai v2.12.0+ (not compatible with legacy haystack)\n- **Improved**: All remaining databases work together without conflicts\n\n## \ud83c\udfd7\ufe0f Architecture\n\nThe library is built on a clean architecture with:\n\n- **Core**: Base interfaces and document types\n- **Adapters**: Backend-specific implementations using Haystack\n- **Factory**: Unified creation interface\n- **Compatibility**: Legacy API support\n\n## \ud83c\udf10 External Embedding Providers (NEW in v0.6.0)\n\n### Setup External Embeddings\n\nConfigure your external embedding provider using environment variables:\n\n```bash\n# OpenAI (recommended)\nexport EMBEDDING_PROVIDER=openai\nexport EMBEDDING_API_KEY=sk-your-openai-key\nexport EMBEDDING_MODEL=text-embedding-3-small\n\n# Cohere\nexport EMBEDDING_PROVIDER=cohere  \nexport EMBEDDING_API_KEY=your-cohere-key\nexport EMBEDDING_MODEL=embed-multilingual-v3.0\n\n# Mistral\nexport EMBEDDING_PROVIDER=mistral\nexport EMBEDDING_API_KEY=your-mistral-key\nexport EMBEDDING_MODEL=mistral-embed\n```\n\n### Using External Embeddings\n\n```python\nimport os\nfrom thoth_vdbmanager import ExternalVectorStoreFactory, ColumnNameDocument\n\n# Create store with external embeddings\nstore = ExternalVectorStoreFactory.create_from_env(\n    backend=\"qdrant\",\n    collection=\"my_collection\",\n    host=\"localhost\",\n    port=6333\n)\n\n# Add document - embeddings generated via API\ndoc = ColumnNameDocument(\n    table_name=\"users\",\n    column_name=\"email\",\n    column_description=\"User email address\",\n    value_description=\"Valid email format\"\n)\nstore.add_column_description(doc)\n\n# Search - query embeddings generated via API\nresults = store.search_similar(\n    query=\"user email address\",\n    doc_type=\"column_name\", \n    top_k=5\n)\n```\n\n### Available External Providers\n\n| Provider | Models | Dimensions | Features |\n|----------|--------|------------|----------|\n| **OpenAI** | text-embedding-3-small, text-embedding-3-large | 1536, 3072 | High quality, multilingual |\n| **Cohere** | embed-multilingual-v3.0, embed-english-v3.0 | 1024 | Optimized for search |\n| **Mistral** | mistral-embed | 1024 | European provider |\n\n### Cost Optimization with Caching\n\n```python\n# Enable intelligent caching to reduce API calls\nembedding_config = {\n    'provider': 'openai',\n    'api_key': 'sk-your-key',\n    'model': 'text-embedding-3-small',\n    'enable_cache': True,    # Enable caching\n    'cache_size': 10000      # Cache up to 10k embeddings\n}\n\nstore = ExternalVectorStoreFactory.create(\n    backend=\"qdrant\",\n    embedding_config=embedding_config,\n    collection=\"cached_collection\",\n    host=\"localhost\",\n    port=6333\n)\n```\n\n## \ud83d\ude80 Quick Start\n\n### External Embedding API (Recommended)\n\n```python\nimport os\nfrom thoth_vdbmanager import ExternalVectorStoreFactory, ColumnNameDocument, SqlDocument, EvidenceDocument\n\n# Set up external embedding provider\nos.environ['EMBEDDING_PROVIDER'] = 'openai'\nos.environ['EMBEDDING_API_KEY'] = 'sk-your-openai-key'\nos.environ['EMBEDDING_MODEL'] = 'text-embedding-3-small'\n\n# Create a vector store with external embeddings\nstore = ExternalVectorStoreFactory.create_from_env(\n    backend=\"qdrant\",\n    collection=\"my_collection\",\n    host=\"localhost\",\n    port=6333\n)\n\n# Add documents\ncolumn_doc = ColumnNameDocument(\n    table_name=\"users\",\n    column_name=\"email\",\n    original_column_name=\"user_email\",\n    column_description=\"User email address\",\n    value_description=\"Valid email format\"\n)\n\ndoc_id = store.add_column_description(column_doc)\n\n# Search documents using external API embeddings\nresults = store.search_similar(\n    query=\"user email\",\n    doc_type=\"column_name\",\n    top_k=5\n)\n```\n\n### Available Classes\n\n```python\nfrom thoth_vdbmanager import (\n    VectorStoreFactory,      # Main factory for creating stores\n    ColumnNameDocument,      # Column metadata documents\n    SqlDocument,            # SQL example documents\n    EvidenceDocument,       # Evidence/hint documents\n    ThothType,              # Document type enumeration\n    VectorStoreInterface    # Base interface for all stores\n)\n```\n\n## \ud83d\udd27 Configuration\n\n### Qdrant\n\n```python\nstore = VectorStoreFactory.create(\n    backend=\"qdrant\",\n    collection=\"my_collection\",\n    host=\"localhost\",\n    port=6333,\n    api_key=\"your-api-key\",  # Optional\n    embedding_dim=384,  # Optional\n    hnsw_config={\"m\": 16, \"ef_construct\": 100}\n)\n```\n\n### Chroma (Multiple Modes)\n\n**Memory Mode (Recommended for Testing):**\n```python\nstore = VectorStoreFactory.create(\n    backend=\"chroma\",\n    collection=\"my_collection\",\n    mode=\"memory\"  # Fast, isolated, no persistence\n)\n```\n\n**Filesystem Mode:**\n```python\nstore = VectorStoreFactory.create(\n    backend=\"chroma\",\n    collection=\"my_collection\",\n    mode=\"filesystem\",\n    persist_path=\"./chroma_db\"\n)\n```\n\n**Server Mode (Production):**\n```python\nstore = VectorStoreFactory.create(\n    backend=\"chroma\",\n    collection=\"my_collection\",\n    mode=\"server\",\n    host=\"localhost\",\n    port=8000\n)\n```\n\n> \ud83d\udcd6 **See [Chroma Configuration Guide](docs/CHROMA_CONFIGURATION.md) for detailed setup instructions**\n\n### PostgreSQL pgvector\n```python\nstore = VectorStoreFactory.create(\n    backend=\"pgvector\",\n    collection=\"my_table\",\n    connection_string=\"postgresql://user:pass@localhost:5432/dbname\"\n)\n```\n\n### Milvus (Multiple Modes)\n\n**Lite Mode (Recommended for Testing):**\n```python\nstore = VectorStoreFactory.create(\n    backend=\"milvus\",\n    collection=\"my_collection\",\n    mode=\"lite\",\n    connection_uri=\"./milvus.db\"  # File-based storage\n)\n```\n\n**Server Mode (Production):**\n```python\nstore = VectorStoreFactory.create(\n    backend=\"milvus\",\n    collection=\"my_collection\",\n    mode=\"server\",\n    host=\"localhost\",\n    port=19530\n)\n```\n\n> \ud83d\udcd6 **See [Milvus Configuration Guide](docs/MILVUS_CONFIGURATION.md) for detailed setup instructions**\n\n\n\n## \ud83d\udcca Performance Optimizations\n\n### Memory Usage\n- **Lazy initialization**: Embedders and connections are initialized on first use\n- **Singleton pattern**: Same configuration reuses existing instances\n- **Batch processing**: Efficient bulk operations\n\n### Performance Tuning\n```python\n# Optimize for specific use cases\nstore = VectorStoreFactory.create(\n    backend=\"qdrant\",\n    collection=\"optimized\",\n    embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\",  # 384-dim, fast\n    hnsw_config={\"m\": 32, \"ef_construct\": 200}  # Better search quality\n)\n```\n\n## \ud83e\uddea Testing\n\n```bash\n# Run all tests\npytest\n\n# Run specific backend tests\npytest tests/test_qdrant.py -v\n\n# Run with coverage\npytest --cov=vdbmanager tests/\n```\n\n## \ud83d\udcc8 Migration Guide\n\n### From v0.3.x to v0.4.0\n\n#### Breaking Changes\n- **Removed databases**: Weaviate and Pinecone are no longer supported\n- **Haystack version**: Now requires haystack-ai v2.12.0+ (not compatible with legacy haystack)\n- **Dependencies**: All remaining databases can now be installed together without conflicts\n\n#### Migration Steps\n\n**1. Update installation:**\n```bash\n# Old installation (v0.3.x)\npip install thoth-vdbmanager[all-safe]  # Avoided conflicts\n\n# New installation (v0.4.0)\npip install thoth-vdbmanager[all]  # No conflicts!\n```\n\n**2. Update code (if using removed databases):**\n```python\n# If you were using Weaviate - migrate to Qdrant or Chroma\n# Old code (v0.3.x)\nstore = VectorStoreFactory.create(\n    backend=\"weaviate\",  # No longer supported\n    collection=\"MyCollection\",\n    url=\"http://localhost:8080\"\n)\n\n# New code (v0.4.0) - migrate to similar database\nstore = VectorStoreFactory.create(\n    backend=\"qdrant\",  # Recommended alternative\n    collection=\"my_collection\",\n    host=\"localhost\",\n    port=6333\n)\n```\n\n**3. Existing supported databases work unchanged:**\n```python\n# This code works exactly the same in v0.4.0\nstore = VectorStoreFactory.create(\n    backend=\"qdrant\",  # \u2705 Still supported\n    collection=\"my_docs\",\n    host=\"localhost\",\n    port=6333\n)\n```\n\n## \ud83d\udd0d API Reference\n\n### Core Classes\n\n#### VectorStoreFactory\n```python\n# Create store\nstore = VectorStoreFactory.create(backend, collection, **kwargs)\n\n# From config\nconfig = {\"backend\": \"qdrant\", \"params\": {...}}\nstore = VectorStoreFactory.from_config(config)\n\n# List backends\nbackends = VectorStoreFactory.list_backends()\n```\n\n#### Document Types\n- `ColumnNameDocument`: Column metadata\n- `SqlDocument`: SQL examples\n- `EvidenceDocument`: General evidence/hints\n\n### Methods\n- `add_column_description(doc)`: Add column metadata\n- `add_sql(doc)`: Add SQL example\n- `add_evidence(doc)`: Add evidence/hint\n- `search_similar(query, doc_type, top_k=5, score_threshold=0.7)`: Semantic search\n- `get_document(doc_id)`: Retrieve by ID\n- `bulk_add_documents(docs)`: Batch insert\n- `get_collection_info()`: Get stats\n\n## \ud83d\udc1b Troubleshooting\n\n### Common Issues\n\n#### Connection Errors\n```python\n# Check service availability\nimport requests\nrequests.get(\"http://localhost:6333\")  # Qdrant\n```\n\n#### Memory Issues\n```python\n# Use smaller embedding model\nstore = VectorStoreFactory.create(\n    backend=\"qdrant\",\n    collection=\"my_collection\",\n    embedding_model=\"sentence-transformers/all-MiniLM-L6-v2\"  # 384-dim\n)\n```\n\n#### Performance Issues\n```python\n# Tune HNSW parameters\nstore = VectorStoreFactory.create(\n    backend=\"qdrant\",\n    collection=\"my_collection\",\n    hnsw_config={\"m\": 16, \"ef_construct\": 100}\n)\n```\n\n## \ud83e\udd1d Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Add tests for new functionality\n4. Ensure all tests pass\n5. Submit a pull request\n\n## \ud83d\udcc4 License\n\nMIT License - see LICENSE file for details.\n\n## \ud83d\udcc1 Directory Structure\n\n```\nthoth_vdbmanager/\n\u251c\u2500\u2500 vdbmanager/\n\u2502   \u251c\u2500\u2500 core/                    # Base interfaces and document types\n\u2502   \u2502   \u251c\u2500\u2500 base.py             # Core document classes and interfaces\n\u2502   \u2502   \u2514\u2500\u2500 __init__.py\n\u2502   \u251c\u2500\u2500 adapters/               # Backend-specific implementations\n\u2502   \u2502   \u251c\u2500\u2500 haystack_adapter.py # Base Haystack adapter\n\u2502   \u2502   \u251c\u2500\u2500 qdrant_adapter.py   # Qdrant implementation\n\u2502   \u2502   \u251c\u2500\u2500 chroma_adapter.py   # Chroma implementation\n\u2502   \u2502   \u251c\u2500\u2500 pgvector_adapter.py # PostgreSQL pgvector\n\u2502   \u2502   \u2514\u2500\u2500 milvus_adapter.py   # Milvus implementation\n\u2502   \u251c\u2500\u2500 factory.py              # Unified creation interface\n\u2502   \u2514\u2500\u2500 __init__.py            # Public API exports\n\u251c\u2500\u2500 test_e2e_vectordb/          # End-to-end tests\n\u251c\u2500\u2500 pyproject.toml              # Project configuration\n\u2514\u2500\u2500 README.md                   # This file\n```\n\n## \ud83d\ude80 Quick API Reference\n\n### Main API\n\n```python\nfrom thoth_vdbmanager import VectorStoreFactory, ColumnNameDocument\n\n# Create any backend\nstore = VectorStoreFactory.create(\n    backend=\"qdrant\",\n    collection=\"my_docs\",\n    host=\"localhost\",\n    port=6333\n)\n\n# Use the methods\ndoc_id = store.add_column_description(column_doc)\nresults = store.search_similar(\"user email\", \"column_name\")\n```\n\n---\n\n**\ud83c\udf89 Ready to use with Haystack v2 and 4 production-ready vector databases!**\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A vector database management module for ThothAI Project",
    "version": "0.7.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/mptyl/thoth_vdb2/issues",
        "Documentation": "https://github.com/mptyl/thoth_vdb2#readme",
        "Homepage": "https://github.com/mptyl/thoth_vdb2",
        "Source Code": "https://github.com/mptyl/thoth_vdb2"
    },
    "split_keywords": [
        "ai",
        " embeddings",
        " machine-learning",
        " similarity-search",
        " vector-database"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "94b79a57bed1004f5719ccf17cc54c11ea8e23f31d2034fcbcaa7878e49f7135",
                "md5": "6ffbc81fd5049a53501d533fc7046d2c",
                "sha256": "3c4d6e3be6483d85ff6263230f6c00dbb99b63572370707db105bd9005f7895f"
            },
            "downloads": -1,
            "filename": "thoth_vdbmanager-0.7.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6ffbc81fd5049a53501d533fc7046d2c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.13",
            "size": 44469,
            "upload_time": "2025-08-12T09:19:52",
            "upload_time_iso_8601": "2025-08-12T09:19:52.904111Z",
            "url": "https://files.pythonhosted.org/packages/94/b7/9a57bed1004f5719ccf17cc54c11ea8e23f31d2034fcbcaa7878e49f7135/thoth_vdbmanager-0.7.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "faf1f7a167cf110e48774e0fbf4e4e6668e7d3a914f9ef2cf8151516a1afec0d",
                "md5": "16826249ae3b995b0f889674c0e24f7e",
                "sha256": "72a0a69e9be296edc57422a5cf0353d57077b11bb894b9ef080120ffee150ca7"
            },
            "downloads": -1,
            "filename": "thoth_vdbmanager-0.7.2.tar.gz",
            "has_sig": false,
            "md5_digest": "16826249ae3b995b0f889674c0e24f7e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.13",
            "size": 32248,
            "upload_time": "2025-08-12T09:19:55",
            "upload_time_iso_8601": "2025-08-12T09:19:55.144999Z",
            "url": "https://files.pythonhosted.org/packages/fa/f1/f7a167cf110e48774e0fbf4e4e6668e7d3a914f9ef2cf8151516a1afec0d/thoth_vdbmanager-0.7.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-12 09:19:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mptyl",
    "github_project": "thoth_vdb2",
    "github_not_found": true,
    "lcname": "thoth-vdbmanager"
}

None