# NusterDB - High-Performance Vector Database with Enterprise Security
[](https://badge.fury.io/py/nusterdb)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
## The Complete Vector Database Solution
NusterDB is a high-performance vector database designed for production workloads with enterprise-grade security, persistence, and comprehensive features. Built for AI/ML applications requiring fast similarity search with reliability and security.
### Core Features
| Feature | Description |
|---------|-------------|
| **Advanced Algorithms** | Multiple search algorithms: IVF, PQ, LSH, SQ, Flat, HNSW |
| **Enterprise Security** | FIPS 140-2 compliance, quantum-resistant encryption |
| **Production APIs** | Complete REST APIs and Python SDK |
| **Data Persistence** | Built-in durable storage with transaction support |
| **Full CRUD Operations** | Complete database operations: Create, Read, Update, Delete |
| **Multiple Storage Modes** | Memory, Persistent, Cache, and API modes |
## Quick Start
### Installation
```bash
pip install nusterdb
```
### Simple Usage
```python
import nusterdb
# Create database - choose your mode
db = nusterdb.NusterDB(mode="memory", dimension=128)
# Add vectors
db.add(1, [0.1, 0.2, 0.3, ...]) # Single vector
db.bulk_add([1,2,3], vectors, metadata) # Multiple vectors
# Search
results = db.search([0.1, 0.2, 0.3, ...], k=5)
for result in results:
print(f"ID: {result['id']}, Distance: {result['distance']}")
```
## Complete API Reference
### Core Classes
#### `NusterDB`
The main database class supporting all storage modes and algorithms.
```python
class NusterDB:
"""
Unified NusterDB interface supporting all storage modes and algorithms.
"""
def __init__(
self,
mode: Union[str, StorageMode] = "memory",
dimension: Optional[int] = None,
path: Optional[str] = None,
url: Optional[str] = None,
algorithm: Union[str, Algorithm] = "flat",
security_level: Union[str, SecurityLevel] = "none",
distance_metric: Union[str, DistanceMetric] = "l2",
use_simd: bool = True,
use_gpu: bool = True,
parallel_processing: bool = True,
cache_size: str = "512MB",
compression: bool = False,
**kwargs
):
```
**Parameters:**
- `mode`: Storage mode ("memory", "persistent", "cache", "api")
- `dimension`: Vector dimension (required for new databases)
- `path`: Path for persistent storage
- `url`: Server URL for API mode
- `algorithm`: Indexing algorithm ("flat", "ivf", "pq", "lsh", "sq", "hnsw")
- `security_level`: Security level ("none", "basic", "enterprise", "government")
- `distance_metric`: Distance metric ("l2", "cosine", "inner_product", "l1")
- `use_simd`: Enable SIMD optimizations
- `use_gpu`: Enable GPU acceleration
- `parallel_processing`: Enable parallel processing
- `cache_size`: Cache size (e.g., "1GB", "512MB")
- `compression`: Enable compression
#### Core Database Operations
##### `add(id, vector, metadata=None)`
Add a single vector to the database.
```python
def add(
self,
id: Union[int, str],
vector: Union[List[float], np.ndarray],
metadata: Optional[Dict[str, Any]] = None
) -> bool:
```
**Parameters:**
- `id`: Unique identifier
- `vector`: Vector data (list or numpy array)
- `metadata`: Optional metadata dictionary
**Returns:** `bool` - Success status
**Example:**
```python
# Add vector with metadata
success = db.add(1, [0.1, 0.2, 0.3], {"category": "document", "type": "text"})
# Add numpy vector
import numpy as np
vector = np.random.random(128)
db.add("doc_001", vector, {"source": "research_paper"})
```
##### `bulk_add(ids, vectors, metadata=None)`
Add multiple vectors efficiently.
```python
def bulk_add(
self,
ids: List[Union[int, str]],
vectors: Union[List[List[float]], np.ndarray],
metadata: Optional[List[Dict[str, Any]]] = None
) -> int:
```
**Parameters:**
- `ids`: List of unique identifiers
- `vectors`: List of vectors or 2D numpy array
- `metadata`: Optional list of metadata dictionaries
**Returns:** `int` - Number of vectors successfully added
**Example:**
```python
# Bulk add with metadata
ids = [1, 2, 3, 4, 5]
vectors = [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6], [0.7, 0.8], [0.9, 1.0]]
metdata = [
{"category": "A", "score": 0.95},
{"category": "B", "score": 0.87},
{"category": "A", "score": 0.92},
{"category": "C", "score": 0.78},
{"category": "B", "score": 0.89}
]
added_count = db.bulk_add(ids, vectors, metadata)
print(f"Added {added_count} vectors")
# Bulk add numpy arrays
import numpy as np
vectors = np.random.random((1000, 128))
ids = [f"vec_{i}" for i in range(1000)]
db.bulk_add(ids, vectors)
```
##### `search(query, k=10, filters=None, include_metadata=True, include_distances=True)`
Search for similar vectors.
```python
def search(
self,
query: Union[List[float], np.ndarray],
k: int = 10,
filters: Optional[Dict[str, Any]] = None,
include_metadata: bool = True,
include_distances: bool = True
) -> List[Dict[str, Any]]:
```
**Parameters:**
- `query`: Query vector
- `k`: Number of results to return
- `filters`: Optional metadata filters
- `include_metadata`: Include metadata in results
- `include_distances`: Include distances in results
**Returns:** `List[Dict[str, Any]]` - List of search results
**Example:**
```python
# Basic search
results = db.search([0.1, 0.2, 0.3], k=5)
for result in results:
print(f"ID: {result['id']}, Distance: {result['distance']}")
# Search with filters
results = db.search(
query=[0.1, 0.2, 0.3],
k=10,
filters={"category": "document"},
include_metadata=True,
include_distances=True
)
# Process results
for result in results:
print(f"ID: {result['id']}")
print(f"Distance: {result['distance']:.4f}")
print(f"Metadata: {result['metadata']}")
```
##### `update(id, vector=None, metadata=None)`
Update an existing vector.
```python
def update(
self,
id: Union[int, str],
vector: Optional[Union[List[float], np.ndarray]] = None,
metadata: Optional[Dict[str, Any]] = None
) -> bool:
```
**Parameters:**
- `id`: Vector ID to update
- `vector`: New vector data (optional)
- `metadata`: New metadata (optional)
**Returns:** `bool` - Success status
**Example:**
```python
# Update vector only
db.update(1, [0.2, 0.3, 0.4])
# Update metadata only
db.update(1, metadata={"category": "updated", "version": 2})
# Update both
db.update(1, [0.2, 0.3, 0.4], {"category": "updated", "version": 2})
```
##### `delete(id)`
Delete a vector by ID.
```python
def delete(self, id: Union[int, str]) -> bool:
```
**Parameters:**
- `id`: Vector ID to delete
**Returns:** `bool` - Success status
**Example:**
```python
# Delete by ID
success = db.delete(1)
if success:
print("Vector deleted successfully")
# Delete multiple vectors
ids_to_delete = [1, 2, 3, 4, 5]
for vec_id in ids_to_delete:
db.delete(vec_id)
```
##### `get(id, include_metadata=True)`
Get a vector by ID.
```python
def get(
self,
id: Union[int, str],
include_metadata: bool = True
) -> Optional[Dict[str, Any]]:
```
**Parameters:**
- `id`: Vector ID
- `include_metadata`: Include metadata in result
**Returns:** `Optional[Dict[str, Any]]` - Vector data or None if not found
**Example:**
```python
# Get vector with metadata
vector_data = db.get(1)
if vector_data:
print(f"Vector: {vector_data['vector']}")
print(f"Metadata: {vector_data['metadata']}")
# Get vector without metadata
vector_data = db.get(1, include_metadata=False)
```
##### `batch_search(queries, k=10, filters=None)`
Search with multiple queries efficiently.
```python
def batch_search(
self,
queries: Union[List[List[float]], np.ndarray],
k: int = 10,
filters: Optional[List[Dict[str, Any]]] = None
) -> List[List[Dict[str, Any]]]:
```
**Parameters:**
- `queries`: List of query vectors
- `k`: Number of results per query
- `filters`: Optional filters per query
**Returns:** `List[List[Dict[str, Any]]]` - List of result lists
**Example:**
```python
# Batch search multiple queries
queries = [
[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6],
[0.7, 0.8, 0.9]
]
all_results = db.batch_search(queries, k=5)
for i, results in enumerate(all_results):
print(f"Query {i} results:")
for result in results:
print(f" ID: {result['id']}, Distance: {result['distance']}")
```
#### Management Operations
##### `train(training_vectors=None)`
Train the index (required for some algorithms).
```python
def train(self, training_vectors: Optional[Union[List[List[float]], np.ndarray]] = None) -> bool:
```
**Parameters:**
- `training_vectors`: Training data (optional, uses existing data if None)
**Returns:** `bool` - Success status
**Example:**
```python
# Train with existing data
db.train()
# Train with specific training data
training_data = np.random.random((10000, 128))
db.train(training_data)
```
##### `optimize()`
Optimize the index for better performance.
```python
def optimize(self) -> bool:
```
**Returns:** `bool` - Success status
**Example:**
```python
# Optimize after bulk inserts
db.bulk_add(ids, vectors)
db.optimize() # Rebuild index for better performance
```
##### `save(path=None)`
Save the database to disk.
```python
def save(self, path: Optional[str] = None) -> bool:
```
**Parameters:**
- `path`: Save path (uses default if None)
**Returns:** `bool` - Success status
##### `load(path)`
Load database from disk.
```python
def load(self, path: str) -> bool:
```
**Parameters:**
- `path`: Load path
**Returns:** `bool` - Success status
##### `clear()`
Clear all vectors from the database.
```python
def clear(self) -> bool:
```
**Returns:** `bool` - Success status
#### Information and Statistics
##### `count()` / `size()`
Get total number of vectors.
```python
def count(self) -> int:
def size(self) -> int: # Alias for count()
```
**Returns:** `int` - Number of vectors
**Example:**
```python
total_vectors = db.count()
print(f"Database contains {total_vectors} vectors")
```
##### `stats()`
Get database statistics.
```python
def stats(self) -> Dict[str, Any]:
```
**Returns:** `Dict[str, Any]` - Statistics dictionary
**Example:**
```python
stats = db.stats()
print(f"Vector count: {stats['vector_count']}")
print(f"Average query time: {stats['avg_query_time']:.3f}ms")
print(f"Algorithm: {stats['algorithm']}")
print(f"Security level: {stats['security_level']}")
```
##### `info()`
Get database information.
```python
def info(self) -> Dict[str, Any]:
```
**Returns:** `Dict[str, Any]` - Information dictionary
##### `health_check()`
Perform health check.
```python
def health_check(self) -> Dict[str, Any]:
```
**Returns:** `Dict[str, Any]` - Health status
**Example:**
```python
health = db.health_check()
if health['healthy']:
print("Database is healthy")
print(f"Vector count: {health['vector_count']}")
else:
print(f"Database issue: {health.get('error')}")
```
#### Context Manager Support
```python
# Automatic resource management
with nusterdb.NusterDB(mode="persistent", path="./vectors") as db:
db.add(1, [0.1, 0.2, 0.3])
results = db.search([0.1, 0.2, 0.3])
# Database automatically saved and closed
```
#### Iterator Support
```python
# Iterate over all vectors (if supported by backend)
for vector_data in db:
print(f"ID: {vector_data['id']}")
print(f"Vector: {vector_data['vector']}")
# Check if ID exists
if 1 in db:
print("Vector with ID 1 exists")
# Get length
print(f"Database has {len(db)} vectors")
```
### Configuration Classes
#### `NusterConfig`
Complete configuration for all NusterDB aspects.
```python
@dataclass
class NusterConfig:
# Core settings
algorithm: Algorithm = Algorithm.FLAT
security_level: SecurityLevel = SecurityLevel.NONE
distance_metric: DistanceMetric = DistanceMetric.L2
# Performance settings
use_simd: bool = True
use_gpu: bool = True
parallel_processing: bool = True
cache_size: str = "512MB"
compression: bool = False
```
**Methods:**
- `to_dict()` - Convert to dictionary
- `to_json()` - Convert to JSON string
- `from_dict(config_dict)` - Create from dictionary
- `from_json(json_str)` - Create from JSON
- `update(**kwargs)` - Create updated configuration
- `optimize_for_speed()` - Speed-optimized configuration
- `optimize_for_accuracy()` - Accuracy-optimized configuration
- `optimize_for_memory()` - Memory-optimized configuration
#### Configuration Enums
```python
class Algorithm(Enum):
FLAT = "flat" # Exact search
IVF = "ivf" # Inverted File Index
PQ = "pq" # Product Quantization
LSH = "lsh" # Locality Sensitive Hashing
SQ = "sq" # Scalar Quantization
HNSW = "hnsw" # Hierarchical NSW
HYBRID = "hybrid" # Multi-algorithm approach
class SecurityLevel(Enum):
NONE = "none" # No special security
BASIC = "basic" # Basic encryption
ENTERPRISE = "enterprise" # Enterprise-grade security
GOVERNMENT = "government" # Government-grade (FIPS 140-2)
class StorageMode(Enum):
MEMORY = "memory" # In-memory only (fastest)
PERSISTENT = "persistent" # Disk-based storage (durable)
CACHE = "cache" # Memory + disk caching
API = "api" # Remote API connection
class DistanceMetric(Enum):
L2 = "l2" # Euclidean distance
COSINE = "cosine" # Cosine similarity
INNER_PRODUCT = "inner_product" # Inner product
L1 = "l1" # Manhattan distance
HAMMING = "hamming" # Hamming distance
```
### Client Class
#### `NusterClient`
Client for connecting to NusterDB server instances.
```python
class NusterClient:
def __init__(
self,
url: str = "http://localhost:7878",
timeout: int = 30,
retry_attempts: int = 3,
api_key: Optional[str] = None,
verify_ssl: bool = True
):
```
**Parameters:**
- `url`: Server URL
- `timeout`: Request timeout in seconds
- `retry_attempts`: Number of retry attempts
- `api_key`: Optional API key for authentication
- `verify_ssl`: Verify SSL certificates
**Methods:** Same interface as `NusterDB` but operates over HTTP/REST API.
**Example:**
```python
# Connect to server
client = nusterdb.NusterClient("http://localhost:7878", api_key="your-key")
# Use same interface as local database
client.add(1, [0.1, 0.2, 0.3])
results = client.search([0.1, 0.2, 0.3], k=5)
```
### Utility Functions
#### `create_random_vectors(count, dimension, distribution="normal", seed=None)`
Create random vectors for testing.
```python
def create_random_vectors(
count: int,
dimension: int,
distribution: str = "normal",
seed: Optional[int] = None
) -> np.ndarray:
```
**Parameters:**
- `count`: Number of vectors
- `dimension`: Vector dimension
- `distribution`: Distribution type ("normal", "uniform", "clustered")
- `seed`: Random seed for reproducibility
**Example:**
```python
# Create test vectors
vectors = nusterdb.create_random_vectors(1000, 128, distribution="normal", seed=42)
# Create clustered data
clustered = nusterdb.create_random_vectors(500, 64, distribution="clustered")
```
#### `benchmark_performance(db, num_vectors=1000, dimension=128, num_queries=100, k=10)`
Benchmark database performance.
```python
def benchmark_performance(
db,
num_vectors: int = 1000,
dimension: int = 128,
num_queries: int = 100,
k: int = 10
) -> Dict[str, float]:
```
**Returns:** Performance metrics dictionary
**Example:**
```python
# Benchmark your database
metrics = nusterdb.benchmark_performance(db, num_vectors=5000, dimension=768)
print(f"Insert rate: {metrics['insert_rate_per_sec']:.0f} vectors/sec")
print(f"Search rate: {metrics['search_rate_qps']:.0f} QPS")
print(f"Average search time: {metrics['avg_search_time_ms']:.2f} ms")
```
#### `validate_vectors(vectors, expected_dimension=None)`
Validate and normalize vector data.
```python
def validate_vectors(
vectors: Union[List[List[float]], np.ndarray],
expected_dimension: Optional[int] = None
) -> np.ndarray:
```
**Example:**
```python
# Validate vectors before adding
vectors = [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]]
validated = nusterdb.validate_vectors(vectors, expected_dimension=2)
```
#### `load_vectors_from_file(file_path, format="auto")` / `save_vectors_to_file(...)`
Load and save vectors from various file formats.
```python
# Load from file
ids, vectors, metadata = nusterdb.load_vectors_from_file("data.json")
db.bulk_add(ids, vectors, metadata)
# Save to file
ids = list(range(100))
vectors = nusterdb.create_random_vectors(100, 128)
nusterdb.save_vectors_to_file("backup.json", ids, vectors)
```
**Supported formats:** JSON, NumPy (.npy), CSV, HDF5 (.h5)
### Exception Classes
```python
class NusterDBError(Exception):
"""Base exception for all NusterDB errors."""
class ConnectionError(NusterDBError):
"""Connection to server failed."""
class SecurityError(NusterDBError):
"""Security validation failed."""
class IndexError(NusterDBError):
"""Index operations failed."""
class ConfigurationError(NusterDBError):
"""Configuration is invalid."""
class ValidationError(NusterDBError):
"""Input validation failed."""
```
### Module-Level Convenience Functions
#### `quick_start(dimension, mode="memory")`
Quick start helper for common use cases.
```python
# Quick setup
db = nusterdb.quick_start(128, "memory")
db.add(1, [0.1, 0.2, ...])
```
#### `connect(url="http://localhost:7878", **kwargs)`
Connect to NusterDB server.
```python
# Connect to server
client = nusterdb.connect("http://localhost:7878", api_key="key")
```
#### `create_database(path, dimension, **kwargs)`
Create a new persistent database.
```python
# Create persistent database
db = nusterdb.create_database("./vectors", 128, algorithm="ivf")
```
#### Configuration Helpers
```python
# Get optimized configurations
speed_config = nusterdb.optimize_for_speed()
accuracy_config = nusterdb.optimize_for_accuracy()
memory_config = nusterdb.optimize_for_memory()
# Use with database
db = nusterdb.NusterDB(dimension=128, **speed_config)
```
#### `info()`
Get package information.
```python
package_info = nusterdb.info()
print(f"Version: {package_info['version']}")
print(f"Algorithms: {package_info['algorithms']}")
print(f"Features: {package_info['features']}")
```
## Storage Modes
### Memory Mode (Fastest)
```python
# For temporary, high-speed operations
db = nusterdb.NusterDB(mode="memory", dimension=128)
# Best for: Testing, temporary data, maximum speed
```
### Persistent Mode (Production Ready)
```python
# For production with data persistence
db = nusterdb.NusterDB(
mode="persistent",
path="./my_vectors",
dimension=768
)
# Best for: Production data, long-term storage, reliability
```
### Cache Mode (Balanced Performance)
```python
# For large datasets with intelligent caching
db = nusterdb.NusterDB(
mode="cache",
cache_size="2GB",
dimension=512
)
# Best for: Large datasets, memory optimization, balanced performance
```
### API Mode (Distributed)
```python
# Connect to NusterDB server
db = nusterdb.NusterDB(mode="api", url="http://localhost:7878")
# Best for: Microservices, distributed systems, scalability
```
## Enterprise Security Features
### Standard Security
```python
# Basic encryption and security
db = nusterdb.NusterDB(
mode="persistent",
path="./secure_vectors",
security_level="basic", # Standard security
encryption_at_rest=True # Data encryption
)
```
### Advanced Security (Enterprise)
```python
# Maximum security for sensitive data
db = nusterdb.NusterDB(
mode="persistent",
path="./classified_vectors",
security_level="enterprise", # Enhanced security
encryption_at_rest=True, # AES-256 encryption
audit_logging=True, # Security event tracking
access_control=True, # Role-based permissions
quantum_resistant=True # Future-proof encryption
)
```
### Security Features Available
- **FIPS 140-2 Ready** - Federal cryptographic standards compliance
- **AES-256 Encryption** - Industry-standard data protection
- **Quantum-Resistant** - Post-quantum cryptography algorithms
- **Audit Logging** - Comprehensive security event tracking
- **Access Control** - Multi-level security permissions
- **Key Management** - Secure key derivation and rotation
## Vector Search Algorithms
### Algorithm Details
- **Flat**: Exact brute-force search for highest accuracy
- **IVF**: Inverted file structure for balanced performance
- **LSH**: Locality-sensitive hashing for speed
- **PQ**: Product quantization for memory efficiency
- **HNSW**: Hierarchical navigable small world graphs
- **SQ**: Scalar quantization for reduced memory usage
```python
# Algorithm-specific configuration
db = nusterdb.NusterDB(
dimension=768,
algorithm="ivf",
# IVF-specific parameters
ivf_clusters=256,
ivf_probe_lists=32
)
# PQ configuration
db = nusterdb.NusterDB(
dimension=768,
algorithm="pq",
pq_subvectors=8,
pq_centroids=256
)
```
## Performance Benchmarks
Based on internal benchmarking on enterprise hardware:
### NusterDB Performance
- **Memory Mode**: 15K-30K queries per second
- **Persistent Mode**: 8K-15K QPS with full durability
- **Insertion Rate**: 10K+ vectors/sec with persistence
- **Memory Efficiency**: Zero-copy access, optimized storage
- **Latency**: Sub-millisecond response times for most queries
### Distance Metrics Supported
- **L2 (Euclidean)**: Standard Euclidean distance
- **Cosine**: Cosine similarity for normalized vectors
- **Inner Product**: Dot product similarity
- **L1 (Manhattan)**: Manhattan distance
## Configuration Management
### Predefined Configurations
```python
from nusterdb import create_config
# Predefined configurations for common use cases
config = create_config("production_speed") # Speed-optimized
config = create_config("production_accuracy") # Accuracy-optimized
config = create_config("memory_constrained") # Memory-optimized
config = create_config("secure") # Security-focused
db = nusterdb.NusterDB(config=config, dimension=768)
```
### Available Presets
- `"development"` / `"dev"` - Development and testing
- `"production_speed"` / `"prod_speed"` - Speed-optimized production
- `"production_accuracy"` / `"prod_accuracy"` - Accuracy-optimized production
- `"government"` / `"secure"` - Government-grade security
- `"memory_constrained"` / `"low_memory"` - Memory-optimized
- `"high_throughput"` / `"throughput"` - High-throughput applications
### Custom Configurations
```python
# Create custom configuration
config = nusterdb.NusterConfig(
algorithm=nusterdb.Algorithm.IVF,
security_level=nusterdb.SecurityLevel.ENTERPRISE,
distance_metric=nusterdb.DistanceMetric.COSINE,
use_gpu=True,
cache_size="4GB"
)
# Use configuration
db = nusterdb.NusterDB(config=config, dimension=768)
# Update configuration
updated_config = config.update(use_simd=False, parallel_processing=False)
```
## Advanced Examples
### Large-Scale Production Setup
```python
import nusterdb
import numpy as np
# Production configuration with security
db = nusterdb.NusterDB(
mode="persistent",
path="/secure/vectors",
dimension=1536, # OpenAI embeddings
algorithm="ivf",
security_level="enterprise",
distance_metric="cosine",
use_gpu=True,
parallel_processing=True,
cache_size="8GB",
# IVF-specific tuning
ivf_clusters=1024,
ivf_probe_lists=64,
# Security settings
encryption_at_rest=True,
audit_logging=True,
access_control=True
)
# Bulk data loading with progress tracking
def load_embeddings(file_path, batch_size=1000):
ids, vectors, metadata = nusterdb.load_vectors_from_file(file_path)
total_batches = len(ids) // batch_size + 1
for i in range(0, len(ids), batch_size):
batch_ids = ids[i:i+batch_size]
batch_vectors = vectors[i:i+batch_size]
batch_metadata = metadata[i:i+batch_size] if metadata else None
added = db.bulk_add(batch_ids, batch_vectors, batch_metadata)
print(f"Batch {i//batch_size + 1}/{total_batches}: Added {added} vectors")
# Optimize after bulk loading
print("Optimizing index...")
db.optimize()
# Advanced search with multiple filters
def semantic_search(query_text, filters=None, k=10):
# Convert text to embedding (pseudo-code)
query_embedding = get_text_embedding(query_text)
results = db.search(
query=query_embedding,
k=k,
filters=filters or {},
include_metadata=True,
include_distances=True
)
# Post-process results
processed_results = []
for result in results:
processed_results.append({
'id': result['id'],
'similarity': 1 - result['distance'], # Convert distance to similarity
'metadata': result['metadata'],
'confidence': result['distance'] < 0.5 # Confidence threshold
})
return processed_results
# Usage
results = semantic_search(
"machine learning algorithms",
filters={"category": "research", "year": 2023},
k=20
)
```
### Multi-Modal Search System
```python
import nusterdb
class MultiModalSearchSystem:
def __init__(self, base_path: str):
# Separate databases for different modalities
self.text_db = nusterdb.NusterDB(
mode="persistent",
path=f"{base_path}/text",
dimension=768,
algorithm="hnsw",
distance_metric="cosine"
)
self.image_db = nusterdb.NusterDB(
mode="persistent",
path=f"{base_path}/images",
dimension=2048,
algorithm="ivf",
distance_metric="l2"
)
self.audio_db = nusterdb.NusterDB(
mode="persistent",
path=f"{base_path}/audio",
dimension=512,
algorithm="lsh",
distance_metric="cosine"
)
def add_content(self, content_id: str, embeddings: dict, metadata: dict):
"""Add multi-modal content."""
if 'text' in embeddings:
self.text_db.add(content_id, embeddings['text'], metadata)
if 'image' in embeddings:
self.image_db.add(content_id, embeddings['image'], metadata)
if 'audio' in embeddings:
self.audio_db.add(content_id, embeddings['audio'], metadata)
def search_all_modalities(self, query_embeddings: dict, k: int = 10):
"""Search across all modalities and combine results."""
all_results = {}
if 'text' in query_embeddings:
text_results = self.text_db.search(query_embeddings['text'], k=k)
all_results['text'] = text_results
if 'image' in query_embeddings:
image_results = self.image_db.search(query_embeddings['image'], k=k)
all_results['image'] = image_results
if 'audio' in query_embeddings:
audio_results = self.audio_db.search(query_embeddings['audio'], k=k)
all_results['audio'] = audio_results
return self._combine_results(all_results)
def _combine_results(self, results_by_modality):
"""Combine and rank results from multiple modalities."""
# Implementation depends on your fusion strategy
combined = {}
for modality, results in results_by_modality.items():
for result in results:
content_id = result['id']
if content_id not in combined:
combined[content_id] = {
'id': content_id,
'metadata': result['metadata'],
'scores': {}
}
combined[content_id]['scores'][modality] = 1 - result['distance']
# Sort by combined score
for item in combined.values():
item['combined_score'] = sum(item['scores'].values()) / len(item['scores'])
return sorted(combined.values(), key=lambda x: x['combined_score'], reverse=True)
# Usage
search_system = MultiModalSearchSystem("/data/multimodal")
# Add content
search_system.add_content(
"doc_001",
embeddings={
'text': text_embedding,
'image': image_embedding
},
metadata={'title': 'Research Paper', 'type': 'academic'}
)
# Search
results = search_system.search_all_modalities({
'text': query_text_embedding,
'image': query_image_embedding
})
```
### Real-Time Recommendation System
```python
import nusterdb
from collections import defaultdict
import time
class RecommendationSystem:
def __init__(self):
self.user_db = nusterdb.NusterDB(
mode="cache",
dimension=256,
algorithm="lsh",
cache_size="1GB"
)
self.item_db = nusterdb.NusterDB(
mode="persistent",
path="./items",
dimension=256,
algorithm="ivf"
)
# Track user interactions
self.user_interactions = defaultdict(list)
def add_user_profile(self, user_id: str, profile_vector: list, metadata: dict):
"""Add or update user profile."""
self.user_db.add(user_id, profile_vector, metadata)
def add_item(self, item_id: str, feature_vector: list, metadata: dict):
"""Add item to catalog."""
self.item_db.add(item_id, feature_vector, metadata)
def record_interaction(self, user_id: str, item_id: str, interaction_type: str, rating: float = None):
"""Record user-item interaction."""
interaction = {
'item_id': item_id,
'type': interaction_type,
'rating': rating,
'timestamp': time.time()
}
self.user_interactions[user_id].append(interaction)
# Update user profile based on interaction
self._update_user_profile(user_id, item_id, interaction_type, rating)
def get_recommendations(self, user_id: str, k: int = 10, exclude_seen: bool = True):
"""Get personalized recommendations."""
# Get user profile
user_profile = self.user_db.get(user_id)
if not user_profile:
return self._get_popular_items(k)
# Find similar items
recommendations = self.item_db.search(
user_profile['vector'],
k=k * 2, # Get more to account for filtering
include_metadata=True
)
# Filter out already seen items
if exclude_seen:
seen_items = {interaction['item_id'] for interaction in self.user_interactions[user_id]}
recommendations = [r for r in recommendations if r['id'] not in seen_items]
return recommendations[:k]
def get_similar_users(self, user_id: str, k: int = 5):
"""Find similar users for collaborative filtering."""
user_profile = self.user_db.get(user_id)
if not user_profile:
return []
similar_users = self.user_db.search(
user_profile['vector'],
k=k + 1, # +1 to exclude self
include_metadata=True
)
# Remove self from results
return [u for u in similar_users if u['id'] != user_id]
def _update_user_profile(self, user_id: str, item_id: str, interaction_type: str, rating: float):
"""Update user profile based on interaction."""
# Get current profile and item features
user_profile = self.user_db.get(user_id)
item_data = self.item_db.get(item_id)
if not user_profile or not item_data:
return
# Simple profile update (weighted average)
weight = self._get_interaction_weight(interaction_type, rating)
current_vector = np.array(user_profile['vector'])
item_vector = np.array(item_data['vector'])
# Update with exponential moving average
alpha = 0.1 # Learning rate
updated_vector = (1 - alpha) * current_vector + alpha * weight * item_vector
# Update user profile
self.user_db.update(user_id, updated_vector.tolist())
def _get_interaction_weight(self, interaction_type: str, rating: float = None) -> float:
"""Convert interaction type to weight."""
weights = {
'view': 0.1,
'click': 0.3,
'like': 0.7,
'purchase': 1.0,
'rating': rating or 0.5
}
return weights.get(interaction_type, 0.1)
def _get_popular_items(self, k: int):
"""Fallback for new users - return popular items."""
# Simple implementation - could be enhanced with actual popularity metrics
all_items = list(self.item_db)[:k]
return all_items
# Usage
rec_system = RecommendationSystem()
# Add items
rec_system.add_item("item_1", feature_vector, {"category": "electronics", "price": 299.99})
# Add users
rec_system.add_user_profile("user_1", profile_vector, {"age": 25, "location": "NY"})
# Record interactions
rec_system.record_interaction("user_1", "item_1", "purchase", rating=4.5)
# Get recommendations
recommendations = rec_system.get_recommendations("user_1", k=10)
for rec in recommendations:
print(f"Recommended: {rec['id']} (similarity: {1-rec['distance']:.3f})")
```
## Why Choose NusterDB?
### Complete Database Solution
- Full CRUD operations with transaction support
- Built-in persistence and data durability
- Comprehensive APIs for production use
- Multiple storage modes for different use cases
### Enterprise Security
- Industry-leading security features
- FIPS 140-2 compliance ready
- Quantum-resistant cryptography
- Comprehensive audit logging and access control
### High Performance
- Advanced algorithms optimized for different workloads
- Hardware acceleration (SIMD, multi-threading)
- Memory-efficient with zero-copy access
- Intelligent caching for large datasets
### Developer Friendly
- Single unified API for all storage modes
- Simple installation with pip
- Extensive documentation and examples
- Type hints and comprehensive error handling
## Use Cases
### Recommended For:
- **AI/ML Applications** requiring fast similarity search
- **Production Systems** needing reliability and persistence
- **Enterprise Environments** with security requirements
- **Large-Scale Deployments** requiring monitoring and ops tools
- **Sensitive Data** needing encryption and compliance
- **Microservices** architecture with API-first design
### Common Applications:
- Semantic search and document retrieval
- Image and video similarity search
- Recommendation systems
- Anomaly detection
- Content-based filtering
- Knowledge base search
## Links & Resources
- [Documentation](https://docs.nusterai.com/nusterdb)
- [Issues](https://github.com/NusterAI/nusterdb/issues)
- [Discussions](https://github.com/NusterAI/nusterdb/discussions)
- [Performance Benchmarks](https://github.com/NusterAI/nusterdb/blob/main/docs/PERFORMANCE_ANALYSIS.md)
- [Security Guide](https://github.com/NusterAI/nusterdb/blob/main/docs/SECURITY.md)
## License
MIT License - see [LICENSE](LICENSE) file for details.
---
**Ready to build with high-performance vector search?**
```bash
pip install nusterdb
```
Get enterprise-grade vector database with security, persistence, and production features!
Raw data
{
"_id": null,
"home_page": "https://github.com/NusterAI/nusterdb",
"name": "nusterdb",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "NusterAI Team <info@nusterai.com>",
"keywords": "vector database, similarity search, machine learning, AI, government-grade, security, FIPS, quantum-resistant",
"author": "NusterAI Team",
"author_email": "NusterAI Team <info@nusterai.com>",
"download_url": null,
"platform": null,
"description": "# NusterDB - High-Performance Vector Database with Enterprise Security\n\n[](https://badge.fury.io/py/nusterdb)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n\n## The Complete Vector Database Solution\n\nNusterDB is a high-performance vector database designed for production workloads with enterprise-grade security, persistence, and comprehensive features. Built for AI/ML applications requiring fast similarity search with reliability and security.\n\n### Core Features\n\n| Feature | Description |\n|---------|-------------|\n| **Advanced Algorithms** | Multiple search algorithms: IVF, PQ, LSH, SQ, Flat, HNSW |\n| **Enterprise Security** | FIPS 140-2 compliance, quantum-resistant encryption |\n| **Production APIs** | Complete REST APIs and Python SDK |\n| **Data Persistence** | Built-in durable storage with transaction support |\n| **Full CRUD Operations** | Complete database operations: Create, Read, Update, Delete |\n| **Multiple Storage Modes** | Memory, Persistent, Cache, and API modes |\n\n## Quick Start\n\n### Installation\n\n```bash\npip install nusterdb\n```\n\n### Simple Usage\n\n```python\nimport nusterdb\n\n# Create database - choose your mode\ndb = nusterdb.NusterDB(mode=\"memory\", dimension=128)\n\n# Add vectors\ndb.add(1, [0.1, 0.2, 0.3, ...]) # Single vector\ndb.bulk_add([1,2,3], vectors, metadata) # Multiple vectors\n\n# Search\nresults = db.search([0.1, 0.2, 0.3, ...], k=5)\nfor result in results:\n print(f\"ID: {result['id']}, Distance: {result['distance']}\")\n```\n\n## Complete API Reference\n\n### Core Classes\n\n#### `NusterDB`\n\nThe main database class supporting all storage modes and algorithms.\n\n```python\nclass NusterDB:\n \"\"\"\n Unified NusterDB interface supporting all storage modes and algorithms.\n \"\"\"\n \n def __init__(\n self,\n mode: Union[str, StorageMode] = \"memory\",\n dimension: Optional[int] = None,\n path: Optional[str] = None,\n url: Optional[str] = None,\n algorithm: Union[str, Algorithm] = \"flat\",\n security_level: Union[str, SecurityLevel] = \"none\",\n distance_metric: Union[str, DistanceMetric] = \"l2\",\n use_simd: bool = True,\n use_gpu: bool = True,\n parallel_processing: bool = True,\n cache_size: str = \"512MB\",\n compression: bool = False,\n **kwargs\n ):\n```\n\n**Parameters:**\n- `mode`: Storage mode (\"memory\", \"persistent\", \"cache\", \"api\")\n- `dimension`: Vector dimension (required for new databases)\n- `path`: Path for persistent storage\n- `url`: Server URL for API mode\n- `algorithm`: Indexing algorithm (\"flat\", \"ivf\", \"pq\", \"lsh\", \"sq\", \"hnsw\")\n- `security_level`: Security level (\"none\", \"basic\", \"enterprise\", \"government\")\n- `distance_metric`: Distance metric (\"l2\", \"cosine\", \"inner_product\", \"l1\")\n- `use_simd`: Enable SIMD optimizations\n- `use_gpu`: Enable GPU acceleration\n- `parallel_processing`: Enable parallel processing\n- `cache_size`: Cache size (e.g., \"1GB\", \"512MB\")\n- `compression`: Enable compression\n\n#### Core Database Operations\n\n##### `add(id, vector, metadata=None)`\n\nAdd a single vector to the database.\n\n```python\ndef add(\n self, \n id: Union[int, str], \n vector: Union[List[float], np.ndarray],\n metadata: Optional[Dict[str, Any]] = None\n) -> bool:\n```\n\n**Parameters:**\n- `id`: Unique identifier\n- `vector`: Vector data (list or numpy array)\n- `metadata`: Optional metadata dictionary\n\n**Returns:** `bool` - Success status\n\n**Example:**\n```python\n# Add vector with metadata\nsuccess = db.add(1, [0.1, 0.2, 0.3], {\"category\": \"document\", \"type\": \"text\"})\n\n# Add numpy vector\nimport numpy as np\nvector = np.random.random(128)\ndb.add(\"doc_001\", vector, {\"source\": \"research_paper\"})\n```\n\n##### `bulk_add(ids, vectors, metadata=None)`\n\nAdd multiple vectors efficiently.\n\n```python\ndef bulk_add(\n self,\n ids: List[Union[int, str]],\n vectors: Union[List[List[float]], np.ndarray],\n metadata: Optional[List[Dict[str, Any]]] = None\n) -> int:\n```\n\n**Parameters:**\n- `ids`: List of unique identifiers\n- `vectors`: List of vectors or 2D numpy array\n- `metadata`: Optional list of metadata dictionaries\n\n**Returns:** `int` - Number of vectors successfully added\n\n**Example:**\n```python\n# Bulk add with metadata\nids = [1, 2, 3, 4, 5]\nvectors = [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6], [0.7, 0.8], [0.9, 1.0]]\nmetdata = [\n {\"category\": \"A\", \"score\": 0.95},\n {\"category\": \"B\", \"score\": 0.87},\n {\"category\": \"A\", \"score\": 0.92},\n {\"category\": \"C\", \"score\": 0.78},\n {\"category\": \"B\", \"score\": 0.89}\n]\nadded_count = db.bulk_add(ids, vectors, metadata)\nprint(f\"Added {added_count} vectors\")\n\n# Bulk add numpy arrays\nimport numpy as np\nvectors = np.random.random((1000, 128))\nids = [f\"vec_{i}\" for i in range(1000)]\ndb.bulk_add(ids, vectors)\n```\n\n##### `search(query, k=10, filters=None, include_metadata=True, include_distances=True)`\n\nSearch for similar vectors.\n\n```python\ndef search(\n self,\n query: Union[List[float], np.ndarray],\n k: int = 10,\n filters: Optional[Dict[str, Any]] = None,\n include_metadata: bool = True,\n include_distances: bool = True\n) -> List[Dict[str, Any]]:\n```\n\n**Parameters:**\n- `query`: Query vector\n- `k`: Number of results to return\n- `filters`: Optional metadata filters\n- `include_metadata`: Include metadata in results\n- `include_distances`: Include distances in results\n\n**Returns:** `List[Dict[str, Any]]` - List of search results\n\n**Example:**\n```python\n# Basic search\nresults = db.search([0.1, 0.2, 0.3], k=5)\nfor result in results:\n print(f\"ID: {result['id']}, Distance: {result['distance']}\")\n\n# Search with filters\nresults = db.search(\n query=[0.1, 0.2, 0.3],\n k=10,\n filters={\"category\": \"document\"},\n include_metadata=True,\n include_distances=True\n)\n\n# Process results\nfor result in results:\n print(f\"ID: {result['id']}\")\n print(f\"Distance: {result['distance']:.4f}\")\n print(f\"Metadata: {result['metadata']}\")\n```\n\n##### `update(id, vector=None, metadata=None)`\n\nUpdate an existing vector.\n\n```python\ndef update(\n self,\n id: Union[int, str],\n vector: Optional[Union[List[float], np.ndarray]] = None,\n metadata: Optional[Dict[str, Any]] = None\n) -> bool:\n```\n\n**Parameters:**\n- `id`: Vector ID to update\n- `vector`: New vector data (optional)\n- `metadata`: New metadata (optional)\n\n**Returns:** `bool` - Success status\n\n**Example:**\n```python\n# Update vector only\ndb.update(1, [0.2, 0.3, 0.4])\n\n# Update metadata only\ndb.update(1, metadata={\"category\": \"updated\", \"version\": 2})\n\n# Update both\ndb.update(1, [0.2, 0.3, 0.4], {\"category\": \"updated\", \"version\": 2})\n```\n\n##### `delete(id)`\n\nDelete a vector by ID.\n\n```python\ndef delete(self, id: Union[int, str]) -> bool:\n```\n\n**Parameters:**\n- `id`: Vector ID to delete\n\n**Returns:** `bool` - Success status\n\n**Example:**\n```python\n# Delete by ID\nsuccess = db.delete(1)\nif success:\n print(\"Vector deleted successfully\")\n\n# Delete multiple vectors\nids_to_delete = [1, 2, 3, 4, 5]\nfor vec_id in ids_to_delete:\n db.delete(vec_id)\n```\n\n##### `get(id, include_metadata=True)`\n\nGet a vector by ID.\n\n```python\ndef get(\n self, \n id: Union[int, str],\n include_metadata: bool = True\n) -> Optional[Dict[str, Any]]:\n```\n\n**Parameters:**\n- `id`: Vector ID\n- `include_metadata`: Include metadata in result\n\n**Returns:** `Optional[Dict[str, Any]]` - Vector data or None if not found\n\n**Example:**\n```python\n# Get vector with metadata\nvector_data = db.get(1)\nif vector_data:\n print(f\"Vector: {vector_data['vector']}\")\n print(f\"Metadata: {vector_data['metadata']}\")\n\n# Get vector without metadata\nvector_data = db.get(1, include_metadata=False)\n```\n\n##### `batch_search(queries, k=10, filters=None)`\n\nSearch with multiple queries efficiently.\n\n```python\ndef batch_search(\n self,\n queries: Union[List[List[float]], np.ndarray],\n k: int = 10,\n filters: Optional[List[Dict[str, Any]]] = None\n) -> List[List[Dict[str, Any]]]:\n```\n\n**Parameters:**\n- `queries`: List of query vectors\n- `k`: Number of results per query\n- `filters`: Optional filters per query\n\n**Returns:** `List[List[Dict[str, Any]]]` - List of result lists\n\n**Example:**\n```python\n# Batch search multiple queries\nqueries = [\n [0.1, 0.2, 0.3],\n [0.4, 0.5, 0.6],\n [0.7, 0.8, 0.9]\n]\nall_results = db.batch_search(queries, k=5)\n\nfor i, results in enumerate(all_results):\n print(f\"Query {i} results:\")\n for result in results:\n print(f\" ID: {result['id']}, Distance: {result['distance']}\")\n```\n\n#### Management Operations\n\n##### `train(training_vectors=None)`\n\nTrain the index (required for some algorithms).\n\n```python\ndef train(self, training_vectors: Optional[Union[List[List[float]], np.ndarray]] = None) -> bool:\n```\n\n**Parameters:**\n- `training_vectors`: Training data (optional, uses existing data if None)\n\n**Returns:** `bool` - Success status\n\n**Example:**\n```python\n# Train with existing data\ndb.train()\n\n# Train with specific training data\ntraining_data = np.random.random((10000, 128))\ndb.train(training_data)\n```\n\n##### `optimize()`\n\nOptimize the index for better performance.\n\n```python\ndef optimize(self) -> bool:\n```\n\n**Returns:** `bool` - Success status\n\n**Example:**\n```python\n# Optimize after bulk inserts\ndb.bulk_add(ids, vectors)\ndb.optimize() # Rebuild index for better performance\n```\n\n##### `save(path=None)`\n\nSave the database to disk.\n\n```python\ndef save(self, path: Optional[str] = None) -> bool:\n```\n\n**Parameters:**\n- `path`: Save path (uses default if None)\n\n**Returns:** `bool` - Success status\n\n##### `load(path)`\n\nLoad database from disk.\n\n```python\ndef load(self, path: str) -> bool:\n```\n\n**Parameters:**\n- `path`: Load path\n\n**Returns:** `bool` - Success status\n\n##### `clear()`\n\nClear all vectors from the database.\n\n```python\ndef clear(self) -> bool:\n```\n\n**Returns:** `bool` - Success status\n\n#### Information and Statistics\n\n##### `count()` / `size()`\n\nGet total number of vectors.\n\n```python\ndef count(self) -> int:\ndef size(self) -> int: # Alias for count()\n```\n\n**Returns:** `int` - Number of vectors\n\n**Example:**\n```python\ntotal_vectors = db.count()\nprint(f\"Database contains {total_vectors} vectors\")\n```\n\n##### `stats()`\n\nGet database statistics.\n\n```python\ndef stats(self) -> Dict[str, Any]:\n```\n\n**Returns:** `Dict[str, Any]` - Statistics dictionary\n\n**Example:**\n```python\nstats = db.stats()\nprint(f\"Vector count: {stats['vector_count']}\")\nprint(f\"Average query time: {stats['avg_query_time']:.3f}ms\")\nprint(f\"Algorithm: {stats['algorithm']}\")\nprint(f\"Security level: {stats['security_level']}\")\n```\n\n##### `info()`\n\nGet database information.\n\n```python\ndef info(self) -> Dict[str, Any]:\n```\n\n**Returns:** `Dict[str, Any]` - Information dictionary\n\n##### `health_check()`\n\nPerform health check.\n\n```python\ndef health_check(self) -> Dict[str, Any]:\n```\n\n**Returns:** `Dict[str, Any]` - Health status\n\n**Example:**\n```python\nhealth = db.health_check()\nif health['healthy']:\n print(\"Database is healthy\")\n print(f\"Vector count: {health['vector_count']}\")\nelse:\n print(f\"Database issue: {health.get('error')}\")\n```\n\n#### Context Manager Support\n\n```python\n# Automatic resource management\nwith nusterdb.NusterDB(mode=\"persistent\", path=\"./vectors\") as db:\n db.add(1, [0.1, 0.2, 0.3])\n results = db.search([0.1, 0.2, 0.3])\n # Database automatically saved and closed\n```\n\n#### Iterator Support\n\n```python\n# Iterate over all vectors (if supported by backend)\nfor vector_data in db:\n print(f\"ID: {vector_data['id']}\")\n print(f\"Vector: {vector_data['vector']}\")\n\n# Check if ID exists\nif 1 in db:\n print(\"Vector with ID 1 exists\")\n\n# Get length\nprint(f\"Database has {len(db)} vectors\")\n```\n\n### Configuration Classes\n\n#### `NusterConfig`\n\nComplete configuration for all NusterDB aspects.\n\n```python\n@dataclass\nclass NusterConfig:\n # Core settings\n algorithm: Algorithm = Algorithm.FLAT\n security_level: SecurityLevel = SecurityLevel.NONE\n distance_metric: DistanceMetric = DistanceMetric.L2\n \n # Performance settings\n use_simd: bool = True\n use_gpu: bool = True\n parallel_processing: bool = True\n cache_size: str = \"512MB\"\n compression: bool = False\n```\n\n**Methods:**\n- `to_dict()` - Convert to dictionary\n- `to_json()` - Convert to JSON string\n- `from_dict(config_dict)` - Create from dictionary\n- `from_json(json_str)` - Create from JSON\n- `update(**kwargs)` - Create updated configuration\n- `optimize_for_speed()` - Speed-optimized configuration\n- `optimize_for_accuracy()` - Accuracy-optimized configuration\n- `optimize_for_memory()` - Memory-optimized configuration\n\n#### Configuration Enums\n\n```python\nclass Algorithm(Enum):\n FLAT = \"flat\" # Exact search\n IVF = \"ivf\" # Inverted File Index\n PQ = \"pq\" # Product Quantization\n LSH = \"lsh\" # Locality Sensitive Hashing\n SQ = \"sq\" # Scalar Quantization\n HNSW = \"hnsw\" # Hierarchical NSW\n HYBRID = \"hybrid\" # Multi-algorithm approach\n\nclass SecurityLevel(Enum):\n NONE = \"none\" # No special security\n BASIC = \"basic\" # Basic encryption\n ENTERPRISE = \"enterprise\" # Enterprise-grade security\n GOVERNMENT = \"government\" # Government-grade (FIPS 140-2)\n\nclass StorageMode(Enum):\n MEMORY = \"memory\" # In-memory only (fastest)\n PERSISTENT = \"persistent\" # Disk-based storage (durable)\n CACHE = \"cache\" # Memory + disk caching\n API = \"api\" # Remote API connection\n\nclass DistanceMetric(Enum):\n L2 = \"l2\" # Euclidean distance\n COSINE = \"cosine\" # Cosine similarity\n INNER_PRODUCT = \"inner_product\" # Inner product\n L1 = \"l1\" # Manhattan distance\n HAMMING = \"hamming\" # Hamming distance\n```\n\n### Client Class\n\n#### `NusterClient`\n\nClient for connecting to NusterDB server instances.\n\n```python\nclass NusterClient:\n def __init__(\n self,\n url: str = \"http://localhost:7878\",\n timeout: int = 30,\n retry_attempts: int = 3,\n api_key: Optional[str] = None,\n verify_ssl: bool = True\n ):\n```\n\n**Parameters:**\n- `url`: Server URL\n- `timeout`: Request timeout in seconds\n- `retry_attempts`: Number of retry attempts\n- `api_key`: Optional API key for authentication\n- `verify_ssl`: Verify SSL certificates\n\n**Methods:** Same interface as `NusterDB` but operates over HTTP/REST API.\n\n**Example:**\n```python\n# Connect to server\nclient = nusterdb.NusterClient(\"http://localhost:7878\", api_key=\"your-key\")\n\n# Use same interface as local database\nclient.add(1, [0.1, 0.2, 0.3])\nresults = client.search([0.1, 0.2, 0.3], k=5)\n```\n\n### Utility Functions\n\n#### `create_random_vectors(count, dimension, distribution=\"normal\", seed=None)`\n\nCreate random vectors for testing.\n\n```python\ndef create_random_vectors(\n count: int, \n dimension: int, \n distribution: str = \"normal\",\n seed: Optional[int] = None\n) -> np.ndarray:\n```\n\n**Parameters:**\n- `count`: Number of vectors\n- `dimension`: Vector dimension\n- `distribution`: Distribution type (\"normal\", \"uniform\", \"clustered\")\n- `seed`: Random seed for reproducibility\n\n**Example:**\n```python\n# Create test vectors\nvectors = nusterdb.create_random_vectors(1000, 128, distribution=\"normal\", seed=42)\n\n# Create clustered data\nclustered = nusterdb.create_random_vectors(500, 64, distribution=\"clustered\")\n```\n\n#### `benchmark_performance(db, num_vectors=1000, dimension=128, num_queries=100, k=10)`\n\nBenchmark database performance.\n\n```python\ndef benchmark_performance(\n db,\n num_vectors: int = 1000,\n dimension: int = 128,\n num_queries: int = 100,\n k: int = 10\n) -> Dict[str, float]:\n```\n\n**Returns:** Performance metrics dictionary\n\n**Example:**\n```python\n# Benchmark your database\nmetrics = nusterdb.benchmark_performance(db, num_vectors=5000, dimension=768)\nprint(f\"Insert rate: {metrics['insert_rate_per_sec']:.0f} vectors/sec\")\nprint(f\"Search rate: {metrics['search_rate_qps']:.0f} QPS\")\nprint(f\"Average search time: {metrics['avg_search_time_ms']:.2f} ms\")\n```\n\n#### `validate_vectors(vectors, expected_dimension=None)`\n\nValidate and normalize vector data.\n\n```python\ndef validate_vectors(\n vectors: Union[List[List[float]], np.ndarray], \n expected_dimension: Optional[int] = None\n) -> np.ndarray:\n```\n\n**Example:**\n```python\n# Validate vectors before adding\nvectors = [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]]\nvalidated = nusterdb.validate_vectors(vectors, expected_dimension=2)\n```\n\n#### `load_vectors_from_file(file_path, format=\"auto\")` / `save_vectors_to_file(...)`\n\nLoad and save vectors from various file formats.\n\n```python\n# Load from file\nids, vectors, metadata = nusterdb.load_vectors_from_file(\"data.json\")\ndb.bulk_add(ids, vectors, metadata)\n\n# Save to file\nids = list(range(100))\nvectors = nusterdb.create_random_vectors(100, 128)\nnusterdb.save_vectors_to_file(\"backup.json\", ids, vectors)\n```\n\n**Supported formats:** JSON, NumPy (.npy), CSV, HDF5 (.h5)\n\n### Exception Classes\n\n```python\nclass NusterDBError(Exception):\n \"\"\"Base exception for all NusterDB errors.\"\"\"\n \nclass ConnectionError(NusterDBError):\n \"\"\"Connection to server failed.\"\"\"\n \nclass SecurityError(NusterDBError):\n \"\"\"Security validation failed.\"\"\"\n \nclass IndexError(NusterDBError):\n \"\"\"Index operations failed.\"\"\"\n \nclass ConfigurationError(NusterDBError):\n \"\"\"Configuration is invalid.\"\"\"\n \nclass ValidationError(NusterDBError):\n \"\"\"Input validation failed.\"\"\"\n```\n\n### Module-Level Convenience Functions\n\n#### `quick_start(dimension, mode=\"memory\")`\n\nQuick start helper for common use cases.\n\n```python\n# Quick setup\ndb = nusterdb.quick_start(128, \"memory\")\ndb.add(1, [0.1, 0.2, ...])\n```\n\n#### `connect(url=\"http://localhost:7878\", **kwargs)`\n\nConnect to NusterDB server.\n\n```python\n# Connect to server\nclient = nusterdb.connect(\"http://localhost:7878\", api_key=\"key\")\n```\n\n#### `create_database(path, dimension, **kwargs)`\n\nCreate a new persistent database.\n\n```python\n# Create persistent database\ndb = nusterdb.create_database(\"./vectors\", 128, algorithm=\"ivf\")\n```\n\n#### Configuration Helpers\n\n```python\n# Get optimized configurations\nspeed_config = nusterdb.optimize_for_speed()\naccuracy_config = nusterdb.optimize_for_accuracy()\nmemory_config = nusterdb.optimize_for_memory()\n\n# Use with database\ndb = nusterdb.NusterDB(dimension=128, **speed_config)\n```\n\n#### `info()`\n\nGet package information.\n\n```python\npackage_info = nusterdb.info()\nprint(f\"Version: {package_info['version']}\")\nprint(f\"Algorithms: {package_info['algorithms']}\")\nprint(f\"Features: {package_info['features']}\")\n```\n\n## Storage Modes\n\n### Memory Mode (Fastest)\n```python\n# For temporary, high-speed operations\ndb = nusterdb.NusterDB(mode=\"memory\", dimension=128)\n# Best for: Testing, temporary data, maximum speed\n```\n\n### Persistent Mode (Production Ready)\n```python\n# For production with data persistence\ndb = nusterdb.NusterDB(\n mode=\"persistent\", \n path=\"./my_vectors\",\n dimension=768\n)\n# Best for: Production data, long-term storage, reliability\n```\n\n### Cache Mode (Balanced Performance)\n```python\n# For large datasets with intelligent caching\ndb = nusterdb.NusterDB(\n mode=\"cache\",\n cache_size=\"2GB\",\n dimension=512\n)\n# Best for: Large datasets, memory optimization, balanced performance\n```\n\n### API Mode (Distributed)\n```python\n# Connect to NusterDB server\ndb = nusterdb.NusterDB(mode=\"api\", url=\"http://localhost:7878\")\n# Best for: Microservices, distributed systems, scalability\n```\n\n## Enterprise Security Features\n\n### Standard Security\n```python\n# Basic encryption and security\ndb = nusterdb.NusterDB(\n mode=\"persistent\",\n path=\"./secure_vectors\",\n security_level=\"basic\", # Standard security\n encryption_at_rest=True # Data encryption\n)\n```\n\n### Advanced Security (Enterprise)\n```python\n# Maximum security for sensitive data\ndb = nusterdb.NusterDB(\n mode=\"persistent\",\n path=\"./classified_vectors\",\n security_level=\"enterprise\", # Enhanced security\n encryption_at_rest=True, # AES-256 encryption\n audit_logging=True, # Security event tracking\n access_control=True, # Role-based permissions\n quantum_resistant=True # Future-proof encryption\n)\n```\n\n### Security Features Available\n- **FIPS 140-2 Ready** - Federal cryptographic standards compliance\n- **AES-256 Encryption** - Industry-standard data protection\n- **Quantum-Resistant** - Post-quantum cryptography algorithms\n- **Audit Logging** - Comprehensive security event tracking\n- **Access Control** - Multi-level security permissions\n- **Key Management** - Secure key derivation and rotation\n\n## Vector Search Algorithms\n\n### Algorithm Details\n\n- **Flat**: Exact brute-force search for highest accuracy\n- **IVF**: Inverted file structure for balanced performance\n- **LSH**: Locality-sensitive hashing for speed\n- **PQ**: Product quantization for memory efficiency\n- **HNSW**: Hierarchical navigable small world graphs\n- **SQ**: Scalar quantization for reduced memory usage\n\n```python\n# Algorithm-specific configuration\ndb = nusterdb.NusterDB(\n dimension=768,\n algorithm=\"ivf\",\n # IVF-specific parameters\n ivf_clusters=256,\n ivf_probe_lists=32\n)\n\n# PQ configuration\ndb = nusterdb.NusterDB(\n dimension=768,\n algorithm=\"pq\",\n pq_subvectors=8,\n pq_centroids=256\n)\n```\n\n## Performance Benchmarks\n\nBased on internal benchmarking on enterprise hardware:\n\n### NusterDB Performance\n- **Memory Mode**: 15K-30K queries per second\n- **Persistent Mode**: 8K-15K QPS with full durability\n- **Insertion Rate**: 10K+ vectors/sec with persistence\n- **Memory Efficiency**: Zero-copy access, optimized storage\n- **Latency**: Sub-millisecond response times for most queries\n\n### Distance Metrics Supported\n- **L2 (Euclidean)**: Standard Euclidean distance\n- **Cosine**: Cosine similarity for normalized vectors\n- **Inner Product**: Dot product similarity\n- **L1 (Manhattan)**: Manhattan distance\n\n## Configuration Management\n\n### Predefined Configurations\n\n```python\nfrom nusterdb import create_config\n\n# Predefined configurations for common use cases\nconfig = create_config(\"production_speed\") # Speed-optimized\nconfig = create_config(\"production_accuracy\") # Accuracy-optimized \nconfig = create_config(\"memory_constrained\") # Memory-optimized\nconfig = create_config(\"secure\") # Security-focused\n\ndb = nusterdb.NusterDB(config=config, dimension=768)\n```\n\n### Available Presets\n- `\"development\"` / `\"dev\"` - Development and testing\n- `\"production_speed\"` / `\"prod_speed\"` - Speed-optimized production\n- `\"production_accuracy\"` / `\"prod_accuracy\"` - Accuracy-optimized production\n- `\"government\"` / `\"secure\"` - Government-grade security\n- `\"memory_constrained\"` / `\"low_memory\"` - Memory-optimized\n- `\"high_throughput\"` / `\"throughput\"` - High-throughput applications\n\n### Custom Configurations\n\n```python\n# Create custom configuration\nconfig = nusterdb.NusterConfig(\n algorithm=nusterdb.Algorithm.IVF,\n security_level=nusterdb.SecurityLevel.ENTERPRISE,\n distance_metric=nusterdb.DistanceMetric.COSINE,\n use_gpu=True,\n cache_size=\"4GB\"\n)\n\n# Use configuration\ndb = nusterdb.NusterDB(config=config, dimension=768)\n\n# Update configuration\nupdated_config = config.update(use_simd=False, parallel_processing=False)\n```\n\n## Advanced Examples\n\n### Large-Scale Production Setup\n\n```python\nimport nusterdb\nimport numpy as np\n\n# Production configuration with security\ndb = nusterdb.NusterDB(\n mode=\"persistent\",\n path=\"/secure/vectors\",\n dimension=1536, # OpenAI embeddings\n algorithm=\"ivf\",\n security_level=\"enterprise\",\n distance_metric=\"cosine\",\n use_gpu=True,\n parallel_processing=True,\n cache_size=\"8GB\",\n \n # IVF-specific tuning\n ivf_clusters=1024,\n ivf_probe_lists=64,\n \n # Security settings\n encryption_at_rest=True,\n audit_logging=True,\n access_control=True\n)\n\n# Bulk data loading with progress tracking\ndef load_embeddings(file_path, batch_size=1000):\n ids, vectors, metadata = nusterdb.load_vectors_from_file(file_path)\n \n total_batches = len(ids) // batch_size + 1\n for i in range(0, len(ids), batch_size):\n batch_ids = ids[i:i+batch_size]\n batch_vectors = vectors[i:i+batch_size]\n batch_metadata = metadata[i:i+batch_size] if metadata else None\n \n added = db.bulk_add(batch_ids, batch_vectors, batch_metadata)\n print(f\"Batch {i//batch_size + 1}/{total_batches}: Added {added} vectors\")\n \n # Optimize after bulk loading\n print(\"Optimizing index...\")\n db.optimize()\n\n# Advanced search with multiple filters\ndef semantic_search(query_text, filters=None, k=10):\n # Convert text to embedding (pseudo-code)\n query_embedding = get_text_embedding(query_text)\n \n results = db.search(\n query=query_embedding,\n k=k,\n filters=filters or {},\n include_metadata=True,\n include_distances=True\n )\n \n # Post-process results\n processed_results = []\n for result in results:\n processed_results.append({\n 'id': result['id'],\n 'similarity': 1 - result['distance'], # Convert distance to similarity\n 'metadata': result['metadata'],\n 'confidence': result['distance'] < 0.5 # Confidence threshold\n })\n \n return processed_results\n\n# Usage\nresults = semantic_search(\n \"machine learning algorithms\",\n filters={\"category\": \"research\", \"year\": 2023},\n k=20\n)\n```\n\n### Multi-Modal Search System\n\n```python\nimport nusterdb\n\nclass MultiModalSearchSystem:\n def __init__(self, base_path: str):\n # Separate databases for different modalities\n self.text_db = nusterdb.NusterDB(\n mode=\"persistent\",\n path=f\"{base_path}/text\",\n dimension=768,\n algorithm=\"hnsw\",\n distance_metric=\"cosine\"\n )\n \n self.image_db = nusterdb.NusterDB(\n mode=\"persistent\", \n path=f\"{base_path}/images\",\n dimension=2048,\n algorithm=\"ivf\",\n distance_metric=\"l2\"\n )\n \n self.audio_db = nusterdb.NusterDB(\n mode=\"persistent\",\n path=f\"{base_path}/audio\", \n dimension=512,\n algorithm=\"lsh\",\n distance_metric=\"cosine\"\n )\n \n def add_content(self, content_id: str, embeddings: dict, metadata: dict):\n \"\"\"Add multi-modal content.\"\"\"\n if 'text' in embeddings:\n self.text_db.add(content_id, embeddings['text'], metadata)\n \n if 'image' in embeddings:\n self.image_db.add(content_id, embeddings['image'], metadata)\n \n if 'audio' in embeddings:\n self.audio_db.add(content_id, embeddings['audio'], metadata)\n \n def search_all_modalities(self, query_embeddings: dict, k: int = 10):\n \"\"\"Search across all modalities and combine results.\"\"\"\n all_results = {}\n \n if 'text' in query_embeddings:\n text_results = self.text_db.search(query_embeddings['text'], k=k)\n all_results['text'] = text_results\n \n if 'image' in query_embeddings:\n image_results = self.image_db.search(query_embeddings['image'], k=k)\n all_results['image'] = image_results\n \n if 'audio' in query_embeddings:\n audio_results = self.audio_db.search(query_embeddings['audio'], k=k)\n all_results['audio'] = audio_results\n \n return self._combine_results(all_results)\n \n def _combine_results(self, results_by_modality):\n \"\"\"Combine and rank results from multiple modalities.\"\"\"\n # Implementation depends on your fusion strategy\n combined = {}\n for modality, results in results_by_modality.items():\n for result in results:\n content_id = result['id']\n if content_id not in combined:\n combined[content_id] = {\n 'id': content_id,\n 'metadata': result['metadata'],\n 'scores': {}\n }\n combined[content_id]['scores'][modality] = 1 - result['distance']\n \n # Sort by combined score\n for item in combined.values():\n item['combined_score'] = sum(item['scores'].values()) / len(item['scores'])\n \n return sorted(combined.values(), key=lambda x: x['combined_score'], reverse=True)\n\n# Usage\nsearch_system = MultiModalSearchSystem(\"/data/multimodal\")\n\n# Add content\nsearch_system.add_content(\n \"doc_001\",\n embeddings={\n 'text': text_embedding,\n 'image': image_embedding\n },\n metadata={'title': 'Research Paper', 'type': 'academic'}\n)\n\n# Search\nresults = search_system.search_all_modalities({\n 'text': query_text_embedding,\n 'image': query_image_embedding\n})\n```\n\n### Real-Time Recommendation System\n\n```python\nimport nusterdb\nfrom collections import defaultdict\nimport time\n\nclass RecommendationSystem:\n def __init__(self):\n self.user_db = nusterdb.NusterDB(\n mode=\"cache\",\n dimension=256,\n algorithm=\"lsh\",\n cache_size=\"1GB\"\n )\n \n self.item_db = nusterdb.NusterDB(\n mode=\"persistent\",\n path=\"./items\",\n dimension=256,\n algorithm=\"ivf\"\n )\n \n # Track user interactions\n self.user_interactions = defaultdict(list)\n \n def add_user_profile(self, user_id: str, profile_vector: list, metadata: dict):\n \"\"\"Add or update user profile.\"\"\"\n self.user_db.add(user_id, profile_vector, metadata)\n \n def add_item(self, item_id: str, feature_vector: list, metadata: dict):\n \"\"\"Add item to catalog.\"\"\"\n self.item_db.add(item_id, feature_vector, metadata)\n \n def record_interaction(self, user_id: str, item_id: str, interaction_type: str, rating: float = None):\n \"\"\"Record user-item interaction.\"\"\"\n interaction = {\n 'item_id': item_id,\n 'type': interaction_type,\n 'rating': rating,\n 'timestamp': time.time()\n }\n self.user_interactions[user_id].append(interaction)\n \n # Update user profile based on interaction\n self._update_user_profile(user_id, item_id, interaction_type, rating)\n \n def get_recommendations(self, user_id: str, k: int = 10, exclude_seen: bool = True):\n \"\"\"Get personalized recommendations.\"\"\"\n # Get user profile\n user_profile = self.user_db.get(user_id)\n if not user_profile:\n return self._get_popular_items(k)\n \n # Find similar items\n recommendations = self.item_db.search(\n user_profile['vector'],\n k=k * 2, # Get more to account for filtering\n include_metadata=True\n )\n \n # Filter out already seen items\n if exclude_seen:\n seen_items = {interaction['item_id'] for interaction in self.user_interactions[user_id]}\n recommendations = [r for r in recommendations if r['id'] not in seen_items]\n \n return recommendations[:k]\n \n def get_similar_users(self, user_id: str, k: int = 5):\n \"\"\"Find similar users for collaborative filtering.\"\"\"\n user_profile = self.user_db.get(user_id)\n if not user_profile:\n return []\n \n similar_users = self.user_db.search(\n user_profile['vector'],\n k=k + 1, # +1 to exclude self\n include_metadata=True\n )\n \n # Remove self from results\n return [u for u in similar_users if u['id'] != user_id]\n \n def _update_user_profile(self, user_id: str, item_id: str, interaction_type: str, rating: float):\n \"\"\"Update user profile based on interaction.\"\"\"\n # Get current profile and item features\n user_profile = self.user_db.get(user_id)\n item_data = self.item_db.get(item_id)\n \n if not user_profile or not item_data:\n return\n \n # Simple profile update (weighted average)\n weight = self._get_interaction_weight(interaction_type, rating)\n current_vector = np.array(user_profile['vector'])\n item_vector = np.array(item_data['vector'])\n \n # Update with exponential moving average\n alpha = 0.1 # Learning rate\n updated_vector = (1 - alpha) * current_vector + alpha * weight * item_vector\n \n # Update user profile\n self.user_db.update(user_id, updated_vector.tolist())\n \n def _get_interaction_weight(self, interaction_type: str, rating: float = None) -> float:\n \"\"\"Convert interaction type to weight.\"\"\"\n weights = {\n 'view': 0.1,\n 'click': 0.3,\n 'like': 0.7,\n 'purchase': 1.0,\n 'rating': rating or 0.5\n }\n return weights.get(interaction_type, 0.1)\n \n def _get_popular_items(self, k: int):\n \"\"\"Fallback for new users - return popular items.\"\"\"\n # Simple implementation - could be enhanced with actual popularity metrics\n all_items = list(self.item_db)[:k]\n return all_items\n\n# Usage\nrec_system = RecommendationSystem()\n\n# Add items\nrec_system.add_item(\"item_1\", feature_vector, {\"category\": \"electronics\", \"price\": 299.99})\n\n# Add users\nrec_system.add_user_profile(\"user_1\", profile_vector, {\"age\": 25, \"location\": \"NY\"})\n\n# Record interactions\nrec_system.record_interaction(\"user_1\", \"item_1\", \"purchase\", rating=4.5)\n\n# Get recommendations\nrecommendations = rec_system.get_recommendations(\"user_1\", k=10)\nfor rec in recommendations:\n print(f\"Recommended: {rec['id']} (similarity: {1-rec['distance']:.3f})\")\n```\n\n## Why Choose NusterDB?\n\n### Complete Database Solution\n- Full CRUD operations with transaction support\n- Built-in persistence and data durability\n- Comprehensive APIs for production use\n- Multiple storage modes for different use cases\n\n### Enterprise Security\n- Industry-leading security features\n- FIPS 140-2 compliance ready\n- Quantum-resistant cryptography\n- Comprehensive audit logging and access control\n\n### High Performance\n- Advanced algorithms optimized for different workloads\n- Hardware acceleration (SIMD, multi-threading)\n- Memory-efficient with zero-copy access\n- Intelligent caching for large datasets\n\n### Developer Friendly\n- Single unified API for all storage modes\n- Simple installation with pip\n- Extensive documentation and examples\n- Type hints and comprehensive error handling\n\n## Use Cases\n\n### Recommended For:\n- **AI/ML Applications** requiring fast similarity search\n- **Production Systems** needing reliability and persistence \n- **Enterprise Environments** with security requirements\n- **Large-Scale Deployments** requiring monitoring and ops tools\n- **Sensitive Data** needing encryption and compliance\n- **Microservices** architecture with API-first design\n\n### Common Applications:\n- Semantic search and document retrieval\n- Image and video similarity search\n- Recommendation systems\n- Anomaly detection\n- Content-based filtering\n- Knowledge base search\n\n## Links & Resources\n\n- [Documentation](https://docs.nusterai.com/nusterdb)\n- [Issues](https://github.com/NusterAI/nusterdb/issues)\n- [Discussions](https://github.com/NusterAI/nusterdb/discussions)\n- [Performance Benchmarks](https://github.com/NusterAI/nusterdb/blob/main/docs/PERFORMANCE_ANALYSIS.md)\n- [Security Guide](https://github.com/NusterAI/nusterdb/blob/main/docs/SECURITY.md)\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n---\n\n**Ready to build with high-performance vector search?**\n\n```bash\npip install nusterdb\n```\n\nGet enterprise-grade vector database with security, persistence, and production features!\n",
"bugtrack_url": null,
"license": null,
"summary": "High-performance, government-grade vector database with advanced indexing algorithms",
"version": "2.1.3",
"project_urls": {
"Bug Tracker": "https://github.com/NusterAI/nusterdb/issues",
"Changelog": "https://github.com/NusterAI/nusterdb/blob/main/CHANGELOG.md",
"Documentation": "https://docs.nusterai.com/nusterdb",
"Homepage": "https://github.com/NusterAI/nusterdb",
"Repository": "https://github.com/NusterAI/nusterdb"
},
"split_keywords": [
"vector database",
" similarity search",
" machine learning",
" ai",
" government-grade",
" security",
" fips",
" quantum-resistant"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "17aca280926362ac3adbe971d715403e6d2da436a6d8f280e4b4cbba7fc2f035",
"md5": "fb5c6faf88785b937b03ee2445c2f586",
"sha256": "326f6b2d97b39a64033f2169306e8084b8c55c4151e1f2a57a9b9070d4377950"
},
"downloads": -1,
"filename": "nusterdb-2.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fb5c6faf88785b937b03ee2445c2f586",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 38119,
"upload_time": "2025-08-04T01:12:17",
"upload_time_iso_8601": "2025-08-04T01:12:17.877882Z",
"url": "https://files.pythonhosted.org/packages/17/ac/a280926362ac3adbe971d715403e6d2da436a6d8f280e4b4cbba7fc2f035/nusterdb-2.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-04 01:12:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "NusterAI",
"github_project": "nusterdb",
"github_not_found": true,
"lcname": "nusterdb"
}