# MatrixOne Python SDK
[](https://badge.fury.io/py/matrixone-python-sdk)
[](https://pypi.org/project/matrixone-python-sdk/)
[](https://matrixone.readthedocs.io/en/latest/?badge=latest)
[](https://opensource.org/licenses/Apache-2.0)
A comprehensive, high-level Python SDK for MatrixOne that provides SQLAlchemy-like interface for database operations, vector similarity search, fulltext search, snapshot management, PITR, restore operations, table cloning, and more.
---
## π Documentation
**[π Complete Documentation on ReadTheDocs](https://matrixone.readthedocs.io/)** β
**Quick Links:**
- π [Quick Start Guide](https://matrixone.readthedocs.io/en/latest/quickstart.html)
- π§ [Vector Search & IVF Index Monitoring](https://matrixone.readthedocs.io/en/latest/vector_guide.html)
- π [Best Practices](https://matrixone.readthedocs.io/en/latest/best_practices.html)
- π [API Reference](https://matrixone.readthedocs.io/en/latest/api/index.html)
---
## β¨ Features
- π **High Performance**: Optimized for MatrixOne database operations with connection pooling
- π **Async Support**: Full async/await support with AsyncClient for non-blocking operations
- π§ **Vector Search**: Advanced vector similarity search with HNSW and IVF indexing
- Support for f32 and f64 precision vectors
- Multiple distance metrics (L2, Cosine, Inner Product)
- β **IVF Index Health Monitoring** with `get_ivf_stats()` - Critical for production!
- High-performance indexing for AI/ML applications
- π **Fulltext Search**: Powerful fulltext indexing and search with BM25 and TF-IDF
- Natural language and boolean search modes
- Multi-column indexes with relevance scoring
- π **Metadata Analysis**: Table and column metadata analysis with statistics
- πΈ **Snapshot Management**: Create and manage database snapshots at multiple levels
- β° **Point-in-Time Recovery**: PITR functionality for precise data recovery
- π **Table Cloning**: Clone databases and tables efficiently
- π₯ **Account Management**: Comprehensive user and role management
- π **Pub/Sub**: Real-time publication and subscription support
- π§ **Version Management**: Automatic backend version detection and compatibility
- π‘οΈ **Type Safety**: Full type hints support with comprehensive documentation
- π **SQLAlchemy Integration**: Seamless SQLAlchemy ORM integration with enhanced features
## π Installation
### Using pip (Recommended)
```bash
pip install matrixone-python-sdk
```
### Install from test.pypi (Latest Pre-release)
```bash
pip install \
--index-url https://test.pypi.org/simple/ \
--extra-index-url https://pypi.org/simple/ \
matrixone-python-sdk
```
**Note**: The `--extra-index-url` is required to install dependencies from the official PyPI.
### Using Virtual Environment (Best Practice)
```bash
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate
# Install MatrixOne SDK
pip install matrixone-python-sdk
# Verify installation
python -c "import matrixone; print('MatrixOne SDK installed successfully')"
```
### Using Conda
```bash
# Create conda environment
conda create -n matrixone python=3.10
conda activate matrixone
# Install MatrixOne SDK
pip install matrixone-python-sdk
```
## Quick Start
### Basic Usage
```python
from matrixone import Client
# Create and connect to MatrixOne
client = Client()
client.connect(
host='localhost',
port=6001,
user='root',
password='111',
database='test'
)
# Execute queries
result = client.execute("SELECT 1 as test")
print(result.fetchall())
# Get backend version (auto-detected)
version = client.get_backend_version()
print(f"MatrixOne version: {version}")
client.disconnect()
```
> **π Connection Parameters**
>
> The `connect()` method requires **keyword arguments** (not positional):
> - `database` - **Required**, no default value
> - `host` - Default: `'localhost'`
> - `port` - Default: `6001`
> - `user` - Default: `'root'`
> - `password` - Default: `'111'`
>
> **Minimal connection** (uses all defaults):
> ```python
> client.connect(database='test')
> ```
>
> By default, all features (IVF, HNSW, fulltext) are automatically enabled via `on_connect=[ConnectionAction.ENABLE_ALL]`.
### Async Usage
```python
import asyncio
from matrixone import AsyncClient
async def main():
client = AsyncClient()
await client.connect(
host='localhost',
port=6001,
user='root',
password='111',
database='test'
)
result = await client.execute("SELECT 1 as test")
print(result.fetchall())
await client.disconnect()
asyncio.run(main())
```
### Snapshot Management
```python
# Create a snapshot
snapshot = client.snapshots.create(
'my_snapshot',
'cluster',
description='Backup before migration'
)
# List snapshots
snapshots = client.snapshots.list()
for snap in snapshots:
print(f"Snapshot: {snap.name}, Created: {snap.created_at}")
# Clone database from snapshot
client.clone.clone_database(
'new_database',
'old_database',
snapshot_name='my_snapshot'
)
```
### Version Management
```python
# Check if feature is available
if client.is_feature_available('snapshot_creation'):
snapshot = client.snapshots.create('my_snapshot', 'cluster')
else:
hint = client.get_version_hint('snapshot_creation')
print(f"Feature not available: {hint}")
# Check version compatibility
if client.check_version_compatibility('3.0.0', '>='):
print("Backend supports 3.0.0+ features")
```
## MatrixOne Version Support
The SDK automatically detects MatrixOne backend versions and handles compatibility:
- **Development Version**: `8.0.30-MatrixOne-v` β `999.0.0` (highest priority)
- **Release Version**: `8.0.30-MatrixOne-v3.0.0` β `3.0.0`
- **Legacy Format**: `MatrixOne 3.0.1` β `3.0.1`
```python
# Check if running development version
if client.is_development_version():
print("Running development version - all features available")
else:
print(f"Running release version: {client.get_backend_version()}")
```
## Advanced Features
### PITR (Point-in-Time Recovery)
```python
# Create PITR for cluster
pitr = client.pitr.create_cluster_pitr(
'cluster_pitr',
range_value=7,
range_unit='d'
)
# Restore cluster from snapshot
client.restore.restore_cluster('my_snapshot')
```
### Account Management
```python
from matrixone.account import AccountManager
# Initialize account manager
account_manager = AccountManager(client)
# Create user
user = account_manager.create_user('newuser', 'password123')
print(f"Created user: {user.name}")
# Create role
role = account_manager.create_role('analyst')
print(f"Created role: {role.name}")
# Grant privileges on specific table (optional)
# Note: table must exist first
account_manager.grant_privilege(
'SELECT', # privilege
'TABLE', # object_type
'users', # object_name (database.table format)
to_role='analyst'
)
# Grant role to user
account_manager.grant_role('analyst', 'newuser')
print(f"Granted role to user")
# List users
users = account_manager.list_users()
for user in users:
print(f"User: {user.name}")
```
### Vector Search Operations
```python
from matrixone import Client
from matrixone.sqlalchemy_ext import create_vector_column
from matrixone.orm import declarative_base
from sqlalchemy import Column, BigInteger, String, Text
import numpy as np
# Create client and connect
client = Client()
client.connect(
host='localhost',
port=6001,
user='root',
password='111',
database='test'
)
# Define vector table using MatrixOne ORM
Base = declarative_base()
class Document(Base):
__tablename__ = 'documents'
# IMPORTANT: HNSW index requires BigInteger (BIGINT) primary key
id = Column(BigInteger, primary_key=True, autoincrement=True)
title = Column(String(200))
content = Column(Text)
embedding = create_vector_column(384, precision='f32')
# Create table using client API (not Base.metadata.create_all)
client.create_table(Document)
# Create HNSW index using SDK (not SQL)
client.vector_ops.enable_hnsw()
client.vector_ops.create_hnsw(
'documents', # table name or model - positional argument
name='idx_embedding',
column='embedding',
m=16,
ef_construction=200
)
# Insert vector data using client API
client.insert(Document, {
'title': 'Machine Learning Guide',
'content': 'Comprehensive ML tutorial...',
'embedding': np.random.rand(384).tolist()
})
# Search similar documents using SDK
query_vector = np.random.rand(384).tolist()
results = client.vector_ops.similarity_search(
'documents', # table name or model - positional argument
vector_column='embedding',
query_vector=query_vector,
limit=5,
distance_type='cosine'
)
for row in results:
print(f"Document: {row[1]}, Similarity: {row[-1]}")
# Cleanup
client.drop_table(Document) # Use client API
client.disconnect()
```
### β IVF Index Health Monitoring (Production Critical)
**Monitor your IVF indexes to ensure optimal performance!**
```python
from matrixone import Client
import numpy as np
client = Client()
client.connect(host='localhost', port=6001, user='root', password='111', database='test')
# After creating IVF index and inserting data...
# Get IVF index statistics
stats = client.vector_ops.get_ivf_stats("documents", "embedding")
# Analyze index balance
counts = stats['distribution']['centroid_count']
total_centroids = len(counts)
total_vectors = sum(counts)
min_count = min(counts) if counts else 0
max_count = max(counts) if counts else 0
balance_ratio = max_count / min_count if min_count > 0 else float('inf')
print(f"π IVF Index Health Report:")
print(f" - Total centroids: {total_centroids}")
print(f" - Total vectors: {total_vectors}")
print(f" - Balance ratio: {balance_ratio:.2f}")
print(f" - Min vectors in centroid: {min_count}")
print(f" - Max vectors in centroid: {max_count}")
# Check if index needs rebuilding
if balance_ratio > 2.5:
print("β οΈ WARNING: Index is imbalanced and needs rebuilding!")
print(" Rebuild the index for optimal performance:")
# Rebuild process
client.vector_ops.drop("documents", "idx_embedding")
client.vector_ops.create_ivf(
"documents",
name="idx_embedding",
column="embedding",
lists=100
)
print("β
Index rebuilt successfully")
else:
print("β
Index is healthy and well-balanced")
client.disconnect()
```
**Why IVF Stats Matter:**
- π― **Performance**: Unbalanced indexes lead to slow searches
- π **Load Distribution**: Identify hot spots and imbalances
- π **Rebuild Timing**: Know when to rebuild for optimal performance
- π **Capacity Planning**: Understand data distribution patterns
**When to Rebuild:**
- Balance ratio > 2.5 (moderate imbalance)
- Balance ratio > 3.0 (severe imbalance - rebuild immediately)
- After bulk inserts (>20% of data)
- Performance degradation in searches
### Fulltext Search Operations
```python
from matrixone import Client
from matrixone.sqlalchemy_ext.fulltext_search import boolean_match
from matrixone.orm import declarative_base
from sqlalchemy import Column, Integer, String, Text
# Create client and connect
client = Client()
client.connect(
host='localhost',
port=6001,
user='root',
password='111',
database='test'
)
# Define model using MatrixOne ORM
Base = declarative_base()
class Article(Base):
__tablename__ = 'articles'
id = Column(Integer, primary_key=True, autoincrement=True)
title = Column(String(200), nullable=False)
content = Column(Text, nullable=False)
category = Column(String(100))
# Create table using client API (not Base.metadata.create_all)
client.create_table(Article)
# Insert some data using client API
articles = [
{'title': 'Machine Learning Guide',
'content': 'Comprehensive machine learning tutorial...',
'category': 'AI'},
{'title': 'Python Programming',
'content': 'Learn Python programming basics',
'category': 'Programming'},
]
client.batch_insert(Article, articles)
# Create fulltext index using SDK (not SQL)
client.fulltext_index.create(
'articles', # table name - positional argument
name='ftidx_content',
columns=['title', 'content']
)
# Boolean search with encourage (like natural language)
results = client.query(
Article.title,
Article.content,
boolean_match('title', 'content').encourage('machine learning tutorial')
).execute()
# Boolean search with must/should operators
results = client.query(
Article.title,
Article.content,
boolean_match('title', 'content')
.must('machine')
.must('learning')
.must_not('basics')
).execute()
# Results is a ResultSet object
for row in results.rows:
print(f"Title: {row[0]}, Content: {row[1][:50]}...")
# Cleanup
client.drop_table(Article) # Use client API
client.disconnect()
```
### Metadata Analysis
```python
from matrixone import Client
# Create client and connect
client = Client()
client.connect(
host='localhost',
port=6001,
user='root',
password='111',
database='test'
)
# Analyze table metadata - returns structured MetadataRow objects
metadata_rows = client.metadata.scan(
dbname='test',
tablename='documents',
columns='*' # Get all columns
)
for row in metadata_rows:
print(f"Column: {row.col_name}")
print(f" Rows count: {row.rows_cnt}")
print(f" Null count: {row.null_cnt}")
print(f" Size: {row.origin_size}")
# Get table brief statistics
brief_stats = client.metadata.get_table_brief_stats(
dbname='test',
tablename='documents'
)
table_stats = brief_stats['documents']
print(f"Total rows: {table_stats['row_cnt']}")
print(f"Total nulls: {table_stats['null_cnt']}")
print(f"Original size: {table_stats['original_size']}")
print(f"Compressed size: {table_stats['compress_size']}")
client.disconnect()
```
### Pub/Sub Operations
```python
# List publications
publications = client.pubsub.list_publications()
for pub in publications:
print(f"Publication: {pub}")
# List subscriptions
subscriptions = client.pubsub.list_subscriptions()
for sub in subscriptions:
print(f"Subscription: {sub}")
# Drop publication/subscription when needed
try:
client.pubsub.drop_publication("test_publication")
client.pubsub.drop_subscription("test_subscription")
except Exception as e:
print(f"Cleanup: {e}")
```
## Configuration
### Connection Parameters
```python
client = Client(
connection_timeout=30,
query_timeout=300,
auto_commit=True,
charset='utf8mb4',
sql_log_mode='auto', # 'off', 'simple', 'auto', 'full'
slow_query_threshold=1.0
)
```
### Logging Configuration
```python
from matrixone import Client
from matrixone.logger import create_default_logger
import logging
# Create custom logger
logger = create_default_logger(
level=logging.INFO,
sql_log_mode='auto', # 'off', 'simple', 'auto', 'full'
slow_query_threshold=1.0,
max_sql_display_length=500
)
# Use custom logger with client
client = Client(logger=logger)
```
## Error Handling
The SDK provides comprehensive error handling with helpful messages:
```python
from matrixone.exceptions import (
ConnectionError,
QueryError,
VersionError,
SnapshotError
)
try:
snapshot = client.snapshots.create('test', 'cluster')
except VersionError as e:
print(f"Version compatibility error: {e}")
except SnapshotError as e:
print(f"Snapshot operation failed: {e}")
```
## π Links
- **π Full Documentation**: https://matrixone.readthedocs.io/
- **π¦ PyPI Package**: https://pypi.org/project/matrixone-python-sdk/
- **π» GitHub Repository**: https://github.com/matrixorigin/matrixone/tree/main/clients/python
- **π MatrixOne Docs**: https://docs.matrixorigin.cn/
### Online Examples
The SDK includes 25+ comprehensive examples covering all features:
**Getting Started:**
- Basic connection and database operations
- Async/await operations
- Transaction management
- SQLAlchemy ORM integration
**Vector Search:**
- Vector data types and distance functions
- IVF and HNSW index creation and tuning
- β **IVF Index Health Monitoring** - Essential for production systems
- Similarity search operations
- Advanced vector optimizations and index rebuilding
**Advanced Features:**
- Fulltext search with BM25/TF-IDF
- Table metadata analysis
- Snapshot and restore operations
- Account and permission management
- Pub/Sub operations
- Connection hooks and logging
### Quick Examples
Clone the repository to access all examples:
```bash
git clone https://github.com/matrixorigin/matrixone.git
cd matrixone/clients/python/examples
# Run basic example
python example_01_basic_connection.py
# Run vector search example
python example_12_vector_basics.py
# Run metadata analysis example
python example_25_metadata_operations.py
```
## Support
- π§ Email: contact@matrixorigin.cn
- π Issues: [GitHub Issues](https://github.com/matrixorigin/matrixone/issues)
- π¬ Discussions: [GitHub Discussions](https://github.com/matrixorigin/matrixone/discussions)
- π Documentation:
- [MatrixOne Docs (English)](https://docs.matrixorigin.cn/en)
- [MatrixOne Docs (δΈζ)](https://docs.matrixorigin.cn/)
## License
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
---
**MatrixOne Python SDK** - Making MatrixOne database operations simple and powerful in Python.
Raw data
{
"_id": null,
"home_page": "https://github.com/matrixorigin/matrixone",
"name": "matrixone-python-sdk",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "MatrixOne Team <dev@matrixone.io>",
"keywords": "matrixone, database, vector, search, sqlalchemy, python",
"author": "MatrixOne Team",
"author_email": "MatrixOne Team <dev@matrixone.io>",
"download_url": "https://files.pythonhosted.org/packages/d3/69/832fd8d2d42e6add4b78270e7327f2393fefea24913cac75913c723ab290/matrixone_python_sdk-0.1.4.tar.gz",
"platform": null,
"description": "# MatrixOne Python SDK\n\n[](https://badge.fury.io/py/matrixone-python-sdk)\n[](https://pypi.org/project/matrixone-python-sdk/)\n[](https://matrixone.readthedocs.io/en/latest/?badge=latest)\n[](https://opensource.org/licenses/Apache-2.0)\n\nA comprehensive, high-level Python SDK for MatrixOne that provides SQLAlchemy-like interface for database operations, vector similarity search, fulltext search, snapshot management, PITR, restore operations, table cloning, and more.\n\n---\n\n## \ud83d\udcda Documentation\n\n**[\ud83d\udcd6 Complete Documentation on ReadTheDocs](https://matrixone.readthedocs.io/)** \u2b50\n\n**Quick Links:**\n- \ud83d\ude80 [Quick Start Guide](https://matrixone.readthedocs.io/en/latest/quickstart.html)\n- \ud83e\udde0 [Vector Search & IVF Index Monitoring](https://matrixone.readthedocs.io/en/latest/vector_guide.html)\n- \ud83d\udccb [Best Practices](https://matrixone.readthedocs.io/en/latest/best_practices.html)\n- \ud83d\udcd6 [API Reference](https://matrixone.readthedocs.io/en/latest/api/index.html)\n\n---\n\n## \u2728 Features\n\n- \ud83d\ude80 **High Performance**: Optimized for MatrixOne database operations with connection pooling\n- \ud83d\udd04 **Async Support**: Full async/await support with AsyncClient for non-blocking operations\n- \ud83e\udde0 **Vector Search**: Advanced vector similarity search with HNSW and IVF indexing\n - Support for f32 and f64 precision vectors\n - Multiple distance metrics (L2, Cosine, Inner Product)\n - \u2b50 **IVF Index Health Monitoring** with `get_ivf_stats()` - Critical for production!\n - High-performance indexing for AI/ML applications\n- \ud83d\udd0d **Fulltext Search**: Powerful fulltext indexing and search with BM25 and TF-IDF\n - Natural language and boolean search modes\n - Multi-column indexes with relevance scoring\n- \ud83d\udcca **Metadata Analysis**: Table and column metadata analysis with statistics\n- \ud83d\udcf8 **Snapshot Management**: Create and manage database snapshots at multiple levels\n- \u23f0 **Point-in-Time Recovery**: PITR functionality for precise data recovery\n- \ud83d\udd04 **Table Cloning**: Clone databases and tables efficiently\n- \ud83d\udc65 **Account Management**: Comprehensive user and role management\n- \ud83d\udcca **Pub/Sub**: Real-time publication and subscription support\n- \ud83d\udd27 **Version Management**: Automatic backend version detection and compatibility\n- \ud83d\udee1\ufe0f **Type Safety**: Full type hints support with comprehensive documentation\n- \ud83d\udcda **SQLAlchemy Integration**: Seamless SQLAlchemy ORM integration with enhanced features\n\n## \ud83d\ude80 Installation\n\n### Using pip (Recommended)\n\n```bash\npip install matrixone-python-sdk\n```\n\n### Install from test.pypi (Latest Pre-release)\n\n```bash\npip install \\\n --index-url https://test.pypi.org/simple/ \\\n --extra-index-url https://pypi.org/simple/ \\\n matrixone-python-sdk\n```\n\n**Note**: The `--extra-index-url` is required to install dependencies from the official PyPI.\n\n### Using Virtual Environment (Best Practice)\n\n```bash\n# Create virtual environment\npython -m venv venv\n\n# Activate virtual environment\n# On macOS/Linux:\nsource venv/bin/activate\n# On Windows:\n# venv\\Scripts\\activate\n\n# Install MatrixOne SDK\npip install matrixone-python-sdk\n\n# Verify installation\npython -c \"import matrixone; print('MatrixOne SDK installed successfully')\"\n```\n\n### Using Conda\n\n```bash\n# Create conda environment\nconda create -n matrixone python=3.10\nconda activate matrixone\n\n# Install MatrixOne SDK\npip install matrixone-python-sdk\n```\n\n## Quick Start\n\n### Basic Usage\n\n```python\nfrom matrixone import Client\n\n# Create and connect to MatrixOne\nclient = Client()\nclient.connect(\n host='localhost',\n port=6001,\n user='root',\n password='111',\n database='test'\n)\n\n# Execute queries\nresult = client.execute(\"SELECT 1 as test\")\nprint(result.fetchall())\n\n# Get backend version (auto-detected)\nversion = client.get_backend_version()\nprint(f\"MatrixOne version: {version}\")\n\nclient.disconnect()\n```\n\n> **\ud83d\udcdd Connection Parameters**\n> \n> The `connect()` method requires **keyword arguments** (not positional):\n> - `database` - **Required**, no default value\n> - `host` - Default: `'localhost'`\n> - `port` - Default: `6001`\n> - `user` - Default: `'root'`\n> - `password` - Default: `'111'`\n> \n> **Minimal connection** (uses all defaults):\n> ```python\n> client.connect(database='test')\n> ```\n> \n> By default, all features (IVF, HNSW, fulltext) are automatically enabled via `on_connect=[ConnectionAction.ENABLE_ALL]`.\n\n### Async Usage\n\n```python\nimport asyncio\nfrom matrixone import AsyncClient\n\nasync def main():\n client = AsyncClient()\n await client.connect(\n host='localhost',\n port=6001,\n user='root',\n password='111',\n database='test'\n )\n \n result = await client.execute(\"SELECT 1 as test\")\n print(result.fetchall())\n \n await client.disconnect()\n\nasyncio.run(main())\n```\n\n### Snapshot Management\n\n```python\n# Create a snapshot\nsnapshot = client.snapshots.create(\n 'my_snapshot',\n 'cluster',\n description='Backup before migration'\n)\n\n# List snapshots\nsnapshots = client.snapshots.list()\nfor snap in snapshots:\n print(f\"Snapshot: {snap.name}, Created: {snap.created_at}\")\n\n# Clone database from snapshot\nclient.clone.clone_database(\n 'new_database',\n 'old_database',\n snapshot_name='my_snapshot'\n)\n```\n\n### Version Management\n\n```python\n# Check if feature is available\nif client.is_feature_available('snapshot_creation'):\n snapshot = client.snapshots.create('my_snapshot', 'cluster')\nelse:\n hint = client.get_version_hint('snapshot_creation')\n print(f\"Feature not available: {hint}\")\n\n# Check version compatibility\nif client.check_version_compatibility('3.0.0', '>='):\n print(\"Backend supports 3.0.0+ features\")\n```\n\n## MatrixOne Version Support\n\nThe SDK automatically detects MatrixOne backend versions and handles compatibility:\n\n- **Development Version**: `8.0.30-MatrixOne-v` \u2192 `999.0.0` (highest priority)\n- **Release Version**: `8.0.30-MatrixOne-v3.0.0` \u2192 `3.0.0`\n- **Legacy Format**: `MatrixOne 3.0.1` \u2192 `3.0.1`\n\n```python\n# Check if running development version\nif client.is_development_version():\n print(\"Running development version - all features available\")\nelse:\n print(f\"Running release version: {client.get_backend_version()}\")\n```\n\n## Advanced Features\n\n### PITR (Point-in-Time Recovery)\n\n```python\n# Create PITR for cluster\npitr = client.pitr.create_cluster_pitr(\n 'cluster_pitr',\n range_value=7,\n range_unit='d'\n)\n\n# Restore cluster from snapshot\nclient.restore.restore_cluster('my_snapshot')\n```\n\n### Account Management\n\n```python\nfrom matrixone.account import AccountManager\n\n# Initialize account manager\naccount_manager = AccountManager(client)\n\n# Create user\nuser = account_manager.create_user('newuser', 'password123')\nprint(f\"Created user: {user.name}\")\n\n# Create role \nrole = account_manager.create_role('analyst')\nprint(f\"Created role: {role.name}\")\n\n# Grant privileges on specific table (optional)\n# Note: table must exist first\naccount_manager.grant_privilege(\n 'SELECT', # privilege\n 'TABLE', # object_type\n 'users', # object_name (database.table format)\n to_role='analyst'\n)\n\n# Grant role to user\naccount_manager.grant_role('analyst', 'newuser')\nprint(f\"Granted role to user\")\n\n# List users\nusers = account_manager.list_users()\nfor user in users:\n print(f\"User: {user.name}\")\n```\n\n### Vector Search Operations\n\n```python\nfrom matrixone import Client\nfrom matrixone.sqlalchemy_ext import create_vector_column\nfrom matrixone.orm import declarative_base\nfrom sqlalchemy import Column, BigInteger, String, Text\nimport numpy as np\n\n# Create client and connect\nclient = Client()\nclient.connect(\n host='localhost',\n port=6001,\n user='root',\n password='111',\n database='test'\n)\n\n# Define vector table using MatrixOne ORM\nBase = declarative_base()\n\nclass Document(Base):\n __tablename__ = 'documents'\n # IMPORTANT: HNSW index requires BigInteger (BIGINT) primary key\n id = Column(BigInteger, primary_key=True, autoincrement=True)\n title = Column(String(200))\n content = Column(Text)\n embedding = create_vector_column(384, precision='f32')\n\n# Create table using client API (not Base.metadata.create_all)\nclient.create_table(Document)\n\n# Create HNSW index using SDK (not SQL)\nclient.vector_ops.enable_hnsw()\nclient.vector_ops.create_hnsw(\n 'documents', # table name or model - positional argument\n name='idx_embedding',\n column='embedding',\n m=16,\n ef_construction=200\n)\n\n# Insert vector data using client API\nclient.insert(Document, {\n 'title': 'Machine Learning Guide',\n 'content': 'Comprehensive ML tutorial...',\n 'embedding': np.random.rand(384).tolist()\n})\n\n# Search similar documents using SDK\nquery_vector = np.random.rand(384).tolist()\nresults = client.vector_ops.similarity_search(\n 'documents', # table name or model - positional argument\n vector_column='embedding',\n query_vector=query_vector,\n limit=5,\n distance_type='cosine'\n)\n\nfor row in results:\n print(f\"Document: {row[1]}, Similarity: {row[-1]}\")\n\n# Cleanup\nclient.drop_table(Document) # Use client API\nclient.disconnect()\n```\n\n### \u2b50 IVF Index Health Monitoring (Production Critical)\n\n**Monitor your IVF indexes to ensure optimal performance!**\n\n```python\nfrom matrixone import Client\nimport numpy as np\n\nclient = Client()\nclient.connect(host='localhost', port=6001, user='root', password='111', database='test')\n\n# After creating IVF index and inserting data...\n\n# Get IVF index statistics\nstats = client.vector_ops.get_ivf_stats(\"documents\", \"embedding\")\n\n# Analyze index balance\ncounts = stats['distribution']['centroid_count']\ntotal_centroids = len(counts)\ntotal_vectors = sum(counts)\nmin_count = min(counts) if counts else 0\nmax_count = max(counts) if counts else 0\nbalance_ratio = max_count / min_count if min_count > 0 else float('inf')\n\nprint(f\"\ud83d\udcca IVF Index Health Report:\")\nprint(f\" - Total centroids: {total_centroids}\")\nprint(f\" - Total vectors: {total_vectors}\")\nprint(f\" - Balance ratio: {balance_ratio:.2f}\")\nprint(f\" - Min vectors in centroid: {min_count}\")\nprint(f\" - Max vectors in centroid: {max_count}\")\n\n# Check if index needs rebuilding\nif balance_ratio > 2.5:\n print(\"\u26a0\ufe0f WARNING: Index is imbalanced and needs rebuilding!\")\n print(\" Rebuild the index for optimal performance:\")\n \n # Rebuild process\n client.vector_ops.drop(\"documents\", \"idx_embedding\")\n client.vector_ops.create_ivf(\n \"documents\",\n name=\"idx_embedding\",\n column=\"embedding\",\n lists=100\n )\n print(\"\u2705 Index rebuilt successfully\")\nelse:\n print(\"\u2705 Index is healthy and well-balanced\")\n\nclient.disconnect()\n```\n\n**Why IVF Stats Matter:**\n- \ud83c\udfaf **Performance**: Unbalanced indexes lead to slow searches\n- \ud83d\udcca **Load Distribution**: Identify hot spots and imbalances\n- \ud83d\udd04 **Rebuild Timing**: Know when to rebuild for optimal performance\n- \ud83d\udcc8 **Capacity Planning**: Understand data distribution patterns\n\n**When to Rebuild:**\n- Balance ratio > 2.5 (moderate imbalance)\n- Balance ratio > 3.0 (severe imbalance - rebuild immediately)\n- After bulk inserts (>20% of data)\n- Performance degradation in searches\n\n### Fulltext Search Operations\n\n```python\nfrom matrixone import Client\nfrom matrixone.sqlalchemy_ext.fulltext_search import boolean_match\nfrom matrixone.orm import declarative_base\nfrom sqlalchemy import Column, Integer, String, Text\n\n# Create client and connect\nclient = Client()\nclient.connect(\n host='localhost',\n port=6001,\n user='root',\n password='111',\n database='test'\n)\n\n# Define model using MatrixOne ORM\nBase = declarative_base()\n\nclass Article(Base):\n __tablename__ = 'articles'\n id = Column(Integer, primary_key=True, autoincrement=True)\n title = Column(String(200), nullable=False)\n content = Column(Text, nullable=False)\n category = Column(String(100))\n\n# Create table using client API (not Base.metadata.create_all)\nclient.create_table(Article)\n\n# Insert some data using client API\narticles = [\n {'title': 'Machine Learning Guide', \n 'content': 'Comprehensive machine learning tutorial...', \n 'category': 'AI'},\n {'title': 'Python Programming', \n 'content': 'Learn Python programming basics', \n 'category': 'Programming'},\n]\nclient.batch_insert(Article, articles)\n\n# Create fulltext index using SDK (not SQL)\nclient.fulltext_index.create(\n 'articles', # table name - positional argument\n name='ftidx_content',\n columns=['title', 'content']\n)\n\n# Boolean search with encourage (like natural language)\nresults = client.query(\n Article.title,\n Article.content,\n boolean_match('title', 'content').encourage('machine learning tutorial')\n).execute()\n\n# Boolean search with must/should operators\nresults = client.query(\n Article.title,\n Article.content,\n boolean_match('title', 'content')\n .must('machine')\n .must('learning')\n .must_not('basics')\n).execute()\n\n# Results is a ResultSet object\nfor row in results.rows:\n print(f\"Title: {row[0]}, Content: {row[1][:50]}...\")\n\n# Cleanup\nclient.drop_table(Article) # Use client API\nclient.disconnect()\n```\n\n### Metadata Analysis\n\n```python\nfrom matrixone import Client\n\n# Create client and connect\nclient = Client()\nclient.connect(\n host='localhost',\n port=6001,\n user='root',\n password='111',\n database='test'\n)\n\n# Analyze table metadata - returns structured MetadataRow objects\nmetadata_rows = client.metadata.scan(\n dbname='test',\n tablename='documents',\n columns='*' # Get all columns\n)\n\nfor row in metadata_rows:\n print(f\"Column: {row.col_name}\")\n print(f\" Rows count: {row.rows_cnt}\")\n print(f\" Null count: {row.null_cnt}\")\n print(f\" Size: {row.origin_size}\")\n\n# Get table brief statistics\nbrief_stats = client.metadata.get_table_brief_stats(\n dbname='test',\n tablename='documents'\n)\n\ntable_stats = brief_stats['documents']\nprint(f\"Total rows: {table_stats['row_cnt']}\")\nprint(f\"Total nulls: {table_stats['null_cnt']}\")\nprint(f\"Original size: {table_stats['original_size']}\")\nprint(f\"Compressed size: {table_stats['compress_size']}\")\n\nclient.disconnect()\n```\n\n### Pub/Sub Operations\n\n```python\n# List publications\npublications = client.pubsub.list_publications()\nfor pub in publications:\n print(f\"Publication: {pub}\")\n\n# List subscriptions\nsubscriptions = client.pubsub.list_subscriptions()\nfor sub in subscriptions:\n print(f\"Subscription: {sub}\")\n\n# Drop publication/subscription when needed\ntry:\n client.pubsub.drop_publication(\"test_publication\")\n client.pubsub.drop_subscription(\"test_subscription\")\nexcept Exception as e:\n print(f\"Cleanup: {e}\")\n```\n\n## Configuration\n\n### Connection Parameters\n\n```python\nclient = Client(\n connection_timeout=30,\n query_timeout=300,\n auto_commit=True,\n charset='utf8mb4',\n sql_log_mode='auto', # 'off', 'simple', 'auto', 'full'\n slow_query_threshold=1.0\n)\n```\n\n### Logging Configuration\n\n```python\nfrom matrixone import Client\nfrom matrixone.logger import create_default_logger\nimport logging\n\n# Create custom logger\nlogger = create_default_logger(\n level=logging.INFO,\n sql_log_mode='auto', # 'off', 'simple', 'auto', 'full'\n slow_query_threshold=1.0,\n max_sql_display_length=500\n)\n\n# Use custom logger with client\nclient = Client(logger=logger)\n```\n\n## Error Handling\n\nThe SDK provides comprehensive error handling with helpful messages:\n\n```python\nfrom matrixone.exceptions import (\n ConnectionError,\n QueryError,\n VersionError,\n SnapshotError\n)\n\ntry:\n snapshot = client.snapshots.create('test', 'cluster')\nexcept VersionError as e:\n print(f\"Version compatibility error: {e}\")\nexcept SnapshotError as e:\n print(f\"Snapshot operation failed: {e}\")\n```\n\n## \ud83d\udd17 Links\n\n- **\ud83d\udcda Full Documentation**: https://matrixone.readthedocs.io/\n- **\ud83d\udce6 PyPI Package**: https://pypi.org/project/matrixone-python-sdk/\n- **\ud83d\udcbb GitHub Repository**: https://github.com/matrixorigin/matrixone/tree/main/clients/python\n- **\ud83c\udf10 MatrixOne Docs**: https://docs.matrixorigin.cn/\n\n### Online Examples\n\nThe SDK includes 25+ comprehensive examples covering all features:\n\n**Getting Started:**\n- Basic connection and database operations\n- Async/await operations\n- Transaction management\n- SQLAlchemy ORM integration\n\n**Vector Search:**\n- Vector data types and distance functions\n- IVF and HNSW index creation and tuning\n- \u2b50 **IVF Index Health Monitoring** - Essential for production systems\n- Similarity search operations\n- Advanced vector optimizations and index rebuilding\n\n**Advanced Features:**\n- Fulltext search with BM25/TF-IDF\n- Table metadata analysis\n- Snapshot and restore operations\n- Account and permission management\n- Pub/Sub operations\n- Connection hooks and logging\n\n### Quick Examples\n\nClone the repository to access all examples:\n```bash\ngit clone https://github.com/matrixorigin/matrixone.git\ncd matrixone/clients/python/examples\n\n# Run basic example\npython example_01_basic_connection.py\n\n# Run vector search example\npython example_12_vector_basics.py\n\n# Run metadata analysis example\npython example_25_metadata_operations.py\n```\n\n\n## Support\n\n- \ud83d\udce7 Email: contact@matrixorigin.cn\n- \ud83d\udc1b Issues: [GitHub Issues](https://github.com/matrixorigin/matrixone/issues)\n- \ud83d\udcac Discussions: [GitHub Discussions](https://github.com/matrixorigin/matrixone/discussions)\n- \ud83d\udcd6 Documentation: \n - [MatrixOne Docs (English)](https://docs.matrixorigin.cn/en)\n - [MatrixOne Docs (\u4e2d\u6587)](https://docs.matrixorigin.cn/)\n\n## License\n\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\n\n---\n\n**MatrixOne Python SDK** - Making MatrixOne database operations simple and powerful in Python.\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "A comprehensive Python SDK for MatrixOne database operations with vector search, fulltext search, and advanced features",
"version": "0.1.4",
"project_urls": {
"Changelog": "https://github.com/matrixorigin/matrixone/blob/main/clients/python/CHANGELOG.md",
"Documentation": "https://matrixone.readthedocs.io/",
"Homepage": "https://github.com/matrixorigin/matrixone",
"Issues": "https://github.com/matrixorigin/matrixone/issues",
"Repository": "https://github.com/matrixorigin/matrixone"
},
"split_keywords": [
"matrixone",
" database",
" vector",
" search",
" sqlalchemy",
" python"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "8c78344a4f787d90b258803413b43d567d3fd2a866153d5c29f31af09f783c30",
"md5": "9b8a307bcaffbd2b3a7cbe4874ec30ac",
"sha256": "2d4206fb692c5f32aa6c426cb5c22c9b43246d918140a9d59c601c2906d59843"
},
"downloads": -1,
"filename": "matrixone_python_sdk-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9b8a307bcaffbd2b3a7cbe4874ec30ac",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 483471,
"upload_time": "2025-10-15T03:34:23",
"upload_time_iso_8601": "2025-10-15T03:34:23.703168Z",
"url": "https://files.pythonhosted.org/packages/8c/78/344a4f787d90b258803413b43d567d3fd2a866153d5c29f31af09f783c30/matrixone_python_sdk-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d369832fd8d2d42e6add4b78270e7327f2393fefea24913cac75913c723ab290",
"md5": "1f98bd66e924955d763435839bb8bb78",
"sha256": "462136662beace530b448705a3627cb456d490943910d8c4b97705141475d475"
},
"downloads": -1,
"filename": "matrixone_python_sdk-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "1f98bd66e924955d763435839bb8bb78",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 253278,
"upload_time": "2025-10-15T03:34:25",
"upload_time_iso_8601": "2025-10-15T03:34:25.495210Z",
"url": "https://files.pythonhosted.org/packages/d3/69/832fd8d2d42e6add4b78270e7327f2393fefea24913cac75913c723ab290/matrixone_python_sdk-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-15 03:34:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "matrixorigin",
"github_project": "matrixone",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "matrixone-python-sdk"
}