Name | thoth-dbmanager JSON |
Version |
0.4.2
JSON |
| download |
home_page | None |
Summary | A Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture. |
upload_time | 2025-07-26 05:53:13 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | MIT |
keywords |
database
sql
lsh
similarity-search
orm
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Thoth Database Manager
A Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.
## Features
- **Multi-Database Support**: PostgreSQL, MySQL, MariaDB, SQLite, SQL Server, Oracle, Informix, and Supabase
- **Plugin Architecture**: Extensible design for adding new database types
- **LSH Search**: Locality-Sensitive Hashing for finding similar values across database columns
- **Type Safety**: Pydantic-based document models for structured data
- **Backward Compatibility**: Maintains compatibility with existing code
## Installation
```bash
pip install thoth-dbmanager
```
## Quick Start
### Basic Usage
```python
from thoth_dbmanager import ThothDbManager
# Create a database manager instance
manager = ThothDbManager.get_instance(
db_type="postgresql",
db_root_path="./data",
db_mode="dev",
host="localhost",
port=5432,
database="mydb",
user="username",
password="password"
)
# Get database information
tables = manager.get_tables()
columns = manager.get_columns("my_table")
foreign_keys = manager.get_foreign_keys()
# Execute SQL queries
results = manager.execute_sql("SELECT * FROM my_table LIMIT 10")
# Get example data for understanding table contents
example_data = manager.get_example_data("my_table", number_of_rows=20)
```
### LSH Similarity Search
```python
# Query for similar values across all database columns
similar_values = manager.query_lsh(
keyword="john doe",
top_n=10,
signature_size=30,
n_gram=3
)
# Results are organized by table and column
for table_name, columns in similar_values.items():
for column_name, values in columns.items():
print(f"Similar values in {table_name}.{column_name}: {values}")
```
## Supported Databases
| Database | Connection Parameters |
|----------|----------------------|
| **PostgreSQL** | `host`, `port`, `database`, `user`, `password` |
| **MySQL** | `host`, `port`, `database`, `user`, `password` |
| **MariaDB** | `host`, `port`, `database`, `user`, `password` |
| **SQLite** | `database_path` |
| **Supabase** | `host`, `port`, `database`, `user`, `password`, `project_url`, `api_key` |
| **SQL Server** | `server`, `database`, `user`, `password` |
| **Oracle** | `host`, `port`, `service_name`, `user`, `password` |
| **Informix** | `server`, `database`, `host`, `user`, `password` |
## API Reference
### ThothDbManager
The main class for database operations.
#### Class Methods
##### `get_instance(db_type: str, **kwargs) -> ThothDbManager`
Creates or retrieves a singleton instance of the database manager.
**Parameters:**
- `db_type` (str): Database type identifier
- `db_root_path` (str): Path to store database-related files
- `db_mode` (str): Operating mode ("dev", "prod", etc.)
- `**kwargs`: Database-specific connection parameters
**Returns:** Database manager instance
#### Instance Methods
##### `execute_sql(sql: str, params: dict = None, fetch: str = "all", timeout: int = 60) -> Any`
Execute SQL queries against the database.
**Parameters:**
- `sql` (str): SQL query string
- `params` (dict, optional): Query parameters for prepared statements
- `fetch` (str): How to fetch results ("all", "one", or number)
- `timeout` (int): Query timeout in seconds
**Returns:** Query results
##### `get_tables() -> List[Dict[str, str]]`
Get list of tables in the database.
**Returns:** List of dictionaries with `name` and `comment` keys
##### `get_columns(table_name: str) -> List[Dict[str, Any]]`
Get column information for a specific table.
**Parameters:**
- `table_name` (str): Name of the table
**Returns:** List of dictionaries with column metadata (`name`, `data_type`, `comment`, `is_pk`)
##### `get_foreign_keys() -> List[Dict[str, str]]`
Get foreign key relationships in the database.
**Returns:** List of dictionaries with foreign key information
##### `get_example_data(table_name: str, number_of_rows: int = 30) -> Dict[str, List[Any]]`
Get the most frequent values for each column in a table.
**Parameters:**
- `table_name` (str): Name of the table
- `number_of_rows` (int): Maximum number of example values per column
**Returns:** Dictionary mapping column names to lists of example values
##### `query_lsh(keyword: str, signature_size: int = 30, n_gram: int = 3, top_n: int = 10) -> Dict[str, Dict[str, List[str]]]`
Search for similar values using LSH (Locality-Sensitive Hashing).
**Parameters:**
- `keyword` (str): Search term
- `signature_size` (int): MinHash signature size
- `n_gram` (int): N-gram size for text processing
- `top_n` (int): Number of similar values to return
**Returns:** Nested dictionary: `{table_name: {column_name: [similar_values]}}`
### Factory Pattern (Alternative API)
For more advanced usage, you can use the factory pattern:
```python
from thoth_dbmanager import ThothDbFactory
# Create manager using factory
manager = ThothDbFactory.create_manager(
db_type="postgresql",
db_root_path="./data",
db_mode="dev",
host="localhost",
port=5432,
database="mydb",
user="username",
password="password"
)
# List available database types
available_dbs = ThothDbFactory.list_available_databases()
# Get required parameters for a database type
params = ThothDbFactory.get_required_parameters("postgresql")
```
## Document-Based API (Advanced)
The library also provides a document-based API for structured data access:
```python
# Get tables as structured documents
if hasattr(manager, 'get_tables_as_documents'):
table_docs = manager.get_tables_as_documents()
for doc in table_docs:
print(f"Table: {doc.table_name}")
print(f"Schema: {doc.schema_name}")
print(f"Comment: {doc.comment}")
# Get columns as structured documents
if hasattr(manager, 'get_columns_as_documents'):
column_docs = manager.get_columns_as_documents("my_table")
for doc in column_docs:
print(f"Column: {doc.column_name} ({doc.data_type})")
print(f"Nullable: {doc.is_nullable}")
print(f"Primary Key: {doc.is_pk}")
```
## Configuration Examples
### PostgreSQL
```python
manager = ThothDbManager.get_instance(
db_type="postgresql",
db_root_path="./data",
db_mode="production",
host="localhost",
port=5432,
database="myapp",
user="dbuser",
password="dbpass"
)
```
### SQLite
```python
manager = ThothDbManager.get_instance(
db_type="sqlite",
db_root_path="./data",
db_mode="dev",
database_path="./data/myapp.db"
)
```
### MySQL/MariaDB
```python
manager = ThothDbManager.get_instance(
db_type="mysql", # or "mariadb"
db_root_path="./data",
db_mode="production",
host="localhost",
port=3306,
database="myapp",
user="dbuser",
password="dbpass"
)
```
## Error Handling
The library provides clear error messages for common issues:
```python
try:
manager = ThothDbManager.get_instance(
db_type="postgresql",
db_root_path="./data",
# Missing required parameters
)
except ValueError as e:
print(f"Configuration error: {e}")
try:
results = manager.execute_sql("SELECT * FROM nonexistent_table")
except Exception as e:
print(f"Query error: {e}")
```
## LSH Search Details
The LSH (Locality-Sensitive Hashing) feature allows you to find similar text values across your database:
1. **Automatic Setup**: LSH indexes are created automatically from your database content
2. **Cross-Column Search**: Search across all text columns in all tables
3. **Fuzzy Matching**: Find similar values even with typos or variations
4. **Configurable**: Adjust similarity sensitivity with parameters
### LSH Use Cases
- **Data Deduplication**: Find duplicate or near-duplicate records
- **Data Quality**: Identify inconsistent data entry
- **Search Enhancement**: Provide "did you mean?" functionality
- **Data Exploration**: Discover related content across tables
## Architecture
The library uses a modern plugin architecture:
- **Plugins**: Database-specific implementations
- **Adapters**: Low-level database operations
- **Factory**: Plugin instantiation and management
- **Documents**: Type-safe data models using Pydantic
- **Registry**: Plugin discovery and registration
This design makes it easy to:
- Add support for new database types
- Maintain consistent APIs across databases
- Extend functionality without breaking existing code
## Contributing
To add support for a new database type:
1. Create an adapter class implementing `DbAdapter`
2. Create a plugin class implementing `DbPlugin`
3. Register the plugin with `@register_plugin("db_type")`
4. Add connection parameter validation
5. Implement required abstract methods
## License
This project is licensed under the MIT License.
## Support
For issues, questions, or contributions, please visit the project repository.
Raw data
{
"_id": null,
"home_page": null,
"name": "thoth-dbmanager",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "database, sql, lsh, similarity-search, orm",
"author": null,
"author_email": "Marco Pancotti <mp@tylconsulting.it>",
"download_url": "https://files.pythonhosted.org/packages/39/3a/e5d6dbc360187cde39623ea672753200a61c6d296d6ee505ad15298403df/thoth_dbmanager-0.4.2.tar.gz",
"platform": null,
"description": "# Thoth Database Manager\n\nA Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.\n\n## Features\n\n- **Multi-Database Support**: PostgreSQL, MySQL, MariaDB, SQLite, SQL Server, Oracle, Informix, and Supabase\n- **Plugin Architecture**: Extensible design for adding new database types\n- **LSH Search**: Locality-Sensitive Hashing for finding similar values across database columns\n- **Type Safety**: Pydantic-based document models for structured data\n- **Backward Compatibility**: Maintains compatibility with existing code\n\n## Installation\n\n```bash\npip install thoth-dbmanager\n```\n\n## Quick Start\n\n### Basic Usage\n\n```python\nfrom thoth_dbmanager import ThothDbManager\n\n# Create a database manager instance\nmanager = ThothDbManager.get_instance(\n db_type=\"postgresql\",\n db_root_path=\"./data\",\n db_mode=\"dev\",\n host=\"localhost\",\n port=5432,\n database=\"mydb\",\n user=\"username\",\n password=\"password\"\n)\n\n# Get database information\ntables = manager.get_tables()\ncolumns = manager.get_columns(\"my_table\")\nforeign_keys = manager.get_foreign_keys()\n\n# Execute SQL queries\nresults = manager.execute_sql(\"SELECT * FROM my_table LIMIT 10\")\n\n# Get example data for understanding table contents\nexample_data = manager.get_example_data(\"my_table\", number_of_rows=20)\n```\n\n### LSH Similarity Search\n\n```python\n# Query for similar values across all database columns\nsimilar_values = manager.query_lsh(\n keyword=\"john doe\",\n top_n=10,\n signature_size=30,\n n_gram=3\n)\n\n# Results are organized by table and column\nfor table_name, columns in similar_values.items():\n for column_name, values in columns.items():\n print(f\"Similar values in {table_name}.{column_name}: {values}\")\n```\n\n## Supported Databases\n\n| Database | Connection Parameters |\n|----------|----------------------|\n| **PostgreSQL** | `host`, `port`, `database`, `user`, `password` |\n| **MySQL** | `host`, `port`, `database`, `user`, `password` |\n| **MariaDB** | `host`, `port`, `database`, `user`, `password` |\n| **SQLite** | `database_path` |\n| **Supabase** | `host`, `port`, `database`, `user`, `password`, `project_url`, `api_key` |\n| **SQL Server** | `server`, `database`, `user`, `password` |\n| **Oracle** | `host`, `port`, `service_name`, `user`, `password` |\n| **Informix** | `server`, `database`, `host`, `user`, `password` |\n\n## API Reference\n\n### ThothDbManager\n\nThe main class for database operations.\n\n#### Class Methods\n\n##### `get_instance(db_type: str, **kwargs) -> ThothDbManager`\n\nCreates or retrieves a singleton instance of the database manager.\n\n**Parameters:**\n- `db_type` (str): Database type identifier\n- `db_root_path` (str): Path to store database-related files\n- `db_mode` (str): Operating mode (\"dev\", \"prod\", etc.)\n- `**kwargs`: Database-specific connection parameters\n\n**Returns:** Database manager instance\n\n#### Instance Methods\n\n##### `execute_sql(sql: str, params: dict = None, fetch: str = \"all\", timeout: int = 60) -> Any`\n\nExecute SQL queries against the database.\n\n**Parameters:**\n- `sql` (str): SQL query string\n- `params` (dict, optional): Query parameters for prepared statements\n- `fetch` (str): How to fetch results (\"all\", \"one\", or number)\n- `timeout` (int): Query timeout in seconds\n\n**Returns:** Query results\n\n##### `get_tables() -> List[Dict[str, str]]`\n\nGet list of tables in the database.\n\n**Returns:** List of dictionaries with `name` and `comment` keys\n\n##### `get_columns(table_name: str) -> List[Dict[str, Any]]`\n\nGet column information for a specific table.\n\n**Parameters:**\n- `table_name` (str): Name of the table\n\n**Returns:** List of dictionaries with column metadata (`name`, `data_type`, `comment`, `is_pk`)\n\n##### `get_foreign_keys() -> List[Dict[str, str]]`\n\nGet foreign key relationships in the database.\n\n**Returns:** List of dictionaries with foreign key information\n\n##### `get_example_data(table_name: str, number_of_rows: int = 30) -> Dict[str, List[Any]]`\n\nGet the most frequent values for each column in a table.\n\n**Parameters:**\n- `table_name` (str): Name of the table\n- `number_of_rows` (int): Maximum number of example values per column\n\n**Returns:** Dictionary mapping column names to lists of example values\n\n##### `query_lsh(keyword: str, signature_size: int = 30, n_gram: int = 3, top_n: int = 10) -> Dict[str, Dict[str, List[str]]]`\n\nSearch for similar values using LSH (Locality-Sensitive Hashing).\n\n**Parameters:**\n- `keyword` (str): Search term\n- `signature_size` (int): MinHash signature size\n- `n_gram` (int): N-gram size for text processing\n- `top_n` (int): Number of similar values to return\n\n**Returns:** Nested dictionary: `{table_name: {column_name: [similar_values]}}`\n\n### Factory Pattern (Alternative API)\n\nFor more advanced usage, you can use the factory pattern:\n\n```python\nfrom thoth_dbmanager import ThothDbFactory\n\n# Create manager using factory\nmanager = ThothDbFactory.create_manager(\n db_type=\"postgresql\",\n db_root_path=\"./data\",\n db_mode=\"dev\",\n host=\"localhost\",\n port=5432,\n database=\"mydb\",\n user=\"username\",\n password=\"password\"\n)\n\n# List available database types\navailable_dbs = ThothDbFactory.list_available_databases()\n\n# Get required parameters for a database type\nparams = ThothDbFactory.get_required_parameters(\"postgresql\")\n```\n\n## Document-Based API (Advanced)\n\nThe library also provides a document-based API for structured data access:\n\n```python\n# Get tables as structured documents\nif hasattr(manager, 'get_tables_as_documents'):\n table_docs = manager.get_tables_as_documents()\n for doc in table_docs:\n print(f\"Table: {doc.table_name}\")\n print(f\"Schema: {doc.schema_name}\")\n print(f\"Comment: {doc.comment}\")\n\n# Get columns as structured documents\nif hasattr(manager, 'get_columns_as_documents'):\n column_docs = manager.get_columns_as_documents(\"my_table\")\n for doc in column_docs:\n print(f\"Column: {doc.column_name} ({doc.data_type})\")\n print(f\"Nullable: {doc.is_nullable}\")\n print(f\"Primary Key: {doc.is_pk}\")\n```\n\n## Configuration Examples\n\n### PostgreSQL\n```python\nmanager = ThothDbManager.get_instance(\n db_type=\"postgresql\",\n db_root_path=\"./data\",\n db_mode=\"production\",\n host=\"localhost\",\n port=5432,\n database=\"myapp\",\n user=\"dbuser\",\n password=\"dbpass\"\n)\n```\n\n### SQLite\n```python\nmanager = ThothDbManager.get_instance(\n db_type=\"sqlite\",\n db_root_path=\"./data\",\n db_mode=\"dev\",\n database_path=\"./data/myapp.db\"\n)\n```\n\n### MySQL/MariaDB\n```python\nmanager = ThothDbManager.get_instance(\n db_type=\"mysql\", # or \"mariadb\"\n db_root_path=\"./data\",\n db_mode=\"production\",\n host=\"localhost\",\n port=3306,\n database=\"myapp\",\n user=\"dbuser\",\n password=\"dbpass\"\n)\n```\n\n## Error Handling\n\nThe library provides clear error messages for common issues:\n\n```python\ntry:\n manager = ThothDbManager.get_instance(\n db_type=\"postgresql\",\n db_root_path=\"./data\",\n # Missing required parameters\n )\nexcept ValueError as e:\n print(f\"Configuration error: {e}\")\n\ntry:\n results = manager.execute_sql(\"SELECT * FROM nonexistent_table\")\nexcept Exception as e:\n print(f\"Query error: {e}\")\n```\n\n## LSH Search Details\n\nThe LSH (Locality-Sensitive Hashing) feature allows you to find similar text values across your database:\n\n1. **Automatic Setup**: LSH indexes are created automatically from your database content\n2. **Cross-Column Search**: Search across all text columns in all tables\n3. **Fuzzy Matching**: Find similar values even with typos or variations\n4. **Configurable**: Adjust similarity sensitivity with parameters\n\n### LSH Use Cases\n\n- **Data Deduplication**: Find duplicate or near-duplicate records\n- **Data Quality**: Identify inconsistent data entry\n- **Search Enhancement**: Provide \"did you mean?\" functionality\n- **Data Exploration**: Discover related content across tables\n\n## Architecture\n\nThe library uses a modern plugin architecture:\n\n- **Plugins**: Database-specific implementations\n- **Adapters**: Low-level database operations\n- **Factory**: Plugin instantiation and management\n- **Documents**: Type-safe data models using Pydantic\n- **Registry**: Plugin discovery and registration\n\nThis design makes it easy to:\n- Add support for new database types\n- Maintain consistent APIs across databases\n- Extend functionality without breaking existing code\n\n## Contributing\n\nTo add support for a new database type:\n\n1. Create an adapter class implementing `DbAdapter`\n2. Create a plugin class implementing `DbPlugin`\n3. Register the plugin with `@register_plugin(\"db_type\")`\n4. Add connection parameter validation\n5. Implement required abstract methods\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Support\n\nFor issues, questions, or contributions, please visit the project repository.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.",
"version": "0.4.2",
"project_urls": {
"Bug Tracker": "https://github.com/mptyl/thoth_dbmanager/issues",
"Documentation": "https://github.com/mptyl/thoth_dbmanager#readme",
"Homepage": "https://github.com/mptyl/thoth_dbmanager",
"Source Code": "https://github.com/mptyl/thoth_dbmanager"
},
"split_keywords": [
"database",
" sql",
" lsh",
" similarity-search",
" orm"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b2687be0eeb93291d16fa7a5874ad5bd466a08a6ec45569785d5debb1dc6691e",
"md5": "f83e67eff698495e1376218eec7b9362",
"sha256": "7a592b9644dcf8084b96d0acc194ba34a7326d4d1b491f3bd7b5382f1a56e56c"
},
"downloads": -1,
"filename": "thoth_dbmanager-0.4.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f83e67eff698495e1376218eec7b9362",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 86053,
"upload_time": "2025-07-26T05:53:09",
"upload_time_iso_8601": "2025-07-26T05:53:09.169701Z",
"url": "https://files.pythonhosted.org/packages/b2/68/7be0eeb93291d16fa7a5874ad5bd466a08a6ec45569785d5debb1dc6691e/thoth_dbmanager-0.4.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "393ae5d6dbc360187cde39623ea672753200a61c6d296d6ee505ad15298403df",
"md5": "26ffc3e4255c8dac3ca40373a0b2170e",
"sha256": "31430509cc36f44236d825968686c75edaae918684e6939392a16a241b218ad6"
},
"downloads": -1,
"filename": "thoth_dbmanager-0.4.2.tar.gz",
"has_sig": false,
"md5_digest": "26ffc3e4255c8dac3ca40373a0b2170e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 73794,
"upload_time": "2025-07-26T05:53:13",
"upload_time_iso_8601": "2025-07-26T05:53:13.250246Z",
"url": "https://files.pythonhosted.org/packages/39/3a/e5d6dbc360187cde39623ea672753200a61c6d296d6ee505ad15298403df/thoth_dbmanager-0.4.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-26 05:53:13",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mptyl",
"github_project": "thoth_dbmanager",
"github_not_found": true,
"lcname": "thoth-dbmanager"
}