thoth-dbmanager


Namethoth-dbmanager JSON
Version 0.4.2 PyPI version JSON
download
home_pageNone
SummaryA Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.
upload_time2025-07-26 05:53:13
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords database sql lsh similarity-search orm
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Thoth Database Manager

A Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.

## Features

- **Multi-Database Support**: PostgreSQL, MySQL, MariaDB, SQLite, SQL Server, Oracle, Informix, and Supabase
- **Plugin Architecture**: Extensible design for adding new database types
- **LSH Search**: Locality-Sensitive Hashing for finding similar values across database columns
- **Type Safety**: Pydantic-based document models for structured data
- **Backward Compatibility**: Maintains compatibility with existing code

## Installation

```bash
pip install thoth-dbmanager
```

## Quick Start

### Basic Usage

```python
from thoth_dbmanager import ThothDbManager

# Create a database manager instance
manager = ThothDbManager.get_instance(
    db_type="postgresql",
    db_root_path="./data",
    db_mode="dev",
    host="localhost",
    port=5432,
    database="mydb",
    user="username",
    password="password"
)

# Get database information
tables = manager.get_tables()
columns = manager.get_columns("my_table")
foreign_keys = manager.get_foreign_keys()

# Execute SQL queries
results = manager.execute_sql("SELECT * FROM my_table LIMIT 10")

# Get example data for understanding table contents
example_data = manager.get_example_data("my_table", number_of_rows=20)
```

### LSH Similarity Search

```python
# Query for similar values across all database columns
similar_values = manager.query_lsh(
    keyword="john doe",
    top_n=10,
    signature_size=30,
    n_gram=3
)

# Results are organized by table and column
for table_name, columns in similar_values.items():
    for column_name, values in columns.items():
        print(f"Similar values in {table_name}.{column_name}: {values}")
```

## Supported Databases

| Database | Connection Parameters |
|----------|----------------------|
| **PostgreSQL** | `host`, `port`, `database`, `user`, `password` |
| **MySQL** | `host`, `port`, `database`, `user`, `password` |
| **MariaDB** | `host`, `port`, `database`, `user`, `password` |
| **SQLite** | `database_path` |
| **Supabase** | `host`, `port`, `database`, `user`, `password`, `project_url`, `api_key` |
| **SQL Server** | `server`, `database`, `user`, `password` |
| **Oracle** | `host`, `port`, `service_name`, `user`, `password` |
| **Informix** | `server`, `database`, `host`, `user`, `password` |

## API Reference

### ThothDbManager

The main class for database operations.

#### Class Methods

##### `get_instance(db_type: str, **kwargs) -> ThothDbManager`

Creates or retrieves a singleton instance of the database manager.

**Parameters:**
- `db_type` (str): Database type identifier
- `db_root_path` (str): Path to store database-related files
- `db_mode` (str): Operating mode ("dev", "prod", etc.)
- `**kwargs`: Database-specific connection parameters

**Returns:** Database manager instance

#### Instance Methods

##### `execute_sql(sql: str, params: dict = None, fetch: str = "all", timeout: int = 60) -> Any`

Execute SQL queries against the database.

**Parameters:**
- `sql` (str): SQL query string
- `params` (dict, optional): Query parameters for prepared statements
- `fetch` (str): How to fetch results ("all", "one", or number)
- `timeout` (int): Query timeout in seconds

**Returns:** Query results

##### `get_tables() -> List[Dict[str, str]]`

Get list of tables in the database.

**Returns:** List of dictionaries with `name` and `comment` keys

##### `get_columns(table_name: str) -> List[Dict[str, Any]]`

Get column information for a specific table.

**Parameters:**
- `table_name` (str): Name of the table

**Returns:** List of dictionaries with column metadata (`name`, `data_type`, `comment`, `is_pk`)

##### `get_foreign_keys() -> List[Dict[str, str]]`

Get foreign key relationships in the database.

**Returns:** List of dictionaries with foreign key information

##### `get_example_data(table_name: str, number_of_rows: int = 30) -> Dict[str, List[Any]]`

Get the most frequent values for each column in a table.

**Parameters:**
- `table_name` (str): Name of the table
- `number_of_rows` (int): Maximum number of example values per column

**Returns:** Dictionary mapping column names to lists of example values

##### `query_lsh(keyword: str, signature_size: int = 30, n_gram: int = 3, top_n: int = 10) -> Dict[str, Dict[str, List[str]]]`

Search for similar values using LSH (Locality-Sensitive Hashing).

**Parameters:**
- `keyword` (str): Search term
- `signature_size` (int): MinHash signature size
- `n_gram` (int): N-gram size for text processing
- `top_n` (int): Number of similar values to return

**Returns:** Nested dictionary: `{table_name: {column_name: [similar_values]}}`

### Factory Pattern (Alternative API)

For more advanced usage, you can use the factory pattern:

```python
from thoth_dbmanager import ThothDbFactory

# Create manager using factory
manager = ThothDbFactory.create_manager(
    db_type="postgresql",
    db_root_path="./data",
    db_mode="dev",
    host="localhost",
    port=5432,
    database="mydb",
    user="username",
    password="password"
)

# List available database types
available_dbs = ThothDbFactory.list_available_databases()

# Get required parameters for a database type
params = ThothDbFactory.get_required_parameters("postgresql")
```

## Document-Based API (Advanced)

The library also provides a document-based API for structured data access:

```python
# Get tables as structured documents
if hasattr(manager, 'get_tables_as_documents'):
    table_docs = manager.get_tables_as_documents()
    for doc in table_docs:
        print(f"Table: {doc.table_name}")
        print(f"Schema: {doc.schema_name}")
        print(f"Comment: {doc.comment}")

# Get columns as structured documents
if hasattr(manager, 'get_columns_as_documents'):
    column_docs = manager.get_columns_as_documents("my_table")
    for doc in column_docs:
        print(f"Column: {doc.column_name} ({doc.data_type})")
        print(f"Nullable: {doc.is_nullable}")
        print(f"Primary Key: {doc.is_pk}")
```

## Configuration Examples

### PostgreSQL
```python
manager = ThothDbManager.get_instance(
    db_type="postgresql",
    db_root_path="./data",
    db_mode="production",
    host="localhost",
    port=5432,
    database="myapp",
    user="dbuser",
    password="dbpass"
)
```

### SQLite
```python
manager = ThothDbManager.get_instance(
    db_type="sqlite",
    db_root_path="./data",
    db_mode="dev",
    database_path="./data/myapp.db"
)
```

### MySQL/MariaDB
```python
manager = ThothDbManager.get_instance(
    db_type="mysql",  # or "mariadb"
    db_root_path="./data",
    db_mode="production",
    host="localhost",
    port=3306,
    database="myapp",
    user="dbuser",
    password="dbpass"
)
```

## Error Handling

The library provides clear error messages for common issues:

```python
try:
    manager = ThothDbManager.get_instance(
        db_type="postgresql",
        db_root_path="./data",
        # Missing required parameters
    )
except ValueError as e:
    print(f"Configuration error: {e}")

try:
    results = manager.execute_sql("SELECT * FROM nonexistent_table")
except Exception as e:
    print(f"Query error: {e}")
```

## LSH Search Details

The LSH (Locality-Sensitive Hashing) feature allows you to find similar text values across your database:

1. **Automatic Setup**: LSH indexes are created automatically from your database content
2. **Cross-Column Search**: Search across all text columns in all tables
3. **Fuzzy Matching**: Find similar values even with typos or variations
4. **Configurable**: Adjust similarity sensitivity with parameters

### LSH Use Cases

- **Data Deduplication**: Find duplicate or near-duplicate records
- **Data Quality**: Identify inconsistent data entry
- **Search Enhancement**: Provide "did you mean?" functionality
- **Data Exploration**: Discover related content across tables

## Architecture

The library uses a modern plugin architecture:

- **Plugins**: Database-specific implementations
- **Adapters**: Low-level database operations
- **Factory**: Plugin instantiation and management
- **Documents**: Type-safe data models using Pydantic
- **Registry**: Plugin discovery and registration

This design makes it easy to:
- Add support for new database types
- Maintain consistent APIs across databases
- Extend functionality without breaking existing code

## Contributing

To add support for a new database type:

1. Create an adapter class implementing `DbAdapter`
2. Create a plugin class implementing `DbPlugin`
3. Register the plugin with `@register_plugin("db_type")`
4. Add connection parameter validation
5. Implement required abstract methods

## License

This project is licensed under the MIT License.

## Support

For issues, questions, or contributions, please visit the project repository.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "thoth-dbmanager",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "database, sql, lsh, similarity-search, orm",
    "author": null,
    "author_email": "Marco Pancotti <mp@tylconsulting.it>",
    "download_url": "https://files.pythonhosted.org/packages/39/3a/e5d6dbc360187cde39623ea672753200a61c6d296d6ee505ad15298403df/thoth_dbmanager-0.4.2.tar.gz",
    "platform": null,
    "description": "# Thoth Database Manager\n\nA Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.\n\n## Features\n\n- **Multi-Database Support**: PostgreSQL, MySQL, MariaDB, SQLite, SQL Server, Oracle, Informix, and Supabase\n- **Plugin Architecture**: Extensible design for adding new database types\n- **LSH Search**: Locality-Sensitive Hashing for finding similar values across database columns\n- **Type Safety**: Pydantic-based document models for structured data\n- **Backward Compatibility**: Maintains compatibility with existing code\n\n## Installation\n\n```bash\npip install thoth-dbmanager\n```\n\n## Quick Start\n\n### Basic Usage\n\n```python\nfrom thoth_dbmanager import ThothDbManager\n\n# Create a database manager instance\nmanager = ThothDbManager.get_instance(\n    db_type=\"postgresql\",\n    db_root_path=\"./data\",\n    db_mode=\"dev\",\n    host=\"localhost\",\n    port=5432,\n    database=\"mydb\",\n    user=\"username\",\n    password=\"password\"\n)\n\n# Get database information\ntables = manager.get_tables()\ncolumns = manager.get_columns(\"my_table\")\nforeign_keys = manager.get_foreign_keys()\n\n# Execute SQL queries\nresults = manager.execute_sql(\"SELECT * FROM my_table LIMIT 10\")\n\n# Get example data for understanding table contents\nexample_data = manager.get_example_data(\"my_table\", number_of_rows=20)\n```\n\n### LSH Similarity Search\n\n```python\n# Query for similar values across all database columns\nsimilar_values = manager.query_lsh(\n    keyword=\"john doe\",\n    top_n=10,\n    signature_size=30,\n    n_gram=3\n)\n\n# Results are organized by table and column\nfor table_name, columns in similar_values.items():\n    for column_name, values in columns.items():\n        print(f\"Similar values in {table_name}.{column_name}: {values}\")\n```\n\n## Supported Databases\n\n| Database | Connection Parameters |\n|----------|----------------------|\n| **PostgreSQL** | `host`, `port`, `database`, `user`, `password` |\n| **MySQL** | `host`, `port`, `database`, `user`, `password` |\n| **MariaDB** | `host`, `port`, `database`, `user`, `password` |\n| **SQLite** | `database_path` |\n| **Supabase** | `host`, `port`, `database`, `user`, `password`, `project_url`, `api_key` |\n| **SQL Server** | `server`, `database`, `user`, `password` |\n| **Oracle** | `host`, `port`, `service_name`, `user`, `password` |\n| **Informix** | `server`, `database`, `host`, `user`, `password` |\n\n## API Reference\n\n### ThothDbManager\n\nThe main class for database operations.\n\n#### Class Methods\n\n##### `get_instance(db_type: str, **kwargs) -> ThothDbManager`\n\nCreates or retrieves a singleton instance of the database manager.\n\n**Parameters:**\n- `db_type` (str): Database type identifier\n- `db_root_path` (str): Path to store database-related files\n- `db_mode` (str): Operating mode (\"dev\", \"prod\", etc.)\n- `**kwargs`: Database-specific connection parameters\n\n**Returns:** Database manager instance\n\n#### Instance Methods\n\n##### `execute_sql(sql: str, params: dict = None, fetch: str = \"all\", timeout: int = 60) -> Any`\n\nExecute SQL queries against the database.\n\n**Parameters:**\n- `sql` (str): SQL query string\n- `params` (dict, optional): Query parameters for prepared statements\n- `fetch` (str): How to fetch results (\"all\", \"one\", or number)\n- `timeout` (int): Query timeout in seconds\n\n**Returns:** Query results\n\n##### `get_tables() -> List[Dict[str, str]]`\n\nGet list of tables in the database.\n\n**Returns:** List of dictionaries with `name` and `comment` keys\n\n##### `get_columns(table_name: str) -> List[Dict[str, Any]]`\n\nGet column information for a specific table.\n\n**Parameters:**\n- `table_name` (str): Name of the table\n\n**Returns:** List of dictionaries with column metadata (`name`, `data_type`, `comment`, `is_pk`)\n\n##### `get_foreign_keys() -> List[Dict[str, str]]`\n\nGet foreign key relationships in the database.\n\n**Returns:** List of dictionaries with foreign key information\n\n##### `get_example_data(table_name: str, number_of_rows: int = 30) -> Dict[str, List[Any]]`\n\nGet the most frequent values for each column in a table.\n\n**Parameters:**\n- `table_name` (str): Name of the table\n- `number_of_rows` (int): Maximum number of example values per column\n\n**Returns:** Dictionary mapping column names to lists of example values\n\n##### `query_lsh(keyword: str, signature_size: int = 30, n_gram: int = 3, top_n: int = 10) -> Dict[str, Dict[str, List[str]]]`\n\nSearch for similar values using LSH (Locality-Sensitive Hashing).\n\n**Parameters:**\n- `keyword` (str): Search term\n- `signature_size` (int): MinHash signature size\n- `n_gram` (int): N-gram size for text processing\n- `top_n` (int): Number of similar values to return\n\n**Returns:** Nested dictionary: `{table_name: {column_name: [similar_values]}}`\n\n### Factory Pattern (Alternative API)\n\nFor more advanced usage, you can use the factory pattern:\n\n```python\nfrom thoth_dbmanager import ThothDbFactory\n\n# Create manager using factory\nmanager = ThothDbFactory.create_manager(\n    db_type=\"postgresql\",\n    db_root_path=\"./data\",\n    db_mode=\"dev\",\n    host=\"localhost\",\n    port=5432,\n    database=\"mydb\",\n    user=\"username\",\n    password=\"password\"\n)\n\n# List available database types\navailable_dbs = ThothDbFactory.list_available_databases()\n\n# Get required parameters for a database type\nparams = ThothDbFactory.get_required_parameters(\"postgresql\")\n```\n\n## Document-Based API (Advanced)\n\nThe library also provides a document-based API for structured data access:\n\n```python\n# Get tables as structured documents\nif hasattr(manager, 'get_tables_as_documents'):\n    table_docs = manager.get_tables_as_documents()\n    for doc in table_docs:\n        print(f\"Table: {doc.table_name}\")\n        print(f\"Schema: {doc.schema_name}\")\n        print(f\"Comment: {doc.comment}\")\n\n# Get columns as structured documents\nif hasattr(manager, 'get_columns_as_documents'):\n    column_docs = manager.get_columns_as_documents(\"my_table\")\n    for doc in column_docs:\n        print(f\"Column: {doc.column_name} ({doc.data_type})\")\n        print(f\"Nullable: {doc.is_nullable}\")\n        print(f\"Primary Key: {doc.is_pk}\")\n```\n\n## Configuration Examples\n\n### PostgreSQL\n```python\nmanager = ThothDbManager.get_instance(\n    db_type=\"postgresql\",\n    db_root_path=\"./data\",\n    db_mode=\"production\",\n    host=\"localhost\",\n    port=5432,\n    database=\"myapp\",\n    user=\"dbuser\",\n    password=\"dbpass\"\n)\n```\n\n### SQLite\n```python\nmanager = ThothDbManager.get_instance(\n    db_type=\"sqlite\",\n    db_root_path=\"./data\",\n    db_mode=\"dev\",\n    database_path=\"./data/myapp.db\"\n)\n```\n\n### MySQL/MariaDB\n```python\nmanager = ThothDbManager.get_instance(\n    db_type=\"mysql\",  # or \"mariadb\"\n    db_root_path=\"./data\",\n    db_mode=\"production\",\n    host=\"localhost\",\n    port=3306,\n    database=\"myapp\",\n    user=\"dbuser\",\n    password=\"dbpass\"\n)\n```\n\n## Error Handling\n\nThe library provides clear error messages for common issues:\n\n```python\ntry:\n    manager = ThothDbManager.get_instance(\n        db_type=\"postgresql\",\n        db_root_path=\"./data\",\n        # Missing required parameters\n    )\nexcept ValueError as e:\n    print(f\"Configuration error: {e}\")\n\ntry:\n    results = manager.execute_sql(\"SELECT * FROM nonexistent_table\")\nexcept Exception as e:\n    print(f\"Query error: {e}\")\n```\n\n## LSH Search Details\n\nThe LSH (Locality-Sensitive Hashing) feature allows you to find similar text values across your database:\n\n1. **Automatic Setup**: LSH indexes are created automatically from your database content\n2. **Cross-Column Search**: Search across all text columns in all tables\n3. **Fuzzy Matching**: Find similar values even with typos or variations\n4. **Configurable**: Adjust similarity sensitivity with parameters\n\n### LSH Use Cases\n\n- **Data Deduplication**: Find duplicate or near-duplicate records\n- **Data Quality**: Identify inconsistent data entry\n- **Search Enhancement**: Provide \"did you mean?\" functionality\n- **Data Exploration**: Discover related content across tables\n\n## Architecture\n\nThe library uses a modern plugin architecture:\n\n- **Plugins**: Database-specific implementations\n- **Adapters**: Low-level database operations\n- **Factory**: Plugin instantiation and management\n- **Documents**: Type-safe data models using Pydantic\n- **Registry**: Plugin discovery and registration\n\nThis design makes it easy to:\n- Add support for new database types\n- Maintain consistent APIs across databases\n- Extend functionality without breaking existing code\n\n## Contributing\n\nTo add support for a new database type:\n\n1. Create an adapter class implementing `DbAdapter`\n2. Create a plugin class implementing `DbPlugin`\n3. Register the plugin with `@register_plugin(\"db_type\")`\n4. Add connection parameter validation\n5. Implement required abstract methods\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Support\n\nFor issues, questions, or contributions, please visit the project repository.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python library for managing SQL databases with support for multiple database types, LSH-based similarity search, and a modern plugin architecture.",
    "version": "0.4.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/mptyl/thoth_dbmanager/issues",
        "Documentation": "https://github.com/mptyl/thoth_dbmanager#readme",
        "Homepage": "https://github.com/mptyl/thoth_dbmanager",
        "Source Code": "https://github.com/mptyl/thoth_dbmanager"
    },
    "split_keywords": [
        "database",
        " sql",
        " lsh",
        " similarity-search",
        " orm"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b2687be0eeb93291d16fa7a5874ad5bd466a08a6ec45569785d5debb1dc6691e",
                "md5": "f83e67eff698495e1376218eec7b9362",
                "sha256": "7a592b9644dcf8084b96d0acc194ba34a7326d4d1b491f3bd7b5382f1a56e56c"
            },
            "downloads": -1,
            "filename": "thoth_dbmanager-0.4.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f83e67eff698495e1376218eec7b9362",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 86053,
            "upload_time": "2025-07-26T05:53:09",
            "upload_time_iso_8601": "2025-07-26T05:53:09.169701Z",
            "url": "https://files.pythonhosted.org/packages/b2/68/7be0eeb93291d16fa7a5874ad5bd466a08a6ec45569785d5debb1dc6691e/thoth_dbmanager-0.4.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "393ae5d6dbc360187cde39623ea672753200a61c6d296d6ee505ad15298403df",
                "md5": "26ffc3e4255c8dac3ca40373a0b2170e",
                "sha256": "31430509cc36f44236d825968686c75edaae918684e6939392a16a241b218ad6"
            },
            "downloads": -1,
            "filename": "thoth_dbmanager-0.4.2.tar.gz",
            "has_sig": false,
            "md5_digest": "26ffc3e4255c8dac3ca40373a0b2170e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 73794,
            "upload_time": "2025-07-26T05:53:13",
            "upload_time_iso_8601": "2025-07-26T05:53:13.250246Z",
            "url": "https://files.pythonhosted.org/packages/39/3a/e5d6dbc360187cde39623ea672753200a61c6d296d6ee505ad15298403df/thoth_dbmanager-0.4.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-26 05:53:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mptyl",
    "github_project": "thoth_dbmanager",
    "github_not_found": true,
    "lcname": "thoth-dbmanager"
}
        
Elapsed time: 0.41810s