sqldeps


Namesqldeps JSON
Version 0.0.8 PyPI version JSON
download
home_pageNone
SummarySQL Dependency Extractor
upload_time2025-03-19 03:24:12
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords dependency extractor llm parser sql
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SQLDeps: SQL Dependency Extractor

<p align="center">
  <img src="https://github.com/glue-lab/sqldeps/blob/main/docs/assets/images/sqldeps_logo.png?raw=true" alt="SQLDeps Logo" width="300">
</p>

<p align="left">
<a href="https://github.com/glue-lab/sqldeps/actions/workflows/ci.yml" target="_blank">
    <img src="https://github.com/glue-lab/sqldeps/actions/workflows/ci.yml/badge.svg" alt="Test">
</a>
<a href="https://sqldeps.readthedocs.io/en/latest/" target="_blank">
    <img src="https://readthedocs.org/projects/sqldeps/badge/?version=latest" alt="Documentation">
</a>
<a href="https://pypi.org/project/sqldeps" target="_blank">
    <img src="https://img.shields.io/pypi/pyversions/sqldeps.svg?color=%2334D058" alt="Supported Python versions">
</a>
<a href="https://pypi.org/project/sqldeps" target="_blank">
    <img src="https://img.shields.io/pypi/v/sqldeps?color=%2334D058&label=pypi%20package" alt="Package version">
</a>
<a href="https://opensource.org/licenses/MIT" target="_blank">
    <img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License">
</a>
</p>

A tool that automatically extracts and maps SQL dependencies and outputs using Large Language Models (LLMs).

---

- **Documentation**: [https://sqldeps.readthedocs.io/](https://sqldeps.readthedocs.io/)
- **Code repositoty**: [https://github.com/glue-lab/sqldeps](https://sqldeps.readthedocs.io/)

---

## Overview

SQLDeps analyzes SQL scripts to identify:

1. **Dependencies**: Tables and columns that must exist BEFORE query execution
2. **Outputs**: Tables and columns permanently CREATED or MODIFIED by the query

It intelligently filters out temporary constructs like CTEs and derived tables, focusing only on the real database objects that matter.

### Benefits

- 🛠️ **Change Management:** Safely modify schemas by identifying true dependencies
- 💾 **Storage Optimization:** Focus resources on essential tables and columns
- 🚢 **Migration Planning:** Precisely determine what needs to be migrated
- 📝 **Project Documentation:** Create comprehensive maps of database dependencies

## Installation

```bash
pip install sqldeps
```

## Quick Start

SQLDeps provides both API and CLI interfaces:
- **API**: Enables flexibility for Python developers to integrate with their applications
- **CLI**: Offers an easy-to-use command-line interface for quick analysis

### API Usage

```python
from sqldeps.llm_parsers import create_extractor

# Create extractor with default settings (framework="groq", model="llama-3.3-70b-versatile")
extractor = create_extractor()

# Extract dependencies and outputs from a SQL query
sql_query = """
WITH user_orders AS (
    SELECT o.user_id, COUNT(*) AS order_count
    FROM orders o
    JOIN users u ON o.user_id = u.id
    WHERE u.status = 'active'
    GROUP BY o.user_id
)

CREATE TABLE transactions.user_order_summary AS
SELECT * FROM user_orders;
"""
result = extractor.extract_from_query(sql_query)

# Print the results
print("Dependencies:")
print(result.dependencies)
print("\nOutputs:")
print(result.outputs)

# Or extract from a file
result = extractor.extract_from_file('path/to/query.sql')

# Convert to dictionary or DataFrame
dict_format = result.to_dict()
df_format = result.to_dataframe()
```

### CLI Usage

```bash
# Basic example with default settings
sqldeps path/to/query.sql

# Specify framework and output format
sqldeps path/to/query.sql --framework=openai --model=gpt-4o-mini -o query_deps.csv

# Process a folder recursively with database validation
sqldeps data/sql_folder \
    --recursive \
    --framework=deepseek \
    --db-match-schema \
    --db-target-schemas public,sales \
    --db-credentials configs/database.yml \
    -o folder_deps.csv
```

## Example

Given this SQL query:

```sql
-- Common Table Expression (CTE) to count user orders for active users
WITH user_orders AS (
    SELECT o.user_id, COUNT(*) AS order_count
    FROM orders o
    JOIN users u ON o.user_id = u.id
    WHERE u.status = 'active'
    GROUP BY o.user_id
)

-- Create a new table from the CTE
CREATE TABLE transactions.user_order_summary AS
SELECT * FROM user_orders;

-- Truncate an existing table before repopulating
TRUNCATE TABLE order_summary;
```

SQLDeps will extract:

```json
{
  "dependencies": {
    "orders": ["user_id"],
    "users": ["id", "status"],
    "order_summary": []
  },
  "outputs": {
    "transactions.user_order_summary": ["*"],
    "order_summary": []
  }
}
```

Note how:
- CTE (`user_orders`) is correctly excluded
- Real source tables (`orders`, `users`) are included as dependencies
- Target tables (`transactions.user_order_summary`, `order_summary`) are correctly identified as outputs
- For `TRUNCATE` operations, the table appears in both dependencies and outputs because it must exist before truncating and the operation modifies the table

## Supported Models

All models available on [Groq](https://console.groq.com/docs/models), [OpenAI](https://platform.openai.com/docs/models), and [DeepSeek](https://api-docs.deepseek.com/).  
For up-to-date pricing details, please check [Groq](https://groq.com/pricing/), [OpenAI](https://platform.openai.com/docs/pricing), [DeepSeek](https://api-docs.deepseek.com/quick_start/pricing).

## API Keys / Configuration

You'll need to set up API keys for your chosen LLM provider. Create a `.env` file in your project root (see the provided `.env.example`):

```
# LLM API Keys
GROQ_API_KEY=your_groq_api_key
OPENAI_API_KEY=your_openai_api_key
DEEPSEEK_API_KEY=your_deepseek_api_key

# Database credentials (for schema validation)
DB_HOST=localhost
DB_PORT=5432
DB_NAME=mydatabase
DB_USER=username
DB_PASSWORD=password
```

For custom database YAML configuration file (optional): 

```yaml
# database.yml
database:
  host: localhost
  port: 5432
  database: mydatabase
  username: username
  password: password
```

## Advanced Usage

### Database Schema Validation

```python
from sqldeps.database import PostgreSQLConnector
from sqldeps.llm_parsers import create_extractor

# Extract dependencies
extractor = create_extractor(framework="openai", model="gpt-4o")
result = extractor.extract_from_file('query.sql')

# Connect to database and validate
conn = PostgreSQLConnector(
    host="localhost",
    port=5432,
    database="mydatabase",
    username="username"
)

# Match extracted dependencies against database schema
validated_schema = extractor.match_database_schema(
    result,
    db_connection=conn,
    target_schemas=["public", "sales"]
)

# View validation results (pandas DataFrame)
print(validated_schema)
```

### Processing Multiple Files

```python
# Extract dependencies from all SQL files in a folder
result = extractor.extract_from_folder('/path/to/sql_folder', recursive=True)
```

### Using Custom Prompts

You can customize the prompts used to instruct the LLM:

```python
# Create extractor with custom prompt
extractor = create_extractor(
    framework="groq",
    model="llama-3.3-70b-versatile",
    prompt_path="path/to/custom_prompt.yml"
)
```

The custom prompt YAML should include:

```yaml
system_prompt: |
  Detailed instructions to the model...

user_prompt: |
  Extract SQL dependencies and outputs from this query:
  {sql}
```

## Web Application

SQLDeps includes a Streamlit-based web interface for interactive exploration of single SQL files:

```bash
# Install with web app dependencies
pip install "sqldeps[app]"

# Run the app
streamlit run app/main.py
```

**Note**: The web application is currently designed for single-file extraction and demonstration purposes. For processing multiple files or entire folders, use the API or CLI.

## Documentation

For comprehensive documentation, including API reference and examples, visit [https://sqldeps.readthedocs.io](https://sqldeps.readthedocs.io/).

## Contributing

Contributions are welcome! 

- Found a bug? Please open an issue with detailed information.
- Missing a feature? Feel free to suggest enhancements or submit a pull request.

Check out the [issues page](https://github.com/glue-lab/sqldeps/issues) or submit a pull request.

## License

MIT
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "sqldeps",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "dependency, extractor, llm, parser, sql",
    "author": null,
    "author_email": "Cain\u00e3 Silva <coutodasilva@wisc.edu>, Matt Christie <mjchristie@wisc.edu>",
    "download_url": "https://files.pythonhosted.org/packages/9c/6d/d5d07da96af46ae98c59260e623fc3442e19ca1a7bcd1c2ad09b7c989120/sqldeps-0.0.8.tar.gz",
    "platform": null,
    "description": "# SQLDeps: SQL Dependency Extractor\n\n<p align=\"center\">\n  <img src=\"https://github.com/glue-lab/sqldeps/blob/main/docs/assets/images/sqldeps_logo.png?raw=true\" alt=\"SQLDeps Logo\" width=\"300\">\n</p>\n\n<p align=\"left\">\n<a href=\"https://github.com/glue-lab/sqldeps/actions/workflows/ci.yml\" target=\"_blank\">\n    <img src=\"https://github.com/glue-lab/sqldeps/actions/workflows/ci.yml/badge.svg\" alt=\"Test\">\n</a>\n<a href=\"https://sqldeps.readthedocs.io/en/latest/\" target=\"_blank\">\n    <img src=\"https://readthedocs.org/projects/sqldeps/badge/?version=latest\" alt=\"Documentation\">\n</a>\n<a href=\"https://pypi.org/project/sqldeps\" target=\"_blank\">\n    <img src=\"https://img.shields.io/pypi/pyversions/sqldeps.svg?color=%2334D058\" alt=\"Supported Python versions\">\n</a>\n<a href=\"https://pypi.org/project/sqldeps\" target=\"_blank\">\n    <img src=\"https://img.shields.io/pypi/v/sqldeps?color=%2334D058&label=pypi%20package\" alt=\"Package version\">\n</a>\n<a href=\"https://opensource.org/licenses/MIT\" target=\"_blank\">\n    <img src=\"https://img.shields.io/badge/License-MIT-yellow.svg\" alt=\"License\">\n</a>\n</p>\n\nA tool that automatically extracts and maps SQL dependencies and outputs using Large Language Models (LLMs).\n\n---\n\n- **Documentation**: [https://sqldeps.readthedocs.io/](https://sqldeps.readthedocs.io/)\n- **Code repositoty**: [https://github.com/glue-lab/sqldeps](https://sqldeps.readthedocs.io/)\n\n---\n\n## Overview\n\nSQLDeps analyzes SQL scripts to identify:\n\n1. **Dependencies**: Tables and columns that must exist BEFORE query execution\n2. **Outputs**: Tables and columns permanently CREATED or MODIFIED by the query\n\nIt intelligently filters out temporary constructs like CTEs and derived tables, focusing only on the real database objects that matter.\n\n### Benefits\n\n- \ud83d\udee0\ufe0f **Change Management:** Safely modify schemas by identifying true dependencies\n- \ud83d\udcbe **Storage Optimization:** Focus resources on essential tables and columns\n- \ud83d\udea2 **Migration Planning:** Precisely determine what needs to be migrated\n- \ud83d\udcdd **Project Documentation:** Create comprehensive maps of database dependencies\n\n## Installation\n\n```bash\npip install sqldeps\n```\n\n## Quick Start\n\nSQLDeps provides both API and CLI interfaces:\n- **API**: Enables flexibility for Python developers to integrate with their applications\n- **CLI**: Offers an easy-to-use command-line interface for quick analysis\n\n### API Usage\n\n```python\nfrom sqldeps.llm_parsers import create_extractor\n\n# Create extractor with default settings (framework=\"groq\", model=\"llama-3.3-70b-versatile\")\nextractor = create_extractor()\n\n# Extract dependencies and outputs from a SQL query\nsql_query = \"\"\"\nWITH user_orders AS (\n    SELECT o.user_id, COUNT(*) AS order_count\n    FROM orders o\n    JOIN users u ON o.user_id = u.id\n    WHERE u.status = 'active'\n    GROUP BY o.user_id\n)\n\nCREATE TABLE transactions.user_order_summary AS\nSELECT * FROM user_orders;\n\"\"\"\nresult = extractor.extract_from_query(sql_query)\n\n# Print the results\nprint(\"Dependencies:\")\nprint(result.dependencies)\nprint(\"\\nOutputs:\")\nprint(result.outputs)\n\n# Or extract from a file\nresult = extractor.extract_from_file('path/to/query.sql')\n\n# Convert to dictionary or DataFrame\ndict_format = result.to_dict()\ndf_format = result.to_dataframe()\n```\n\n### CLI Usage\n\n```bash\n# Basic example with default settings\nsqldeps path/to/query.sql\n\n# Specify framework and output format\nsqldeps path/to/query.sql --framework=openai --model=gpt-4o-mini -o query_deps.csv\n\n# Process a folder recursively with database validation\nsqldeps data/sql_folder \\\n    --recursive \\\n    --framework=deepseek \\\n    --db-match-schema \\\n    --db-target-schemas public,sales \\\n    --db-credentials configs/database.yml \\\n    -o folder_deps.csv\n```\n\n## Example\n\nGiven this SQL query:\n\n```sql\n-- Common Table Expression (CTE) to count user orders for active users\nWITH user_orders AS (\n    SELECT o.user_id, COUNT(*) AS order_count\n    FROM orders o\n    JOIN users u ON o.user_id = u.id\n    WHERE u.status = 'active'\n    GROUP BY o.user_id\n)\n\n-- Create a new table from the CTE\nCREATE TABLE transactions.user_order_summary AS\nSELECT * FROM user_orders;\n\n-- Truncate an existing table before repopulating\nTRUNCATE TABLE order_summary;\n```\n\nSQLDeps will extract:\n\n```json\n{\n  \"dependencies\": {\n    \"orders\": [\"user_id\"],\n    \"users\": [\"id\", \"status\"],\n    \"order_summary\": []\n  },\n  \"outputs\": {\n    \"transactions.user_order_summary\": [\"*\"],\n    \"order_summary\": []\n  }\n}\n```\n\nNote how:\n- CTE (`user_orders`) is correctly excluded\n- Real source tables (`orders`, `users`) are included as dependencies\n- Target tables (`transactions.user_order_summary`, `order_summary`) are correctly identified as outputs\n- For `TRUNCATE` operations, the table appears in both dependencies and outputs because it must exist before truncating and the operation modifies the table\n\n## Supported Models\n\nAll models available on [Groq](https://console.groq.com/docs/models), [OpenAI](https://platform.openai.com/docs/models), and [DeepSeek](https://api-docs.deepseek.com/).  \nFor up-to-date pricing details, please check [Groq](https://groq.com/pricing/), [OpenAI](https://platform.openai.com/docs/pricing), [DeepSeek](https://api-docs.deepseek.com/quick_start/pricing).\n\n## API Keys / Configuration\n\nYou'll need to set up API keys for your chosen LLM provider. Create a `.env` file in your project root (see the provided `.env.example`):\n\n```\n# LLM API Keys\nGROQ_API_KEY=your_groq_api_key\nOPENAI_API_KEY=your_openai_api_key\nDEEPSEEK_API_KEY=your_deepseek_api_key\n\n# Database credentials (for schema validation)\nDB_HOST=localhost\nDB_PORT=5432\nDB_NAME=mydatabase\nDB_USER=username\nDB_PASSWORD=password\n```\n\nFor custom database YAML configuration file (optional): \n\n```yaml\n# database.yml\ndatabase:\n  host: localhost\n  port: 5432\n  database: mydatabase\n  username: username\n  password: password\n```\n\n## Advanced Usage\n\n### Database Schema Validation\n\n```python\nfrom sqldeps.database import PostgreSQLConnector\nfrom sqldeps.llm_parsers import create_extractor\n\n# Extract dependencies\nextractor = create_extractor(framework=\"openai\", model=\"gpt-4o\")\nresult = extractor.extract_from_file('query.sql')\n\n# Connect to database and validate\nconn = PostgreSQLConnector(\n    host=\"localhost\",\n    port=5432,\n    database=\"mydatabase\",\n    username=\"username\"\n)\n\n# Match extracted dependencies against database schema\nvalidated_schema = extractor.match_database_schema(\n    result,\n    db_connection=conn,\n    target_schemas=[\"public\", \"sales\"]\n)\n\n# View validation results (pandas DataFrame)\nprint(validated_schema)\n```\n\n### Processing Multiple Files\n\n```python\n# Extract dependencies from all SQL files in a folder\nresult = extractor.extract_from_folder('/path/to/sql_folder', recursive=True)\n```\n\n### Using Custom Prompts\n\nYou can customize the prompts used to instruct the LLM:\n\n```python\n# Create extractor with custom prompt\nextractor = create_extractor(\n    framework=\"groq\",\n    model=\"llama-3.3-70b-versatile\",\n    prompt_path=\"path/to/custom_prompt.yml\"\n)\n```\n\nThe custom prompt YAML should include:\n\n```yaml\nsystem_prompt: |\n  Detailed instructions to the model...\n\nuser_prompt: |\n  Extract SQL dependencies and outputs from this query:\n  {sql}\n```\n\n## Web Application\n\nSQLDeps includes a Streamlit-based web interface for interactive exploration of single SQL files:\n\n```bash\n# Install with web app dependencies\npip install \"sqldeps[app]\"\n\n# Run the app\nstreamlit run app/main.py\n```\n\n**Note**: The web application is currently designed for single-file extraction and demonstration purposes. For processing multiple files or entire folders, use the API or CLI.\n\n## Documentation\n\nFor comprehensive documentation, including API reference and examples, visit [https://sqldeps.readthedocs.io](https://sqldeps.readthedocs.io/).\n\n## Contributing\n\nContributions are welcome! \n\n- Found a bug? Please open an issue with detailed information.\n- Missing a feature? Feel free to suggest enhancements or submit a pull request.\n\nCheck out the [issues page](https://github.com/glue-lab/sqldeps/issues) or submit a pull request.\n\n## License\n\nMIT",
    "bugtrack_url": null,
    "license": null,
    "summary": "SQL Dependency Extractor",
    "version": "0.0.8",
    "project_urls": null,
    "split_keywords": [
        "dependency",
        " extractor",
        " llm",
        " parser",
        " sql"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9afd04d8c1f52a78782010a3a4a6201a6e35534e082ca002203864b7f2112c74",
                "md5": "d70982c9a79ba8200a5a2fb575f333ab",
                "sha256": "c5243486214e3f9861769d771f66af3289f589b0f46522f9f6fef0fdb3033541"
            },
            "downloads": -1,
            "filename": "sqldeps-0.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d70982c9a79ba8200a5a2fb575f333ab",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 28253,
            "upload_time": "2025-03-19T03:24:11",
            "upload_time_iso_8601": "2025-03-19T03:24:11.320716Z",
            "url": "https://files.pythonhosted.org/packages/9a/fd/04d8c1f52a78782010a3a4a6201a6e35534e082ca002203864b7f2112c74/sqldeps-0.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9c6dd5d07da96af46ae98c59260e623fc3442e19ca1a7bcd1c2ad09b7c989120",
                "md5": "e62968b1029243f667c91afb1f3a763b",
                "sha256": "addb19cd1deb531e4498395775258baceff01ffbc7d45644f08b347f2195f8f5"
            },
            "downloads": -1,
            "filename": "sqldeps-0.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "e62968b1029243f667c91afb1f3a763b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 23492,
            "upload_time": "2025-03-19T03:24:12",
            "upload_time_iso_8601": "2025-03-19T03:24:12.363857Z",
            "url": "https://files.pythonhosted.org/packages/9c/6d/d5d07da96af46ae98c59260e623fc3442e19ca1a7bcd1c2ad09b7c989120/sqldeps-0.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-03-19 03:24:12",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "sqldeps"
}
        
Elapsed time: 2.01978s