Name | sqldeps JSON |
Version |
0.0.8
JSON |
| download |
home_page | None |
Summary | SQL Dependency Extractor |
upload_time | 2025-03-19 03:24:12 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | None |
keywords |
dependency
extractor
llm
parser
sql
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# SQLDeps: SQL Dependency Extractor
<p align="center">
<img src="https://github.com/glue-lab/sqldeps/blob/main/docs/assets/images/sqldeps_logo.png?raw=true" alt="SQLDeps Logo" width="300">
</p>
<p align="left">
<a href="https://github.com/glue-lab/sqldeps/actions/workflows/ci.yml" target="_blank">
<img src="https://github.com/glue-lab/sqldeps/actions/workflows/ci.yml/badge.svg" alt="Test">
</a>
<a href="https://sqldeps.readthedocs.io/en/latest/" target="_blank">
<img src="https://readthedocs.org/projects/sqldeps/badge/?version=latest" alt="Documentation">
</a>
<a href="https://pypi.org/project/sqldeps" target="_blank">
<img src="https://img.shields.io/pypi/pyversions/sqldeps.svg?color=%2334D058" alt="Supported Python versions">
</a>
<a href="https://pypi.org/project/sqldeps" target="_blank">
<img src="https://img.shields.io/pypi/v/sqldeps?color=%2334D058&label=pypi%20package" alt="Package version">
</a>
<a href="https://opensource.org/licenses/MIT" target="_blank">
<img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License">
</a>
</p>
A tool that automatically extracts and maps SQL dependencies and outputs using Large Language Models (LLMs).
---
- **Documentation**: [https://sqldeps.readthedocs.io/](https://sqldeps.readthedocs.io/)
- **Code repositoty**: [https://github.com/glue-lab/sqldeps](https://sqldeps.readthedocs.io/)
---
## Overview
SQLDeps analyzes SQL scripts to identify:
1. **Dependencies**: Tables and columns that must exist BEFORE query execution
2. **Outputs**: Tables and columns permanently CREATED or MODIFIED by the query
It intelligently filters out temporary constructs like CTEs and derived tables, focusing only on the real database objects that matter.
### Benefits
- 🛠️ **Change Management:** Safely modify schemas by identifying true dependencies
- 💾 **Storage Optimization:** Focus resources on essential tables and columns
- 🚢 **Migration Planning:** Precisely determine what needs to be migrated
- 📝 **Project Documentation:** Create comprehensive maps of database dependencies
## Installation
```bash
pip install sqldeps
```
## Quick Start
SQLDeps provides both API and CLI interfaces:
- **API**: Enables flexibility for Python developers to integrate with their applications
- **CLI**: Offers an easy-to-use command-line interface for quick analysis
### API Usage
```python
from sqldeps.llm_parsers import create_extractor
# Create extractor with default settings (framework="groq", model="llama-3.3-70b-versatile")
extractor = create_extractor()
# Extract dependencies and outputs from a SQL query
sql_query = """
WITH user_orders AS (
SELECT o.user_id, COUNT(*) AS order_count
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE u.status = 'active'
GROUP BY o.user_id
)
CREATE TABLE transactions.user_order_summary AS
SELECT * FROM user_orders;
"""
result = extractor.extract_from_query(sql_query)
# Print the results
print("Dependencies:")
print(result.dependencies)
print("\nOutputs:")
print(result.outputs)
# Or extract from a file
result = extractor.extract_from_file('path/to/query.sql')
# Convert to dictionary or DataFrame
dict_format = result.to_dict()
df_format = result.to_dataframe()
```
### CLI Usage
```bash
# Basic example with default settings
sqldeps path/to/query.sql
# Specify framework and output format
sqldeps path/to/query.sql --framework=openai --model=gpt-4o-mini -o query_deps.csv
# Process a folder recursively with database validation
sqldeps data/sql_folder \
--recursive \
--framework=deepseek \
--db-match-schema \
--db-target-schemas public,sales \
--db-credentials configs/database.yml \
-o folder_deps.csv
```
## Example
Given this SQL query:
```sql
-- Common Table Expression (CTE) to count user orders for active users
WITH user_orders AS (
SELECT o.user_id, COUNT(*) AS order_count
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE u.status = 'active'
GROUP BY o.user_id
)
-- Create a new table from the CTE
CREATE TABLE transactions.user_order_summary AS
SELECT * FROM user_orders;
-- Truncate an existing table before repopulating
TRUNCATE TABLE order_summary;
```
SQLDeps will extract:
```json
{
"dependencies": {
"orders": ["user_id"],
"users": ["id", "status"],
"order_summary": []
},
"outputs": {
"transactions.user_order_summary": ["*"],
"order_summary": []
}
}
```
Note how:
- CTE (`user_orders`) is correctly excluded
- Real source tables (`orders`, `users`) are included as dependencies
- Target tables (`transactions.user_order_summary`, `order_summary`) are correctly identified as outputs
- For `TRUNCATE` operations, the table appears in both dependencies and outputs because it must exist before truncating and the operation modifies the table
## Supported Models
All models available on [Groq](https://console.groq.com/docs/models), [OpenAI](https://platform.openai.com/docs/models), and [DeepSeek](https://api-docs.deepseek.com/).
For up-to-date pricing details, please check [Groq](https://groq.com/pricing/), [OpenAI](https://platform.openai.com/docs/pricing), [DeepSeek](https://api-docs.deepseek.com/quick_start/pricing).
## API Keys / Configuration
You'll need to set up API keys for your chosen LLM provider. Create a `.env` file in your project root (see the provided `.env.example`):
```
# LLM API Keys
GROQ_API_KEY=your_groq_api_key
OPENAI_API_KEY=your_openai_api_key
DEEPSEEK_API_KEY=your_deepseek_api_key
# Database credentials (for schema validation)
DB_HOST=localhost
DB_PORT=5432
DB_NAME=mydatabase
DB_USER=username
DB_PASSWORD=password
```
For custom database YAML configuration file (optional):
```yaml
# database.yml
database:
host: localhost
port: 5432
database: mydatabase
username: username
password: password
```
## Advanced Usage
### Database Schema Validation
```python
from sqldeps.database import PostgreSQLConnector
from sqldeps.llm_parsers import create_extractor
# Extract dependencies
extractor = create_extractor(framework="openai", model="gpt-4o")
result = extractor.extract_from_file('query.sql')
# Connect to database and validate
conn = PostgreSQLConnector(
host="localhost",
port=5432,
database="mydatabase",
username="username"
)
# Match extracted dependencies against database schema
validated_schema = extractor.match_database_schema(
result,
db_connection=conn,
target_schemas=["public", "sales"]
)
# View validation results (pandas DataFrame)
print(validated_schema)
```
### Processing Multiple Files
```python
# Extract dependencies from all SQL files in a folder
result = extractor.extract_from_folder('/path/to/sql_folder', recursive=True)
```
### Using Custom Prompts
You can customize the prompts used to instruct the LLM:
```python
# Create extractor with custom prompt
extractor = create_extractor(
framework="groq",
model="llama-3.3-70b-versatile",
prompt_path="path/to/custom_prompt.yml"
)
```
The custom prompt YAML should include:
```yaml
system_prompt: |
Detailed instructions to the model...
user_prompt: |
Extract SQL dependencies and outputs from this query:
{sql}
```
## Web Application
SQLDeps includes a Streamlit-based web interface for interactive exploration of single SQL files:
```bash
# Install with web app dependencies
pip install "sqldeps[app]"
# Run the app
streamlit run app/main.py
```
**Note**: The web application is currently designed for single-file extraction and demonstration purposes. For processing multiple files or entire folders, use the API or CLI.
## Documentation
For comprehensive documentation, including API reference and examples, visit [https://sqldeps.readthedocs.io](https://sqldeps.readthedocs.io/).
## Contributing
Contributions are welcome!
- Found a bug? Please open an issue with detailed information.
- Missing a feature? Feel free to suggest enhancements or submit a pull request.
Check out the [issues page](https://github.com/glue-lab/sqldeps/issues) or submit a pull request.
## License
MIT
Raw data
{
"_id": null,
"home_page": null,
"name": "sqldeps",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "dependency, extractor, llm, parser, sql",
"author": null,
"author_email": "Cain\u00e3 Silva <coutodasilva@wisc.edu>, Matt Christie <mjchristie@wisc.edu>",
"download_url": "https://files.pythonhosted.org/packages/9c/6d/d5d07da96af46ae98c59260e623fc3442e19ca1a7bcd1c2ad09b7c989120/sqldeps-0.0.8.tar.gz",
"platform": null,
"description": "# SQLDeps: SQL Dependency Extractor\n\n<p align=\"center\">\n <img src=\"https://github.com/glue-lab/sqldeps/blob/main/docs/assets/images/sqldeps_logo.png?raw=true\" alt=\"SQLDeps Logo\" width=\"300\">\n</p>\n\n<p align=\"left\">\n<a href=\"https://github.com/glue-lab/sqldeps/actions/workflows/ci.yml\" target=\"_blank\">\n <img src=\"https://github.com/glue-lab/sqldeps/actions/workflows/ci.yml/badge.svg\" alt=\"Test\">\n</a>\n<a href=\"https://sqldeps.readthedocs.io/en/latest/\" target=\"_blank\">\n <img src=\"https://readthedocs.org/projects/sqldeps/badge/?version=latest\" alt=\"Documentation\">\n</a>\n<a href=\"https://pypi.org/project/sqldeps\" target=\"_blank\">\n <img src=\"https://img.shields.io/pypi/pyversions/sqldeps.svg?color=%2334D058\" alt=\"Supported Python versions\">\n</a>\n<a href=\"https://pypi.org/project/sqldeps\" target=\"_blank\">\n <img src=\"https://img.shields.io/pypi/v/sqldeps?color=%2334D058&label=pypi%20package\" alt=\"Package version\">\n</a>\n<a href=\"https://opensource.org/licenses/MIT\" target=\"_blank\">\n <img src=\"https://img.shields.io/badge/License-MIT-yellow.svg\" alt=\"License\">\n</a>\n</p>\n\nA tool that automatically extracts and maps SQL dependencies and outputs using Large Language Models (LLMs).\n\n---\n\n- **Documentation**: [https://sqldeps.readthedocs.io/](https://sqldeps.readthedocs.io/)\n- **Code repositoty**: [https://github.com/glue-lab/sqldeps](https://sqldeps.readthedocs.io/)\n\n---\n\n## Overview\n\nSQLDeps analyzes SQL scripts to identify:\n\n1. **Dependencies**: Tables and columns that must exist BEFORE query execution\n2. **Outputs**: Tables and columns permanently CREATED or MODIFIED by the query\n\nIt intelligently filters out temporary constructs like CTEs and derived tables, focusing only on the real database objects that matter.\n\n### Benefits\n\n- \ud83d\udee0\ufe0f **Change Management:** Safely modify schemas by identifying true dependencies\n- \ud83d\udcbe **Storage Optimization:** Focus resources on essential tables and columns\n- \ud83d\udea2 **Migration Planning:** Precisely determine what needs to be migrated\n- \ud83d\udcdd **Project Documentation:** Create comprehensive maps of database dependencies\n\n## Installation\n\n```bash\npip install sqldeps\n```\n\n## Quick Start\n\nSQLDeps provides both API and CLI interfaces:\n- **API**: Enables flexibility for Python developers to integrate with their applications\n- **CLI**: Offers an easy-to-use command-line interface for quick analysis\n\n### API Usage\n\n```python\nfrom sqldeps.llm_parsers import create_extractor\n\n# Create extractor with default settings (framework=\"groq\", model=\"llama-3.3-70b-versatile\")\nextractor = create_extractor()\n\n# Extract dependencies and outputs from a SQL query\nsql_query = \"\"\"\nWITH user_orders AS (\n SELECT o.user_id, COUNT(*) AS order_count\n FROM orders o\n JOIN users u ON o.user_id = u.id\n WHERE u.status = 'active'\n GROUP BY o.user_id\n)\n\nCREATE TABLE transactions.user_order_summary AS\nSELECT * FROM user_orders;\n\"\"\"\nresult = extractor.extract_from_query(sql_query)\n\n# Print the results\nprint(\"Dependencies:\")\nprint(result.dependencies)\nprint(\"\\nOutputs:\")\nprint(result.outputs)\n\n# Or extract from a file\nresult = extractor.extract_from_file('path/to/query.sql')\n\n# Convert to dictionary or DataFrame\ndict_format = result.to_dict()\ndf_format = result.to_dataframe()\n```\n\n### CLI Usage\n\n```bash\n# Basic example with default settings\nsqldeps path/to/query.sql\n\n# Specify framework and output format\nsqldeps path/to/query.sql --framework=openai --model=gpt-4o-mini -o query_deps.csv\n\n# Process a folder recursively with database validation\nsqldeps data/sql_folder \\\n --recursive \\\n --framework=deepseek \\\n --db-match-schema \\\n --db-target-schemas public,sales \\\n --db-credentials configs/database.yml \\\n -o folder_deps.csv\n```\n\n## Example\n\nGiven this SQL query:\n\n```sql\n-- Common Table Expression (CTE) to count user orders for active users\nWITH user_orders AS (\n SELECT o.user_id, COUNT(*) AS order_count\n FROM orders o\n JOIN users u ON o.user_id = u.id\n WHERE u.status = 'active'\n GROUP BY o.user_id\n)\n\n-- Create a new table from the CTE\nCREATE TABLE transactions.user_order_summary AS\nSELECT * FROM user_orders;\n\n-- Truncate an existing table before repopulating\nTRUNCATE TABLE order_summary;\n```\n\nSQLDeps will extract:\n\n```json\n{\n \"dependencies\": {\n \"orders\": [\"user_id\"],\n \"users\": [\"id\", \"status\"],\n \"order_summary\": []\n },\n \"outputs\": {\n \"transactions.user_order_summary\": [\"*\"],\n \"order_summary\": []\n }\n}\n```\n\nNote how:\n- CTE (`user_orders`) is correctly excluded\n- Real source tables (`orders`, `users`) are included as dependencies\n- Target tables (`transactions.user_order_summary`, `order_summary`) are correctly identified as outputs\n- For `TRUNCATE` operations, the table appears in both dependencies and outputs because it must exist before truncating and the operation modifies the table\n\n## Supported Models\n\nAll models available on [Groq](https://console.groq.com/docs/models), [OpenAI](https://platform.openai.com/docs/models), and [DeepSeek](https://api-docs.deepseek.com/). \nFor up-to-date pricing details, please check [Groq](https://groq.com/pricing/), [OpenAI](https://platform.openai.com/docs/pricing), [DeepSeek](https://api-docs.deepseek.com/quick_start/pricing).\n\n## API Keys / Configuration\n\nYou'll need to set up API keys for your chosen LLM provider. Create a `.env` file in your project root (see the provided `.env.example`):\n\n```\n# LLM API Keys\nGROQ_API_KEY=your_groq_api_key\nOPENAI_API_KEY=your_openai_api_key\nDEEPSEEK_API_KEY=your_deepseek_api_key\n\n# Database credentials (for schema validation)\nDB_HOST=localhost\nDB_PORT=5432\nDB_NAME=mydatabase\nDB_USER=username\nDB_PASSWORD=password\n```\n\nFor custom database YAML configuration file (optional): \n\n```yaml\n# database.yml\ndatabase:\n host: localhost\n port: 5432\n database: mydatabase\n username: username\n password: password\n```\n\n## Advanced Usage\n\n### Database Schema Validation\n\n```python\nfrom sqldeps.database import PostgreSQLConnector\nfrom sqldeps.llm_parsers import create_extractor\n\n# Extract dependencies\nextractor = create_extractor(framework=\"openai\", model=\"gpt-4o\")\nresult = extractor.extract_from_file('query.sql')\n\n# Connect to database and validate\nconn = PostgreSQLConnector(\n host=\"localhost\",\n port=5432,\n database=\"mydatabase\",\n username=\"username\"\n)\n\n# Match extracted dependencies against database schema\nvalidated_schema = extractor.match_database_schema(\n result,\n db_connection=conn,\n target_schemas=[\"public\", \"sales\"]\n)\n\n# View validation results (pandas DataFrame)\nprint(validated_schema)\n```\n\n### Processing Multiple Files\n\n```python\n# Extract dependencies from all SQL files in a folder\nresult = extractor.extract_from_folder('/path/to/sql_folder', recursive=True)\n```\n\n### Using Custom Prompts\n\nYou can customize the prompts used to instruct the LLM:\n\n```python\n# Create extractor with custom prompt\nextractor = create_extractor(\n framework=\"groq\",\n model=\"llama-3.3-70b-versatile\",\n prompt_path=\"path/to/custom_prompt.yml\"\n)\n```\n\nThe custom prompt YAML should include:\n\n```yaml\nsystem_prompt: |\n Detailed instructions to the model...\n\nuser_prompt: |\n Extract SQL dependencies and outputs from this query:\n {sql}\n```\n\n## Web Application\n\nSQLDeps includes a Streamlit-based web interface for interactive exploration of single SQL files:\n\n```bash\n# Install with web app dependencies\npip install \"sqldeps[app]\"\n\n# Run the app\nstreamlit run app/main.py\n```\n\n**Note**: The web application is currently designed for single-file extraction and demonstration purposes. For processing multiple files or entire folders, use the API or CLI.\n\n## Documentation\n\nFor comprehensive documentation, including API reference and examples, visit [https://sqldeps.readthedocs.io](https://sqldeps.readthedocs.io/).\n\n## Contributing\n\nContributions are welcome! \n\n- Found a bug? Please open an issue with detailed information.\n- Missing a feature? Feel free to suggest enhancements or submit a pull request.\n\nCheck out the [issues page](https://github.com/glue-lab/sqldeps/issues) or submit a pull request.\n\n## License\n\nMIT",
"bugtrack_url": null,
"license": null,
"summary": "SQL Dependency Extractor",
"version": "0.0.8",
"project_urls": null,
"split_keywords": [
"dependency",
" extractor",
" llm",
" parser",
" sql"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9afd04d8c1f52a78782010a3a4a6201a6e35534e082ca002203864b7f2112c74",
"md5": "d70982c9a79ba8200a5a2fb575f333ab",
"sha256": "c5243486214e3f9861769d771f66af3289f589b0f46522f9f6fef0fdb3033541"
},
"downloads": -1,
"filename": "sqldeps-0.0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d70982c9a79ba8200a5a2fb575f333ab",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 28253,
"upload_time": "2025-03-19T03:24:11",
"upload_time_iso_8601": "2025-03-19T03:24:11.320716Z",
"url": "https://files.pythonhosted.org/packages/9a/fd/04d8c1f52a78782010a3a4a6201a6e35534e082ca002203864b7f2112c74/sqldeps-0.0.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "9c6dd5d07da96af46ae98c59260e623fc3442e19ca1a7bcd1c2ad09b7c989120",
"md5": "e62968b1029243f667c91afb1f3a763b",
"sha256": "addb19cd1deb531e4498395775258baceff01ffbc7d45644f08b347f2195f8f5"
},
"downloads": -1,
"filename": "sqldeps-0.0.8.tar.gz",
"has_sig": false,
"md5_digest": "e62968b1029243f667c91afb1f3a763b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 23492,
"upload_time": "2025-03-19T03:24:12",
"upload_time_iso_8601": "2025-03-19T03:24:12.363857Z",
"url": "https://files.pythonhosted.org/packages/9c/6d/d5d07da96af46ae98c59260e623fc3442e19ca1a7bcd1c2ad09b7c989120/sqldeps-0.0.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-03-19 03:24:12",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "sqldeps"
}