excelsql

Name	excelsql JSON
Version	0.1.35 JSON
	download
home_page	https://github.com/chris17453/xlsql
Summary	A Python driver to use SQL on Excel workbooks with CRUD, using SQLite as a middleman
upload_time	2024-09-20 14:44:28
maintainer	None
docs_url	None
author	Charles Watkins
requires_python	>=3.6
license	BSD 3
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # excelsql

`excelsql` is a lightweight Python package that allows you to load Excel workbooks (`.xls` and `.xlsx` formats) into an SQLite database, perform SQL operations, and save changes back into the original format. It's designed to combine the ease of Excel with the power of SQL, streamlining data manipulation and analysis.

## Table of Contents

1. [Features](#features)
2. [Installation](#installation)
3. [Usage](#usage)
4. [Features in Detail](#features-in-detail)
   - [SQL Query Interface with Pandas Integration](#sql-query-interface-with-pandas-integration)
   - [Schema Validation and Modification](#schema-validation-and-modification)
   - [Batch Processing and Multi-file Support](#batch-processing-and-multi-file-support)
   - [Data Cleaning Functions](#data-cleaning-functions)
   - [Joins and Merging Data Across Sheets](#joins-and-merging-data-across-sheets)
   - [Advanced Reporting & Visualization Support](#advanced-reporting--visualization-support)
   - [Caching Queries for Speed Optimization](#caching-queries-for-speed-optimization)
   - [Export to Multiple Formats](#export-to-multiple-formats)
   - [Jupyter Notebook Integration](#jupyter-notebook-integration)
5. [Contributing](#contributing)
6. [License](#license)

---

## Features

- Load Excel files into an SQLite database for SQL querying.
- Perform SQL operations on the data (SELECT, INSERT, JOIN, etc.).
- Return SQL query results as Pandas DataFrames.
- Clean and normalize data in Excel sheets.
- Validate and modify the schema (add/remove columns).
- Support for batch processing and merging multiple Excel workbooks.
- Generate reports and visualizations.
- Caching for repeated queries.
- Export data to multiple formats (CSV, JSON, etc.).
- Jupyter Notebook integration for displaying data.

---

## Installation

To install `excelsql`, run the following command:

```bash
pip install excelsql
```

---

## Usage

### Loading an Excel File

```python
import excelsql

# Path to the Excel file
xls_file = './data/file_example_XLS_5000.xls'
db_file = './data/excel_db.sqlite'

# Initialize the driver
driver = excelsql(xls_file, db_path=db_file)

# Show worksheets
worksheets = driver.show_worksheets()

# Perform SQL query and return results as a DataFrame
df = driver.execute_query_to_dataframe('SELECT * FROM Sheet1')

# Save changes back to an Excel file
driver.save_to_file('./data/modified_file.xlsx')

# Close the connection
driver.close()
```

---

## Features in Detail

### SQL Query Interface with Pandas Integration

Run SQL queries and return the results as a Pandas DataFrame.

```python
df = driver.execute_query_to_dataframe('SELECT * FROM Sheet1 WHERE Age > 30')
```

### Schema Validation and Modification

- **`validate_schema`**: Validate the schema for a worksheet.
- **`add_column`**: Add new columns to a worksheet.

```python
driver.add_column('Sheet1', 'NewColumn', datatype='TEXT')
driver.validate_schema('Sheet1', validations={'Age': 'numeric', 'Date': 'date'})
```

### Batch Processing and Multi-file Support

Load and merge multiple Excel workbooks.

```python
driver.load_multiple_workbooks(['file1.xlsx', 'file2.xlsx'])
driver.merge_workbooks('Sheet1', 'Sheet2', on='Id', how='inner')
```

### Data Cleaning Functions

- **`clean_data`**: Remove or fill missing values.
- **`normalize`**: Normalize the values of numeric columns.

```python
driver.clean_data('Sheet1', strategy='dropna')
driver.normalize('Sheet1', columns=['Age'])
```

### Joins and Merging Data Across Sheets

Join multiple worksheets using SQL-style operations.

```python
driver.join_sheets('Sheet1', 'Sheet2', on='Id', how='inner')
```

### Advanced Reporting & Visualization Support

- **`generate_report`**: Generate descriptive statistics and save as an Excel file.
- **`export_visualization`**: Create visualizations (bar, line, area charts) from data.

```python
driver.generate_report('Sheet1', output='./data/summary.xlsx')
driver.export_visualization('Sheet1', x_col='Age', y_col='Salary', plot_type='bar', output_path='./data/visuals')
```

Below is an example of a bar chart visualization generated from the `Sheet1` dataset:

![Bar Chart Visualization](./data/visuals/Sheet1_bar_Age_Salary.png)

### Caching Queries for Speed Optimization

Enable caching of frequent SQL queries to speed up repeated requests.

```python
driver.enable_query_cache()
```

### Export to Multiple Formats

- **`export_to_csv`**: Export worksheet data to CSV.
- **`export_to_json`**: Export worksheet data to JSON.

```python
driver.export_to_csv('Sheet1', './data/output.csv')
driver.export_to_json('Sheet1', './data/output.json')
```

You can find example exported data formats below:

- [CSV Export Example](./data/output.csv)
- [JSON Export Example](./data/output.json)

### Jupyter Notebook Integration

Display data directly in Jupyter notebooks for quick data inspection.

```python
driver.display_in_notebook('Sheet1')
```

---

## Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request on [GitHub](https://github.com/chris17453/xlsql).

---

## License

This project is licensed under the BSD 3-Clause License. See the [LICENSE](./LICENSE) file for details.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/chris17453/xlsql",
    "name": "excelsql",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Charles Watkins",
    "author_email": "chris@watkinslabs.com",
    "download_url": "https://files.pythonhosted.org/packages/0c/80/c508a0a4737e321f8ddeb379e5b3eb6a47eab0837572145a53a2e7e574dc/excelsql-0.1.35.tar.gz",
    "platform": null,
    "description": "# excelsql\n\n`excelsql` is a lightweight Python package that allows you to load Excel workbooks (`.xls` and `.xlsx` formats) into an SQLite database, perform SQL operations, and save changes back into the original format. It's designed to combine the ease of Excel with the power of SQL, streamlining data manipulation and analysis.\n\n## Table of Contents\n\n1. [Features](#features)\n2. [Installation](#installation)\n3. [Usage](#usage)\n4. [Features in Detail](#features-in-detail)\n   - [SQL Query Interface with Pandas Integration](#sql-query-interface-with-pandas-integration)\n   - [Schema Validation and Modification](#schema-validation-and-modification)\n   - [Batch Processing and Multi-file Support](#batch-processing-and-multi-file-support)\n   - [Data Cleaning Functions](#data-cleaning-functions)\n   - [Joins and Merging Data Across Sheets](#joins-and-merging-data-across-sheets)\n   - [Advanced Reporting & Visualization Support](#advanced-reporting--visualization-support)\n   - [Caching Queries for Speed Optimization](#caching-queries-for-speed-optimization)\n   - [Export to Multiple Formats](#export-to-multiple-formats)\n   - [Jupyter Notebook Integration](#jupyter-notebook-integration)\n5. [Contributing](#contributing)\n6. [License](#license)\n\n---\n\n## Features\n\n- Load Excel files into an SQLite database for SQL querying.\n- Perform SQL operations on the data (SELECT, INSERT, JOIN, etc.).\n- Return SQL query results as Pandas DataFrames.\n- Clean and normalize data in Excel sheets.\n- Validate and modify the schema (add/remove columns).\n- Support for batch processing and merging multiple Excel workbooks.\n- Generate reports and visualizations.\n- Caching for repeated queries.\n- Export data to multiple formats (CSV, JSON, etc.).\n- Jupyter Notebook integration for displaying data.\n\n---\n\n## Installation\n\nTo install `excelsql`, run the following command:\n\n```bash\npip install excelsql\n```\n\n---\n\n## Usage\n\n### Loading an Excel File\n\n```python\nimport excelsql\n\n# Path to the Excel file\nxls_file = './data/file_example_XLS_5000.xls'\ndb_file = './data/excel_db.sqlite'\n\n# Initialize the driver\ndriver = excelsql(xls_file, db_path=db_file)\n\n# Show worksheets\nworksheets = driver.show_worksheets()\n\n# Perform SQL query and return results as a DataFrame\ndf = driver.execute_query_to_dataframe('SELECT * FROM Sheet1')\n\n# Save changes back to an Excel file\ndriver.save_to_file('./data/modified_file.xlsx')\n\n# Close the connection\ndriver.close()\n```\n\n---\n\n## Features in Detail\n\n### SQL Query Interface with Pandas Integration\n\nRun SQL queries and return the results as a Pandas DataFrame.\n\n```python\ndf = driver.execute_query_to_dataframe('SELECT * FROM Sheet1 WHERE Age > 30')\n```\n\n### Schema Validation and Modification\n\n- **`validate_schema`**: Validate the schema for a worksheet.\n- **`add_column`**: Add new columns to a worksheet.\n\n```python\ndriver.add_column('Sheet1', 'NewColumn', datatype='TEXT')\ndriver.validate_schema('Sheet1', validations={'Age': 'numeric', 'Date': 'date'})\n```\n\n### Batch Processing and Multi-file Support\n\nLoad and merge multiple Excel workbooks.\n\n```python\ndriver.load_multiple_workbooks(['file1.xlsx', 'file2.xlsx'])\ndriver.merge_workbooks('Sheet1', 'Sheet2', on='Id', how='inner')\n```\n\n### Data Cleaning Functions\n\n- **`clean_data`**: Remove or fill missing values.\n- **`normalize`**: Normalize the values of numeric columns.\n\n```python\ndriver.clean_data('Sheet1', strategy='dropna')\ndriver.normalize('Sheet1', columns=['Age'])\n```\n\n### Joins and Merging Data Across Sheets\n\nJoin multiple worksheets using SQL-style operations.\n\n```python\ndriver.join_sheets('Sheet1', 'Sheet2', on='Id', how='inner')\n```\n\n### Advanced Reporting & Visualization Support\n\n- **`generate_report`**: Generate descriptive statistics and save as an Excel file.\n- **`export_visualization`**: Create visualizations (bar, line, area charts) from data.\n\n```python\ndriver.generate_report('Sheet1', output='./data/summary.xlsx')\ndriver.export_visualization('Sheet1', x_col='Age', y_col='Salary', plot_type='bar', output_path='./data/visuals')\n```\n\nBelow is an example of a bar chart visualization generated from the `Sheet1` dataset:\n\n![Bar Chart Visualization](./data/visuals/Sheet1_bar_Age_Salary.png)\n\n### Caching Queries for Speed Optimization\n\nEnable caching of frequent SQL queries to speed up repeated requests.\n\n```python\ndriver.enable_query_cache()\n```\n\n### Export to Multiple Formats\n\n- **`export_to_csv`**: Export worksheet data to CSV.\n- **`export_to_json`**: Export worksheet data to JSON.\n\n```python\ndriver.export_to_csv('Sheet1', './data/output.csv')\ndriver.export_to_json('Sheet1', './data/output.json')\n```\n\nYou can find example exported data formats below:\n\n- [CSV Export Example](./data/output.csv)\n- [JSON Export Example](./data/output.json)\n\n### Jupyter Notebook Integration\n\nDisplay data directly in Jupyter notebooks for quick data inspection.\n\n```python\ndriver.display_in_notebook('Sheet1')\n```\n\n---\n\n## Contributing\n\nContributions are welcome! Feel free to open an issue or submit a pull request on [GitHub](https://github.com/chris17453/xlsql).\n\n---\n\n## License\n\nThis project is licensed under the BSD 3-Clause License. See the [LICENSE](./LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": "BSD 3",
    "summary": "A Python driver to use SQL on Excel workbooks with CRUD, using SQLite as a middleman",
    "version": "0.1.35",
    "project_urls": {
        "Homepage": "https://github.com/chris17453/xlsql"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6e2cc87cfadda1ce6953a0e69749800bb8302bf7fd14fb7d88603fe4773dfe6e",
                "md5": "5c477b642b6e753494244bf049add43c",
                "sha256": "c05d1206bbd8ff5fa7753101f5946beb0ed4d9b03ddfe02d8af194a1a950c55b"
            },
            "downloads": -1,
            "filename": "excelsql-0.1.35-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5c477b642b6e753494244bf049add43c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 3299,
            "upload_time": "2024-09-20T14:44:26",
            "upload_time_iso_8601": "2024-09-20T14:44:26.819170Z",
            "url": "https://files.pythonhosted.org/packages/6e/2c/c87cfadda1ce6953a0e69749800bb8302bf7fd14fb7d88603fe4773dfe6e/excelsql-0.1.35-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c80c508a0a4737e321f8ddeb379e5b3eb6a47eab0837572145a53a2e7e574dc",
                "md5": "0cc2c526148e5d190e73a445891dfa47",
                "sha256": "42b35c8e8e99d3fcc0fa212a3d17130ab612d3fcf1110f9ed95401de27832845"
            },
            "downloads": -1,
            "filename": "excelsql-0.1.35.tar.gz",
            "has_sig": false,
            "md5_digest": "0cc2c526148e5d190e73a445891dfa47",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 5209,
            "upload_time": "2024-09-20T14:44:28",
            "upload_time_iso_8601": "2024-09-20T14:44:28.055425Z",
            "url": "https://files.pythonhosted.org/packages/0c/80/c508a0a4737e321f8ddeb379e5b3eb6a47eab0837572145a53a2e7e574dc/excelsql-0.1.35.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-20 14:44:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "chris17453",
    "github_project": "xlsql",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "excelsql"
}

Charles Watkins