polars-mssql


Namepolars-mssql JSON
Version 0.3.2 PyPI version JSON
download
home_pagehttps://github.com/drosenman/py_mssql
SummaryEffortlessly connect to SQL Server to import data into Polars DataFrames and export data back to SQL Server.
upload_time2025-01-05 19:46:12
maintainerNone
docs_urlNone
authorDave Rosenman
requires_python>=3.7
licenseMIT
keywords sql server polars sqlalchemy database
VCS
bugtrack_url
requirements polars sqlalchemy pyodbc
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# polars_mssql

`polars_mssql` is a Python package designed to simplify working with Microsoft SQL Server databases using the high-performance `polars` DataFrame library. It provides an intuitive and efficient interface for running SQL queries, reading tables, and writing data to SQL Server.

## Features

- **Seamless SQL Server Integration**: Easily connect to SQL Server with options for Windows Authentication or SQL Authentication.
- **Query Execution**: Execute SQL queries and retrieve results as `polars.DataFrame` objects. Use `read_query` for simple query execution or `polars.read_database` for advanced functionality like batch processing and schema customization.
- **Parameterization Support**: Securely execute parameterized queries to prevent accidental SQL injection.
- **Table Operations**: Read and write tables with flexibility and performance.
- **Context Management**: Supports Python's context manager for automatic connection handling.

## Installation

Install the package using pip:

```bash
pip install polars_mssql
```

Ensure the following dependencies are installed:

- `polars` for high-performance DataFrame operations.
- `sqlalchemy` for database connectivity.
- An appropriate ODBC driver for SQL Server (e.g., ODBC Driver 17 or 18).

## Usage

Here is an example of how to use `polars_mssql` to connect to SQL Server and perform various operations:

### 1. Connecting to SQL Server

```python
from polars_mssql import Connection

# Initialize a connection
conn = Connection(
  server="my_server",
    database="my_database",
    # If not specified, driver defaults to "SQL Server"
    # driver = 'ODBC Driver 17 for SQL Server'
)
```
**Driver Defaults**
By default, the driver parameter is set to `"SQL Server"`, which often comes preinstalled on Windows. If you don't have `"SQL Server"` installed or prefer a more recent driver, specify any compatible driver you have installed (e.g., `"ODBC Driver 17 for SQL Server"`) for the database you are trying to connect to. 

### 2. Read Data from SQL Server

#### Execute a SQL Query and Get Results as a DataFrame

```python
query = "SELECT * FROM my_table WHERE col1 = 'a'"
df = conn.read_query(query)
```

For advanced functionality (e.g., batch processing or schema customization), use the polars.read_database function with the engine:

```python
import polars as pl

df = pl.read_database(
    query="SELECT * FROM users",
    connection=conn.engine,
    iter_batches=True,
    batch_size=1000
)

for batch in df:
    print(batch)
```

#### Read an Entire Table

```python
df = conn.read_table("my_table")
```

### 3. Save DataFrame to SQL Server

#### Write a Polars DataFrame to a Table

```python
import polars as pl

# Example DataFrame
data = pl.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
conn.write_table(data, name="my_table", if_exists="replace")
```

### 4. Execute Queries

The `execute_query` method allows you to run any SQL query on your database. It supports parameterized queries to prevent accidental SQL injection and can be used for both retrieval and modification operations such as `INSERT`, `DELETE`, and `DROP`.

#### Example: Run a Simple Query
```python
query = "DELETE FROM users WHERE id = 1"
conn.execute_query(query)
```

#### Example: Insert Data Securely
```python
query = "INSERT INTO users (id, name, email) VALUES (:id, :name, :email)"
params = {"id": 1, "name": "John Doe", "email": "john.doe@example.com"}
conn.execute_query(query, params)
```

#### Example: Drop a Table
```python
query = "DROP TABLE users"
conn.execute_query(query)
```

#### Example: Prevent SQL Injection
```python
query = "SELECT * FROM users WHERE name = :name"
params = {"name": "John'; DROP TABLE users; --"}
conn.execute_query(query, params)
```
This safely executes the query without executing malicious SQL commands.

### 5. Using Context Management

```python
with Connection(server="my_server", database="my_database") as conn:
    df = conn.read_query("SELECT * FROM my_table")
    print(df)
```

### 6. Closing the Connection

```python
conn.close()
```

## API Reference

### `Connection` Class

#### Constructor
```python
Connection(server: Optional[str] = None, database: Optional[str] = None, driver: Optional[str] = None, username: Optional[str] = None, password: Optional[str] = None)
```

#### Attributes

- **`server`** (str):
  The name or address of the SQL Server instance.

  - **`database`** (str):
  The name of the connected database.

- **`driver`** (str):
  The ODBC driver being used for the connection (e.g., "ODBC Driver 17 for SQL Server").

- **`connection_string`** (str):
  The full SQLAlchemy connection string used to create the engine. This can be useful for debugging or passing to other tools.

- **`engine`** (`sqlalchemy.engine.base.Engine`):
  The SQLAlchemy engine used for database interactions. Advanced users can use this attribute for custom SQLAlchemy operations or to pass it to functions like `polars.read_database`.

#### Methods

- **`read_query(query: str) -> pl.DataFrame`**:
  Execute a query and return results as a Polars DataFrame.
  - **Parameters:**
    - `query (str)`: The SQL query to execute.
  - **Returns**: pl.DataFrame: The result of the query as a Polars DataFrame.

  - **Example**:
    ```python
    query = "SELECT * FROM my_table WHERE col1 = 'a'"
    df = conn.read_query(query)
    print(df)
    ```

- **`read_table(name: str) -> pl.DataFrame`**:
  Read all rows from a table.
  - **Parameters:**
    - `name (str)`: The name of the table to read from.
  - **Returns**: pl.DataFrame: All rows from the specified table as a Polars DataFrame.

  - **Example**:
    ```python
    df = conn.read_table('my_table')
    print(df)
    ```

- **`write_table(df: pl.DataFrame, name: str, if_exists: str = "fail") -> None`**:
  Save a Polars DataFrame to a specified table in SQL Server.
  - **Parameters**:
    - `df` (pl.DataFrame): The Polars DataFrame to be written.
    - `name` (str): The name of the target table in the database.
    - `if_exists` (str): What to do if the target table already exists. Options:
      - `'fail'` (default): Raise an error.
      - `'append'`: Append the data to the existing table.
      - `'replace'`: Drop the existing table, recreate it, and insert the data.
  - **Raises**:
    - `ValueError`: If `if_exists` is not one of `'fail'`, `'append'`, or `'replace'`.
    - `RuntimeError`: If the write operation fails.
  - **Examples**:
    ```python
    import polars as pl

    # Create a Polars DataFrame
    df = pl.DataFrame({
        "id": [1, 2, 3],
        "name": ["Alice", "Bob", "Charlie"]
    })

    # Write the DataFrame to the database
    conn.write_table(df, name="users", if_exists="replace")
    ```

- **`execute_query(query: str, params: Optional[Dict[str, Any]] = None) -> None`**:
  Execute any SQL query. Supports parameterized queries to prevent SQL injection.
  - **Parameters**:
    - `query` (str): The SQL query to execute. Can include placeholders for parameterized queries (e.g., `:param_name`).
    - `params` (dict, optional): A dictionary of parameters to bind to the query.
  - **Examples**:
    ```python
    query = "DELETE FROM users WHERE id = 1"
    conn.execute_query(query)
    ```

    ```python
    query = "INSERT INTO users (id, name) VALUES (:id, :name)"
    params = {"id": 1, "name": "Jane"}
    conn.execute_query(query, params)
    ```

- **`close() -> None`**:
  Dispose of the SQLAlchemy engine and close the connection.
  - **Example**:
    ```python
    conn.close()
    ```

## Requirements

- Python 3.7 or higher
- `polars`
- `sqlalchemy`
- ODBC Driver for SQL Server (17 or 18 recommended)

### Installing the ODBC Driver

#### Windows
Download and install the ODBC Driver from [Microsoft's website](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server).

#### macOS
Install via Homebrew:
```bash
brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
brew update
brew install --no-sandbox msodbcsql18
```

#### Linux
Install using the following commands:
```bash
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
curl https://packages.microsoft.com/config/ubuntu/20.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list
sudo apt-get update
sudo apt-get install -y mssql-tools unixodbc-dev
```

## Contributing

Contributions are welcome! If you encounter issues or have feature requests, please open an issue or submit a pull request on [GitHub](https://github.com/drosenman/polars_mssql).

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## Acknowledgments

This package integrates the efficiency of polars with the versatility of SQL Server, inspired by real-world data engineering needs. As a data engineer, I often need to pull data from SQL Server into polars and export data from polars back to SQL Server. I created this package to streamline these workflows and make the process more efficient.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/drosenman/py_mssql",
    "name": "polars-mssql",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "sql server polars sqlalchemy database",
    "author": "Dave Rosenman",
    "author_email": "daverosenman@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e5/d6/fe7cac731d68a2602f4e1865a3b14bb48c320ec630914eb9d26b6f0a7b5e/polars_mssql-0.3.2.tar.gz",
    "platform": null,
    "description": "\r\n# polars_mssql\r\n\r\n`polars_mssql` is a Python package designed to simplify working with Microsoft SQL Server databases using the high-performance `polars` DataFrame library. It provides an intuitive and efficient interface for running SQL queries, reading tables, and writing data to SQL Server.\r\n\r\n## Features\r\n\r\n- **Seamless SQL Server Integration**: Easily connect to SQL Server with options for Windows Authentication or SQL Authentication.\r\n- **Query Execution**: Execute SQL queries and retrieve results as `polars.DataFrame` objects. Use `read_query` for simple query execution or `polars.read_database` for advanced functionality like batch processing and schema customization.\r\n- **Parameterization Support**: Securely execute parameterized queries to prevent accidental SQL injection.\r\n- **Table Operations**: Read and write tables with flexibility and performance.\r\n- **Context Management**: Supports Python's context manager for automatic connection handling.\r\n\r\n## Installation\r\n\r\nInstall the package using pip:\r\n\r\n```bash\r\npip install polars_mssql\r\n```\r\n\r\nEnsure the following dependencies are installed:\r\n\r\n- `polars` for high-performance DataFrame operations.\r\n- `sqlalchemy` for database connectivity.\r\n- An appropriate ODBC driver for SQL Server (e.g., ODBC Driver 17 or 18).\r\n\r\n## Usage\r\n\r\nHere is an example of how to use `polars_mssql` to connect to SQL Server and perform various operations:\r\n\r\n### 1. Connecting to SQL Server\r\n\r\n```python\r\nfrom polars_mssql import Connection\r\n\r\n# Initialize a connection\r\nconn = Connection(\r\n  server=\"my_server\",\r\n    database=\"my_database\",\r\n    # If not specified, driver defaults to \"SQL Server\"\r\n    # driver = 'ODBC Driver 17 for SQL Server'\r\n)\r\n```\r\n**Driver Defaults**\r\nBy default, the driver parameter is set to `\"SQL Server\"`, which often comes preinstalled on Windows. If you don't have `\"SQL Server\"` installed or prefer a more recent driver, specify any compatible driver you have installed (e.g., `\"ODBC Driver 17 for SQL Server\"`) for the database you are trying to connect to. \r\n\r\n### 2. Read Data from SQL Server\r\n\r\n#### Execute a SQL Query and Get Results as a DataFrame\r\n\r\n```python\r\nquery = \"SELECT * FROM my_table WHERE col1 = 'a'\"\r\ndf = conn.read_query(query)\r\n```\r\n\r\nFor advanced functionality (e.g., batch processing or schema customization), use the polars.read_database function with the engine:\r\n\r\n```python\r\nimport polars as pl\r\n\r\ndf = pl.read_database(\r\n    query=\"SELECT * FROM users\",\r\n    connection=conn.engine,\r\n    iter_batches=True,\r\n    batch_size=1000\r\n)\r\n\r\nfor batch in df:\r\n    print(batch)\r\n```\r\n\r\n#### Read an Entire Table\r\n\r\n```python\r\ndf = conn.read_table(\"my_table\")\r\n```\r\n\r\n### 3. Save DataFrame to SQL Server\r\n\r\n#### Write a Polars DataFrame to a Table\r\n\r\n```python\r\nimport polars as pl\r\n\r\n# Example DataFrame\r\ndata = pl.DataFrame({\"col1\": [1, 2, 3], \"col2\": [\"a\", \"b\", \"c\"]})\r\nconn.write_table(data, name=\"my_table\", if_exists=\"replace\")\r\n```\r\n\r\n### 4. Execute Queries\r\n\r\nThe `execute_query` method allows you to run any SQL query on your database. It supports parameterized queries to prevent accidental SQL injection and can be used for both retrieval and modification operations such as `INSERT`, `DELETE`, and `DROP`.\r\n\r\n#### Example: Run a Simple Query\r\n```python\r\nquery = \"DELETE FROM users WHERE id = 1\"\r\nconn.execute_query(query)\r\n```\r\n\r\n#### Example: Insert Data Securely\r\n```python\r\nquery = \"INSERT INTO users (id, name, email) VALUES (:id, :name, :email)\"\r\nparams = {\"id\": 1, \"name\": \"John Doe\", \"email\": \"john.doe@example.com\"}\r\nconn.execute_query(query, params)\r\n```\r\n\r\n#### Example: Drop a Table\r\n```python\r\nquery = \"DROP TABLE users\"\r\nconn.execute_query(query)\r\n```\r\n\r\n#### Example: Prevent SQL Injection\r\n```python\r\nquery = \"SELECT * FROM users WHERE name = :name\"\r\nparams = {\"name\": \"John'; DROP TABLE users; --\"}\r\nconn.execute_query(query, params)\r\n```\r\nThis safely executes the query without executing malicious SQL commands.\r\n\r\n### 5. Using Context Management\r\n\r\n```python\r\nwith Connection(server=\"my_server\", database=\"my_database\") as conn:\r\n    df = conn.read_query(\"SELECT * FROM my_table\")\r\n    print(df)\r\n```\r\n\r\n### 6. Closing the Connection\r\n\r\n```python\r\nconn.close()\r\n```\r\n\r\n## API Reference\r\n\r\n### `Connection` Class\r\n\r\n#### Constructor\r\n```python\r\nConnection(server: Optional[str] = None, database: Optional[str] = None, driver: Optional[str] = None, username: Optional[str] = None, password: Optional[str] = None)\r\n```\r\n\r\n#### Attributes\r\n\r\n- **`server`** (str):\r\n  The name or address of the SQL Server instance.\r\n\r\n  - **`database`** (str):\r\n  The name of the connected database.\r\n\r\n- **`driver`** (str):\r\n  The ODBC driver being used for the connection (e.g., \"ODBC Driver 17 for SQL Server\").\r\n\r\n- **`connection_string`** (str):\r\n  The full SQLAlchemy connection string used to create the engine. This can be useful for debugging or passing to other tools.\r\n\r\n- **`engine`** (`sqlalchemy.engine.base.Engine`):\r\n  The SQLAlchemy engine used for database interactions. Advanced users can use this attribute for custom SQLAlchemy operations or to pass it to functions like `polars.read_database`.\r\n\r\n#### Methods\r\n\r\n- **`read_query(query: str) -> pl.DataFrame`**:\r\n  Execute a query and return results as a Polars DataFrame.\r\n  - **Parameters:**\r\n    - `query (str)`: The SQL query to execute.\r\n  - **Returns**: pl.DataFrame: The result of the query as a Polars DataFrame.\r\n\r\n  - **Example**:\r\n    ```python\r\n    query = \"SELECT * FROM my_table WHERE col1 = 'a'\"\r\n    df = conn.read_query(query)\r\n    print(df)\r\n    ```\r\n\r\n- **`read_table(name: str) -> pl.DataFrame`**:\r\n  Read all rows from a table.\r\n  - **Parameters:**\r\n    - `name (str)`: The name of the table to read from.\r\n  - **Returns**: pl.DataFrame: All rows from the specified table as a Polars DataFrame.\r\n\r\n  - **Example**:\r\n    ```python\r\n    df = conn.read_table('my_table')\r\n    print(df)\r\n    ```\r\n\r\n- **`write_table(df: pl.DataFrame, name: str, if_exists: str = \"fail\") -> None`**:\r\n  Save a Polars DataFrame to a specified table in SQL Server.\r\n  - **Parameters**:\r\n    - `df` (pl.DataFrame): The Polars DataFrame to be written.\r\n    - `name` (str): The name of the target table in the database.\r\n    - `if_exists` (str): What to do if the target table already exists. Options:\r\n      - `'fail'` (default): Raise an error.\r\n      - `'append'`: Append the data to the existing table.\r\n      - `'replace'`: Drop the existing table, recreate it, and insert the data.\r\n  - **Raises**:\r\n    - `ValueError`: If `if_exists` is not one of `'fail'`, `'append'`, or `'replace'`.\r\n    - `RuntimeError`: If the write operation fails.\r\n  - **Examples**:\r\n    ```python\r\n    import polars as pl\r\n\r\n    # Create a Polars DataFrame\r\n    df = pl.DataFrame({\r\n        \"id\": [1, 2, 3],\r\n        \"name\": [\"Alice\", \"Bob\", \"Charlie\"]\r\n    })\r\n\r\n    # Write the DataFrame to the database\r\n    conn.write_table(df, name=\"users\", if_exists=\"replace\")\r\n    ```\r\n\r\n- **`execute_query(query: str, params: Optional[Dict[str, Any]] = None) -> None`**:\r\n  Execute any SQL query. Supports parameterized queries to prevent SQL injection.\r\n  - **Parameters**:\r\n    - `query` (str): The SQL query to execute. Can include placeholders for parameterized queries (e.g., `:param_name`).\r\n    - `params` (dict, optional): A dictionary of parameters to bind to the query.\r\n  - **Examples**:\r\n    ```python\r\n    query = \"DELETE FROM users WHERE id = 1\"\r\n    conn.execute_query(query)\r\n    ```\r\n\r\n    ```python\r\n    query = \"INSERT INTO users (id, name) VALUES (:id, :name)\"\r\n    params = {\"id\": 1, \"name\": \"Jane\"}\r\n    conn.execute_query(query, params)\r\n    ```\r\n\r\n- **`close() -> None`**:\r\n  Dispose of the SQLAlchemy engine and close the connection.\r\n  - **Example**:\r\n    ```python\r\n    conn.close()\r\n    ```\r\n\r\n## Requirements\r\n\r\n- Python 3.7 or higher\r\n- `polars`\r\n- `sqlalchemy`\r\n- ODBC Driver for SQL Server (17 or 18 recommended)\r\n\r\n### Installing the ODBC Driver\r\n\r\n#### Windows\r\nDownload and install the ODBC Driver from [Microsoft's website](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server).\r\n\r\n#### macOS\r\nInstall via Homebrew:\r\n```bash\r\nbrew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release\r\nbrew update\r\nbrew install --no-sandbox msodbcsql18\r\n```\r\n\r\n#### Linux\r\nInstall using the following commands:\r\n```bash\r\ncurl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -\r\ncurl https://packages.microsoft.com/config/ubuntu/20.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list\r\nsudo apt-get update\r\nsudo apt-get install -y mssql-tools unixodbc-dev\r\n```\r\n\r\n## Contributing\r\n\r\nContributions are welcome! If you encounter issues or have feature requests, please open an issue or submit a pull request on [GitHub](https://github.com/drosenman/polars_mssql).\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\r\n\r\n## Acknowledgments\r\n\r\nThis package integrates the efficiency of polars with the versatility of SQL Server, inspired by real-world data engineering needs. As a data engineer, I often need to pull data from SQL Server into polars and export data from polars back to SQL Server. I created this package to streamline these workflows and make the process more efficient.\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Effortlessly connect to SQL Server to import data into Polars DataFrames and export data back to SQL Server.",
    "version": "0.3.2",
    "project_urls": {
        "Bug Reports": "https://github.com/drosenman/py_mssql/issues",
        "Homepage": "https://github.com/drosenman/py_mssql",
        "Source": "https://github.com/drosenman/py_mssql"
    },
    "split_keywords": [
        "sql",
        "server",
        "polars",
        "sqlalchemy",
        "database"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "47200c5381798bb64b80c29ce2ac69fde90e8757e3f36413dcf77ca9a8f06cb0",
                "md5": "3a1054405bdb99a13c06007be20e6561",
                "sha256": "9138a1ed2e1928984e44e9724947b9c5a548e21e943183750babaa8c4ae8953d"
            },
            "downloads": -1,
            "filename": "polars_mssql-0.3.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3a1054405bdb99a13c06007be20e6561",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 10786,
            "upload_time": "2025-01-05T19:46:10",
            "upload_time_iso_8601": "2025-01-05T19:46:10.255529Z",
            "url": "https://files.pythonhosted.org/packages/47/20/0c5381798bb64b80c29ce2ac69fde90e8757e3f36413dcf77ca9a8f06cb0/polars_mssql-0.3.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e5d6fe7cac731d68a2602f4e1865a3b14bb48c320ec630914eb9d26b6f0a7b5e",
                "md5": "6b9eefc3c5df82f5f124654d74ad22e2",
                "sha256": "15d63be65107f48a60600bce6ae3a5ab976752d2275bbdaf2386c4cab170fcc6"
            },
            "downloads": -1,
            "filename": "polars_mssql-0.3.2.tar.gz",
            "has_sig": false,
            "md5_digest": "6b9eefc3c5df82f5f124654d74ad22e2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 13204,
            "upload_time": "2025-01-05T19:46:12",
            "upload_time_iso_8601": "2025-01-05T19:46:12.661591Z",
            "url": "https://files.pythonhosted.org/packages/e5/d6/fe7cac731d68a2602f4e1865a3b14bb48c320ec630914eb9d26b6f0a7b5e/polars_mssql-0.3.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-05 19:46:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "drosenman",
    "github_project": "py_mssql",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "polars",
            "specs": []
        },
        {
            "name": "sqlalchemy",
            "specs": []
        },
        {
            "name": "pyodbc",
            "specs": []
        }
    ],
    "lcname": "polars-mssql"
}
        
Elapsed time: 1.56784s