# Snowflake Snowpark Python and Snowpark pandas APIs
[![Build and Test](https://github.com/snowflakedb/snowpark-python/actions/workflows/precommit.yml/badge.svg)](https://github.com/snowflakedb/snowpark-python/actions/workflows/precommit.yml)
[![codecov](https://codecov.io/gh/snowflakedb/snowpark-python/branch/main/graph/badge.svg)](https://codecov.io/gh/snowflakedb/snowpark-python)
[![PyPi](https://img.shields.io/pypi/v/snowflake-snowpark-python.svg)](https://pypi.org/project/snowflake-snowpark-python/)
[![License Apache-2.0](https://img.shields.io/:license-Apache%202-brightgreen.svg)](http://www.apache.org/licenses/LICENSE-2.0.txt)
[![Codestyle Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
The Snowpark library provides intuitive APIs for querying and processing data in a data pipeline.
Using this library, you can build applications that process data in Snowflake without having to move data to the system where your application code runs.
[Source code][source code] | [Snowpark Python developer guide][Snowpark Python developer guide] | [Snowpark Python API reference][Snowpark Python api references] | [Snowpark pandas developer guide][Snowpark pandas developer guide] | [Snowpark pandas API reference][Snowpark pandas api references] | [Product documentation][snowpark] | [Samples][samples]
## Getting started
### Have your Snowflake account ready
If you don't have a Snowflake account yet, you can [sign up for a 30-day free trial account][sign up trial].
### Create a Python virtual environment
You can use [miniconda][miniconda], [anaconda][anaconda], or [virtualenv][virtualenv]
to create a Python 3.8, 3.9, 3.10 or 3.11 virtual environment.
For Snowpark pandas, only Python 3.9, 3.10, or 3.11 is supported.
To have the best experience when using it with UDFs, [creating a local conda environment with the Snowflake channel][use snowflake channel] is recommended.
### Install the library to the Python virtual environment
```bash
pip install snowflake-snowpark-python
```
To use the [Snowpark pandas API][Snowpark pandas developer guide], you can optionally install the following, which installs [modin][modin] in the same environment. The Snowpark pandas API provides a familiar interface for pandas users to query and process data directly in Snowflake.
```bash
pip install "snowflake-snowpark-python[modin]"
```
### Create a session and use the Snowpark Python API
```python
from snowflake.snowpark import Session
connection_parameters = {
"account": "<your snowflake account>",
"user": "<your snowflake user>",
"password": "<your snowflake password>",
"role": "<snowflake user role>",
"warehouse": "<snowflake warehouse>",
"database": "<snowflake database>",
"schema": "<snowflake schema>"
}
session = Session.builder.configs(connection_parameters).create()
# Create a Snowpark dataframe from input data
df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
df = df.filter(df.a > 1)
result = df.collect()
df.show()
# -------------
# |"A" |"B" |
# -------------
# |3 |4 |
# -------------
```
### Create a session and use the Snowpark pandas API
```python
import modin.pandas as pd
import snowflake.snowpark.modin.plugin
from snowflake.snowpark import Session
CONNECTION_PARAMETERS = {
'account': '<myaccount>',
'user': '<myuser>',
'password': '<mypassword>',
'role': '<myrole>',
'database': '<mydatabase>',
'schema': '<myschema>',
'warehouse': '<mywarehouse>',
}
session = Session.builder.configs(CONNECTION_PARAMETERS).create()
# Create a Snowpark pandas dataframe from input data
df = pd.DataFrame([['a', 2.0, 1],['b', 4.0, 2],['c', 6.0, None]], columns=["COL_STR", "COL_FLOAT", "COL_INT"])
df
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1.0
# 1 b 4.0 2.0
# 2 c 6.0 NaN
df.shape
# (3, 3)
df.head(2)
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1
# 1 b 4.0 2
df.dropna(subset=["COL_INT"], inplace=True)
df
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1
# 1 b 4.0 2
df.shape
# (2, 3)
df.head(2)
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1
# 1 b 4.0 2
# Save the result back to Snowflake with a row_pos column.
df.reset_index(drop=True).to_snowflake('pandas_test2', index=True, index_label=['row_pos'])
```
## Samples
The [Snowpark Python developer guide][Snowpark Python developer guide], [Snowpark Python API references][Snowpark Python api references], [Snowpark pandas developer guide][Snowpark pandas developer guide], and [Snowpark pandas api references][Snowpark pandas api references] have basic sample code.
[Snowflake-Labs][snowflake lab sample code] has more curated demos.
## Logging
Configure logging level for `snowflake.snowpark` for Snowpark Python API logs.
Snowpark uses the [Snowflake Python Connector][python connector].
So you may also want to configure the logging level for `snowflake.connector` when the error is in the Python Connector.
For instance,
```python
import logging
for logger_name in ('snowflake.snowpark', 'snowflake.connector'):
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
logger.addHandler(ch)
```
## Reading and writing to pandas DataFrame
Snowpark Python API supports reading from and writing to a pandas DataFrame via the [to_pandas][to_pandas] and [write_pandas][write_pandas] commands.
To use these operations, ensure that pandas is installed in the same environment. You can install pandas alongside Snowpark Python by executing the following command:
```bash
pip install "snowflake-snowpark-python[pandas]"
```
Once pandas is installed, you can convert between a Snowpark DataFrame and pandas DataFrame as follows:
```python
df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
# Convert Snowpark DataFrame to pandas DataFrame
pandas_df = df.to_pandas()
# Write pandas DataFrame to a Snowflake table and return Snowpark DataFrame
snowpark_df = session.write_pandas(pandas_df, "new_table", auto_create_table=True)
```
Snowpark pandas API also supports writing to pandas:
```python
import modin.pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns=["a", "b"])
# Convert Snowpark pandas DataFrame to pandas DataFrame
pandas_df = df.to_pandas()
```
Note that the above Snowpark pandas commands will work if Snowpark is installed with the `[modin]` option, the additional `[pandas]` installation is not required.
## Contributing
Please refer to [CONTRIBUTING.md][contributing].
[add other sample code repo links]: # (Developer advocacy is open-sourcing a repo that has excellent sample code. The link will be added here.)
[Snowpark Python developer guide]: https://docs.snowflake.com/en/developer-guide/snowpark/python/index.html
[Snowpark Python api references]: https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/index.html
[Snowpark pandas developer guide]: https://docs.snowflake.com/developer-guide/snowpark/python/snowpark-pandas
[Snowpark pandas api references]: https://docs.snowflake.com/developer-guide/snowpark/reference/python/latest/modin/index
[snowpark]: https://www.snowflake.com/snowpark
[sign up trial]: https://signup.snowflake.com
[source code]: https://github.com/snowflakedb/snowpark-python
[miniconda]: https://docs.conda.io/en/latest/miniconda.html
[anaconda]: https://www.anaconda.com/
[virtualenv]: https://docs.python.org/3/tutorial/venv.html
[config pycharm interpreter]: https://www.jetbrains.com/help/pycharm/configuring-python-interpreter.html
[python connector]: https://pypi.org/project/snowflake-connector-python/
[use snowflake channel]: https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages.html#local-development-and-testing
[snowflake lab sample code]: https://github.com/Snowflake-Labs/snowpark-python-demos
[samples]: https://github.com/snowflakedb/snowpark-python/blob/main/README.md#samples
[contributing]: https://github.com/snowflakedb/snowpark-python/blob/main/CONTRIBUTING.md
[to_pandas]: https://docs.snowflake.com/developer-guide/snowpark/reference/python/latest/snowpark/api/snowflake.snowpark.DataFrame.to_pandas
[write_pandas]: https://docs.snowflake.com/developer-guide/snowpark/reference/python/latest/snowpark/api/snowflake.snowpark.Session.write_pandas
[modin]: https://github.com/modin-project/modin
# Release History
## 1.26.0 (2024-12-05)
### Snowpark Python API Updates
#### New Features
- Added support for property `version` and class method `get_active_session` for `Session` class.
- Added new methods and variables to enhance data type handling and JSON serialization/deserialization:
- To `DataType`, its derived classes, and `StructField`:
- `type_name`: Returns the type name of the data.
- `simple_string`: Provides a simple string representation of the data.
- `json_value`: Returns the data as a JSON-compatible value.
- `json`: Converts the data to a JSON string.
- To `ArrayType`, `MapType`, `StructField`, `PandasSeriesType`, `PandasDataFrameType` and `StructType`:
- `from_json`: Enables these types to be created from JSON data.
- To `MapType`:
- `keyType`: keys of the map
- `valueType`: values of the map
- Added support for method `appName` in `SessionBuilder`.
- Added support for `include_nulls` argument in `DataFrame.unpivot`.
- Added support for following functions in `functions.py`:
- `size` to get size of array, object, or map columns.
- `collect_list` an alias of `array_agg`.
- `substring` makes `len` argument optional.
- Added parameter `ast_enabled` to session for internal usage (default: `False`).
#### Improvements
- Added support for specifying the following to `DataFrame.create_or_replace_dynamic_table`:
- `iceberg_config` A dictionary that can hold the following iceberg configuration options:
- `external_volume`
- `catalog`
- `base_location`
- `catalog_sync`
- `storage_serialization_policy`
- Added support for nested data types to `DataFrame.print_schema`
- Added support for `level` parameter to `DataFrame.print_schema`
- Improved flexibility of `DataFrameReader` and `DataFrameWriter` API by adding support for the following:
- Added `format` method to `DataFrameReader` and `DataFrameWriter` to specify file format when loading or unloading results.
- Added `load` method to `DataFrameReader` to work in conjunction with `format`.
- Added `save` method to `DataFrameWriter` to work in conjunction with `format`.
- Added support to read keyword arguments to `options` method for `DataFrameReader` and `DataFrameWriter`.
- Relaxed the cloudpickle dependency for Python 3.11 to simplify build requirements. However, for Python 3.11, `cloudpickle==2.2.1` remains the only supported version.
#### Bug Fixes
- Removed warnings that dynamic pivot features were in private preview, because
dynamic pivot is now generally available.
- Fixed a bug in `session.read.options` where `False` Boolean values were incorrectly parsed as `True` in the generated file format.
#### Dependency Updates
- Added a runtime dependency on `python-dateutil`.
### Snowpark pandas API Updates
#### New Features
- Added partial support for `Series.map` when `arg` is a pandas `Series` or a
`collections.abc.Mapping`. No support for instances of `dict` that implement
`__missing__` but are not instances of `collections.defaultdict`.
- Added support for `DataFrame.align` and `Series.align` for `axis=1` and `axis=None`.
- Added support for `pd.json_normalize`.
- Added support for `GroupBy.pct_change` with `axis=0`, `freq=None`, and `limit=None`.
- Added support for `DataFrameGroupBy.__iter__` and `SeriesGroupBy.__iter__`.
- Added support for `np.sqrt`, `np.trunc`, `np.floor`, numpy trig functions, `np.exp`, `np.abs`, `np.positive` and `np.negative`.
- Added partial support for the dataframe interchange protocol method
`DataFrame.__dataframe__()`.
#### Bug Fixes
- Fixed a bug in `df.loc` where setting a single column from a series results in unexpected `None` values.
#### Improvements
- Use UNPIVOT INCLUDE NULLS for unpivot operations in pandas instead of sentinel values.
- Improved documentation for pd.read_excel.
## 1.25.0 (2024-11-14)
### Snowpark Python API Updates
#### New Features
- Added the following new functions in `snowflake.snowpark.dataframe`:
- `map`
- Added support for passing parameter `include_error` to `Session.query_history` to record queries that have error during execution.
#### Improvements
- When target stage is not set in profiler, a default stage from `Session.get_session_stage` is used instead of raising `SnowparkSQLException`.
- Allowed lower case or mixed case input when calling `Session.stored_procedure_profiler.set_active_profiler`.
- Added distributed tracing using open telemetry APIs for action function in `DataFrame`:
- `cache_result`
- Removed opentelemetry warning from logging.
#### Bug Fixes
- Fixed the pre-action and post-action query propagation when `In` expression were used in selects.
- Fixed a bug that raised error `AttributeError` while calling `Session.stored_procedure_profiler.get_output` when `Session.stored_procedure_profiler` is disabled.
#### Dependency Updates
- Added a dependency on `protobuf>=5.28` and `tzlocal` at runtime.
- Added a dependency on `protoc-wheel-0` for the development profile.
- Require `snowflake-connector-python>=3.12.0, <4.0.0` (was `>=3.10.0`).
### Snowpark pandas API Updates
#### Dependency Updates
- Updated `modin` from 0.28.1 to 0.30.1.
- Added support for all `pandas` 2.2.x versions.
#### New Features
- Added support for `Index.to_numpy`.
- Added support for `DataFrame.align` and `Series.align` for `axis=0`.
- Added support for `size` in `GroupBy.aggregate`, `DataFrame.aggregate`, and `Series.aggregate`.
- Added support for `snowflake.snowpark.functions.window`
- Added support for `pd.read_pickle` (Uses native pandas for processing).
- Added support for `pd.read_html` (Uses native pandas for processing).
- Added support for `pd.read_xml` (Uses native pandas for processing).
- Added support for aggregation functions `"size"` and `len` in `GroupBy.aggregate`, `DataFrame.aggregate`, and `Series.aggregate`.
- Added support for list values in `Series.str.len`.
#### Bug Fixes
- Fixed a bug where aggregating a single-column dataframe with a single callable function (e.g. `pd.DataFrame([0]).agg(np.mean)`) would fail to transpose the result.
- Fixed bugs where `DataFrame.dropna()` would:
- Treat an empty `subset` (e.g. `[]`) as if it specified all columns instead of no columns.
- Raise a `TypeError` for a scalar `subset` instead of filtering on just that column.
- Raise a `ValueError` for a `subset` of type `pandas.Index` instead of filtering on the columns in the index.
- Disable creation of scoped read only table to mitigate Disable creation of scoped read only table to mitigate `TableNotFoundError` when using dynamic pivot in notebook environment.
- Fixed a bug when concat dataframe or series objects are coming from the same dataframe when axis = 1.
#### Improvements
- Improve np.where with scalar x value by eliminating unnecessary join and temp table creation.
- Improve get_dummies performance by flattening the pivot with join.
- Improve align performance when aligning on row position column by removing unnecessary window functions.
### Snowpark Local Testing Updates
#### New Features
- Added support for patching functions that are unavailable in the `snowflake.snowpark.functions` module.
- Added support for `snowflake.snowpark.functions.any_value`
#### Bug Fixes
- Fixed a bug where `Table.update` could not handle `VariantType`, `MapType`, and `ArrayType` data types.
- Fixed a bug where column aliases were incorrectly resolved in `DataFrame.join`, causing errors when selecting columns from a joined DataFrame.
- Fixed a bug where `Table.update` and `Table.merge` could fail if the target table's index was not the default `RangeIndex`.
## 1.24.0 (2024-10-28)
### Snowpark Python API Updates
#### New Features
- Updated `Session` class to be thread-safe. This allows concurrent DataFrame transformations, DataFrame actions, UDF and stored procedure registration, and concurrent file uploads when using the same `Session` object.
- The feature is disabled by default and can be enabled by setting `FEATURE_THREAD_SAFE_PYTHON_SESSION` to `True` for account.
- Updating session configurations, like changing database or schema, when multiple threads are using the session may lead to unexpected behavior.
- When enabled, some internally created temporary table names returned from `DataFrame.queries` API are not deterministic, and may be different when DataFrame actions are executed. This does not affect explicit user-created temporary tables.
- Added support for 'Service' domain to `session.lineage.trace` API.
- Added support for `copy_grants` parameter when registering UDxF and stored procedures.
- Added support for the following methods in `DataFrameWriter` to support daisy-chaining:
- `option`
- `options`
- `partition_by`
- Added support for `snowflake_cortex_summarize`.
#### Improvements
- Improved the following new capability for function `snowflake.snowpark.functions.array_remove` it is now possible to use in python.
- Disables sql simplification when sort is performed after limit.
- Previously, `df.sort().limit()` and `df.limit().sort()` generates the same query with sort in front of limit. Now, `df.limit().sort()` will generate query that reads `df.limit().sort()`.
- Improve performance of generated query for `df.limit().sort()`, because limit stops table scanning as soon as the number of records is satisfied.
- Added a client side error message for when an invalid stage location is passed to DataFrame read functions.
#### Bug Fixes
- Fixed a bug where the automatic cleanup of temporary tables could interfere with the results of async query execution.
- Fixed a bug in `DataFrame.analytics.time_series_agg` function to handle multiple data points in same sliding interval.
- Fixed a bug that created inconsistent casing in field names of structured objects in iceberg schemas.
#### Deprecations
- Deprecated warnings will be triggered when using snowpark-python with Python 3.8. For more details, please refer to https://docs.snowflake.com/en/developer-guide/python-runtime-support-policy.
### Snowpark pandas API Updates
#### New Features
- Added support for `np.subtract`, `np.multiply`, `np.divide`, and `np.true_divide`.
- Added support for tracking usages of `__array_ufunc__`.
- Added numpy compatibility support for `np.float_power`, `np.mod`, `np.remainder`, `np.greater`, `np.greater_equal`, `np.less`, `np.less_equal`, `np.not_equal`, and `np.equal`.
- Added numpy compatibility support for `np.log`, `np.log2`, and `np.log10`
- Added support for `DataFrameGroupBy.bfill`, `SeriesGroupBy.bfill`, `DataFrameGroupBy.ffill`, and `SeriesGroupBy.ffill`.
- Added support for `on` parameter with `Resampler`.
- Added support for timedelta inputs in `value_counts()`.
- Added support for applying Snowpark Python function `snowflake_cortex_summarize`.
- Added support for `DataFrame.attrs` and `Series.attrs`.
- Added support for `DataFrame.style`.
- Added numpy compatibility support for `np.full_like`
#### Improvements
- Improved generated SQL query for `head` and `iloc` when the row key is a slice.
- Improved error message when passing an unknown timezone to `tz_convert` and `tz_localize` in `Series`, `DataFrame`, `Series.dt`, and `DatetimeIndex`.
- Improved documentation for `tz_convert` and `tz_localize` in `Series`, `DataFrame`, `Series.dt`, and `DatetimeIndex` to specify the supported timezone formats.
- Added additional kwargs support for `df.apply` and `series.apply` ( as well as `map` and `applymap` ) when using snowpark functions. This allows for some position independent compatibility between apply and functions where the first argument is not a pandas object.
- Improved generated SQL query for `iloc` and `iat` when the row key is a scalar.
- Removed all joins in `iterrows`.
- Improved documentation for `Series.map` to reflect the unsupported features.
- Added support for `np.may_share_memory` which is used internally by many scikit-learn functions. This method will always return false when called with a Snowpark pandas object.
#### Bug Fixes
- Fixed a bug where `DataFrame` and `Series` `pct_change()` would raise `TypeError` when input contained timedelta columns.
- Fixed a bug where `replace()` would sometimes propagate `Timedelta` types incorrectly through `replace()`. Instead raise `NotImplementedError` for `replace()` on `Timedelta`.
- Fixed a bug where `DataFrame` and `Series` `round()` would raise `AssertionError` for `Timedelta` columns. Instead raise `NotImplementedError` for `round()` on `Timedelta`.
- Fixed a bug where `reindex` fails when the new index is a Series with non-overlapping types from the original index.
- Fixed a bug where calling `__getitem__` on a DataFrameGroupBy object always returned a DataFrameGroupBy object if `as_index=False`.
- Fixed a bug where inserting timedelta values into an existing column would silently convert the values to integers instead of raising `NotImplementedError`.
- Fixed a bug where `DataFrame.shift()` on axis=0 and axis=1 would fail to propagate timedelta types.
- `DataFrame.abs()`, `DataFrame.__neg__()`, `DataFrame.stack()`, and `DataFrame.unstack()` now raise `NotImplementedError` for timedelta inputs instead of failing to propagate timedelta types.
### Snowpark Local Testing Updates
#### Bug Fixes
- Fixed a bug where `DataFrame.alias` raises `KeyError` for input column name.
- Fixed a bug where `to_csv` on Snowflake stage fails when data contains empty strings.
## 1.23.0 (2024-10-09)
### Snowpark Python API Updates
#### New Features
- Added the following new functions in `snowflake.snowpark.functions`:
- `make_interval`
- Added support for using Snowflake Interval constants with `Window.range_between()` when the order by column is TIMESTAMP or DATE type.
- Added support for file writes. This feature is currently in private preview.
- Added `thread_id` to `QueryRecord` to track the thread id submitting the query history.
- Added support for `Session.stored_procedure_profiler`.
#### Improvements
#### Bug Fixes
- Fixed a bug where registering a stored procedure or UDxF with type hints would give a warning `'NoneType' has no len() when trying to read default values from function`.
### Snowpark pandas API Updates
#### New Features
- Added support for `TimedeltaIndex.mean` method.
- Added support for some cases of aggregating `Timedelta` columns on `axis=0` with `agg` or `aggregate`.
- Added support for `by`, `left_by`, `right_by`, `left_index`, and `right_index` for `pd.merge_asof`.
- Added support for passing parameter `include_describe` to `Session.query_history`.
- Added support for `DatetimeIndex.mean` and `DatetimeIndex.std` methods.
- Added support for `Resampler.asfreq`, `Resampler.indices`, `Resampler.nunique`, and `Resampler.quantile`.
- Added support for `resample` frequency `W`, `ME`, `YE` with `closed = "left"`.
- Added support for `DataFrame.rolling.corr` and `Series.rolling.corr` for `pairwise = False` and int `window`.
- Added support for string time-based `window` and `min_periods = None` for `Rolling`.
- Added support for `DataFrameGroupBy.fillna` and `SeriesGroupBy.fillna`.
- Added support for constructing `Series` and `DataFrame` objects with the lazy `Index` object as `data`, `index`, and `columns` arguments.
- Added support for constructing `Series` and `DataFrame` objects with `index` and `column` values not present in `DataFrame`/`Series` `data`.
- Added support for `pd.read_sas` (Uses native pandas for processing).
- Added support for applying `rolling().count()` and `expanding().count()` to `Timedelta` series and columns.
- Added support for `tz` in both `pd.date_range` and `pd.bdate_range`.
- Added support for `Series.items`.
- Added support for `errors="ignore"` in `pd.to_datetime`.
- Added support for `DataFrame.tz_localize` and `Series.tz_localize`.
- Added support for `DataFrame.tz_convert` and `Series.tz_convert`.
- Added support for applying Snowpark Python functions (e.g., `sin`) in `Series.map`, `Series.apply`, `DataFrame.apply` and `DataFrame.applymap`.
#### Improvements
- Improved `to_pandas` to persist the original timezone offset for TIMESTAMP_TZ type.
- Improved `dtype` results for TIMESTAMP_TZ type to show correct timezone offset.
- Improved `dtype` results for TIMESTAMP_LTZ type to show correct timezone.
- Improved error message when passing non-bool value to `numeric_only` for groupby aggregations.
- Removed unnecessary warning about sort algorithm in `sort_values`.
- Use SCOPED object for internal create temp tables. The SCOPED objects will be stored sproc scoped if created within stored sproc, otherwise will be session scoped, and the object will be automatically cleaned at the end of the scope.
- Improved warning messages for operations that lead to materialization with inadvertent slowness.
- Removed unnecessary warning message about `convert_dtype` in `Series.apply`.
#### Bug Fixes
- Fixed a bug where an `Index` object created from a `Series`/`DataFrame` incorrectly updates the `Series`/`DataFrame`'s index name after an inplace update has been applied to the original `Series`/`DataFrame`.
- Suppressed an unhelpful `SettingWithCopyWarning` that sometimes appeared when printing `Timedelta` columns.
- Fixed `inplace` argument for `Series` objects derived from other `Series` objects.
- Fixed a bug where `Series.sort_values` failed if series name overlapped with index column name.
- Fixed a bug where transposing a dataframe would map `Timedelta` index levels to integer column levels.
- Fixed a bug where `Resampler` methods on timedelta columns would produce integer results.
- Fixed a bug where `pd.to_numeric()` would leave `Timedelta` inputs as `Timedelta` instead of converting them to integers.
- Fixed `loc` set when setting a single row, or multiple rows, of a DataFrame with a Series value.
### Snowpark Local Testing Updates
#### Bug Fixes
- Fixed a bug where nullable columns were annotated wrongly.
- Fixed a bug where the `date_add` and `date_sub` functions failed for `NULL` values.
- Fixed a bug where `equal_null` could fail inside a merge statement.
- Fixed a bug where `row_number` could fail inside a Window function.
- Fixed a bug where updates could fail when the source is the result of a join.
## 1.22.1 (2024-09-11)
This is a re-release of 1.22.0. Please refer to the 1.22.0 release notes for detailed release content.
## 1.22.0 (2024-09-10)
### Snowpark Python API Updates
### New Features
- Added the following new functions in `snowflake.snowpark.functions`:
- `array_remove`
- `ln`
#### Improvements
- Improved documentation for `Session.write_pandas` by making `use_logical_type` option more explicit.
- Added support for specifying the following to `DataFrameWriter.save_as_table`:
- `enable_schema_evolution`
- `data_retention_time`
- `max_data_extension_time`
- `change_tracking`
- `copy_grants`
- `iceberg_config` A dicitionary that can hold the following iceberg configuration options:
- `external_volume`
- `catalog`
- `base_location`
- `catalog_sync`
- `storage_serialization_policy`
- Added support for specifying the following to `DataFrameWriter.copy_into_table`:
- `iceberg_config` A dicitionary that can hold the following iceberg configuration options:
- `external_volume`
- `catalog`
- `base_location`
- `catalog_sync`
- `storage_serialization_policy`
- Added support for specifying the following parameters to `DataFrame.create_or_replace_dynamic_table`:
- `mode`
- `refresh_mode`
- `initialize`
- `clustering_keys`
- `is_transient`
- `data_retention_time`
- `max_data_extension_time`
#### Bug Fixes
- Fixed a bug in `session.read.csv` that caused an error when setting `PARSE_HEADER = True` in an externally defined file format.
- Fixed a bug in query generation from set operations that allowed generation of duplicate queries when children have common subqueries.
- Fixed a bug in `session.get_session_stage` that referenced a non-existing stage after switching database or schema.
- Fixed a bug where calling `DataFrame.to_snowpark_pandas` without explicitly initializing the Snowpark pandas plugin caused an error.
- Fixed a bug where using the `explode` function in dynamic table creation caused a SQL compilation error due to improper boolean type casting on the `outer` parameter.
### Snowpark Local Testing Updates
#### New Features
- Added support for type coercion when passing columns as input to UDF calls.
- Added support for `Index.identical`.
#### Bug Fixes
- Fixed a bug where the truncate mode in `DataFrameWriter.save_as_table` incorrectly handled DataFrames containing only a subset of columns from the existing table.
- Fixed a bug where function `to_timestamp` does not set the default timezone of the column datatype.
### Snowpark pandas API Updates
#### New Features
- Added limited support for the `Timedelta` type, including the following features. Snowpark pandas will raise `NotImplementedError` for unsupported `Timedelta` use cases.
- supporting tracking the Timedelta type through `copy`, `cache_result`, `shift`, `sort_index`, `assign`, `bfill`, `ffill`, `fillna`, `compare`, `diff`, `drop`, `dropna`, `duplicated`, `empty`, `equals`, `insert`, `isin`, `isna`, `items`, `iterrows`, `join`, `len`, `mask`, `melt`, `merge`, `nlargest`, `nsmallest`, `to_pandas`.
- converting non-timedelta to timedelta via `astype`.
- `NotImplementedError` will be raised for the rest of methods that do not support `Timedelta`.
- support for subtracting two timestamps to get a Timedelta.
- support indexing with Timedelta data columns.
- support for adding or subtracting timestamps and `Timedelta`.
- support for binary arithmetic between two `Timedelta` values.
- support for binary arithmetic and comparisons between `Timedelta` values and numeric values.
- support for lazy `TimedeltaIndex`.
- support for `pd.to_timedelta`.
- support for `GroupBy` aggregations `min`, `max`, `mean`, `idxmax`, `idxmin`, `std`, `sum`, `median`, `count`, `any`, `all`, `size`, `nunique`, `head`, `tail`, `aggregate`.
- support for `GroupBy` filtrations `first` and `last`.
- support for `TimedeltaIndex` attributes: `days`, `seconds`, `microseconds` and `nanoseconds`.
- support for `diff` with timestamp columns on `axis=0` and `axis=1`
- support for `TimedeltaIndex` methods: `ceil`, `floor` and `round`.
- support for `TimedeltaIndex.total_seconds` method.
- Added support for index's arithmetic and comparison operators.
- Added support for `Series.dt.round`.
- Added documentation pages for `DatetimeIndex`.
- Added support for `Index.name`, `Index.names`, `Index.rename`, and `Index.set_names`.
- Added support for `Index.__repr__`.
- Added support for `DatetimeIndex.month_name` and `DatetimeIndex.day_name`.
- Added support for `Series.dt.weekday`, `Series.dt.time`, and `DatetimeIndex.time`.
- Added support for `Index.min` and `Index.max`.
- Added support for `pd.merge_asof`.
- Added support for `Series.dt.normalize` and `DatetimeIndex.normalize`.
- Added support for `Index.is_boolean`, `Index.is_integer`, `Index.is_floating`, `Index.is_numeric`, and `Index.is_object`.
- Added support for `DatetimeIndex.round`, `DatetimeIndex.floor` and `DatetimeIndex.ceil`.
- Added support for `Series.dt.days_in_month` and `Series.dt.daysinmonth`.
- Added support for `DataFrameGroupBy.value_counts` and `SeriesGroupBy.value_counts`.
- Added support for `Series.is_monotonic_increasing` and `Series.is_monotonic_decreasing`.
- Added support for `Index.is_monotonic_increasing` and `Index.is_monotonic_decreasing`.
- Added support for `pd.crosstab`.
- Added support for `pd.bdate_range` and included business frequency support (B, BME, BMS, BQE, BQS, BYE, BYS) for both `pd.date_range` and `pd.bdate_range`.
- Added support for lazy `Index` objects as `labels` in `DataFrame.reindex` and `Series.reindex`.
- Added support for `Series.dt.days`, `Series.dt.seconds`, `Series.dt.microseconds`, and `Series.dt.nanoseconds`.
- Added support for creating a `DatetimeIndex` from an `Index` of numeric or string type.
- Added support for string indexing with `Timedelta` objects.
- Added support for `Series.dt.total_seconds` method.
- Added support for `DataFrame.apply(axis=0)`.
- Added support for `Series.dt.tz_convert` and `Series.dt.tz_localize`.
- Added support for `DatetimeIndex.tz_convert` and `DatetimeIndex.tz_localize`.
#### Improvements
- Improve concat, join performance when operations are performed on series coming from the same dataframe by avoiding unnecessary joins.
- Refactored `quoted_identifier_to_snowflake_type` to avoid making metadata queries if the types have been cached locally.
- Improved `pd.to_datetime` to handle all local input cases.
- Create a lazy index from another lazy index without pulling data to client.
- Raised `NotImplementedError` for Index bitwise operators.
- Display a more clear error message when `Index.names` is set to a non-like-like object.
- Raise a warning whenever MultiIndex values are pulled in locally.
- Improve warning message for `pd.read_snowflake` include the creation reason when temp table creation is triggered.
- Improve performance for `DataFrame.set_index`, or setting `DataFrame.index` or `Series.index` by avoiding checks require eager evaluation. As a consequence, when the new index that does not match the current `Series`/`DataFrame` object length, a `ValueError` is no longer raised. Instead, when the `Series`/`DataFrame` object is longer than the provided index, the `Series`/`DataFrame`'s new index is filled with `NaN` values for the "extra" elements. Otherwise, the extra values in the provided index are ignored.
- Properly raise `NotImplementedError` when ambiguous/nonexistent are non-string in `ceil`/`floor`/`round`.
#### Bug Fixes
- Stopped ignoring nanoseconds in `pd.Timedelta` scalars.
- Fixed AssertionError in tree of binary operations.
- Fixed bug in `Series.dt.isocalendar` using a named Series
- Fixed `inplace` argument for Series objects derived from DataFrame columns.
- Fixed a bug where `Series.reindex` and `DataFrame.reindex` did not update the result index's name correctly.
- Fixed a bug where `Series.take` did not error when `axis=1` was specified.
## 1.21.1 (2024-09-05)
### Snowpark Python API Updates
#### Bug Fixes
- Fixed a bug where using `to_pandas_batches` with async jobs caused an error due to improper handling of waiting for asynchronous query completion.
## 1.21.0 (2024-08-19)
### Snowpark Python API Updates
#### New Features
- Added support for `snowflake.snowpark.testing.assert_dataframe_equal` that is a utility function to check the equality of two Snowpark DataFrames.
#### Improvements
- Added support server side string size limitations.
- Added support to create and invoke stored procedures, UDFs and UDTFs with optional arguments.
- Added support for column lineage in the DataFrame.lineage.trace API.
- Added support for passing `INFER_SCHEMA` options to `DataFrameReader` via `INFER_SCHEMA_OPTIONS`.
- Added support for passing `parameters` parameter to `Column.rlike` and `Column.regexp`.
- Added support for automatically cleaning up temporary tables created by `df.cache_result()` in the current session, when the DataFrame is no longer referenced (i.e., gets garbage collected). It is still an experimental feature not enabled by default, and can be enabled by setting `session.auto_clean_up_temp_table_enabled` to `True`.
- Added support for string literals to the `fmt` parameter of `snowflake.snowpark.functions.to_date`.
- Added support for system$reference function.
#### Bug Fixes
- Fixed a bug where SQL generated for selecting `*` column has an incorrect subquery.
- Fixed a bug in `DataFrame.to_pandas_batches` where the iterator could throw an error if certain transformation is made to the pandas dataframe due to wrong isolation level.
- Fixed a bug in `DataFrame.lineage.trace` to split the quoted feature view's name and version correctly.
- Fixed a bug in `Column.isin` that caused invalid sql generation when passed an empty list.
- Fixed a bug that fails to raise NotImplementedError while setting cell with list like item.
### Snowpark Local Testing Updates
#### New Features
- Added support for the following APIs:
- snowflake.snowpark.functions
- `rank`
- `dense_rank`
- `percent_rank`
- `cume_dist`
- `ntile`
- `datediff`
- `array_agg`
- snowflake.snowpark.column.Column.within_group
- Added support for parsing flags in regex statements for mocked plans. This maintains parity with the `rlike` and `regexp` changes above.
#### Bug Fixes
- Fixed a bug where Window Functions LEAD and LAG do not handle option `ignore_nulls` properly.
- Fixed a bug where values were not populated into the result DataFrame during the insertion of table merge operation.
#### Improvements
- Fix pandas FutureWarning about integer indexing.
### Snowpark pandas API Updates
#### New Features
- Added support for `DataFrame.backfill`, `DataFrame.bfill`, `Series.backfill`, and `Series.bfill`.
- Added support for `DataFrame.compare` and `Series.compare` with default parameters.
- Added support for `Series.dt.microsecond` and `Series.dt.nanosecond`.
- Added support for `Index.is_unique` and `Index.has_duplicates`.
- Added support for `Index.equals`.
- Added support for `Index.value_counts`.
- Added support for `Series.dt.day_name` and `Series.dt.month_name`.
- Added support for indexing on Index, e.g., `df.index[:10]`.
- Added support for `DataFrame.unstack` and `Series.unstack`.
- Added support for `DataFrame.asfreq` and `Series.asfreq`.
- Added support for `Series.dt.is_month_start` and `Series.dt.is_month_end`.
- Added support for `Index.all` and `Index.any`.
- Added support for `Series.dt.is_year_start` and `Series.dt.is_year_end`.
- Added support for `Series.dt.is_quarter_start` and `Series.dt.is_quarter_end`.
- Added support for lazy `DatetimeIndex`.
- Added support for `Series.argmax` and `Series.argmin`.
- Added support for `Series.dt.is_leap_year`.
- Added support for `DataFrame.items`.
- Added support for `Series.dt.floor` and `Series.dt.ceil`.
- Added support for `Index.reindex`.
- Added support for `DatetimeIndex` properties: `year`, `month`, `day`, `hour`, `minute`, `second`, `microsecond`,
`nanosecond`, `date`, `dayofyear`, `day_of_year`, `dayofweek`, `day_of_week`, `weekday`, `quarter`,
`is_month_start`, `is_month_end`, `is_quarter_start`, `is_quarter_end`, `is_year_start`, `is_year_end`
and `is_leap_year`.
- Added support for `Resampler.fillna` and `Resampler.bfill`.
- Added limited support for the `Timedelta` type, including creating `Timedelta` columns and `to_pandas`.
- Added support for `Index.argmax` and `Index.argmin`.
#### Improvements
- Removed the public preview warning message when importing Snowpark pandas.
- Removed unnecessary count query from `SnowflakeQueryCompiler.is_series_like` method.
- `Dataframe.columns` now returns native pandas Index object instead of Snowpark Index object.
- Refactor and introduce `query_compiler` argument in `Index` constructor to create `Index` from query compiler.
- `pd.to_datetime` now returns a DatetimeIndex object instead of a Series object.
- `pd.date_range` now returns a DatetimeIndex object instead of a Series object.
#### Bug Fixes
- Made passing an unsupported aggregation function to `pivot_table` raise `NotImplementedError` instead of `KeyError`.
- Removed axis labels and callable names from error messages and telemetry about unsupported aggregations.
- Fixed AssertionError in `Series.drop_duplicates` and `DataFrame.drop_duplicates` when called after `sort_values`.
- Fixed a bug in `Index.to_frame` where the result frame's column name may be wrong where name is unspecified.
- Fixed a bug where some Index docstrings are ignored.
- Fixed a bug in `Series.reset_index(drop=True)` where the result name may be wrong.
- Fixed a bug in `Groupby.first/last` ordering by the correct columns in the underlying window expression.
## 1.20.0 (2024-07-17)
### Snowpark Python API Updates
#### Improvements
- Added distributed tracing using open telemetry APIs for table stored procedure function in `DataFrame`:
- `_execute_and_get_query_id`
- Added support for the `arrays_zip` function.
- Improves performance for binary column expression and `df._in` by avoiding unnecessary cast for numeric values. You can enable this optimization by setting `session.eliminate_numeric_sql_value_cast_enabled = True`.
- Improved error message for `write_pandas` when the target table does not exist and `auto_create_table=False`.
- Added open telemetry tracing on UDxF functions in Snowpark.
- Added open telemetry tracing on stored procedure registration in Snowpark.
- Added a new optional parameter called `format_json` to the `Session.SessionBuilder.app_name` function that sets the app name in the `Session.query_tag` in JSON format. By default, this parameter is set to `False`.
#### Bug Fixes
- Fixed a bug where SQL generated for `lag(x, 0)` was incorrect and failed with error message `argument 1 to function LAG needs to be constant, found 'SYSTEM$NULL_TO_FIXED(null)'`.
### Snowpark Local Testing Updates
#### New Features
- Added support for the following APIs:
- snowflake.snowpark.functions
- random
- Added new parameters to `patch` function when registering a mocked function:
- `distinct` allows an alternate function to be specified for when a sql function should be distinct.
- `pass_column_index` passes a named parameter `column_index` to the mocked function that contains the pandas.Index for the input data.
- `pass_row_index` passes a named parameter `row_index` to the mocked function that is the 0 indexed row number the function is currently operating on.
- `pass_input_data` passes a named parameter `input_data` to the mocked function that contains the entire input dataframe for the current expression.
- Added support for the `column_order` parameter to method `DataFrameWriter.save_as_table`.
#### Bug Fixes
- Fixed a bug that caused DecimalType columns to be incorrectly truncated to integer precision when used in BinaryExpressions.
### Snowpark pandas API Updates
#### New Features
- Added support for `DataFrameGroupBy.all`, `SeriesGroupBy.all`, `DataFrameGroupBy.any`, and `SeriesGroupBy.any`.
- Added support for `DataFrame.nlargest`, `DataFrame.nsmallest`, `Series.nlargest` and `Series.nsmallest`.
- Added support for `replace` and `frac > 1` in `DataFrame.sample` and `Series.sample`.
- Added support for `read_excel` (Uses local pandas for processing)
- Added support for `Series.at`, `Series.iat`, `DataFrame.at`, and `DataFrame.iat`.
- Added support for `Series.dt.isocalendar`.
- Added support for `Series.case_when` except when condition or replacement is callable.
- Added documentation pages for `Index` and its APIs.
- Added support for `DataFrame.assign`.
- Added support for `DataFrame.stack`.
- Added support for `DataFrame.pivot` and `pd.pivot`.
- Added support for `DataFrame.to_csv` and `Series.to_csv`.
- Added partial support for `Series.str.translate` where the values in the `table` are single-codepoint strings.
- Added support for `DataFrame.corr`.
- Allow `df.plot()` and `series.plot()` to be called, materializing the data into the local client
- Added support for `DataFrameGroupBy` and `SeriesGroupBy` aggregations `first` and `last`
- Added support for `DataFrameGroupBy.get_group`.
- Added support for `limit` parameter when `method` parameter is used in `fillna`.
- Added partial support for `Series.str.translate` where the values in the `table` are single-codepoint strings.
- Added support for `DataFrame.corr`.
- Added support for `DataFrame.equals` and `Series.equals`.
- Added support for `DataFrame.reindex` and `Series.reindex`.
- Added support for `Index.astype`.
- Added support for `Index.unique` and `Index.nunique`.
- Added support for `Index.sort_values`.
#### Bug Fixes
- Fixed an issue when using np.where and df.where when the scalar 'other' is the literal 0.
- Fixed a bug regarding precision loss when converting to Snowpark pandas `DataFrame` or `Series` with `dtype=np.uint64`.
- Fixed bug where `values` is set to `index` when `index` and `columns` contain all columns in DataFrame during `pivot_table`.
#### Improvements
- Added support for `Index.copy()`
- Added support for Index APIs: `dtype`, `values`, `item()`, `tolist()`, `to_series()` and `to_frame()`
- Expand support for DataFrames with no rows in `pd.pivot_table` and `DataFrame.pivot_table`.
- Added support for `inplace` parameter in `DataFrame.sort_index` and `Series.sort_index`.
## 1.19.0 (2024-06-25)
### Snowpark Python API Updates
#### New Features
- Added support for `to_boolean` function.
- Added documentation pages for Index and its APIs.
#### Bug Fixes
- Fixed a bug where python stored procedure with table return type fails when run in a task.
- Fixed a bug where df.dropna fails due to `RecursionError: maximum recursion depth exceeded` when the DataFrame has more than 500 columns.
- Fixed a bug where `AsyncJob.result("no_result")` doesn't wait for the query to finish execution.
### Snowpark Local Testing Updates
#### New Features
- Added support for the `strict` parameter when registering UDFs and Stored Procedures.
#### Bug Fixes
- Fixed a bug in convert_timezone that made the setting the source_timezone parameter return an error.
- Fixed a bug where creating DataFrame with empty data of type `DateType` raises `AttributeError`.
- Fixed a bug that table merge fails when update clause exists but no update takes place.
- Fixed a bug in mock implementation of `to_char` that raises `IndexError` when incoming column has nonconsecutive row index.
- Fixed a bug in handling of `CaseExpr` expressions that raises `IndexError` when incoming column has nonconsecutive row index.
- Fixed a bug in implementation of `Column.like` that raises `IndexError` when incoming column has nonconsecutive row index.
#### Improvements
- Added support for type coercion in the implementation of DataFrame.replace, DataFrame.dropna and the mock function `iff`.
### Snowpark pandas API Updates
#### New Features
- Added partial support for `DataFrame.pct_change` and `Series.pct_change` without the `freq` and `limit` parameters.
- Added support for `Series.str.get`.
- Added support for `Series.dt.dayofweek`, `Series.dt.day_of_week`, `Series.dt.dayofyear`, and `Series.dt.day_of_year`.
- Added support for `Series.str.__getitem__` (`Series.str[...]`).
- Added support for `Series.str.lstrip` and `Series.str.rstrip`.
- Added support for `DataFrameGroupBy.size` and `SeriesGroupBy.size`.
- Added support for `DataFrame.expanding` and `Series.expanding` for aggregations `count`, `sum`, `min`, `max`, `mean`, `std`, `var`, and `sem` with `axis=0`.
- Added support for `DataFrame.rolling` and `Series.rolling` for aggregation `count` with `axis=0`.
- Added support for `Series.str.match`.
- Added support for `DataFrame.resample` and `Series.resample` for aggregations `size`, `first`, and `last`.
- Added support for `DataFrameGroupBy.all`, `SeriesGroupBy.all`, `DataFrameGroupBy.any`, and `SeriesGroupBy.any`.
- Added support for `DataFrame.nlargest`, `DataFrame.nsmallest`, `Series.nlargest` and `Series.nsmallest`.
- Added support for `replace` and `frac > 1` in `DataFrame.sample` and `Series.sample`.
- Added support for `read_excel` (Uses local pandas for processing)
- Added support for `Series.at`, `Series.iat`, `DataFrame.at`, and `DataFrame.iat`.
- Added support for `Series.dt.isocalendar`.
- Added support for `Series.case_when` except when condition or replacement is callable.
- Added documentation pages for `Index` and its APIs.
- Added support for `DataFrame.assign`.
- Added support for `DataFrame.stack`.
- Added support for `DataFrame.pivot` and `pd.pivot`.
- Added support for `DataFrame.to_csv` and `Series.to_csv`.
- Added support for `Index.T`.
#### Bug Fixes
- Fixed a bug that causes output of GroupBy.aggregate's columns to be ordered incorrectly.
- Fixed a bug where `DataFrame.describe` on a frame with duplicate columns of differing dtypes could cause an error or incorrect results.
- Fixed a bug in `DataFrame.rolling` and `Series.rolling` so `window=0` now throws `NotImplementedError` instead of `ValueError`
#### Improvements
- Added support for named aggregations in `DataFrame.aggregate` and `Series.aggregate` with `axis=0`.
- `pd.read_csv` reads using the native pandas CSV parser, then uploads data to snowflake using parquet. This enables most of the parameters supported by `read_csv` including date parsing and numeric conversions. Uploading via parquet is roughly twice as fast as uploading via CSV.
- Initial work to support an `pd.Index` directly in Snowpark pandas. Support for `pd.Index` as a first-class component of Snowpark pandas is coming soon.
- Added a lazy index constructor and support for `len`, `shape`, `size`, `empty`, `to_pandas()` and `names`. For `df.index`, Snowpark pandas creates a lazy index object.
- For `df.columns`, Snowpark pandas supports a non-lazy version of an `Index` since the data is already stored locally.
## 1.18.0 (2024-05-28)
### Snowpark Python API Updates
#### Improvements
- Improved error message to remind users set `{"infer_schema": True}` when reading csv file without specifying its schema.
- Improved error handling for `Session.create_dataframe` when called with more than 512 rows and using `format` or `pyformat` `paramstyle`.
### Snowpark pandas API Updates
#### New Features
- Added `DataFrame.cache_result` and `Series.cache_result` methods for users to persist DataFrames and Series to a temporary table lasting the duration of the session to improve latency of subsequent operations.
#### Bug Fixes
#### Improvements
- Added partial support for `DataFrame.pivot_table` with no `index` parameter, as well as for `margins` parameter.
- Updated the signature of `DataFrame.shift`/`Series.shift`/`DataFrameGroupBy.shift`/`SeriesGroupBy.shift` to match pandas 2.2.1. Snowpark pandas does not yet support the newly-added `suffix` argument, or sequence values of `periods`.
- Re-added support for `Series.str.split`.
#### Bug Fixes
- Fixed how we support mixed columns for string methods (`Series.str.*`).
### Snowpark Local Testing Updates
#### New Features
- Added support for the following DataFrameReader read options to file formats `csv` and `json`:
- PURGE
- PATTERN
- INFER_SCHEMA with value being `False`
- ENCODING with value being `UTF8`
- Added support for `DataFrame.analytics.moving_agg` and `DataFrame.analytics.cumulative_agg_agg`.
- Added support for `if_not_exists` parameter during UDF and stored procedure registration.
#### Bug Fixes
- Fixed a bug that when processing time format, fractional second part is not handled properly.
- Fixed a bug that caused function calls on `*` to fail.
- Fixed a bug that prevented creation of map and struct type objects.
- Fixed a bug that function `date_add` was unable to handle some numeric types.
- Fixed a bug that `TimestampType` casting resulted in incorrect data.
- Fixed a bug that caused `DecimalType` data to have incorrect precision in some cases.
- Fixed a bug where referencing missing table or view raises confusing `IndexError`.
- Fixed a bug that mocked function `to_timestamp_ntz` can not handle None data.
- Fixed a bug that mocked UDFs handles output data of None improperly.
- Fixed a bug where `DataFrame.with_column_renamed` ignores attributes from parent DataFrames after join operations.
- Fixed a bug that integer precision of large value gets lost when converted to pandas DataFrame.
- Fixed a bug that the schema of datetime object is wrong when create DataFrame from a pandas DataFrame.
- Fixed a bug in the implementation of `Column.equal_nan` where null data is handled incorrectly.
- Fixed a bug where `DataFrame.drop` ignore attributes from parent DataFrames after join operations.
- Fixed a bug in mocked function `date_part` where Column type is set wrong.
- Fixed a bug where `DataFrameWriter.save_as_table` does not raise exceptions when inserting null data into non-nullable columns.
- Fixed a bug in the implementation of `DataFrameWriter.save_as_table` where
- Append or Truncate fails when incoming data has different schema than existing table.
- Truncate fails when incoming data does not specify columns that are nullable.
#### Improvements
- Removed dependency check for `pyarrow` as it is not used.
- Improved target type coverage of `Column.cast`, adding support for casting to boolean and all integral types.
- Aligned error experience when calling UDFs and stored procedures.
- Added appropriate error messages for `is_permanent` and `anonymous` options in UDFs and stored procedures registration to make it more clear that those features are not yet supported.
- File read operation with unsupported options and values now raises `NotImplementedError` instead of warnings and unclear error information.
## 1.17.0 (2024-05-21)
### Snowpark Python API Updates
#### New Features
- Added support to add a comment on tables and views using the functions listed below:
- `DataFrameWriter.save_as_table`
- `DataFrame.create_or_replace_view`
- `DataFrame.create_or_replace_temp_view`
- `DataFrame.create_or_replace_dynamic_table`
#### Improvements
- Improved error message to remind users to set `{"infer_schema": True}` when reading CSV file without specifying its schema.
### Snowpark pandas API Updates
#### New Features
- Start of Public Preview of Snowpark pandas API. Refer to the [Snowpark pandas API Docs](https://docs.snowflake.com/developer-guide/snowpark/python/snowpark-pandas) for more details.
### Snowpark Local Testing Updates
#### New Features
- Added support for NumericType and VariantType data conversion in the mocked function `to_timestamp_ltz`, `to_timestamp_ntz`, `to_timestamp_tz` and `to_timestamp`.
- Added support for DecimalType, BinaryType, ArrayType, MapType, TimestampType, DateType and TimeType data conversion in the mocked function `to_char`.
- Added support for the following APIs:
- snowflake.snowpark.functions:
- to_varchar
- snowflake.snowpark.DataFrame:
- pivot
- snowflake.snowpark.Session:
- cancel_all
- Introduced a new exception class `snowflake.snowpark.mock.exceptions.SnowparkLocalTestingException`.
- Added support for casting to FloatType
#### Bug Fixes
- Fixed a bug that stored procedure and UDF should not remove imports already in the `sys.path` during the clean-up step.
- Fixed a bug that when processing datetime format, the fractional second part is not handled properly.
- Fixed a bug that on Windows platform that file operations was unable to properly handle file separator in directory name.
- Fixed a bug that on Windows platform that when reading a pandas dataframe, IntervalType column with integer data can not be processed.
- Fixed a bug that prevented users from being able to select multiple columns with the same alias.
- Fixed a bug that `Session.get_current_[schema|database|role|user|account|warehouse]` returns upper-cased identifiers when identifiers are quoted.
- Fixed a bug that function `substr` and `substring` can not handle 0-based `start_expr`.
#### Improvements
- Standardized the error experience by raising `SnowparkLocalTestingException` in error cases which is on par with `SnowparkSQLException` raised in non-local execution.
- Improved error experience of `Session.write_pandas` method that `NotImplementError` will be raised when called.
- Aligned error experience with reusing a closed session in non-local execution.
## 1.16.0 (2024-05-07)
### New Features
- Support stored procedure register with packages given as Python modules.
- Added snowflake.snowpark.Session.lineage.trace to explore data lineage of snowfake objects.
- Added support for structured type schema parsing.
### Bug Fixes
- Fixed a bug when inferring schema, single quotes are added to stage files already have single quotes.
### Local Testing Updates
#### New Features
- Added support for StringType, TimestampType and VariantType data conversion in the mocked function `to_date`.
- Added support for the following APIs:
- snowflake.snowpark.functions
- get
- concat
- concat_ws
#### Bug Fixes
- Fixed a bug that caused `NaT` and `NaN` values to not be recognized.
- Fixed a bug where, when inferring a schema, single quotes were added to stage files that already had single quotes.
- Fixed a bug where `DataFrameReader.csv` was unable to handle quoted values containing a delimiter.
- Fixed a bug that when there is `None` value in an arithmetic calculation, the output should remain `None` instead of `math.nan`.
- Fixed a bug in function `sum` and `covar_pop` that when there is `math.nan` in the data, the output should also be `math.nan`.
- Fixed a bug that stage operation can not handle directories.
- Fixed a bug that `DataFrame.to_pandas` should take Snowflake numeric types with precision 38 as `int64`.
## 1.15.0 (2024-04-24)
### New Features
- Added `truncate` save mode in `DataFrameWrite` to overwrite existing tables by truncating the underlying table instead of dropping it.
- Added telemetry to calculate query plan height and number of duplicate nodes during collect operations.
- Added the functions below to unload data from a `DataFrame` into one or more files in a stage:
- `DataFrame.write.json`
- `DataFrame.write.csv`
- `DataFrame.write.parquet`
- Added distributed tracing using open telemetry APIs for action functions in `DataFrame` and `DataFrameWriter`:
- snowflake.snowpark.DataFrame:
- collect
- collect_nowait
- to_pandas
- count
- show
- snowflake.snowpark.DataFrameWriter:
- save_as_table
- Added support for snow:// URLs to `snowflake.snowpark.Session.file.get` and `snowflake.snowpark.Session.file.get_stream`
- Added support to register stored procedures and UDxFs with a `comment`.
- UDAF client support is ready for public preview. Please stay tuned for the Snowflake announcement of UDAF public preview.
- Added support for dynamic pivot. This feature is currently in private preview.
### Improvements
- Improved the generated query performance for both compilation and execution by converting duplicate subqueries to Common Table Expressions (CTEs). It is still an experimental feature not enabled by default, and can be enabled by setting `session.cte_optimization_enabled` to `True`.
### Bug Fixes
- Fixed a bug where `statement_params` was not passed to query executions that register stored procedures and user defined functions.
- Fixed a bug causing `snowflake.snowpark.Session.file.get_stream` to fail for quoted stage locations.
- Fixed a bug that an internal type hint in `utils.py` might raise AttributeError in case the underlying module can not be found.
### Local Testing Updates
#### New Features
- Added support for registering UDFs and stored procedures.
- Added support for the following APIs:
- snowflake.snowpark.Session:
- file.put
- file.put_stream
- file.get
- file.get_stream
- read.json
- add_import
- remove_import
- get_imports
- clear_imports
- add_packages
- add_requirements
- clear_packages
- remove_package
- udf.register
- udf.register_from_file
- sproc.register
- sproc.register_from_file
- snowflake.snowpark.functions
- current_database
- current_session
- date_trunc
- object_construct
- object_construct_keep_null
- pow
- sqrt
- udf
- sproc
- Added support for StringType, TimestampType and VariantType data conversion in the mocked function `to_time`.
#### Bug Fixes
- Fixed a bug that null filled columns for constant functions.
- Fixed a bug that implementation of to_object, to_array and to_binary to better handle null inputs.
- Fixed a bug that timestamp data comparison can not handle year beyond 2262.
- Fixed a bug that `Session.builder.getOrCreate` should return the created mock session.
## 1.14.0 (2024-03-20)
### New Features
- Added support for creating vectorized UDTFs with `process` method.
- Added support for dataframe functions:
- to_timestamp_ltz
- to_timestamp_ntz
- to_timestamp_tz
- locate
- Added support for ASOF JOIN type.
- Added support for the following local testing APIs:
- snowflake.snowpark.functions:
- to_double
- to_timestamp
- to_timestamp_ltz
- to_timestamp_ntz
- to_timestamp_tz
- greatest
- least
- convert_timezone
- dateadd
- date_part
- snowflake.snowpark.Session:
- get_current_account
- get_current_warehouse
- get_current_role
- use_schema
- use_warehouse
- use_database
- use_role
### Bug Fixes
- Fixed a bug in `SnowflakePlanBuilder` that `save_as_table` does not filter column that name start with '$' and follow by number correctly.
- Fixed a bug that statement parameters may have no effect when resolving imports and packages.
- Fixed bugs in local testing:
- LEFT ANTI and LEFT SEMI joins drop rows with null values.
- DataFrameReader.csv incorrectly parses data when the optional parameter `field_optionally_enclosed_by` is specified.
- Column.regexp only considers the first entry when `pattern` is a `Column`.
- Table.update raises `KeyError` when updating null values in the rows.
- VARIANT columns raise errors at `DataFrame.collect`.
- `count_distinct` does not work correctly when counting.
- Null values in integer columns raise `TypeError`.
### Improvements
- Added telemetry to local testing.
- Improved the error message of `DataFrameReader` to raise `FileNotFound` error when reading a path that does not exist or when there are no files under the path.
## 1.13.0 (2024-02-26)
### New Features
- Added support for an optional `date_part` argument in function `last_day`.
- `SessionBuilder.app_name` will set the query_tag after the session is created.
- Added support for the following local testing functions:
- current_timestamp
- current_date
- current_time
- strip_null_value
- upper
- lower
- length
- initcap
### Improvements
- Added cleanup logic at interpreter shutdown to close all active sessions.
- Closing sessions within stored procedures now is a no-op logging a warning instead of raising an error.
### Bug Fixes
- Fixed a bug in `DataFrame.to_local_iterator` where the iterator could yield wrong results if another query is executed before the iterator finishes due to wrong isolation level. For details, please see #945.
- Fixed a bug that truncated table names in error messages while running a plan with local testing enabled.
- Fixed a bug that `Session.range` returns empty result when the range is large.
## 1.12.1 (2024-02-08)
### Improvements
- Use `split_blocks=True` by default during `to_pandas` conversion, for optimal memory allocation. This parameter is passed to `pyarrow.Table.to_pandas`, which enables `PyArrow` to split the memory allocation into smaller, more manageable blocks instead of allocating a single contiguous block. This results in better memory management when dealing with larger datasets.
### Bug Fixes
- Fixed a bug in `DataFrame.to_pandas` that caused an error when evaluating on a Dataframe with an `IntergerType` column with null values.
## 1.12.0 (2024-01-30)
### New Features
- Exposed `statement_params` in `StoredProcedure.__call__`.
- Added two optional arguments to `Session.add_import`.
- `chunk_size`: The number of bytes to hash per chunk of the uploaded files.
- `whole_file_hash`: By default only the first chunk of the uploaded import is hashed to save time. When this is set to True each uploaded file is fully hashed instead.
- Added parameters `external_access_integrations` and `secrets` when creating a UDAF from Snowpark Python to allow integration with external access.
- Added a new method `Session.append_query_tag`. Allows an additional tag to be added to the current query tag by appending it as a comma separated value.
- Added a new method `Session.update_query_tag`. Allows updates to a JSON encoded dictionary query tag.
- `SessionBuilder.getOrCreate` will now attempt to replace the singleton it returns when token expiration has been detected.
- Added support for new functions in `snowflake.snowpark.functions`:
- `array_except`
- `create_map`
- `sign`/`signum`
- Added the following functions to `DataFrame.analytics`:
- Added the `moving_agg` function in `DataFrame.analytics` to enable moving aggregations like sums and averages with multiple window sizes.
- Added the `cummulative_agg` function in `DataFrame.analytics` to enable commulative aggregations like sums and averages on multiple columns.
- Added the `compute_lag` and `compute_lead` functions in `DataFrame.analytics` for enabling lead and lag calculations on multiple columns.
- Added the `time_series_agg` function in `DataFrame.analytics` to enable time series aggregations like sums and averages with multiple time windows.
### Bug Fixes
- Fixed a bug in `DataFrame.na.fill` that caused Boolean values to erroneously override integer values.
- Fixed a bug in `Session.create_dataframe` where the Snowpark DataFrames created using pandas DataFrames were not inferring the type for timestamp columns correctly. The behavior is as follows:
- Earlier timestamp columns without a timezone would be converted to nanosecond epochs and inferred as `LongType()`, but will now be correctly maintained as timestamp values and be inferred as `TimestampType(TimestampTimeZone.NTZ)`.
- Earlier timestamp columns with a timezone would be inferred as `TimestampType(TimestampTimeZone.NTZ)` and loose timezone information but will now be correctly inferred as `TimestampType(TimestampTimeZone.LTZ)` and timezone information is retained correctly.
- Set session parameter `PYTHON_SNOWPARK_USE_LOGICAL_TYPE_FOR_CREATE_DATAFRAME` to revert back to old behavior. It is recommended that you update your code to align with correct behavior because the parameter will be removed in the future.
- Fixed a bug that `DataFrame.to_pandas` gets decimal type when scale is not 0, and creates an object dtype in `pandas`. Instead, we cast the value to a float64 type.
- Fixed bugs that wrongly flattened the generated SQL when one of the following happens:
- `DataFrame.filter()` is called after `DataFrame.sort().limit()`.
- `DataFrame.sort()` or `filter()` is called on a DataFrame that already has a window function or sequence-dependent data generator column.
For instance, `df.select("a", seq1().alias("b")).select("a", "b").sort("a")` won't flatten the sort clause anymore.
- a window or sequence-dependent data generator column is used after `DataFrame.limit()`. For instance, `df.limit(10).select(row_number().over())` won't flatten the limit and select in the generated SQL.
- Fixed a bug where aliasing a DataFrame column raised an error when the DataFame was copied from another DataFrame with an aliased column. For instance,
```python
df = df.select(col("a").alias("b"))
df = copy(df)
df.select(col("b").alias("c")) # threw an error. Now it's fixed.
```
- Fixed a bug in `Session.create_dataframe` that the non-nullable field in a schema is not respected for boolean type. Note that this fix is only effective when the user has the privilege to create a temp table.
- Fixed a bug in SQL simplifier where non-select statements in `session.sql` dropped a SQL query when used with `limit()`.
- Fixed a bug that raised an exception when session parameter `ERROR_ON_NONDETERMINISTIC_UPDATE` is true.
### Behavior Changes (API Compatible)
- When parsing data types during a `to_pandas` operation, we rely on GS precision value to fix precision issues for large integer values. This may affect users where a column that was earlier returned as `int8` gets returned as `int64`. Users can fix this by explicitly specifying precision values for their return column.
- Aligned behavior for `Session.call` in case of table stored procedures where running `Session.call` would not trigger stored procedure unless a `collect()` operation was performed.
- `StoredProcedureRegistration` will now automatically add `snowflake-snowpark-python` as a package dependency. The added dependency will be on the client's local version of the library and an error is thrown if the server cannot support that version.
## 1.11.1 (2023-12-07)
### Bug Fixes
- Fixed a bug that numpy should not be imported at the top level of mock module.
- Added support for these new functions in `snowflake.snowpark.functions`:
- `from_utc_timestamp`
- `to_utc_timestamp`
## 1.11.0 (2023-12-05)
### New Features
- Add the `conn_error` attribute to `SnowflakeSQLException` that stores the whole underlying exception from `snowflake-connector-python`.
- Added support for `RelationalGroupedDataframe.pivot()` to access `pivot` in the following pattern `Dataframe.group_by(...).pivot(...)`.
- Added experimental feature: Local Testing Mode, which allows you to create and operate on Snowpark Python DataFrames locally without connecting to a Snowflake account. You can use the local testing framework to test your DataFrame operations locally, on your development machine or in a CI (continuous integration) pipeline, before deploying code changes to your account.
- Added support for `arrays_to_object` new functions in `snowflake.snowpark.functions`.
- Added support for the vector data type.
### Dependency Updates
- Bumped cloudpickle dependency to work with `cloudpickle==2.2.1`
- Updated ``snowflake-connector-python`` to `3.4.0`.
### Bug Fixes
- DataFrame column names quoting check now supports newline characters.
- Fix a bug where a DataFrame generated by `session.read.with_metadata` creates inconsistent table when doing `df.write.save_as_table`.
## 1.10.0 (2023-11-03)
### New Features
- Added support for managing case sensitivity in `DataFrame.to_local_iterator()`.
- Added support for specifying vectorized UDTF's input column names by using the optional parameter `input_names` in `UDTFRegistration.register/register_file` and `functions.pandas_udtf`. By default, `RelationalGroupedDataFrame.applyInPandas` will infer the column names from current dataframe schema.
- Add `sql_error_code` and `raw_message` attributes to `SnowflakeSQLException` when it is caused by a SQL exception.
### Bug Fixes
- Fixed a bug in `DataFrame.to_pandas()` where converting snowpark dataframes to pandas dataframes was losing precision on integers with more than 19 digits.
- Fixed a bug that `session.add_packages` can not handle requirement specifier that contains project name with underscore and version.
- Fixed a bug in `DataFrame.limit()` when `offset` is used and the parent `DataFrame` uses `limit`. Now the `offset` won't impact the parent DataFrame's `limit`.
- Fixed a bug in `DataFrame.write.save_as_table` where dataframes created from read api could not save data into snowflake because of invalid column name `$1`.
### Behavior change
- Changed the behavior of `date_format`:
- The `format` argument changed from optional to required.
- The returned result changed from a date object to a date-formatted string.
- When a window function, or a sequence-dependent data generator (`normal`, `zipf`, `uniform`, `seq1`, `seq2`, `seq4`, `seq8`) function is used, the sort and filter operation will no longer be flattened when generating the query.
## 1.9.0 (2023-10-13)
### New Features
- Added support for the Python 3.11 runtime environment.
### Dependency updates
- Added back the dependency of `typing-extensions`.
### Bug Fixes
- Fixed a bug where imports from permanent stage locations were ignored for temporary stored procedures, UDTFs, UDFs, and UDAFs.
- Revert back to using CTAS (create table as select) statement for `Dataframe.writer.save_as_table` which does not need insert permission for writing tables.
### New Features
- Support `PythonObjJSONEncoder` json-serializable objects for `ARRAY` and `OBJECT` literals.
## 1.8.0 (2023-09-14)
### New Features
- Added support for VOLATILE/IMMUTABLE keyword when registering UDFs.
- Added support for specifying clustering keys when saving dataframes using `DataFrame.save_as_table`.
- Accept `Iterable` objects input for `schema` when creating dataframes using `Session.create_dataframe`.
- Added the property `DataFrame.session` to return a `Session` object.
- Added the property `Session.session_id` to return an integer that represents session ID.
- Added the property `Session.connection` to return a `SnowflakeConnection` object .
- Added support for creating a Snowpark session from a configuration file or environment variables.
### Dependency updates
- Updated ``snowflake-connector-python`` to 3.2.0.
### Bug Fixes
- Fixed a bug where automatic package upload would raise `ValueError` even when compatible package version were added in `session.add_packages`.
- Fixed a bug where table stored procedures were not registered correctly when using `register_from_file`.
- Fixed a bug where dataframe joins failed with `invalid_identifier` error.
- Fixed a bug where `DataFrame.copy` disables SQL simplfier for the returned copy.
- Fixed a bug where `session.sql().select()` would fail if any parameters are specified to `session.sql()`
## 1.7.0 (2023-08-28)
### New Features
- Added parameters `external_access_integrations` and `secrets` when creating a UDF, UDTF or Stored Procedure from Snowpark Python to allow integration with external access.
- Added support for these new functions in `snowflake.snowpark.functions`:
- `array_flatten`
- `flatten`
- Added support for `apply_in_pandas` in `snowflake.snowpark.relational_grouped_dataframe`.
- Added support for replicating your local Python environment on Snowflake via `Session.replicate_local_environment`.
### Bug Fixes
- Fixed a bug where `session.create_dataframe` fails to properly set nullable columns where nullability was affected by order or data was given.
- Fixed a bug where `DataFrame.select` could not identify and alias columns in presence of table functions when output columns of table function overlapped with columns in dataframe.
### Behavior Changes
- When creating stored procedures, UDFs, UDTFs, UDAFs with parameter `is_permanent=False` will now create temporary objects even when `stage_name` is provided. The default value of `is_permanent` is `False` which is why if this value is not explicitly set to `True` for permanent objects, users will notice a change in behavior.
- `types.StructField` now enquotes column identifier by default.
## 1.6.1 (2023-08-02)
### New Features
- Added support for these new functions in `snowflake.snowpark.functions`:
- `array_sort`
- `sort_array`
- `array_min`
- `array_max`
- `explode_outer`
- Added support for pure Python packages specified via `Session.add_requirements` or `Session.add_packages`. They are now usable in stored procedures and UDFs even if packages are not present on the Snowflake Anaconda channel.
- Added Session parameter `custom_packages_upload_enabled` and `custom_packages_force_upload_enabled` to enable the support for pure Python packages feature mentioned above. Both parameters default to `False`.
- Added support for specifying package requirements by passing a Conda environment yaml file to `Session.add_requirements`.
- Added support for asynchronous execution of multi-query dataframes that contain binding variables.
- Added support for renaming multiple columns in `DataFrame.rename`.
- Added support for Geometry datatypes.
- Added support for `params` in `session.sql()` in stored procedures.
- Added support for user-defined aggregate functions (UDAFs). This feature is currently in private preview.
- Added support for vectorized UDTFs (user-defined table functions). This feature is currently in public preview.
- Added support for Snowflake Timestamp variants (i.e., `TIMESTAMP_NTZ`, `TIMESTAMP_LTZ`, `TIMESTAMP_TZ`)
- Added `TimestampTimezone` as an argument in `TimestampType` constructor.
- Added type hints `NTZ`, `LTZ`, `TZ` and `Timestamp` to annotate functions when registering UDFs.
### Improvements
- Removed redundant dependency `typing-extensions`.
- `DataFrame.cache_result` now creates temp table fully qualified names under current database and current schema.
### Bug Fixes
- Fixed a bug where type check happens on pandas before it is imported.
- Fixed a bug when creating a UDF from `numpy.ufunc`.
- Fixed a bug where `DataFrame.union` was not generating the correct `Selectable.schema_query` when SQL simplifier is enabled.
### Behavior Changes
- `DataFrameWriter.save_as_table` now respects the `nullable` field of the schema provided by the user or the inferred schema based on data from user input.
### Dependency updates
- Updated ``snowflake-connector-python`` to 3.0.4.
## 1.5.1 (2023-06-20)
### New Features
- Added support for the Python 3.10 runtime environment.
## 1.5.0 (2023-06-09)
### Behavior Changes
- Aggregation results, from functions such as `DataFrame.agg` and `DataFrame.describe`, no longer strip away non-printing characters from column names.
### New Features
- Added support for the Python 3.9 runtime environment.
- Added support for new functions in `snowflake.snowpark.functions`:
- `array_generate_range`
- `array_unique_agg`
- `collect_set`
- `sequence`
- Added support for registering and calling stored procedures with `TABLE` return type.
- Added support for parameter `length` in `StringType()` to specify the maximum number of characters that can be stored by the column.
- Added the alias `functions.element_at()` for `functions.get()`.
- Added the alias `Column.contains` for `functions.contains`.
- Added experimental feature `DataFrame.alias`.
- Added support for querying metadata columns from stage when creating `DataFrame` using `DataFrameReader`.
- Added support for `StructType.add` to append more fields to existing `StructType` objects.
- Added support for parameter `execute_as` in `StoredProcedureRegistration.register_from_file()` to specify stored procedure caller rights.
### Bug Fixes
- Fixed a bug where the `Dataframe.join_table_function` did not run all of the necessary queries to set up the join table function when SQL simplifier was enabled.
- Fixed type hint declaration for custom types - `ColumnOrName`, `ColumnOrLiteralStr`, `ColumnOrSqlExpr`, `LiteralType` and `ColumnOrLiteral` that were breaking `mypy` checks.
- Fixed a bug where `DataFrameWriter.save_as_table` and `DataFrame.copy_into_table` failed to parse fully qualified table names.
## 1.4.0 (2023-04-24)
### New Features
- Added support for `session.getOrCreate`.
- Added support for alias `Column.getField`.
- Added support for new functions in `snowflake.snowpark.functions`:
- `date_add` and `date_sub` to make add and subtract operations easier.
- `daydiff`
- `explode`
- `array_distinct`.
- `regexp_extract`.
- `struct`.
- `format_number`.
- `bround`.
- `substring_index`
- Added parameter `skip_upload_on_content_match` when creating UDFs, UDTFs and stored procedures using `register_from_file` to skip uploading files to a stage if the same version of the files are already on the stage.
- Added support for `DataFrameWriter.save_as_table` method to take table names that contain dots.
- Flattened generated SQL when `DataFrame.filter()` or `DataFrame.order_by()` is followed by a projection statement (e.g. `DataFrame.select()`, `DataFrame.with_column()`).
- Added support for creating dynamic tables _(in private preview)_ using `Dataframe.create_or_replace_dynamic_table`.
- Added an optional argument `params` in `session.sql()` to support binding variables. Note that this is not supported in stored procedures yet.
### Bug Fixes
- Fixed a bug in `strtok_to_array` where an exception was thrown when a delimiter was passed in.
- Fixed a bug in `session.add_import` where the module had the same namespace as other dependencies.
## 1.3.0 (2023-03-28)
### New Features
- Added support for `delimiters` parameter in `functions.initcap()`.
- Added support for `functions.hash()` to accept a variable number of input expressions.
- Added API `Session.RuntimeConfig` for getting/setting/checking the mutability of any runtime configuration.
- Added support managing case sensitivity in `Row` results from `DataFrame.collect` using `case_sensitive` parameter.
- Added API `Session.conf` for getting, setting or checking the mutability of any runtime configuration.
- Added support for managing case sensitivity in `Row` results from `DataFrame.collect` using `case_sensitive` parameter.
- Added indexer support for `snowflake.snowpark.types.StructType`.
- Added a keyword argument `log_on_exception` to `Dataframe.collect` and `Dataframe.collect_no_wait` to optionally disable error logging for SQL exceptions.
### Bug Fixes
- Fixed a bug where a DataFrame set operation(`DataFrame.substract`, `DataFrame.union`, etc.) being called after another DataFrame set operation and `DataFrame.select` or `DataFrame.with_column` throws an exception.
- Fixed a bug where chained sort statements are overwritten by the SQL simplifier.
### Improvements
- Simplified JOIN queries to use constant subquery aliases (`SNOWPARK_LEFT`, `SNOWPARK_RIGHT`) by default. Users can disable this at runtime with `session.conf.set('use_constant_subquery_alias', False)` to use randomly generated alias names instead.
- Allowed specifying statement parameters in `session.call()`.
- Enabled the uploading of large pandas DataFrames in stored procedures by defaulting to a chunk size of 100,000 rows.
## 1.2.0 (2023-03-02)
### New Features
- Added support for displaying source code as comments in the generated scripts when registering stored procedures. This
is enabled by default, turn off by specifying `source_code_display=False` at registration.
- Added a parameter `if_not_exists` when creating a UDF, UDTF or Stored Procedure from Snowpark Python to ignore creating the specified function or procedure if it already exists.
- Accept integers when calling `snowflake.snowpark.functions.get` to extract value from array.
- Added `functions.reverse` in functions to open access to Snowflake built-in function
[reverse](https://docs.snowflake.com/en/sql-reference/functions/reverse).
- Added parameter `require_scoped_url` in snowflake.snowflake.files.SnowflakeFile.open() `(in Private Preview)` to replace `is_owner_file` is marked for deprecation.
### Bug Fixes
- Fixed a bug that overwrote `paramstyle` to `qmark` when creating a Snowpark session.
- Fixed a bug where `df.join(..., how="cross")` fails with `SnowparkJoinException: (1112): Unsupported using join type 'Cross'`.
- Fixed a bug where querying a `DataFrame` column created from chained function calls used a wrong column name.
## 1.1.0 (2023-01-26)
### New Features:
- Added `asc`, `asc_nulls_first`, `asc_nulls_last`, `desc`, `desc_nulls_first`, `desc_nulls_last`, `date_part` and `unix_timestamp` in functions.
- Added the property `DataFrame.dtypes` to return a list of column name and data type pairs.
- Added the following aliases:
- `functions.expr()` for `functions.sql_expr()`.
- `functions.date_format()` for `functions.to_date()`.
- `functions.monotonically_increasing_id()` for `functions.seq8()`
- `functions.from_unixtime()` for `functions.to_timestamp()`
### Bug Fixes:
- Fixed a bug in SQL simplifier that didn’t handle Column alias and join well in some cases. See https://github.com/snowflakedb/snowpark-python/issues/658 for details.
- Fixed a bug in SQL simplifier that generated wrong column names for function calls, NaN and INF.
### Improvements
- The session parameter `PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER` is `True` after Snowflake 7.3 was released. In snowpark-python, `session.sql_simplifier_enabled` reads the value of `PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER` by default, meaning that the SQL simplfier is enabled by default after the Snowflake 7.3 release. To turn this off, set `PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER` in Snowflake to `False` or run `session.sql_simplifier_enabled = False` from Snowpark. It is recommended to use the SQL simplifier because it helps to generate more concise SQL.
## 1.0.0 (2022-11-01)
### New Features
- Added `Session.generator()` to create a new `DataFrame` using the Generator table function.
- Added a parameter `secure` to the functions that create a secure UDF or UDTF.
## 0.12.0 (2022-10-14)
### New Features
- Added new APIs for async job:
- `Session.create_async_job()` to create an `AsyncJob` instance from a query id.
- `AsyncJob.result()` now accepts argument `result_type` to return the results in different formats.
- `AsyncJob.to_df()` returns a `DataFrame` built from the result of this asynchronous job.
- `AsyncJob.query()` returns the SQL text of the executed query.
- `DataFrame.agg()` and `RelationalGroupedDataFrame.agg()` now accept variable-length arguments.
- Added parameters `lsuffix` and `rsuffix` to `DataFram.join()` and `DataFrame.cross_join()` to conveniently rename overlapping columns.
- Added `Table.drop_table()` so you can drop the temp table after `DataFrame.cache_result()`. `Table` is also a context manager so you can use the `with` statement to drop the cache temp table after use.
- Added `Session.use_secondary_roles()`.
- Added functions `first_value()` and `last_value()`. (contributed by @chasleslr)
- Added `on` as an alias for `using_columns` and `how` as an alias for `join_type` in `DataFrame.join()`.
### Bug Fixes
- Fixed a bug in `Session.create_dataframe()` that raised an error when `schema` names had special characters.
- Fixed a bug in which options set in `Session.read.option()` were not passed to `DataFrame.copy_into_table()` as default values.
- Fixed a bug in which `DataFrame.copy_into_table()` raises an error when a copy option has single quotes in the value.
## 0.11.0 (2022-09-28)
### Behavior Changes
- `Session.add_packages()` now raises `ValueError` when the version of a package cannot be found in Snowflake Anaconda channel. Previously, `Session.add_packages()` succeeded, and a `SnowparkSQLException` exception was raised later in the UDF/SP registration step.
### New Features:
- Added method `FileOperation.get_stream()` to support downloading stage files as stream.
- Added support in `functions.ntiles()` to accept int argument.
- Added the following aliases:
- `functions.call_function()` for `functions.call_builtin()`.
- `functions.function()` for `functions.builtin()`.
- `DataFrame.order_by()` for `DataFrame.sort()`
- `DataFrame.orderBy()` for `DataFrame.sort()`
- Improved `DataFrame.cache_result()` to return a more accurate `Table` class instead of a `DataFrame` class.
- Added support to allow `session` as the first argument when calling `StoredProcedure`.
### Improvements
- Improved nested query generation by flattening queries when applicable.
- This improvement could be enabled by setting `Session.sql_simplifier_enabled = True`.
- `DataFrame.select()`, `DataFrame.with_column()`, `DataFrame.drop()` and other select-related APIs have more flattened SQLs.
- `DataFrame.union()`, `DataFrame.union_all()`, `DataFrame.except_()`, `DataFrame.intersect()`, `DataFrame.union_by_name()` have flattened SQLs generated when multiple set operators are chained.
- Improved type annotations for async job APIs.
### Bug Fixes
- Fixed a bug in which `Table.update()`, `Table.delete()`, `Table.merge()` try to reference a temp table that does not exist.
## 0.10.0 (2022-09-16)
### New Features:
- Added experimental APIs for evaluating Snowpark dataframes with asynchronous queries:
- Added keyword argument `block` to the following action APIs on Snowpark dataframes (which execute queries) to allow asynchronous evaluations:
- `DataFrame.collect()`, `DataFrame.to_local_iterator()`, `DataFrame.to_pandas()`, `DataFrame.to_pandas_batches()`, `DataFrame.count()`, `DataFrame.first()`.
- `DataFrameWriter.save_as_table()`, `DataFrameWriter.copy_into_location()`.
- `Table.delete()`, `Table.update()`, `Table.merge()`.
- Added method `DataFrame.collect_nowait()` to allow asynchronous evaluations.
- Added class `AsyncJob` to retrieve results from asynchronously executed queries and check their status.
- Added support for `table_type` in `Session.write_pandas()`. You can now choose from these `table_type` options: `"temporary"`, `"temp"`, and `"transient"`.
- Added support for using Python structured data (`list`, `tuple` and `dict`) as literal values in Snowpark.
- Added keyword argument `execute_as` to `functions.sproc()` and `session.sproc.register()` to allow registering a stored procedure as a caller or owner.
- Added support for specifying a pre-configured file format when reading files from a stage in Snowflake.
### Improvements:
- Added support for displaying details of a Snowpark session.
### Bug Fixes:
- Fixed a bug in which `DataFrame.copy_into_table()` and `DataFrameWriter.save_as_table()` mistakenly created a new table if the table name is fully qualified, and the table already exists.
### Deprecations:
- Deprecated keyword argument `create_temp_table` in `Session.write_pandas()`.
- Deprecated invoking UDFs using arguments wrapped in a Python list or tuple. You can use variable-length arguments without a list or tuple.
### Dependency updates
- Updated ``snowflake-connector-python`` to 2.7.12.
## 0.9.0 (2022-08-30)
### New Features:
- Added support for displaying source code as comments in the generated scripts when registering UDFs.
This feature is turned on by default. To turn it off, pass the new keyword argument `source_code_display` as `False` when calling `register()` or `@udf()`.
- Added support for calling table functions from `DataFrame.select()`, `DataFrame.with_column()` and `DataFrame.with_columns()` which now take parameters of type `table_function.TableFunctionCall` for columns.
- Added keyword argument `overwrite` to `session.write_pandas()` to allow overwriting contents of a Snowflake table with that of a pandas DataFrame.
- Added keyword argument `column_order` to `df.write.save_as_table()` to specify the matching rules when inserting data into table in append mode.
- Added method `FileOperation.put_stream()` to upload local files to a stage via file stream.
- Added methods `TableFunctionCall.alias()` and `TableFunctionCall.as_()` to allow aliasing the names of columns that come from the output of table function joins.
- Added function `get_active_session()` in module `snowflake.snowpark.context` to get the current active Snowpark session.
### Bug Fixes:
- Fixed a bug in which batch insert should not raise an error when `statement_params` is not passed to the function.
- Fixed a bug in which column names should be quoted when `session.create_dataframe()` is called with dicts and a given schema.
- Fixed a bug in which creation of table should be skipped if the table already exists and is in append mode when calling `df.write.save_as_table()`.
- Fixed a bug in which third-party packages with underscores cannot be added when registering UDFs.
### Improvements:
- Improved function `function.uniform()` to infer the types of inputs `max_` and `min_` and cast the limits to `IntegerType` or `FloatType` correspondingly.
## 0.8.0 (2022-07-22)
### New Features:
- Added keyword only argument `statement_params` to the following methods to allow for specifying statement level parameters:
- `collect`, `to_local_iterator`, `to_pandas`, `to_pandas_batches`,
`count`, `copy_into_table`, `show`, `create_or_replace_view`, `create_or_replace_temp_view`, `first`, `cache_result`
and `random_split` on class `snowflake.snowpark.Dateframe`.
- `update`, `delete` and `merge` on class `snowflake.snowpark.Table`.
- `save_as_table` and `copy_into_location` on class `snowflake.snowpark.DataFrameWriter`.
- `approx_quantile`, `statement_params`, `cov` and `crosstab` on class `snowflake.snowpark.DataFrameStatFunctions`.
- `register` and `register_from_file` on class `snowflake.snowpark.udf.UDFRegistration`.
- `register` and `register_from_file` on class `snowflake.snowpark.udtf.UDTFRegistration`.
- `register` and `register_from_file` on class `snowflake.snowpark.stored_procedure.StoredProcedureRegistration`.
- `udf`, `udtf` and `sproc` in `snowflake.snowpark.functions`.
- Added support for `Column` as an input argument to `session.call()`.
- Added support for `table_type` in `df.write.save_as_table()`. You can now choose from these `table_type` options: `"temporary"`, `"temp"`, and `"transient"`.
### Improvements:
- Added validation of object name in `session.use_*` methods.
- Updated the query tag in SQL to escape it when it has special characters.
- Added a check to see if Anaconda terms are acknowledged when adding missing packages.
### Bug Fixes:
- Fixed the limited length of the string column in `session.create_dataframe()`.
- Fixed a bug in which `session.create_dataframe()` mistakenly converted 0 and `False` to `None` when the input data was only a list.
- Fixed a bug in which calling `session.create_dataframe()` using a large local dataset sometimes created a temp table twice.
- Aligned the definition of `function.trim()` with the SQL function definition.
- Fixed an issue where snowpark-python would hang when using the Python system-defined (built-in function) `sum` vs. the Snowpark `function.sum()`.
### Deprecations:
- Deprecated keyword argument `create_temp_table` in `df.write.save_as_table()`.
## 0.7.0 (2022-05-25)
### New Features:
- Added support for user-defined table functions (UDTFs).
- Use function `snowflake.snowpark.functions.udtf()` to register a UDTF, or use it as a decorator to register the UDTF.
- You can also use `Session.udtf.register()` to register a UDTF.
- Use `Session.udtf.register_from_file()` to register a UDTF from a Python file.
- Updated APIs to query a table function, including both Snowflake built-in table functions and UDTFs.
- Use function `snowflake.snowpark.functions.table_function()` to create a callable representing a table function and use it to call the table function in a query.
- Alternatively, use function `snowflake.snowpark.functions.call_table_function()` to call a table function.
- Added support for `over` clause that specifies `partition by` and `order by` when lateral joining a table function.
- Updated `Session.table_function()` and `DataFrame.join_table_function()` to accept `TableFunctionCall` instances.
### Breaking Changes:
- When creating a function with `functions.udf()` and `functions.sproc()`, you can now specify an empty list for the `imports` or `packages` argument to indicate that no import or package is used for this UDF or stored procedure. Previously, specifying an empty list meant that the function would use session-level imports or packages.
- Improved the `__repr__` implementation of data types in `types.py`. The unused `type_name` property has been removed.
- Added a Snowpark-specific exception class for SQL errors. This replaces the previous `ProgrammingError` from the Python connector.
### Improvements:
- Added a lock to a UDF or UDTF when it is called for the first time per thread.
- Improved the error message for pickling errors that occurred during UDF creation.
- Included the query ID when logging the failed query.
### Bug Fixes:
- Fixed a bug in which non-integral data (such as timestamps) was occasionally converted to integer when calling `DataFrame.to_pandas()`.
- Fixed a bug in which `DataFrameReader.parquet()` failed to read a parquet file when its column contained spaces.
- Fixed a bug in which `DataFrame.copy_into_table()` failed when the dataframe is created by reading a file with inferred schemas.
### Deprecations
`Session.flatten()` and `DataFrame.flatten()`.
### Dependency Updates:
- Restricted the version of `cloudpickle` <= `2.0.0`.
## 0.6.0 (2022-04-27)
### New Features:
- Added support for vectorized UDFs with the input as a pandas DataFrame or pandas Series and the output as a pandas Series. This improves the performance of UDFs in Snowpark.
- Added support for inferring the schema of a DataFrame by default when it is created by reading a Parquet, Avro, or ORC file in the stage.
- Added functions `current_session()`, `current_statement()`, `current_user()`, `current_version()`, `current_warehouse()`, `date_from_parts()`, `date_trunc()`, `dayname()`, `dayofmonth()`, `dayofweek()`, `dayofyear()`, `grouping()`, `grouping_id()`, `hour()`, `last_day()`, `minute()`, `next_day()`, `previous_day()`, `second()`, `month()`, `monthname()`, `quarter()`, `year()`, `current_database()`, `current_role()`, `current_schema()`, `current_schemas()`, `current_region()`, `current_avaliable_roles()`, `add_months()`, `any_value()`, `bitnot()`, `bitshiftleft()`, `bitshiftright()`, `convert_timezone()`, `uniform()`, `strtok_to_array()`, `sysdate()`, `time_from_parts()`, `timestamp_from_parts()`, `timestamp_ltz_from_parts()`, `timestamp_ntz_from_parts()`, `timestamp_tz_from_parts()`, `weekofyear()`, `percentile_cont()` to `snowflake.snowflake.functions`.
### Breaking Changes:
- Expired deprecations:
- Removed the following APIs that were deprecated in 0.4.0: `DataFrame.groupByGroupingSets()`, `DataFrame.naturalJoin()`, `DataFrame.joinTableFunction`, `DataFrame.withColumns()`, `Session.getImports()`, `Session.addImport()`, `Session.removeImport()`, `Session.clearImports()`, `Session.getSessionStage()`, `Session.getDefaultDatabase()`, `Session.getDefaultSchema()`, `Session.getCurrentDatabase()`, `Session.getCurrentSchema()`, `Session.getFullyQualifiedCurrentSchema()`.
### Improvements:
- Added support for creating an empty `DataFrame` with a specific schema using the `Session.create_dataframe()` method.
- Changed the logging level from `INFO` to `DEBUG` for several logs (e.g., the executed query) when evaluating a dataframe.
- Improved the error message when failing to create a UDF due to pickle errors.
### Bug Fixes:
- Removed pandas hard dependencies in the `Session.create_dataframe()` method.
### Dependency Updates:
- Added `typing-extension` as a new dependency with the version >= `4.1.0`.
## 0.5.0 (2022-03-22)
### New Features
- Added stored procedures API.
- Added `Session.sproc` property and `sproc()` to `snowflake.snowpark.functions`, so you can register stored procedures.
- Added `Session.call` to call stored procedures by name.
- Added `UDFRegistration.register_from_file()` to allow registering UDFs from Python source files or zip files directly.
- Added `UDFRegistration.describe()` to describe a UDF.
- Added `DataFrame.random_split()` to provide a way to randomly split a dataframe.
- Added functions `md5()`, `sha1()`, `sha2()`, `ascii()`, `initcap()`, `length()`, `lower()`, `lpad()`, `ltrim()`, `rpad()`, `rtrim()`, `repeat()`, `soundex()`, `regexp_count()`, `replace()`, `charindex()`, `collate()`, `collation()`, `insert()`, `left()`, `right()`, `endswith()` to `snowflake.snowpark.functions`.
- Allowed `call_udf()` to accept literal values.
- Provided a `distinct` keyword in `array_agg()`.
### Bug Fixes:
- Fixed an issue that caused `DataFrame.to_pandas()` to have a string column if `Column.cast(IntegerType())` was used.
- Fixed a bug in `DataFrame.describe()` when there is more than one string column.
## 0.4.0 (2022-02-15)
### New Features
- You can now specify which Anaconda packages to use when defining UDFs.
- Added `add_packages()`, `get_packages()`, `clear_packages()`, and `remove_package()`, to class `Session`.
- Added `add_requirements()` to `Session` so you can use a requirements file to specify which packages this session will use.
- Added parameter `packages` to function `snowflake.snowpark.functions.udf()` and method `UserDefinedFunction.register()` to indicate UDF-level Anaconda package dependencies when creating a UDF.
- Added parameter `imports` to `snowflake.snowpark.functions.udf()` and `UserDefinedFunction.register()` to specify UDF-level code imports.
- Added a parameter `session` to function `udf()` and `UserDefinedFunction.register()` so you can specify which session to use to create a UDF if you have multiple sessions.
- Added types `Geography` and `Variant` to `snowflake.snowpark.types` to be used as type hints for Geography and Variant data when defining a UDF.
- Added support for Geography geoJSON data.
- Added `Table`, a subclass of `DataFrame` for table operations:
- Methods `update` and `delete` update and delete rows of a table in Snowflake.
- Method `merge` merges data from a `DataFrame` to a `Table`.
- Override method `DataFrame.sample()` with an additional parameter `seed`, which works on tables but not on view and sub-queries.
- Added `DataFrame.to_local_iterator()` and `DataFrame.to_pandas_batches()` to allow getting results from an iterator when the result set returned from the Snowflake database is too large.
- Added `DataFrame.cache_result()` for caching the operations performed on a `DataFrame` in a temporary table.
Subsequent operations on the original `DataFrame` have no effect on the cached result `DataFrame`.
- Added property `DataFrame.queries` to get SQL queries that will be executed to evaluate the `DataFrame`.
- Added `Session.query_history()` as a context manager to track SQL queries executed on a session, including all SQL queries to evaluate `DataFrame`s created from a session. Both query ID and query text are recorded.
- You can now create a `Session` instance from an existing established `snowflake.connector.SnowflakeConnection`. Use parameter `connection` in `Session.builder.configs()`.
- Added `use_database()`, `use_schema()`, `use_warehouse()`, and `use_role()` to class `Session` to switch database/schema/warehouse/role after a session is created.
- Added `DataFrameWriter.copy_into_table()` to unload a `DataFrame` to stage files.
- Added `DataFrame.unpivot()`.
- Added `Column.within_group()` for sorting the rows by columns with some aggregation functions.
- Added functions `listagg()`, `mode()`, `div0()`, `acos()`, `asin()`, `atan()`, `atan2()`, `cos()`, `cosh()`, `sin()`, `sinh()`, `tan()`, `tanh()`, `degrees()`, `radians()`, `round()`, `trunc()`, and `factorial()` to `snowflake.snowflake.functions`.
- Added an optional argument `ignore_nulls` in function `lead()` and `lag()`.
- The `condition` parameter of function `when()` and `iff()` now accepts SQL expressions.
### Improvements
- All function and method names have been renamed to use the snake case naming style, which is more Pythonic. For convenience, some camel case names are kept as aliases to the snake case APIs. It is recommended to use the snake case APIs.
- Deprecated these methods on class `Session` and replaced them with their snake case equivalents: `getImports()`, `addImports()`, `removeImport()`, `clearImports()`, `getSessionStage()`, `getDefaultSchema()`, `getDefaultSchema()`, `getCurrentDatabase()`, `getFullyQualifiedCurrentSchema()`.
- Deprecated these methods on class `DataFrame` and replaced them with their snake case equivalents: `groupingByGroupingSets()`, `naturalJoin()`, `withColumns()`, `joinTableFunction()`.
- Property `DataFrame.columns` is now consistent with `DataFrame.schema.names` and the Snowflake database `Identifier Requirements`.
- `Column.__bool__()` now raises a `TypeError`. This will ban the use of logical operators `and`, `or`, `not` on `Column` object, for instance `col("a") > 1 and col("b") > 2` will raise the `TypeError`. Use `(col("a") > 1) & (col("b") > 2)` instead.
- Changed `PutResult` and `GetResult` to subclass `NamedTuple`.
- Fixed a bug which raised an error when the local path or stage location has a space or other special characters.
- Changed `DataFrame.describe()` so that non-numeric and non-string columns are ignored instead of raising an exception.
### Dependency updates
- Updated ``snowflake-connector-python`` to 2.7.4.
## 0.3.0 (2022-01-09)
### New Features
- Added `Column.isin()`, with an alias `Column.in_()`.
- Added `Column.try_cast()`, which is a special version of `cast()`. It tries to cast a string expression to other types and returns `null` if the cast is not possible.
- Added `Column.startswith()` and `Column.substr()` to process string columns.
- `Column.cast()` now also accepts a `str` value to indicate the cast type in addition to a `DataType` instance.
- Added `DataFrame.describe()` to summarize stats of a `DataFrame`.
- Added `DataFrame.explain()` to print the query plan of a `DataFrame`.
- `DataFrame.filter()` and `DataFrame.select_expr()` now accepts a sql expression.
- Added a new `bool` parameter `create_temp_table` to methods `DataFrame.saveAsTable()` and `Session.write_pandas()` to optionally create a temp table.
- Added `DataFrame.minus()` and `DataFrame.subtract()` as aliases to `DataFrame.except_()`.
- Added `regexp_replace()`, `concat()`, `concat_ws()`, `to_char()`, `current_timestamp()`, `current_date()`, `current_time()`, `months_between()`, `cast()`, `try_cast()`, `greatest()`, `least()`, and `hash()` to module `snowflake.snowpark.functions`.
### Bug Fixes
- Fixed an issue where `Session.createDataFrame(pandas_df)` and `Session.write_pandas(pandas_df)` raise an exception when the `pandas DataFrame` has spaces in the column name.
- `DataFrame.copy_into_table()` sometimes prints an `error` level log entry while it actually works. It's fixed now.
- Fixed an API docs issue where some `DataFrame` APIs are missing from the docs.
### Dependency updates
- Update ``snowflake-connector-python`` to 2.7.2, which upgrades ``pyarrow`` dependency to 6.0.x. Refer to the [python connector 2.7.2 release notes](https://pypi.org/project/snowflake-connector-python/2.7.2/) for more details.
## 0.2.0 (2021-12-02)
### New Features
- Updated the `Session.createDataFrame()` method for creating a `DataFrame` from a pandas DataFrame.
- Added the `Session.write_pandas()` method for writing a `pandas DataFrame` to a table in Snowflake and getting a `Snowpark DataFrame` object back.
- Added new classes and methods for calling window functions.
- Added the new functions `cume_dist()`, to find the cumulative distribution of a value with regard to other values within a window partition,
and `row_number()`, which returns a unique row number for each row within a window partition.
- Added functions for computing statistics for DataFrames in the `DataFrameStatFunctions` class.
- Added functions for handling missing values in a DataFrame in the `DataFrameNaFunctions` class.
- Added new methods `rollup()`, `cube()`, and `pivot()` to the `DataFrame` class.
- Added the `GroupingSets` class, which you can use with the DataFrame groupByGroupingSets method to perform a SQL GROUP BY GROUPING SETS.
- Added the new `FileOperation(session)`
class that you can use to upload and download files to and from a stage.
- Added the `DataFrame.copy_into_table()`
method for loading data from files in a stage into a table.
- In CASE expressions, the functions `when()` and `otherwise()`
now accept Python types in addition to `Column` objects.
- When you register a UDF you can now optionally set the `replace` parameter to `True` to overwrite an existing UDF with the same name.
### Improvements
- UDFs are now compressed before they are uploaded to the server. This makes them about 10 times smaller, which can help
when you are using large ML model files.
- When the size of a UDF is less than 8196 bytes, it will be uploaded as in-line code instead of uploaded to a stage.
### Bug Fixes
- Fixed an issue where the statement `df.select(when(col("a") == 1, 4).otherwise(col("a"))), [Row(4), Row(2), Row(3)]` raised an exception.
- Fixed an issue where `df.toPandas()` raised an exception when a DataFrame was created from large local data.
## 0.1.0 (2021-10-26)
Start of Private Preview
Raw data
{
"_id": null,
"home_page": "https://www.snowflake.com/",
"name": "snowflake-snowpark-python",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.12,>=3.8",
"maintainer_email": null,
"keywords": "Snowflake db database cloud analytics warehouse",
"author": "Snowflake, Inc",
"author_email": "snowflake-python-libraries-dl@snowflake.com",
"download_url": "https://files.pythonhosted.org/packages/63/dc/b6823c0621236dc7077a74c0da8369f6d4804aafaf4135bce0de8b13359c/snowflake_snowpark_python-1.26.0.tar.gz",
"platform": null,
"description": "# Snowflake Snowpark Python and Snowpark pandas APIs\n\n[![Build and Test](https://github.com/snowflakedb/snowpark-python/actions/workflows/precommit.yml/badge.svg)](https://github.com/snowflakedb/snowpark-python/actions/workflows/precommit.yml)\n[![codecov](https://codecov.io/gh/snowflakedb/snowpark-python/branch/main/graph/badge.svg)](https://codecov.io/gh/snowflakedb/snowpark-python)\n[![PyPi](https://img.shields.io/pypi/v/snowflake-snowpark-python.svg)](https://pypi.org/project/snowflake-snowpark-python/)\n[![License Apache-2.0](https://img.shields.io/:license-Apache%202-brightgreen.svg)](http://www.apache.org/licenses/LICENSE-2.0.txt)\n[![Codestyle Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nThe Snowpark library provides intuitive APIs for querying and processing data in a data pipeline.\nUsing this library, you can build applications that process data in Snowflake without having to move data to the system where your application code runs.\n\n[Source code][source code] | [Snowpark Python developer guide][Snowpark Python developer guide] | [Snowpark Python API reference][Snowpark Python api references] | [Snowpark pandas developer guide][Snowpark pandas developer guide] | [Snowpark pandas API reference][Snowpark pandas api references] | [Product documentation][snowpark] | [Samples][samples]\n\n## Getting started\n\n### Have your Snowflake account ready\nIf you don't have a Snowflake account yet, you can [sign up for a 30-day free trial account][sign up trial].\n\n### Create a Python virtual environment\nYou can use [miniconda][miniconda], [anaconda][anaconda], or [virtualenv][virtualenv]\nto create a Python 3.8, 3.9, 3.10 or 3.11 virtual environment.\n\nFor Snowpark pandas, only Python 3.9, 3.10, or 3.11 is supported.\n\nTo have the best experience when using it with UDFs, [creating a local conda environment with the Snowflake channel][use snowflake channel] is recommended.\n\n### Install the library to the Python virtual environment\n```bash\npip install snowflake-snowpark-python\n```\nTo use the [Snowpark pandas API][Snowpark pandas developer guide], you can optionally install the following, which installs [modin][modin] in the same environment. The Snowpark pandas API provides a familiar interface for pandas users to query and process data directly in Snowflake.\n```bash\npip install \"snowflake-snowpark-python[modin]\"\n```\n\n### Create a session and use the Snowpark Python API\n```python\nfrom snowflake.snowpark import Session\n\nconnection_parameters = {\n \"account\": \"<your snowflake account>\",\n \"user\": \"<your snowflake user>\",\n \"password\": \"<your snowflake password>\",\n \"role\": \"<snowflake user role>\",\n \"warehouse\": \"<snowflake warehouse>\",\n \"database\": \"<snowflake database>\",\n \"schema\": \"<snowflake schema>\"\n}\n\nsession = Session.builder.configs(connection_parameters).create()\n# Create a Snowpark dataframe from input data\ndf = session.create_dataframe([[1, 2], [3, 4]], schema=[\"a\", \"b\"]) \ndf = df.filter(df.a > 1)\nresult = df.collect()\ndf.show()\n\n# -------------\n# |\"A\" |\"B\" |\n# -------------\n# |3 |4 |\n# -------------\n```\n\n### Create a session and use the Snowpark pandas API\n```python\nimport modin.pandas as pd\nimport snowflake.snowpark.modin.plugin\nfrom snowflake.snowpark import Session\n\nCONNECTION_PARAMETERS = {\n 'account': '<myaccount>',\n 'user': '<myuser>',\n 'password': '<mypassword>',\n 'role': '<myrole>',\n 'database': '<mydatabase>',\n 'schema': '<myschema>',\n 'warehouse': '<mywarehouse>',\n}\nsession = Session.builder.configs(CONNECTION_PARAMETERS).create()\n\n# Create a Snowpark pandas dataframe from input data\ndf = pd.DataFrame([['a', 2.0, 1],['b', 4.0, 2],['c', 6.0, None]], columns=[\"COL_STR\", \"COL_FLOAT\", \"COL_INT\"])\ndf\n# COL_STR COL_FLOAT COL_INT\n# 0 a 2.0 1.0\n# 1 b 4.0 2.0\n# 2 c 6.0 NaN\n\ndf.shape\n# (3, 3)\n\ndf.head(2)\n# COL_STR COL_FLOAT COL_INT\n# 0 a 2.0 1\n# 1 b 4.0 2\n\ndf.dropna(subset=[\"COL_INT\"], inplace=True)\n\ndf\n# COL_STR COL_FLOAT COL_INT\n# 0 a 2.0 1\n# 1 b 4.0 2\n\ndf.shape\n# (2, 3)\n\ndf.head(2)\n# COL_STR COL_FLOAT COL_INT\n# 0 a 2.0 1\n# 1 b 4.0 2\n\n# Save the result back to Snowflake with a row_pos column.\ndf.reset_index(drop=True).to_snowflake('pandas_test2', index=True, index_label=['row_pos'])\n```\n\n## Samples\nThe [Snowpark Python developer guide][Snowpark Python developer guide], [Snowpark Python API references][Snowpark Python api references], [Snowpark pandas developer guide][Snowpark pandas developer guide], and [Snowpark pandas api references][Snowpark pandas api references] have basic sample code.\n[Snowflake-Labs][snowflake lab sample code] has more curated demos.\n\n## Logging\nConfigure logging level for `snowflake.snowpark` for Snowpark Python API logs.\nSnowpark uses the [Snowflake Python Connector][python connector].\nSo you may also want to configure the logging level for `snowflake.connector` when the error is in the Python Connector.\nFor instance,\n```python\nimport logging\nfor logger_name in ('snowflake.snowpark', 'snowflake.connector'):\n logger = logging.getLogger(logger_name)\n logger.setLevel(logging.DEBUG)\n ch = logging.StreamHandler()\n ch.setLevel(logging.DEBUG)\n ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))\n logger.addHandler(ch)\n```\n\n## Reading and writing to pandas DataFrame\n\nSnowpark Python API supports reading from and writing to a pandas DataFrame via the [to_pandas][to_pandas] and [write_pandas][write_pandas] commands. \n\nTo use these operations, ensure that pandas is installed in the same environment. You can install pandas alongside Snowpark Python by executing the following command:\n```bash\npip install \"snowflake-snowpark-python[pandas]\"\n```\nOnce pandas is installed, you can convert between a Snowpark DataFrame and pandas DataFrame as follows: \n```python\ndf = session.create_dataframe([[1, 2], [3, 4]], schema=[\"a\", \"b\"])\n# Convert Snowpark DataFrame to pandas DataFrame\npandas_df = df.to_pandas() \n# Write pandas DataFrame to a Snowflake table and return Snowpark DataFrame\nsnowpark_df = session.write_pandas(pandas_df, \"new_table\", auto_create_table=True)\n```\n\nSnowpark pandas API also supports writing to pandas: \n```python\nimport modin.pandas as pd\ndf = pd.DataFrame([[1, 2], [3, 4]], columns=[\"a\", \"b\"])\n# Convert Snowpark pandas DataFrame to pandas DataFrame\npandas_df = df.to_pandas() \n```\n\nNote that the above Snowpark pandas commands will work if Snowpark is installed with the `[modin]` option, the additional `[pandas]` installation is not required.\n\n## Contributing\nPlease refer to [CONTRIBUTING.md][contributing].\n\n[add other sample code repo links]: # (Developer advocacy is open-sourcing a repo that has excellent sample code. The link will be added here.)\n\n[Snowpark Python developer guide]: https://docs.snowflake.com/en/developer-guide/snowpark/python/index.html\n[Snowpark Python api references]: https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/index.html\n[Snowpark pandas developer guide]: https://docs.snowflake.com/developer-guide/snowpark/python/snowpark-pandas\n[Snowpark pandas api references]: https://docs.snowflake.com/developer-guide/snowpark/reference/python/latest/modin/index\n[snowpark]: https://www.snowflake.com/snowpark\n[sign up trial]: https://signup.snowflake.com\n[source code]: https://github.com/snowflakedb/snowpark-python\n[miniconda]: https://docs.conda.io/en/latest/miniconda.html\n[anaconda]: https://www.anaconda.com/\n[virtualenv]: https://docs.python.org/3/tutorial/venv.html\n[config pycharm interpreter]: https://www.jetbrains.com/help/pycharm/configuring-python-interpreter.html\n[python connector]: https://pypi.org/project/snowflake-connector-python/\n[use snowflake channel]: https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages.html#local-development-and-testing\n[snowflake lab sample code]: https://github.com/Snowflake-Labs/snowpark-python-demos\n[samples]: https://github.com/snowflakedb/snowpark-python/blob/main/README.md#samples\n[contributing]: https://github.com/snowflakedb/snowpark-python/blob/main/CONTRIBUTING.md\n[to_pandas]: https://docs.snowflake.com/developer-guide/snowpark/reference/python/latest/snowpark/api/snowflake.snowpark.DataFrame.to_pandas\n[write_pandas]: https://docs.snowflake.com/developer-guide/snowpark/reference/python/latest/snowpark/api/snowflake.snowpark.Session.write_pandas\n[modin]: https://github.com/modin-project/modin\n\n\n# Release History\n\n## 1.26.0 (2024-12-05)\n\n### Snowpark Python API Updates\n\n#### New Features\n\n- Added support for property `version` and class method `get_active_session` for `Session` class.\n- Added new methods and variables to enhance data type handling and JSON serialization/deserialization:\n - To `DataType`, its derived classes, and `StructField`:\n - `type_name`: Returns the type name of the data.\n - `simple_string`: Provides a simple string representation of the data.\n - `json_value`: Returns the data as a JSON-compatible value.\n - `json`: Converts the data to a JSON string.\n - To `ArrayType`, `MapType`, `StructField`, `PandasSeriesType`, `PandasDataFrameType` and `StructType`:\n - `from_json`: Enables these types to be created from JSON data.\n - To `MapType`:\n - `keyType`: keys of the map\n - `valueType`: values of the map\n- Added support for method `appName` in `SessionBuilder`.\n- Added support for `include_nulls` argument in `DataFrame.unpivot`.\n- Added support for following functions in `functions.py`:\n - `size` to get size of array, object, or map columns.\n - `collect_list` an alias of `array_agg`.\n - `substring` makes `len` argument optional.\n- Added parameter `ast_enabled` to session for internal usage (default: `False`).\n\n#### Improvements\n\n- Added support for specifying the following to `DataFrame.create_or_replace_dynamic_table`:\n - `iceberg_config` A dictionary that can hold the following iceberg configuration options:\n - `external_volume`\n - `catalog`\n - `base_location`\n - `catalog_sync`\n - `storage_serialization_policy`\n- Added support for nested data types to `DataFrame.print_schema`\n- Added support for `level` parameter to `DataFrame.print_schema`\n- Improved flexibility of `DataFrameReader` and `DataFrameWriter` API by adding support for the following:\n - Added `format` method to `DataFrameReader` and `DataFrameWriter` to specify file format when loading or unloading results.\n - Added `load` method to `DataFrameReader` to work in conjunction with `format`.\n - Added `save` method to `DataFrameWriter` to work in conjunction with `format`.\n - Added support to read keyword arguments to `options` method for `DataFrameReader` and `DataFrameWriter`.\n- Relaxed the cloudpickle dependency for Python 3.11 to simplify build requirements. However, for Python 3.11, `cloudpickle==2.2.1` remains the only supported version.\n\n#### Bug Fixes\n\n- Removed warnings that dynamic pivot features were in private preview, because\n dynamic pivot is now generally available.\n- Fixed a bug in `session.read.options` where `False` Boolean values were incorrectly parsed as `True` in the generated file format.\n\n#### Dependency Updates\n\n- Added a runtime dependency on `python-dateutil`.\n\n### Snowpark pandas API Updates\n\n#### New Features\n\n- Added partial support for `Series.map` when `arg` is a pandas `Series` or a\n `collections.abc.Mapping`. No support for instances of `dict` that implement\n `__missing__` but are not instances of `collections.defaultdict`.\n- Added support for `DataFrame.align` and `Series.align` for `axis=1` and `axis=None`.\n- Added support for `pd.json_normalize`.\n- Added support for `GroupBy.pct_change` with `axis=0`, `freq=None`, and `limit=None`.\n- Added support for `DataFrameGroupBy.__iter__` and `SeriesGroupBy.__iter__`.\n- Added support for `np.sqrt`, `np.trunc`, `np.floor`, numpy trig functions, `np.exp`, `np.abs`, `np.positive` and `np.negative`.\n- Added partial support for the dataframe interchange protocol method\n `DataFrame.__dataframe__()`.\n\n#### Bug Fixes\n\n- Fixed a bug in `df.loc` where setting a single column from a series results in unexpected `None` values.\n\n#### Improvements\n\n- Use UNPIVOT INCLUDE NULLS for unpivot operations in pandas instead of sentinel values.\n- Improved documentation for pd.read_excel.\n\n## 1.25.0 (2024-11-14)\n\n### Snowpark Python API Updates\n\n#### New Features\n\n- Added the following new functions in `snowflake.snowpark.dataframe`:\n - `map`\n- Added support for passing parameter `include_error` to `Session.query_history` to record queries that have error during execution.\n\n#### Improvements\n\n- When target stage is not set in profiler, a default stage from `Session.get_session_stage` is used instead of raising `SnowparkSQLException`.\n- Allowed lower case or mixed case input when calling `Session.stored_procedure_profiler.set_active_profiler`.\n- Added distributed tracing using open telemetry APIs for action function in `DataFrame`:\n - `cache_result`\n- Removed opentelemetry warning from logging.\n\n#### Bug Fixes\n\n- Fixed the pre-action and post-action query propagation when `In` expression were used in selects.\n- Fixed a bug that raised error `AttributeError` while calling `Session.stored_procedure_profiler.get_output` when `Session.stored_procedure_profiler` is disabled.\n\n#### Dependency Updates\n\n- Added a dependency on `protobuf>=5.28` and `tzlocal` at runtime.\n- Added a dependency on `protoc-wheel-0` for the development profile.\n- Require `snowflake-connector-python>=3.12.0, <4.0.0` (was `>=3.10.0`).\n\n### Snowpark pandas API Updates\n\n#### Dependency Updates\n\n- Updated `modin` from 0.28.1 to 0.30.1.\n- Added support for all `pandas` 2.2.x versions.\n\n#### New Features\n\n- Added support for `Index.to_numpy`.\n- Added support for `DataFrame.align` and `Series.align` for `axis=0`.\n- Added support for `size` in `GroupBy.aggregate`, `DataFrame.aggregate`, and `Series.aggregate`.\n- Added support for `snowflake.snowpark.functions.window`\n- Added support for `pd.read_pickle` (Uses native pandas for processing).\n- Added support for `pd.read_html` (Uses native pandas for processing).\n- Added support for `pd.read_xml` (Uses native pandas for processing).\n- Added support for aggregation functions `\"size\"` and `len` in `GroupBy.aggregate`, `DataFrame.aggregate`, and `Series.aggregate`.\n- Added support for list values in `Series.str.len`.\n\n#### Bug Fixes\n\n- Fixed a bug where aggregating a single-column dataframe with a single callable function (e.g. `pd.DataFrame([0]).agg(np.mean)`) would fail to transpose the result.\n- Fixed bugs where `DataFrame.dropna()` would:\n - Treat an empty `subset` (e.g. `[]`) as if it specified all columns instead of no columns.\n - Raise a `TypeError` for a scalar `subset` instead of filtering on just that column.\n - Raise a `ValueError` for a `subset` of type `pandas.Index` instead of filtering on the columns in the index.\n- Disable creation of scoped read only table to mitigate Disable creation of scoped read only table to mitigate `TableNotFoundError` when using dynamic pivot in notebook environment.\n- Fixed a bug when concat dataframe or series objects are coming from the same dataframe when axis = 1.\n\n#### Improvements\n\n- Improve np.where with scalar x value by eliminating unnecessary join and temp table creation.\n- Improve get_dummies performance by flattening the pivot with join.\n- Improve align performance when aligning on row position column by removing unnecessary window functions.\n\n\n\n### Snowpark Local Testing Updates\n\n#### New Features\n\n- Added support for patching functions that are unavailable in the `snowflake.snowpark.functions` module.\n- Added support for `snowflake.snowpark.functions.any_value`\n\n#### Bug Fixes\n\n- Fixed a bug where `Table.update` could not handle `VariantType`, `MapType`, and `ArrayType` data types.\n- Fixed a bug where column aliases were incorrectly resolved in `DataFrame.join`, causing errors when selecting columns from a joined DataFrame.\n- Fixed a bug where `Table.update` and `Table.merge` could fail if the target table's index was not the default `RangeIndex`.\n\n## 1.24.0 (2024-10-28)\n\n### Snowpark Python API Updates\n\n#### New Features\n\n- Updated `Session` class to be thread-safe. This allows concurrent DataFrame transformations, DataFrame actions, UDF and stored procedure registration, and concurrent file uploads when using the same `Session` object.\n - The feature is disabled by default and can be enabled by setting `FEATURE_THREAD_SAFE_PYTHON_SESSION` to `True` for account.\n - Updating session configurations, like changing database or schema, when multiple threads are using the session may lead to unexpected behavior.\n - When enabled, some internally created temporary table names returned from `DataFrame.queries` API are not deterministic, and may be different when DataFrame actions are executed. This does not affect explicit user-created temporary tables.\n- Added support for 'Service' domain to `session.lineage.trace` API.\n- Added support for `copy_grants` parameter when registering UDxF and stored procedures.\n- Added support for the following methods in `DataFrameWriter` to support daisy-chaining:\n - `option`\n - `options`\n - `partition_by`\n- Added support for `snowflake_cortex_summarize`.\n\n#### Improvements\n\n- Improved the following new capability for function `snowflake.snowpark.functions.array_remove` it is now possible to use in python.\n- Disables sql simplification when sort is performed after limit.\n - Previously, `df.sort().limit()` and `df.limit().sort()` generates the same query with sort in front of limit. Now, `df.limit().sort()` will generate query that reads `df.limit().sort()`.\n - Improve performance of generated query for `df.limit().sort()`, because limit stops table scanning as soon as the number of records is satisfied.\n- Added a client side error message for when an invalid stage location is passed to DataFrame read functions.\n\n#### Bug Fixes\n\n- Fixed a bug where the automatic cleanup of temporary tables could interfere with the results of async query execution.\n- Fixed a bug in `DataFrame.analytics.time_series_agg` function to handle multiple data points in same sliding interval.\n- Fixed a bug that created inconsistent casing in field names of structured objects in iceberg schemas.\n\n#### Deprecations\n\n- Deprecated warnings will be triggered when using snowpark-python with Python 3.8. For more details, please refer to https://docs.snowflake.com/en/developer-guide/python-runtime-support-policy.\n\n### Snowpark pandas API Updates\n\n#### New Features\n\n- Added support for `np.subtract`, `np.multiply`, `np.divide`, and `np.true_divide`.\n- Added support for tracking usages of `__array_ufunc__`.\n- Added numpy compatibility support for `np.float_power`, `np.mod`, `np.remainder`, `np.greater`, `np.greater_equal`, `np.less`, `np.less_equal`, `np.not_equal`, and `np.equal`.\n- Added numpy compatibility support for `np.log`, `np.log2`, and `np.log10`\n- Added support for `DataFrameGroupBy.bfill`, `SeriesGroupBy.bfill`, `DataFrameGroupBy.ffill`, and `SeriesGroupBy.ffill`.\n- Added support for `on` parameter with `Resampler`.\n- Added support for timedelta inputs in `value_counts()`.\n- Added support for applying Snowpark Python function `snowflake_cortex_summarize`.\n- Added support for `DataFrame.attrs` and `Series.attrs`.\n- Added support for `DataFrame.style`.\n- Added numpy compatibility support for `np.full_like`\n\n#### Improvements\n\n- Improved generated SQL query for `head` and `iloc` when the row key is a slice.\n- Improved error message when passing an unknown timezone to `tz_convert` and `tz_localize` in `Series`, `DataFrame`, `Series.dt`, and `DatetimeIndex`.\n- Improved documentation for `tz_convert` and `tz_localize` in `Series`, `DataFrame`, `Series.dt`, and `DatetimeIndex` to specify the supported timezone formats.\n- Added additional kwargs support for `df.apply` and `series.apply` ( as well as `map` and `applymap` ) when using snowpark functions. This allows for some position independent compatibility between apply and functions where the first argument is not a pandas object.\n- Improved generated SQL query for `iloc` and `iat` when the row key is a scalar.\n- Removed all joins in `iterrows`.\n- Improved documentation for `Series.map` to reflect the unsupported features.\n- Added support for `np.may_share_memory` which is used internally by many scikit-learn functions. This method will always return false when called with a Snowpark pandas object.\n\n#### Bug Fixes\n\n- Fixed a bug where `DataFrame` and `Series` `pct_change()` would raise `TypeError` when input contained timedelta columns.\n- Fixed a bug where `replace()` would sometimes propagate `Timedelta` types incorrectly through `replace()`. Instead raise `NotImplementedError` for `replace()` on `Timedelta`.\n- Fixed a bug where `DataFrame` and `Series` `round()` would raise `AssertionError` for `Timedelta` columns. Instead raise `NotImplementedError` for `round()` on `Timedelta`.\n- Fixed a bug where `reindex` fails when the new index is a Series with non-overlapping types from the original index.\n- Fixed a bug where calling `__getitem__` on a DataFrameGroupBy object always returned a DataFrameGroupBy object if `as_index=False`.\n- Fixed a bug where inserting timedelta values into an existing column would silently convert the values to integers instead of raising `NotImplementedError`.\n- Fixed a bug where `DataFrame.shift()` on axis=0 and axis=1 would fail to propagate timedelta types.\n- `DataFrame.abs()`, `DataFrame.__neg__()`, `DataFrame.stack()`, and `DataFrame.unstack()` now raise `NotImplementedError` for timedelta inputs instead of failing to propagate timedelta types.\n\n### Snowpark Local Testing Updates\n\n#### Bug Fixes\n\n- Fixed a bug where `DataFrame.alias` raises `KeyError` for input column name.\n- Fixed a bug where `to_csv` on Snowflake stage fails when data contains empty strings.\n\n## 1.23.0 (2024-10-09)\n\n### Snowpark Python API Updates\n\n#### New Features\n\n- Added the following new functions in `snowflake.snowpark.functions`:\n - `make_interval`\n- Added support for using Snowflake Interval constants with `Window.range_between()` when the order by column is TIMESTAMP or DATE type.\n- Added support for file writes. This feature is currently in private preview.\n- Added `thread_id` to `QueryRecord` to track the thread id submitting the query history.\n- Added support for `Session.stored_procedure_profiler`.\n\n#### Improvements\n\n#### Bug Fixes\n\n- Fixed a bug where registering a stored procedure or UDxF with type hints would give a warning `'NoneType' has no len() when trying to read default values from function`.\n\n### Snowpark pandas API Updates\n\n#### New Features\n\n- Added support for `TimedeltaIndex.mean` method.\n- Added support for some cases of aggregating `Timedelta` columns on `axis=0` with `agg` or `aggregate`.\n- Added support for `by`, `left_by`, `right_by`, `left_index`, and `right_index` for `pd.merge_asof`.\n- Added support for passing parameter `include_describe` to `Session.query_history`.\n- Added support for `DatetimeIndex.mean` and `DatetimeIndex.std` methods.\n- Added support for `Resampler.asfreq`, `Resampler.indices`, `Resampler.nunique`, and `Resampler.quantile`.\n- Added support for `resample` frequency `W`, `ME`, `YE` with `closed = \"left\"`.\n- Added support for `DataFrame.rolling.corr` and `Series.rolling.corr` for `pairwise = False` and int `window`.\n- Added support for string time-based `window` and `min_periods = None` for `Rolling`.\n- Added support for `DataFrameGroupBy.fillna` and `SeriesGroupBy.fillna`.\n- Added support for constructing `Series` and `DataFrame` objects with the lazy `Index` object as `data`, `index`, and `columns` arguments.\n- Added support for constructing `Series` and `DataFrame` objects with `index` and `column` values not present in `DataFrame`/`Series` `data`.\n- Added support for `pd.read_sas` (Uses native pandas for processing).\n- Added support for applying `rolling().count()` and `expanding().count()` to `Timedelta` series and columns.\n- Added support for `tz` in both `pd.date_range` and `pd.bdate_range`.\n- Added support for `Series.items`.\n- Added support for `errors=\"ignore\"` in `pd.to_datetime`.\n- Added support for `DataFrame.tz_localize` and `Series.tz_localize`.\n- Added support for `DataFrame.tz_convert` and `Series.tz_convert`.\n- Added support for applying Snowpark Python functions (e.g., `sin`) in `Series.map`, `Series.apply`, `DataFrame.apply` and `DataFrame.applymap`.\n\n#### Improvements\n\n- Improved `to_pandas` to persist the original timezone offset for TIMESTAMP_TZ type.\n- Improved `dtype` results for TIMESTAMP_TZ type to show correct timezone offset.\n- Improved `dtype` results for TIMESTAMP_LTZ type to show correct timezone.\n- Improved error message when passing non-bool value to `numeric_only` for groupby aggregations.\n- Removed unnecessary warning about sort algorithm in `sort_values`.\n- Use SCOPED object for internal create temp tables. The SCOPED objects will be stored sproc scoped if created within stored sproc, otherwise will be session scoped, and the object will be automatically cleaned at the end of the scope.\n- Improved warning messages for operations that lead to materialization with inadvertent slowness.\n- Removed unnecessary warning message about `convert_dtype` in `Series.apply`.\n\n#### Bug Fixes\n\n- Fixed a bug where an `Index` object created from a `Series`/`DataFrame` incorrectly updates the `Series`/`DataFrame`'s index name after an inplace update has been applied to the original `Series`/`DataFrame`.\n- Suppressed an unhelpful `SettingWithCopyWarning` that sometimes appeared when printing `Timedelta` columns.\n- Fixed `inplace` argument for `Series` objects derived from other `Series` objects.\n- Fixed a bug where `Series.sort_values` failed if series name overlapped with index column name.\n- Fixed a bug where transposing a dataframe would map `Timedelta` index levels to integer column levels.\n- Fixed a bug where `Resampler` methods on timedelta columns would produce integer results.\n- Fixed a bug where `pd.to_numeric()` would leave `Timedelta` inputs as `Timedelta` instead of converting them to integers.\n- Fixed `loc` set when setting a single row, or multiple rows, of a DataFrame with a Series value.\n\n### Snowpark Local Testing Updates\n\n#### Bug Fixes\n\n- Fixed a bug where nullable columns were annotated wrongly.\n- Fixed a bug where the `date_add` and `date_sub` functions failed for `NULL` values.\n- Fixed a bug where `equal_null` could fail inside a merge statement.\n- Fixed a bug where `row_number` could fail inside a Window function.\n- Fixed a bug where updates could fail when the source is the result of a join.\n\n\n## 1.22.1 (2024-09-11)\nThis is a re-release of 1.22.0. Please refer to the 1.22.0 release notes for detailed release content.\n\n\n## 1.22.0 (2024-09-10)\n\n### Snowpark Python API Updates\n\n### New Features\n\n- Added the following new functions in `snowflake.snowpark.functions`:\n - `array_remove`\n - `ln`\n\n#### Improvements\n\n- Improved documentation for `Session.write_pandas` by making `use_logical_type` option more explicit.\n- Added support for specifying the following to `DataFrameWriter.save_as_table`:\n - `enable_schema_evolution`\n - `data_retention_time`\n - `max_data_extension_time`\n - `change_tracking`\n - `copy_grants`\n - `iceberg_config` A dicitionary that can hold the following iceberg configuration options:\n - `external_volume`\n - `catalog`\n - `base_location`\n - `catalog_sync`\n - `storage_serialization_policy`\n- Added support for specifying the following to `DataFrameWriter.copy_into_table`:\n - `iceberg_config` A dicitionary that can hold the following iceberg configuration options:\n - `external_volume`\n - `catalog`\n - `base_location`\n - `catalog_sync`\n - `storage_serialization_policy`\n- Added support for specifying the following parameters to `DataFrame.create_or_replace_dynamic_table`:\n - `mode`\n - `refresh_mode`\n - `initialize`\n - `clustering_keys`\n - `is_transient`\n - `data_retention_time`\n - `max_data_extension_time`\n\n#### Bug Fixes\n\n- Fixed a bug in `session.read.csv` that caused an error when setting `PARSE_HEADER = True` in an externally defined file format.\n- Fixed a bug in query generation from set operations that allowed generation of duplicate queries when children have common subqueries.\n- Fixed a bug in `session.get_session_stage` that referenced a non-existing stage after switching database or schema.\n- Fixed a bug where calling `DataFrame.to_snowpark_pandas` without explicitly initializing the Snowpark pandas plugin caused an error.\n- Fixed a bug where using the `explode` function in dynamic table creation caused a SQL compilation error due to improper boolean type casting on the `outer` parameter.\n\n### Snowpark Local Testing Updates\n\n#### New Features\n\n- Added support for type coercion when passing columns as input to UDF calls.\n- Added support for `Index.identical`.\n\n#### Bug Fixes\n\n- Fixed a bug where the truncate mode in `DataFrameWriter.save_as_table` incorrectly handled DataFrames containing only a subset of columns from the existing table.\n- Fixed a bug where function `to_timestamp` does not set the default timezone of the column datatype.\n\n### Snowpark pandas API Updates\n\n#### New Features\n\n- Added limited support for the `Timedelta` type, including the following features. Snowpark pandas will raise `NotImplementedError` for unsupported `Timedelta` use cases.\n - supporting tracking the Timedelta type through `copy`, `cache_result`, `shift`, `sort_index`, `assign`, `bfill`, `ffill`, `fillna`, `compare`, `diff`, `drop`, `dropna`, `duplicated`, `empty`, `equals`, `insert`, `isin`, `isna`, `items`, `iterrows`, `join`, `len`, `mask`, `melt`, `merge`, `nlargest`, `nsmallest`, `to_pandas`.\n - converting non-timedelta to timedelta via `astype`.\n - `NotImplementedError` will be raised for the rest of methods that do not support `Timedelta`.\n - support for subtracting two timestamps to get a Timedelta.\n - support indexing with Timedelta data columns.\n - support for adding or subtracting timestamps and `Timedelta`.\n - support for binary arithmetic between two `Timedelta` values.\n - support for binary arithmetic and comparisons between `Timedelta` values and numeric values.\n - support for lazy `TimedeltaIndex`.\n - support for `pd.to_timedelta`.\n - support for `GroupBy` aggregations `min`, `max`, `mean`, `idxmax`, `idxmin`, `std`, `sum`, `median`, `count`, `any`, `all`, `size`, `nunique`, `head`, `tail`, `aggregate`.\n - support for `GroupBy` filtrations `first` and `last`.\n - support for `TimedeltaIndex` attributes: `days`, `seconds`, `microseconds` and `nanoseconds`.\n - support for `diff` with timestamp columns on `axis=0` and `axis=1`\n - support for `TimedeltaIndex` methods: `ceil`, `floor` and `round`.\n - support for `TimedeltaIndex.total_seconds` method.\n- Added support for index's arithmetic and comparison operators.\n- Added support for `Series.dt.round`.\n- Added documentation pages for `DatetimeIndex`.\n- Added support for `Index.name`, `Index.names`, `Index.rename`, and `Index.set_names`.\n- Added support for `Index.__repr__`.\n- Added support for `DatetimeIndex.month_name` and `DatetimeIndex.day_name`.\n- Added support for `Series.dt.weekday`, `Series.dt.time`, and `DatetimeIndex.time`.\n- Added support for `Index.min` and `Index.max`.\n- Added support for `pd.merge_asof`.\n- Added support for `Series.dt.normalize` and `DatetimeIndex.normalize`.\n- Added support for `Index.is_boolean`, `Index.is_integer`, `Index.is_floating`, `Index.is_numeric`, and `Index.is_object`.\n- Added support for `DatetimeIndex.round`, `DatetimeIndex.floor` and `DatetimeIndex.ceil`.\n- Added support for `Series.dt.days_in_month` and `Series.dt.daysinmonth`.\n- Added support for `DataFrameGroupBy.value_counts` and `SeriesGroupBy.value_counts`.\n- Added support for `Series.is_monotonic_increasing` and `Series.is_monotonic_decreasing`.\n- Added support for `Index.is_monotonic_increasing` and `Index.is_monotonic_decreasing`.\n- Added support for `pd.crosstab`.\n- Added support for `pd.bdate_range` and included business frequency support (B, BME, BMS, BQE, BQS, BYE, BYS) for both `pd.date_range` and `pd.bdate_range`.\n- Added support for lazy `Index` objects as `labels` in `DataFrame.reindex` and `Series.reindex`.\n- Added support for `Series.dt.days`, `Series.dt.seconds`, `Series.dt.microseconds`, and `Series.dt.nanoseconds`.\n- Added support for creating a `DatetimeIndex` from an `Index` of numeric or string type.\n- Added support for string indexing with `Timedelta` objects.\n- Added support for `Series.dt.total_seconds` method.\n- Added support for `DataFrame.apply(axis=0)`.\n- Added support for `Series.dt.tz_convert` and `Series.dt.tz_localize`.\n- Added support for `DatetimeIndex.tz_convert` and `DatetimeIndex.tz_localize`.\n\n#### Improvements\n\n- Improve concat, join performance when operations are performed on series coming from the same dataframe by avoiding unnecessary joins.\n- Refactored `quoted_identifier_to_snowflake_type` to avoid making metadata queries if the types have been cached locally.\n- Improved `pd.to_datetime` to handle all local input cases.\n- Create a lazy index from another lazy index without pulling data to client.\n- Raised `NotImplementedError` for Index bitwise operators.\n- Display a more clear error message when `Index.names` is set to a non-like-like object.\n- Raise a warning whenever MultiIndex values are pulled in locally.\n- Improve warning message for `pd.read_snowflake` include the creation reason when temp table creation is triggered.\n- Improve performance for `DataFrame.set_index`, or setting `DataFrame.index` or `Series.index` by avoiding checks require eager evaluation. As a consequence, when the new index that does not match the current `Series`/`DataFrame` object length, a `ValueError` is no longer raised. Instead, when the `Series`/`DataFrame` object is longer than the provided index, the `Series`/`DataFrame`'s new index is filled with `NaN` values for the \"extra\" elements. Otherwise, the extra values in the provided index are ignored.\n- Properly raise `NotImplementedError` when ambiguous/nonexistent are non-string in `ceil`/`floor`/`round`.\n\n#### Bug Fixes\n\n- Stopped ignoring nanoseconds in `pd.Timedelta` scalars.\n- Fixed AssertionError in tree of binary operations.\n- Fixed bug in `Series.dt.isocalendar` using a named Series\n- Fixed `inplace` argument for Series objects derived from DataFrame columns.\n- Fixed a bug where `Series.reindex` and `DataFrame.reindex` did not update the result index's name correctly.\n- Fixed a bug where `Series.take` did not error when `axis=1` was specified.\n\n\n## 1.21.1 (2024-09-05)\n\n### Snowpark Python API Updates\n\n#### Bug Fixes\n\n- Fixed a bug where using `to_pandas_batches` with async jobs caused an error due to improper handling of waiting for asynchronous query completion.\n\n## 1.21.0 (2024-08-19)\n\n### Snowpark Python API Updates\n\n#### New Features\n\n- Added support for `snowflake.snowpark.testing.assert_dataframe_equal` that is a utility function to check the equality of two Snowpark DataFrames.\n\n#### Improvements\n\n- Added support server side string size limitations.\n- Added support to create and invoke stored procedures, UDFs and UDTFs with optional arguments.\n- Added support for column lineage in the DataFrame.lineage.trace API.\n- Added support for passing `INFER_SCHEMA` options to `DataFrameReader` via `INFER_SCHEMA_OPTIONS`.\n- Added support for passing `parameters` parameter to `Column.rlike` and `Column.regexp`.\n- Added support for automatically cleaning up temporary tables created by `df.cache_result()` in the current session, when the DataFrame is no longer referenced (i.e., gets garbage collected). It is still an experimental feature not enabled by default, and can be enabled by setting `session.auto_clean_up_temp_table_enabled` to `True`.\n- Added support for string literals to the `fmt` parameter of `snowflake.snowpark.functions.to_date`.\n- Added support for system$reference function.\n\n#### Bug Fixes\n\n- Fixed a bug where SQL generated for selecting `*` column has an incorrect subquery.\n- Fixed a bug in `DataFrame.to_pandas_batches` where the iterator could throw an error if certain transformation is made to the pandas dataframe due to wrong isolation level.\n- Fixed a bug in `DataFrame.lineage.trace` to split the quoted feature view's name and version correctly.\n- Fixed a bug in `Column.isin` that caused invalid sql generation when passed an empty list.\n- Fixed a bug that fails to raise NotImplementedError while setting cell with list like item.\n\n### Snowpark Local Testing Updates\n\n#### New Features\n\n- Added support for the following APIs:\n - snowflake.snowpark.functions\n - `rank`\n - `dense_rank`\n - `percent_rank`\n - `cume_dist`\n - `ntile`\n - `datediff`\n - `array_agg`\n - snowflake.snowpark.column.Column.within_group\n- Added support for parsing flags in regex statements for mocked plans. This maintains parity with the `rlike` and `regexp` changes above.\n\n#### Bug Fixes\n\n- Fixed a bug where Window Functions LEAD and LAG do not handle option `ignore_nulls` properly.\n- Fixed a bug where values were not populated into the result DataFrame during the insertion of table merge operation.\n\n#### Improvements\n\n- Fix pandas FutureWarning about integer indexing.\n\n### Snowpark pandas API Updates\n\n#### New Features\n\n- Added support for `DataFrame.backfill`, `DataFrame.bfill`, `Series.backfill`, and `Series.bfill`.\n- Added support for `DataFrame.compare` and `Series.compare` with default parameters.\n- Added support for `Series.dt.microsecond` and `Series.dt.nanosecond`.\n- Added support for `Index.is_unique` and `Index.has_duplicates`.\n- Added support for `Index.equals`.\n- Added support for `Index.value_counts`.\n- Added support for `Series.dt.day_name` and `Series.dt.month_name`.\n- Added support for indexing on Index, e.g., `df.index[:10]`.\n- Added support for `DataFrame.unstack` and `Series.unstack`.\n- Added support for `DataFrame.asfreq` and `Series.asfreq`.\n- Added support for `Series.dt.is_month_start` and `Series.dt.is_month_end`.\n- Added support for `Index.all` and `Index.any`.\n- Added support for `Series.dt.is_year_start` and `Series.dt.is_year_end`.\n- Added support for `Series.dt.is_quarter_start` and `Series.dt.is_quarter_end`.\n- Added support for lazy `DatetimeIndex`.\n- Added support for `Series.argmax` and `Series.argmin`.\n- Added support for `Series.dt.is_leap_year`.\n- Added support for `DataFrame.items`.\n- Added support for `Series.dt.floor` and `Series.dt.ceil`.\n- Added support for `Index.reindex`.\n- Added support for `DatetimeIndex` properties: `year`, `month`, `day`, `hour`, `minute`, `second`, `microsecond`,\n `nanosecond`, `date`, `dayofyear`, `day_of_year`, `dayofweek`, `day_of_week`, `weekday`, `quarter`,\n `is_month_start`, `is_month_end`, `is_quarter_start`, `is_quarter_end`, `is_year_start`, `is_year_end`\n and `is_leap_year`.\n- Added support for `Resampler.fillna` and `Resampler.bfill`.\n- Added limited support for the `Timedelta` type, including creating `Timedelta` columns and `to_pandas`.\n- Added support for `Index.argmax` and `Index.argmin`.\n\n#### Improvements\n\n- Removed the public preview warning message when importing Snowpark pandas.\n- Removed unnecessary count query from `SnowflakeQueryCompiler.is_series_like` method.\n- `Dataframe.columns` now returns native pandas Index object instead of Snowpark Index object.\n- Refactor and introduce `query_compiler` argument in `Index` constructor to create `Index` from query compiler.\n- `pd.to_datetime` now returns a DatetimeIndex object instead of a Series object.\n- `pd.date_range` now returns a DatetimeIndex object instead of a Series object.\n\n#### Bug Fixes\n\n- Made passing an unsupported aggregation function to `pivot_table` raise `NotImplementedError` instead of `KeyError`.\n- Removed axis labels and callable names from error messages and telemetry about unsupported aggregations.\n- Fixed AssertionError in `Series.drop_duplicates` and `DataFrame.drop_duplicates` when called after `sort_values`.\n- Fixed a bug in `Index.to_frame` where the result frame's column name may be wrong where name is unspecified.\n- Fixed a bug where some Index docstrings are ignored.\n- Fixed a bug in `Series.reset_index(drop=True)` where the result name may be wrong.\n- Fixed a bug in `Groupby.first/last` ordering by the correct columns in the underlying window expression.\n\n## 1.20.0 (2024-07-17)\n\n### Snowpark Python API Updates\n\n#### Improvements\n\n- Added distributed tracing using open telemetry APIs for table stored procedure function in `DataFrame`:\n - `_execute_and_get_query_id`\n- Added support for the `arrays_zip` function.\n- Improves performance for binary column expression and `df._in` by avoiding unnecessary cast for numeric values. You can enable this optimization by setting `session.eliminate_numeric_sql_value_cast_enabled = True`.\n- Improved error message for `write_pandas` when the target table does not exist and `auto_create_table=False`.\n- Added open telemetry tracing on UDxF functions in Snowpark.\n- Added open telemetry tracing on stored procedure registration in Snowpark.\n- Added a new optional parameter called `format_json` to the `Session.SessionBuilder.app_name` function that sets the app name in the `Session.query_tag` in JSON format. By default, this parameter is set to `False`.\n\n#### Bug Fixes\n- Fixed a bug where SQL generated for `lag(x, 0)` was incorrect and failed with error message `argument 1 to function LAG needs to be constant, found 'SYSTEM$NULL_TO_FIXED(null)'`.\n\n### Snowpark Local Testing Updates\n\n#### New Features\n\n- Added support for the following APIs:\n - snowflake.snowpark.functions\n - random\n- Added new parameters to `patch` function when registering a mocked function:\n - `distinct` allows an alternate function to be specified for when a sql function should be distinct.\n - `pass_column_index` passes a named parameter `column_index` to the mocked function that contains the pandas.Index for the input data.\n - `pass_row_index` passes a named parameter `row_index` to the mocked function that is the 0 indexed row number the function is currently operating on.\n - `pass_input_data` passes a named parameter `input_data` to the mocked function that contains the entire input dataframe for the current expression.\n - Added support for the `column_order` parameter to method `DataFrameWriter.save_as_table`.\n\n\n#### Bug Fixes\n- Fixed a bug that caused DecimalType columns to be incorrectly truncated to integer precision when used in BinaryExpressions.\n\n### Snowpark pandas API Updates\n\n#### New Features\n- Added support for `DataFrameGroupBy.all`, `SeriesGroupBy.all`, `DataFrameGroupBy.any`, and `SeriesGroupBy.any`.\n- Added support for `DataFrame.nlargest`, `DataFrame.nsmallest`, `Series.nlargest` and `Series.nsmallest`.\n- Added support for `replace` and `frac > 1` in `DataFrame.sample` and `Series.sample`.\n- Added support for `read_excel` (Uses local pandas for processing)\n- Added support for `Series.at`, `Series.iat`, `DataFrame.at`, and `DataFrame.iat`.\n- Added support for `Series.dt.isocalendar`.\n- Added support for `Series.case_when` except when condition or replacement is callable.\n- Added documentation pages for `Index` and its APIs.\n- Added support for `DataFrame.assign`.\n- Added support for `DataFrame.stack`.\n- Added support for `DataFrame.pivot` and `pd.pivot`.\n- Added support for `DataFrame.to_csv` and `Series.to_csv`.\n- Added partial support for `Series.str.translate` where the values in the `table` are single-codepoint strings.\n- Added support for `DataFrame.corr`.\n- Allow `df.plot()` and `series.plot()` to be called, materializing the data into the local client\n- Added support for `DataFrameGroupBy` and `SeriesGroupBy` aggregations `first` and `last`\n- Added support for `DataFrameGroupBy.get_group`.\n- Added support for `limit` parameter when `method` parameter is used in `fillna`.\n- Added partial support for `Series.str.translate` where the values in the `table` are single-codepoint strings.\n- Added support for `DataFrame.corr`.\n- Added support for `DataFrame.equals` and `Series.equals`.\n- Added support for `DataFrame.reindex` and `Series.reindex`.\n- Added support for `Index.astype`.\n- Added support for `Index.unique` and `Index.nunique`.\n- Added support for `Index.sort_values`.\n\n#### Bug Fixes\n- Fixed an issue when using np.where and df.where when the scalar 'other' is the literal 0.\n- Fixed a bug regarding precision loss when converting to Snowpark pandas `DataFrame` or `Series` with `dtype=np.uint64`.\n- Fixed bug where `values` is set to `index` when `index` and `columns` contain all columns in DataFrame during `pivot_table`.\n\n#### Improvements\n- Added support for `Index.copy()`\n- Added support for Index APIs: `dtype`, `values`, `item()`, `tolist()`, `to_series()` and `to_frame()`\n- Expand support for DataFrames with no rows in `pd.pivot_table` and `DataFrame.pivot_table`.\n- Added support for `inplace` parameter in `DataFrame.sort_index` and `Series.sort_index`.\n\n\n## 1.19.0 (2024-06-25)\n\n### Snowpark Python API Updates\n\n#### New Features\n\n- Added support for `to_boolean` function.\n- Added documentation pages for Index and its APIs.\n\n#### Bug Fixes\n\n- Fixed a bug where python stored procedure with table return type fails when run in a task.\n- Fixed a bug where df.dropna fails due to `RecursionError: maximum recursion depth exceeded` when the DataFrame has more than 500 columns.\n- Fixed a bug where `AsyncJob.result(\"no_result\")` doesn't wait for the query to finish execution.\n\n\n### Snowpark Local Testing Updates\n\n#### New Features\n\n- Added support for the `strict` parameter when registering UDFs and Stored Procedures.\n\n#### Bug Fixes\n\n- Fixed a bug in convert_timezone that made the setting the source_timezone parameter return an error.\n- Fixed a bug where creating DataFrame with empty data of type `DateType` raises `AttributeError`.\n- Fixed a bug that table merge fails when update clause exists but no update takes place.\n- Fixed a bug in mock implementation of `to_char` that raises `IndexError` when incoming column has nonconsecutive row index.\n- Fixed a bug in handling of `CaseExpr` expressions that raises `IndexError` when incoming column has nonconsecutive row index.\n- Fixed a bug in implementation of `Column.like` that raises `IndexError` when incoming column has nonconsecutive row index.\n\n#### Improvements\n\n- Added support for type coercion in the implementation of DataFrame.replace, DataFrame.dropna and the mock function `iff`.\n\n### Snowpark pandas API Updates\n\n#### New Features\n\n- Added partial support for `DataFrame.pct_change` and `Series.pct_change` without the `freq` and `limit` parameters.\n- Added support for `Series.str.get`.\n- Added support for `Series.dt.dayofweek`, `Series.dt.day_of_week`, `Series.dt.dayofyear`, and `Series.dt.day_of_year`.\n- Added support for `Series.str.__getitem__` (`Series.str[...]`).\n- Added support for `Series.str.lstrip` and `Series.str.rstrip`.\n- Added support for `DataFrameGroupBy.size` and `SeriesGroupBy.size`.\n- Added support for `DataFrame.expanding` and `Series.expanding` for aggregations `count`, `sum`, `min`, `max`, `mean`, `std`, `var`, and `sem` with `axis=0`.\n- Added support for `DataFrame.rolling` and `Series.rolling` for aggregation `count` with `axis=0`.\n- Added support for `Series.str.match`.\n- Added support for `DataFrame.resample` and `Series.resample` for aggregations `size`, `first`, and `last`.\n- Added support for `DataFrameGroupBy.all`, `SeriesGroupBy.all`, `DataFrameGroupBy.any`, and `SeriesGroupBy.any`.\n- Added support for `DataFrame.nlargest`, `DataFrame.nsmallest`, `Series.nlargest` and `Series.nsmallest`.\n- Added support for `replace` and `frac > 1` in `DataFrame.sample` and `Series.sample`.\n- Added support for `read_excel` (Uses local pandas for processing)\n- Added support for `Series.at`, `Series.iat`, `DataFrame.at`, and `DataFrame.iat`.\n- Added support for `Series.dt.isocalendar`.\n- Added support for `Series.case_when` except when condition or replacement is callable.\n- Added documentation pages for `Index` and its APIs.\n- Added support for `DataFrame.assign`.\n- Added support for `DataFrame.stack`.\n- Added support for `DataFrame.pivot` and `pd.pivot`.\n- Added support for `DataFrame.to_csv` and `Series.to_csv`.\n- Added support for `Index.T`.\n\n#### Bug Fixes\n\n- Fixed a bug that causes output of GroupBy.aggregate's columns to be ordered incorrectly.\n- Fixed a bug where `DataFrame.describe` on a frame with duplicate columns of differing dtypes could cause an error or incorrect results.\n- Fixed a bug in `DataFrame.rolling` and `Series.rolling` so `window=0` now throws `NotImplementedError` instead of `ValueError`\n\n#### Improvements\n\n- Added support for named aggregations in `DataFrame.aggregate` and `Series.aggregate` with `axis=0`.\n- `pd.read_csv` reads using the native pandas CSV parser, then uploads data to snowflake using parquet. This enables most of the parameters supported by `read_csv` including date parsing and numeric conversions. Uploading via parquet is roughly twice as fast as uploading via CSV.\n- Initial work to support an `pd.Index` directly in Snowpark pandas. Support for `pd.Index` as a first-class component of Snowpark pandas is coming soon.\n- Added a lazy index constructor and support for `len`, `shape`, `size`, `empty`, `to_pandas()` and `names`. For `df.index`, Snowpark pandas creates a lazy index object.\n- For `df.columns`, Snowpark pandas supports a non-lazy version of an `Index` since the data is already stored locally.\n\n## 1.18.0 (2024-05-28)\n\n### Snowpark Python API Updates\n\n#### Improvements\n\n- Improved error message to remind users set `{\"infer_schema\": True}` when reading csv file without specifying its schema.\n- Improved error handling for `Session.create_dataframe` when called with more than 512 rows and using `format` or `pyformat` `paramstyle`.\n\n### Snowpark pandas API Updates\n\n#### New Features\n\n- Added `DataFrame.cache_result` and `Series.cache_result` methods for users to persist DataFrames and Series to a temporary table lasting the duration of the session to improve latency of subsequent operations.\n\n#### Bug Fixes\n\n#### Improvements\n\n- Added partial support for `DataFrame.pivot_table` with no `index` parameter, as well as for `margins` parameter.\n- Updated the signature of `DataFrame.shift`/`Series.shift`/`DataFrameGroupBy.shift`/`SeriesGroupBy.shift` to match pandas 2.2.1. Snowpark pandas does not yet support the newly-added `suffix` argument, or sequence values of `periods`.\n- Re-added support for `Series.str.split`.\n\n#### Bug Fixes\n\n- Fixed how we support mixed columns for string methods (`Series.str.*`).\n\n### Snowpark Local Testing Updates\n\n#### New Features\n\n- Added support for the following DataFrameReader read options to file formats `csv` and `json`:\n - PURGE\n - PATTERN\n - INFER_SCHEMA with value being `False`\n - ENCODING with value being `UTF8`\n- Added support for `DataFrame.analytics.moving_agg` and `DataFrame.analytics.cumulative_agg_agg`.\n- Added support for `if_not_exists` parameter during UDF and stored procedure registration.\n\n#### Bug Fixes\n\n- Fixed a bug that when processing time format, fractional second part is not handled properly.\n- Fixed a bug that caused function calls on `*` to fail.\n- Fixed a bug that prevented creation of map and struct type objects.\n- Fixed a bug that function `date_add` was unable to handle some numeric types.\n- Fixed a bug that `TimestampType` casting resulted in incorrect data.\n- Fixed a bug that caused `DecimalType` data to have incorrect precision in some cases.\n- Fixed a bug where referencing missing table or view raises confusing `IndexError`.\n- Fixed a bug that mocked function `to_timestamp_ntz` can not handle None data.\n- Fixed a bug that mocked UDFs handles output data of None improperly.\n- Fixed a bug where `DataFrame.with_column_renamed` ignores attributes from parent DataFrames after join operations.\n- Fixed a bug that integer precision of large value gets lost when converted to pandas DataFrame.\n- Fixed a bug that the schema of datetime object is wrong when create DataFrame from a pandas DataFrame.\n- Fixed a bug in the implementation of `Column.equal_nan` where null data is handled incorrectly.\n- Fixed a bug where `DataFrame.drop` ignore attributes from parent DataFrames after join operations.\n- Fixed a bug in mocked function `date_part` where Column type is set wrong.\n- Fixed a bug where `DataFrameWriter.save_as_table` does not raise exceptions when inserting null data into non-nullable columns.\n- Fixed a bug in the implementation of `DataFrameWriter.save_as_table` where\n - Append or Truncate fails when incoming data has different schema than existing table.\n - Truncate fails when incoming data does not specify columns that are nullable.\n\n#### Improvements\n\n- Removed dependency check for `pyarrow` as it is not used.\n- Improved target type coverage of `Column.cast`, adding support for casting to boolean and all integral types.\n- Aligned error experience when calling UDFs and stored procedures.\n- Added appropriate error messages for `is_permanent` and `anonymous` options in UDFs and stored procedures registration to make it more clear that those features are not yet supported.\n- File read operation with unsupported options and values now raises `NotImplementedError` instead of warnings and unclear error information.\n\n## 1.17.0 (2024-05-21)\n\n### Snowpark Python API Updates\n\n#### New Features\n\n- Added support to add a comment on tables and views using the functions listed below:\n - `DataFrameWriter.save_as_table`\n - `DataFrame.create_or_replace_view`\n - `DataFrame.create_or_replace_temp_view`\n - `DataFrame.create_or_replace_dynamic_table`\n\n#### Improvements\n\n- Improved error message to remind users to set `{\"infer_schema\": True}` when reading CSV file without specifying its schema.\n\n### Snowpark pandas API Updates\n\n#### New Features\n\n- Start of Public Preview of Snowpark pandas API. Refer to the [Snowpark pandas API Docs](https://docs.snowflake.com/developer-guide/snowpark/python/snowpark-pandas) for more details.\n\n### Snowpark Local Testing Updates\n\n#### New Features\n\n- Added support for NumericType and VariantType data conversion in the mocked function `to_timestamp_ltz`, `to_timestamp_ntz`, `to_timestamp_tz` and `to_timestamp`.\n- Added support for DecimalType, BinaryType, ArrayType, MapType, TimestampType, DateType and TimeType data conversion in the mocked function `to_char`.\n- Added support for the following APIs:\n - snowflake.snowpark.functions:\n - to_varchar\n - snowflake.snowpark.DataFrame:\n - pivot\n - snowflake.snowpark.Session:\n - cancel_all\n- Introduced a new exception class `snowflake.snowpark.mock.exceptions.SnowparkLocalTestingException`.\n- Added support for casting to FloatType\n\n#### Bug Fixes\n\n- Fixed a bug that stored procedure and UDF should not remove imports already in the `sys.path` during the clean-up step.\n- Fixed a bug that when processing datetime format, the fractional second part is not handled properly.\n- Fixed a bug that on Windows platform that file operations was unable to properly handle file separator in directory name.\n- Fixed a bug that on Windows platform that when reading a pandas dataframe, IntervalType column with integer data can not be processed.\n- Fixed a bug that prevented users from being able to select multiple columns with the same alias.\n- Fixed a bug that `Session.get_current_[schema|database|role|user|account|warehouse]` returns upper-cased identifiers when identifiers are quoted.\n- Fixed a bug that function `substr` and `substring` can not handle 0-based `start_expr`.\n\n#### Improvements\n\n- Standardized the error experience by raising `SnowparkLocalTestingException` in error cases which is on par with `SnowparkSQLException` raised in non-local execution.\n- Improved error experience of `Session.write_pandas` method that `NotImplementError` will be raised when called.\n- Aligned error experience with reusing a closed session in non-local execution.\n\n## 1.16.0 (2024-05-07)\n\n### New Features\n\n- Support stored procedure register with packages given as Python modules.\n- Added snowflake.snowpark.Session.lineage.trace to explore data lineage of snowfake objects.\n- Added support for structured type schema parsing.\n\n### Bug Fixes\n\n- Fixed a bug when inferring schema, single quotes are added to stage files already have single quotes.\n\n### Local Testing Updates\n\n#### New Features\n\n- Added support for StringType, TimestampType and VariantType data conversion in the mocked function `to_date`.\n- Added support for the following APIs:\n - snowflake.snowpark.functions\n - get\n - concat\n - concat_ws\n\n#### Bug Fixes\n\n- Fixed a bug that caused `NaT` and `NaN` values to not be recognized.\n- Fixed a bug where, when inferring a schema, single quotes were added to stage files that already had single quotes.\n- Fixed a bug where `DataFrameReader.csv` was unable to handle quoted values containing a delimiter.\n- Fixed a bug that when there is `None` value in an arithmetic calculation, the output should remain `None` instead of `math.nan`.\n- Fixed a bug in function `sum` and `covar_pop` that when there is `math.nan` in the data, the output should also be `math.nan`.\n- Fixed a bug that stage operation can not handle directories.\n- Fixed a bug that `DataFrame.to_pandas` should take Snowflake numeric types with precision 38 as `int64`.\n\n## 1.15.0 (2024-04-24)\n\n### New Features\n\n- Added `truncate` save mode in `DataFrameWrite` to overwrite existing tables by truncating the underlying table instead of dropping it.\n- Added telemetry to calculate query plan height and number of duplicate nodes during collect operations.\n- Added the functions below to unload data from a `DataFrame` into one or more files in a stage:\n - `DataFrame.write.json`\n - `DataFrame.write.csv`\n - `DataFrame.write.parquet`\n- Added distributed tracing using open telemetry APIs for action functions in `DataFrame` and `DataFrameWriter`:\n - snowflake.snowpark.DataFrame:\n - collect\n - collect_nowait\n - to_pandas\n - count\n - show\n - snowflake.snowpark.DataFrameWriter:\n - save_as_table\n- Added support for snow:// URLs to `snowflake.snowpark.Session.file.get` and `snowflake.snowpark.Session.file.get_stream`\n- Added support to register stored procedures and UDxFs with a `comment`.\n- UDAF client support is ready for public preview. Please stay tuned for the Snowflake announcement of UDAF public preview.\n- Added support for dynamic pivot. This feature is currently in private preview.\n\n### Improvements\n\n- Improved the generated query performance for both compilation and execution by converting duplicate subqueries to Common Table Expressions (CTEs). It is still an experimental feature not enabled by default, and can be enabled by setting `session.cte_optimization_enabled` to `True`.\n\n### Bug Fixes\n\n- Fixed a bug where `statement_params` was not passed to query executions that register stored procedures and user defined functions.\n- Fixed a bug causing `snowflake.snowpark.Session.file.get_stream` to fail for quoted stage locations.\n- Fixed a bug that an internal type hint in `utils.py` might raise AttributeError in case the underlying module can not be found.\n\n### Local Testing Updates\n\n#### New Features\n\n- Added support for registering UDFs and stored procedures.\n- Added support for the following APIs:\n - snowflake.snowpark.Session:\n - file.put\n - file.put_stream\n - file.get\n - file.get_stream\n - read.json\n - add_import\n - remove_import\n - get_imports\n - clear_imports\n - add_packages\n - add_requirements\n - clear_packages\n - remove_package\n - udf.register\n - udf.register_from_file\n - sproc.register\n - sproc.register_from_file\n - snowflake.snowpark.functions\n - current_database\n - current_session\n - date_trunc\n - object_construct\n - object_construct_keep_null\n - pow\n - sqrt\n - udf\n - sproc\n- Added support for StringType, TimestampType and VariantType data conversion in the mocked function `to_time`.\n\n#### Bug Fixes\n\n- Fixed a bug that null filled columns for constant functions.\n- Fixed a bug that implementation of to_object, to_array and to_binary to better handle null inputs.\n- Fixed a bug that timestamp data comparison can not handle year beyond 2262.\n- Fixed a bug that `Session.builder.getOrCreate` should return the created mock session.\n\n## 1.14.0 (2024-03-20)\n\n### New Features\n\n- Added support for creating vectorized UDTFs with `process` method.\n- Added support for dataframe functions:\n - to_timestamp_ltz\n - to_timestamp_ntz\n - to_timestamp_tz\n - locate\n- Added support for ASOF JOIN type.\n- Added support for the following local testing APIs:\n - snowflake.snowpark.functions:\n - to_double\n - to_timestamp\n - to_timestamp_ltz\n - to_timestamp_ntz\n - to_timestamp_tz\n - greatest\n - least\n - convert_timezone\n - dateadd\n - date_part\n - snowflake.snowpark.Session:\n - get_current_account\n - get_current_warehouse\n - get_current_role\n - use_schema\n - use_warehouse\n - use_database\n - use_role\n\n### Bug Fixes\n\n- Fixed a bug in `SnowflakePlanBuilder` that `save_as_table` does not filter column that name start with '$' and follow by number correctly.\n- Fixed a bug that statement parameters may have no effect when resolving imports and packages.\n- Fixed bugs in local testing:\n - LEFT ANTI and LEFT SEMI joins drop rows with null values.\n - DataFrameReader.csv incorrectly parses data when the optional parameter `field_optionally_enclosed_by` is specified.\n - Column.regexp only considers the first entry when `pattern` is a `Column`.\n - Table.update raises `KeyError` when updating null values in the rows.\n - VARIANT columns raise errors at `DataFrame.collect`.\n - `count_distinct` does not work correctly when counting.\n - Null values in integer columns raise `TypeError`.\n\n### Improvements\n\n- Added telemetry to local testing.\n- Improved the error message of `DataFrameReader` to raise `FileNotFound` error when reading a path that does not exist or when there are no files under the path.\n\n## 1.13.0 (2024-02-26)\n\n### New Features\n\n- Added support for an optional `date_part` argument in function `last_day`.\n- `SessionBuilder.app_name` will set the query_tag after the session is created.\n- Added support for the following local testing functions:\n - current_timestamp\n - current_date\n - current_time\n - strip_null_value\n - upper\n - lower\n - length\n - initcap\n\n### Improvements\n\n- Added cleanup logic at interpreter shutdown to close all active sessions.\n- Closing sessions within stored procedures now is a no-op logging a warning instead of raising an error.\n\n### Bug Fixes\n\n- Fixed a bug in `DataFrame.to_local_iterator` where the iterator could yield wrong results if another query is executed before the iterator finishes due to wrong isolation level. For details, please see #945.\n- Fixed a bug that truncated table names in error messages while running a plan with local testing enabled.\n- Fixed a bug that `Session.range` returns empty result when the range is large.\n\n## 1.12.1 (2024-02-08)\n\n### Improvements\n\n- Use `split_blocks=True` by default during `to_pandas` conversion, for optimal memory allocation. This parameter is passed to `pyarrow.Table.to_pandas`, which enables `PyArrow` to split the memory allocation into smaller, more manageable blocks instead of allocating a single contiguous block. This results in better memory management when dealing with larger datasets.\n\n### Bug Fixes\n\n- Fixed a bug in `DataFrame.to_pandas` that caused an error when evaluating on a Dataframe with an `IntergerType` column with null values.\n\n## 1.12.0 (2024-01-30)\n\n### New Features\n\n- Exposed `statement_params` in `StoredProcedure.__call__`.\n- Added two optional arguments to `Session.add_import`.\n - `chunk_size`: The number of bytes to hash per chunk of the uploaded files.\n - `whole_file_hash`: By default only the first chunk of the uploaded import is hashed to save time. When this is set to True each uploaded file is fully hashed instead.\n- Added parameters `external_access_integrations` and `secrets` when creating a UDAF from Snowpark Python to allow integration with external access.\n- Added a new method `Session.append_query_tag`. Allows an additional tag to be added to the current query tag by appending it as a comma separated value.\n- Added a new method `Session.update_query_tag`. Allows updates to a JSON encoded dictionary query tag.\n- `SessionBuilder.getOrCreate` will now attempt to replace the singleton it returns when token expiration has been detected.\n- Added support for new functions in `snowflake.snowpark.functions`:\n - `array_except`\n - `create_map`\n - `sign`/`signum`\n- Added the following functions to `DataFrame.analytics`:\n - Added the `moving_agg` function in `DataFrame.analytics` to enable moving aggregations like sums and averages with multiple window sizes.\n - Added the `cummulative_agg` function in `DataFrame.analytics` to enable commulative aggregations like sums and averages on multiple columns.\n - Added the `compute_lag` and `compute_lead` functions in `DataFrame.analytics` for enabling lead and lag calculations on multiple columns.\n - Added the `time_series_agg` function in `DataFrame.analytics` to enable time series aggregations like sums and averages with multiple time windows.\n\n### Bug Fixes\n\n- Fixed a bug in `DataFrame.na.fill` that caused Boolean values to erroneously override integer values.\n- Fixed a bug in `Session.create_dataframe` where the Snowpark DataFrames created using pandas DataFrames were not inferring the type for timestamp columns correctly. The behavior is as follows:\n - Earlier timestamp columns without a timezone would be converted to nanosecond epochs and inferred as `LongType()`, but will now be correctly maintained as timestamp values and be inferred as `TimestampType(TimestampTimeZone.NTZ)`.\n - Earlier timestamp columns with a timezone would be inferred as `TimestampType(TimestampTimeZone.NTZ)` and loose timezone information but will now be correctly inferred as `TimestampType(TimestampTimeZone.LTZ)` and timezone information is retained correctly.\n - Set session parameter `PYTHON_SNOWPARK_USE_LOGICAL_TYPE_FOR_CREATE_DATAFRAME` to revert back to old behavior. It is recommended that you update your code to align with correct behavior because the parameter will be removed in the future.\n- Fixed a bug that `DataFrame.to_pandas` gets decimal type when scale is not 0, and creates an object dtype in `pandas`. Instead, we cast the value to a float64 type.\n- Fixed bugs that wrongly flattened the generated SQL when one of the following happens:\n - `DataFrame.filter()` is called after `DataFrame.sort().limit()`.\n - `DataFrame.sort()` or `filter()` is called on a DataFrame that already has a window function or sequence-dependent data generator column.\n For instance, `df.select(\"a\", seq1().alias(\"b\")).select(\"a\", \"b\").sort(\"a\")` won't flatten the sort clause anymore.\n - a window or sequence-dependent data generator column is used after `DataFrame.limit()`. For instance, `df.limit(10).select(row_number().over())` won't flatten the limit and select in the generated SQL.\n- Fixed a bug where aliasing a DataFrame column raised an error when the DataFame was copied from another DataFrame with an aliased column. For instance,\n\n ```python\n df = df.select(col(\"a\").alias(\"b\"))\n df = copy(df)\n df.select(col(\"b\").alias(\"c\")) # threw an error. Now it's fixed.\n ```\n\n- Fixed a bug in `Session.create_dataframe` that the non-nullable field in a schema is not respected for boolean type. Note that this fix is only effective when the user has the privilege to create a temp table.\n- Fixed a bug in SQL simplifier where non-select statements in `session.sql` dropped a SQL query when used with `limit()`.\n- Fixed a bug that raised an exception when session parameter `ERROR_ON_NONDETERMINISTIC_UPDATE` is true.\n\n### Behavior Changes (API Compatible)\n\n- When parsing data types during a `to_pandas` operation, we rely on GS precision value to fix precision issues for large integer values. This may affect users where a column that was earlier returned as `int8` gets returned as `int64`. Users can fix this by explicitly specifying precision values for their return column.\n- Aligned behavior for `Session.call` in case of table stored procedures where running `Session.call` would not trigger stored procedure unless a `collect()` operation was performed.\n- `StoredProcedureRegistration` will now automatically add `snowflake-snowpark-python` as a package dependency. The added dependency will be on the client's local version of the library and an error is thrown if the server cannot support that version.\n\n## 1.11.1 (2023-12-07)\n\n### Bug Fixes\n\n- Fixed a bug that numpy should not be imported at the top level of mock module.\n- Added support for these new functions in `snowflake.snowpark.functions`:\n - `from_utc_timestamp`\n - `to_utc_timestamp`\n\n## 1.11.0 (2023-12-05)\n\n### New Features\n\n- Add the `conn_error` attribute to `SnowflakeSQLException` that stores the whole underlying exception from `snowflake-connector-python`.\n- Added support for `RelationalGroupedDataframe.pivot()` to access `pivot` in the following pattern `Dataframe.group_by(...).pivot(...)`.\n- Added experimental feature: Local Testing Mode, which allows you to create and operate on Snowpark Python DataFrames locally without connecting to a Snowflake account. You can use the local testing framework to test your DataFrame operations locally, on your development machine or in a CI (continuous integration) pipeline, before deploying code changes to your account.\n\n- Added support for `arrays_to_object` new functions in `snowflake.snowpark.functions`.\n- Added support for the vector data type.\n\n### Dependency Updates\n\n- Bumped cloudpickle dependency to work with `cloudpickle==2.2.1`\n- Updated ``snowflake-connector-python`` to `3.4.0`.\n\n### Bug Fixes\n\n- DataFrame column names quoting check now supports newline characters.\n- Fix a bug where a DataFrame generated by `session.read.with_metadata` creates inconsistent table when doing `df.write.save_as_table`.\n\n## 1.10.0 (2023-11-03)\n\n### New Features\n\n- Added support for managing case sensitivity in `DataFrame.to_local_iterator()`.\n- Added support for specifying vectorized UDTF's input column names by using the optional parameter `input_names` in `UDTFRegistration.register/register_file` and `functions.pandas_udtf`. By default, `RelationalGroupedDataFrame.applyInPandas` will infer the column names from current dataframe schema.\n- Add `sql_error_code` and `raw_message` attributes to `SnowflakeSQLException` when it is caused by a SQL exception.\n\n### Bug Fixes\n\n- Fixed a bug in `DataFrame.to_pandas()` where converting snowpark dataframes to pandas dataframes was losing precision on integers with more than 19 digits.\n- Fixed a bug that `session.add_packages` can not handle requirement specifier that contains project name with underscore and version.\n- Fixed a bug in `DataFrame.limit()` when `offset` is used and the parent `DataFrame` uses `limit`. Now the `offset` won't impact the parent DataFrame's `limit`.\n- Fixed a bug in `DataFrame.write.save_as_table` where dataframes created from read api could not save data into snowflake because of invalid column name `$1`.\n\n### Behavior change\n\n- Changed the behavior of `date_format`:\n - The `format` argument changed from optional to required.\n - The returned result changed from a date object to a date-formatted string.\n- When a window function, or a sequence-dependent data generator (`normal`, `zipf`, `uniform`, `seq1`, `seq2`, `seq4`, `seq8`) function is used, the sort and filter operation will no longer be flattened when generating the query.\n\n## 1.9.0 (2023-10-13)\n\n### New Features\n\n- Added support for the Python 3.11 runtime environment.\n\n### Dependency updates\n\n- Added back the dependency of `typing-extensions`.\n\n### Bug Fixes\n\n- Fixed a bug where imports from permanent stage locations were ignored for temporary stored procedures, UDTFs, UDFs, and UDAFs.\n- Revert back to using CTAS (create table as select) statement for `Dataframe.writer.save_as_table` which does not need insert permission for writing tables.\n\n### New Features\n- Support `PythonObjJSONEncoder` json-serializable objects for `ARRAY` and `OBJECT` literals.\n\n## 1.8.0 (2023-09-14)\n\n### New Features\n\n- Added support for VOLATILE/IMMUTABLE keyword when registering UDFs.\n- Added support for specifying clustering keys when saving dataframes using `DataFrame.save_as_table`.\n- Accept `Iterable` objects input for `schema` when creating dataframes using `Session.create_dataframe`.\n- Added the property `DataFrame.session` to return a `Session` object.\n- Added the property `Session.session_id` to return an integer that represents session ID.\n- Added the property `Session.connection` to return a `SnowflakeConnection` object .\n\n- Added support for creating a Snowpark session from a configuration file or environment variables.\n\n### Dependency updates\n\n- Updated ``snowflake-connector-python`` to 3.2.0.\n\n### Bug Fixes\n\n- Fixed a bug where automatic package upload would raise `ValueError` even when compatible package version were added in `session.add_packages`.\n- Fixed a bug where table stored procedures were not registered correctly when using `register_from_file`.\n- Fixed a bug where dataframe joins failed with `invalid_identifier` error.\n- Fixed a bug where `DataFrame.copy` disables SQL simplfier for the returned copy.\n- Fixed a bug where `session.sql().select()` would fail if any parameters are specified to `session.sql()`\n\n## 1.7.0 (2023-08-28)\n\n### New Features\n\n- Added parameters `external_access_integrations` and `secrets` when creating a UDF, UDTF or Stored Procedure from Snowpark Python to allow integration with external access.\n- Added support for these new functions in `snowflake.snowpark.functions`:\n - `array_flatten`\n - `flatten`\n- Added support for `apply_in_pandas` in `snowflake.snowpark.relational_grouped_dataframe`.\n- Added support for replicating your local Python environment on Snowflake via `Session.replicate_local_environment`.\n\n### Bug Fixes\n\n- Fixed a bug where `session.create_dataframe` fails to properly set nullable columns where nullability was affected by order or data was given.\n- Fixed a bug where `DataFrame.select` could not identify and alias columns in presence of table functions when output columns of table function overlapped with columns in dataframe.\n\n### Behavior Changes\n\n- When creating stored procedures, UDFs, UDTFs, UDAFs with parameter `is_permanent=False` will now create temporary objects even when `stage_name` is provided. The default value of `is_permanent` is `False` which is why if this value is not explicitly set to `True` for permanent objects, users will notice a change in behavior.\n- `types.StructField` now enquotes column identifier by default.\n\n## 1.6.1 (2023-08-02)\n\n### New Features\n\n- Added support for these new functions in `snowflake.snowpark.functions`:\n - `array_sort`\n - `sort_array`\n - `array_min`\n - `array_max`\n - `explode_outer`\n- Added support for pure Python packages specified via `Session.add_requirements` or `Session.add_packages`. They are now usable in stored procedures and UDFs even if packages are not present on the Snowflake Anaconda channel.\n - Added Session parameter `custom_packages_upload_enabled` and `custom_packages_force_upload_enabled` to enable the support for pure Python packages feature mentioned above. Both parameters default to `False`.\n- Added support for specifying package requirements by passing a Conda environment yaml file to `Session.add_requirements`.\n- Added support for asynchronous execution of multi-query dataframes that contain binding variables.\n- Added support for renaming multiple columns in `DataFrame.rename`.\n- Added support for Geometry datatypes.\n- Added support for `params` in `session.sql()` in stored procedures.\n- Added support for user-defined aggregate functions (UDAFs). This feature is currently in private preview.\n- Added support for vectorized UDTFs (user-defined table functions). This feature is currently in public preview.\n- Added support for Snowflake Timestamp variants (i.e., `TIMESTAMP_NTZ`, `TIMESTAMP_LTZ`, `TIMESTAMP_TZ`)\n - Added `TimestampTimezone` as an argument in `TimestampType` constructor.\n - Added type hints `NTZ`, `LTZ`, `TZ` and `Timestamp` to annotate functions when registering UDFs.\n\n### Improvements\n\n- Removed redundant dependency `typing-extensions`.\n- `DataFrame.cache_result` now creates temp table fully qualified names under current database and current schema.\n\n### Bug Fixes\n\n- Fixed a bug where type check happens on pandas before it is imported.\n- Fixed a bug when creating a UDF from `numpy.ufunc`.\n- Fixed a bug where `DataFrame.union` was not generating the correct `Selectable.schema_query` when SQL simplifier is enabled.\n\n### Behavior Changes\n\n- `DataFrameWriter.save_as_table` now respects the `nullable` field of the schema provided by the user or the inferred schema based on data from user input.\n\n### Dependency updates\n\n- Updated ``snowflake-connector-python`` to 3.0.4.\n\n## 1.5.1 (2023-06-20)\n\n### New Features\n\n- Added support for the Python 3.10 runtime environment.\n\n## 1.5.0 (2023-06-09)\n\n### Behavior Changes\n\n- Aggregation results, from functions such as `DataFrame.agg` and `DataFrame.describe`, no longer strip away non-printing characters from column names.\n\n### New Features\n\n- Added support for the Python 3.9 runtime environment.\n- Added support for new functions in `snowflake.snowpark.functions`:\n - `array_generate_range`\n - `array_unique_agg`\n - `collect_set`\n - `sequence`\n- Added support for registering and calling stored procedures with `TABLE` return type.\n- Added support for parameter `length` in `StringType()` to specify the maximum number of characters that can be stored by the column.\n- Added the alias `functions.element_at()` for `functions.get()`.\n- Added the alias `Column.contains` for `functions.contains`.\n- Added experimental feature `DataFrame.alias`.\n- Added support for querying metadata columns from stage when creating `DataFrame` using `DataFrameReader`.\n- Added support for `StructType.add` to append more fields to existing `StructType` objects.\n- Added support for parameter `execute_as` in `StoredProcedureRegistration.register_from_file()` to specify stored procedure caller rights.\n\n### Bug Fixes\n\n- Fixed a bug where the `Dataframe.join_table_function` did not run all of the necessary queries to set up the join table function when SQL simplifier was enabled.\n- Fixed type hint declaration for custom types - `ColumnOrName`, `ColumnOrLiteralStr`, `ColumnOrSqlExpr`, `LiteralType` and `ColumnOrLiteral` that were breaking `mypy` checks.\n- Fixed a bug where `DataFrameWriter.save_as_table` and `DataFrame.copy_into_table` failed to parse fully qualified table names.\n\n## 1.4.0 (2023-04-24)\n\n### New Features\n\n- Added support for `session.getOrCreate`.\n- Added support for alias `Column.getField`.\n- Added support for new functions in `snowflake.snowpark.functions`:\n - `date_add` and `date_sub` to make add and subtract operations easier.\n - `daydiff`\n - `explode`\n - `array_distinct`.\n - `regexp_extract`.\n - `struct`.\n - `format_number`.\n - `bround`.\n - `substring_index`\n- Added parameter `skip_upload_on_content_match` when creating UDFs, UDTFs and stored procedures using `register_from_file` to skip uploading files to a stage if the same version of the files are already on the stage.\n- Added support for `DataFrameWriter.save_as_table` method to take table names that contain dots.\n- Flattened generated SQL when `DataFrame.filter()` or `DataFrame.order_by()` is followed by a projection statement (e.g. `DataFrame.select()`, `DataFrame.with_column()`).\n- Added support for creating dynamic tables _(in private preview)_ using `Dataframe.create_or_replace_dynamic_table`.\n- Added an optional argument `params` in `session.sql()` to support binding variables. Note that this is not supported in stored procedures yet.\n\n### Bug Fixes\n\n- Fixed a bug in `strtok_to_array` where an exception was thrown when a delimiter was passed in.\n- Fixed a bug in `session.add_import` where the module had the same namespace as other dependencies.\n\n## 1.3.0 (2023-03-28)\n\n### New Features\n\n- Added support for `delimiters` parameter in `functions.initcap()`.\n- Added support for `functions.hash()` to accept a variable number of input expressions.\n- Added API `Session.RuntimeConfig` for getting/setting/checking the mutability of any runtime configuration.\n- Added support managing case sensitivity in `Row` results from `DataFrame.collect` using `case_sensitive` parameter.\n- Added API `Session.conf` for getting, setting or checking the mutability of any runtime configuration.\n- Added support for managing case sensitivity in `Row` results from `DataFrame.collect` using `case_sensitive` parameter.\n- Added indexer support for `snowflake.snowpark.types.StructType`.\n- Added a keyword argument `log_on_exception` to `Dataframe.collect` and `Dataframe.collect_no_wait` to optionally disable error logging for SQL exceptions.\n\n### Bug Fixes\n\n- Fixed a bug where a DataFrame set operation(`DataFrame.substract`, `DataFrame.union`, etc.) being called after another DataFrame set operation and `DataFrame.select` or `DataFrame.with_column` throws an exception.\n- Fixed a bug where chained sort statements are overwritten by the SQL simplifier.\n\n### Improvements\n\n- Simplified JOIN queries to use constant subquery aliases (`SNOWPARK_LEFT`, `SNOWPARK_RIGHT`) by default. Users can disable this at runtime with `session.conf.set('use_constant_subquery_alias', False)` to use randomly generated alias names instead.\n- Allowed specifying statement parameters in `session.call()`.\n- Enabled the uploading of large pandas DataFrames in stored procedures by defaulting to a chunk size of 100,000 rows.\n\n## 1.2.0 (2023-03-02)\n\n### New Features\n\n- Added support for displaying source code as comments in the generated scripts when registering stored procedures. This\n is enabled by default, turn off by specifying `source_code_display=False` at registration.\n- Added a parameter `if_not_exists` when creating a UDF, UDTF or Stored Procedure from Snowpark Python to ignore creating the specified function or procedure if it already exists.\n- Accept integers when calling `snowflake.snowpark.functions.get` to extract value from array.\n- Added `functions.reverse` in functions to open access to Snowflake built-in function\n [reverse](https://docs.snowflake.com/en/sql-reference/functions/reverse).\n- Added parameter `require_scoped_url` in snowflake.snowflake.files.SnowflakeFile.open() `(in Private Preview)` to replace `is_owner_file` is marked for deprecation.\n\n### Bug Fixes\n\n- Fixed a bug that overwrote `paramstyle` to `qmark` when creating a Snowpark session.\n- Fixed a bug where\u00a0`df.join(..., how=\"cross\")`\u00a0fails with\u00a0`SnowparkJoinException: (1112): Unsupported using join type 'Cross'`.\n- Fixed a bug where querying a `DataFrame` column created from chained function calls used a wrong column name.\n\n## 1.1.0 (2023-01-26)\n\n### New Features:\n\n- Added `asc`, `asc_nulls_first`, `asc_nulls_last`, `desc`, `desc_nulls_first`, `desc_nulls_last`, `date_part` and `unix_timestamp` in functions.\n- Added the property `DataFrame.dtypes` to return a list of column name and data type pairs.\n- Added the following aliases:\n - `functions.expr()` for `functions.sql_expr()`.\n - `functions.date_format()` for `functions.to_date()`.\n - `functions.monotonically_increasing_id()` for `functions.seq8()`\n - `functions.from_unixtime()` for `functions.to_timestamp()`\n\n### Bug Fixes:\n\n- Fixed a bug in SQL simplifier that didn\u2019t handle Column alias and join well in some cases. See https://github.com/snowflakedb/snowpark-python/issues/658 for details.\n- Fixed a bug in SQL simplifier that generated wrong column names for function calls, NaN and INF.\n\n### Improvements\n\n- The session parameter `PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER` is `True` after Snowflake 7.3 was released. In snowpark-python, `session.sql_simplifier_enabled` reads the value of `PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER` by default, meaning that the SQL simplfier is enabled by default after the Snowflake 7.3 release. To turn this off, set `PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER` in Snowflake to `False` or run `session.sql_simplifier_enabled = False` from Snowpark. It is recommended to use the SQL simplifier because it helps to generate more concise SQL.\n\n## 1.0.0 (2022-11-01)\n\n### New Features\n\n- Added `Session.generator()` to create a new `DataFrame` using the Generator table function.\n- Added a parameter `secure` to the functions that create a secure UDF or UDTF.\n\n## 0.12.0 (2022-10-14)\n\n### New Features\n\n- Added new APIs for async job:\n - `Session.create_async_job()` to create an `AsyncJob` instance from a query id.\n - `AsyncJob.result()` now accepts argument `result_type` to return the results in different formats.\n - `AsyncJob.to_df()` returns a `DataFrame` built from the result of this asynchronous job.\n - `AsyncJob.query()` returns the SQL text of the executed query.\n- `DataFrame.agg()` and `RelationalGroupedDataFrame.agg()` now accept variable-length arguments.\n- Added parameters `lsuffix` and `rsuffix` to `DataFram.join()` and `DataFrame.cross_join()` to conveniently rename overlapping columns.\n- Added `Table.drop_table()` so you can drop the temp table after `DataFrame.cache_result()`. `Table` is also a context manager so you can use the `with` statement to drop the cache temp table after use.\n- Added `Session.use_secondary_roles()`.\n- Added functions `first_value()` and `last_value()`. (contributed by @chasleslr)\n- Added `on` as an alias for `using_columns` and `how` as an alias for `join_type` in `DataFrame.join()`.\n\n### Bug Fixes\n\n- Fixed a bug in `Session.create_dataframe()` that raised an error when `schema` names had special characters.\n- Fixed a bug in which options set in `Session.read.option()` were not passed to `DataFrame.copy_into_table()` as default values.\n- Fixed a bug in which `DataFrame.copy_into_table()` raises an error when a copy option has single quotes in the value.\n\n## 0.11.0 (2022-09-28)\n\n### Behavior Changes\n\n- `Session.add_packages()` now raises `ValueError` when the version of a package cannot be found in Snowflake Anaconda channel. Previously, `Session.add_packages()` succeeded, and a `SnowparkSQLException` exception was raised later in the UDF/SP registration step.\n\n### New Features:\n\n- Added method `FileOperation.get_stream()` to support downloading stage files as stream.\n- Added support in `functions.ntiles()` to accept int argument.\n- Added the following aliases:\n - `functions.call_function()` for `functions.call_builtin()`.\n - `functions.function()` for `functions.builtin()`.\n - `DataFrame.order_by()` for `DataFrame.sort()`\n - `DataFrame.orderBy()` for `DataFrame.sort()`\n- Improved `DataFrame.cache_result()` to return a more accurate `Table` class instead of a `DataFrame` class.\n- Added support to allow `session` as the first argument when calling `StoredProcedure`.\n\n### Improvements\n\n- Improved nested query generation by flattening queries when applicable.\n - This improvement could be enabled by setting `Session.sql_simplifier_enabled = True`.\n - `DataFrame.select()`, `DataFrame.with_column()`, `DataFrame.drop()` and other select-related APIs have more flattened SQLs.\n - `DataFrame.union()`, `DataFrame.union_all()`, `DataFrame.except_()`, `DataFrame.intersect()`, `DataFrame.union_by_name()` have flattened SQLs generated when multiple set operators are chained.\n- Improved type annotations for async job APIs.\n\n### Bug Fixes\n\n- Fixed a bug in which `Table.update()`, `Table.delete()`, `Table.merge()` try to reference a temp table that does not exist.\n\n## 0.10.0 (2022-09-16)\n\n### New Features:\n\n- Added experimental APIs for evaluating Snowpark dataframes with asynchronous queries:\n - Added keyword argument `block` to the following action APIs on Snowpark dataframes (which execute queries) to allow asynchronous evaluations:\n - `DataFrame.collect()`, `DataFrame.to_local_iterator()`, `DataFrame.to_pandas()`, `DataFrame.to_pandas_batches()`, `DataFrame.count()`, `DataFrame.first()`.\n - `DataFrameWriter.save_as_table()`, `DataFrameWriter.copy_into_location()`.\n - `Table.delete()`, `Table.update()`, `Table.merge()`.\n - Added method `DataFrame.collect_nowait()` to allow asynchronous evaluations.\n - Added class `AsyncJob` to retrieve results from asynchronously executed queries and check their status.\n- Added support for `table_type` in `Session.write_pandas()`. You can now choose from these `table_type` options: `\"temporary\"`, `\"temp\"`, and `\"transient\"`.\n- Added support for using Python structured data (`list`, `tuple` and `dict`) as literal values in Snowpark.\n- Added keyword argument `execute_as` to `functions.sproc()` and `session.sproc.register()` to allow registering a stored procedure as a caller or owner.\n- Added support for specifying a pre-configured file format when reading files from a stage in Snowflake.\n\n### Improvements:\n\n- Added support for displaying details of a Snowpark session.\n\n### Bug Fixes:\n\n- Fixed a bug in which `DataFrame.copy_into_table()` and `DataFrameWriter.save_as_table()` mistakenly created a new table if the table name is fully qualified, and the table already exists.\n\n### Deprecations:\n\n- Deprecated keyword argument `create_temp_table` in `Session.write_pandas()`.\n- Deprecated invoking UDFs using arguments wrapped in a Python list or tuple. You can use variable-length arguments without a list or tuple.\n\n### Dependency updates\n\n- Updated ``snowflake-connector-python`` to 2.7.12.\n\n## 0.9.0 (2022-08-30)\n\n### New Features:\n\n- Added support for displaying source code as comments in the generated scripts when registering UDFs.\n This feature is turned on by default. To turn it off, pass the new keyword argument `source_code_display` as `False` when calling `register()` or `@udf()`.\n- Added support for calling table functions from `DataFrame.select()`, `DataFrame.with_column()` and `DataFrame.with_columns()` which now take parameters of type `table_function.TableFunctionCall` for columns.\n- Added keyword argument `overwrite` to `session.write_pandas()` to allow overwriting contents of a Snowflake table with that of a pandas DataFrame.\n- Added keyword argument `column_order` to `df.write.save_as_table()` to specify the matching rules when inserting data into table in append mode.\n- Added method `FileOperation.put_stream()` to upload local files to a stage via file stream.\n- Added methods `TableFunctionCall.alias()` and `TableFunctionCall.as_()` to allow aliasing the names of columns that come from the output of table function joins.\n- Added function `get_active_session()` in module `snowflake.snowpark.context` to get the current active Snowpark session.\n\n### Bug Fixes:\n\n- Fixed a bug in which batch insert should not raise an error when `statement_params` is not passed to the function.\n- Fixed a bug in which column names should be quoted when `session.create_dataframe()` is called with dicts and a given schema.\n- Fixed a bug in which creation of table should be skipped if the table already exists and is in append mode when calling `df.write.save_as_table()`.\n- Fixed a bug in which third-party packages with underscores cannot be added when registering UDFs.\n\n### Improvements:\n\n- Improved function `function.uniform()` to infer the types of inputs `max_` and `min_` and cast the limits to `IntegerType` or `FloatType` correspondingly.\n\n## 0.8.0 (2022-07-22)\n\n### New Features:\n\n- Added keyword only argument `statement_params` to the following methods to allow for specifying statement level parameters:\n - `collect`, `to_local_iterator`, `to_pandas`, `to_pandas_batches`,\n `count`, `copy_into_table`, `show`, `create_or_replace_view`, `create_or_replace_temp_view`, `first`, `cache_result`\n and `random_split` on class `snowflake.snowpark.Dateframe`.\n - `update`, `delete` and `merge` on class `snowflake.snowpark.Table`.\n - `save_as_table` and `copy_into_location` on class `snowflake.snowpark.DataFrameWriter`.\n - `approx_quantile`, `statement_params`, `cov` and `crosstab` on class `snowflake.snowpark.DataFrameStatFunctions`.\n - `register` and `register_from_file` on class `snowflake.snowpark.udf.UDFRegistration`.\n - `register` and `register_from_file` on class `snowflake.snowpark.udtf.UDTFRegistration`.\n - `register` and `register_from_file` on class `snowflake.snowpark.stored_procedure.StoredProcedureRegistration`.\n - `udf`, `udtf` and `sproc` in `snowflake.snowpark.functions`.\n- Added support for `Column` as an input argument to `session.call()`.\n- Added support for `table_type` in `df.write.save_as_table()`. You can now choose from these `table_type` options: `\"temporary\"`, `\"temp\"`, and `\"transient\"`.\n\n### Improvements:\n\n- Added validation of object name in `session.use_*` methods.\n- Updated the query tag in SQL to escape it when it has special characters.\n- Added a check to see if Anaconda terms are acknowledged when adding missing packages.\n\n### Bug Fixes:\n\n- Fixed the limited length of the string column in `session.create_dataframe()`.\n- Fixed a bug in which `session.create_dataframe()` mistakenly converted 0 and `False` to `None` when the input data was only a list.\n- Fixed a bug in which calling `session.create_dataframe()` using a large local dataset sometimes created a temp table twice.\n- Aligned the definition of `function.trim()` with the SQL function definition.\n- Fixed an issue where snowpark-python would hang when using the Python system-defined (built-in function) `sum` vs. the Snowpark `function.sum()`.\n\n### Deprecations:\n\n- Deprecated keyword argument `create_temp_table` in `df.write.save_as_table()`.\n\n## 0.7.0 (2022-05-25)\n\n### New Features:\n\n- Added support for user-defined table functions (UDTFs).\n - Use function `snowflake.snowpark.functions.udtf()` to register a UDTF, or use it as a decorator to register the UDTF.\n - You can also use `Session.udtf.register()` to register a UDTF.\n - Use `Session.udtf.register_from_file()` to register a UDTF from a Python file.\n- Updated APIs to query a table function, including both Snowflake built-in table functions and UDTFs.\n - Use function `snowflake.snowpark.functions.table_function()` to create a callable representing a table function and use it to call the table function in a query.\n - Alternatively, use function `snowflake.snowpark.functions.call_table_function()` to call a table function.\n - Added support for `over` clause that specifies `partition by` and `order by` when lateral joining a table function.\n - Updated `Session.table_function()` and `DataFrame.join_table_function()` to accept `TableFunctionCall` instances.\n\n### Breaking Changes:\n\n- When creating a function with `functions.udf()` and `functions.sproc()`, you can now specify an empty list for the `imports` or `packages` argument to indicate that no import or package is used for this UDF or stored procedure. Previously, specifying an empty list meant that the function would use session-level imports or packages.\n- Improved the `__repr__` implementation of data types in `types.py`. The unused `type_name` property has been removed.\n- Added a Snowpark-specific exception class for SQL errors. This replaces the previous `ProgrammingError` from the Python connector.\n\n### Improvements:\n\n- Added a lock to a UDF or UDTF when it is called for the first time per thread.\n- Improved the error message for pickling errors that occurred during UDF creation.\n- Included the query ID when logging the failed query.\n\n### Bug Fixes:\n\n- Fixed a bug in which non-integral data (such as timestamps) was occasionally converted to integer when calling `DataFrame.to_pandas()`.\n- Fixed a bug in which `DataFrameReader.parquet()` failed to read a parquet file when its column contained spaces.\n- Fixed a bug in which `DataFrame.copy_into_table()` failed when the dataframe is created by reading a file with inferred schemas.\n\n### Deprecations\n\n`Session.flatten()` and `DataFrame.flatten()`.\n\n### Dependency Updates:\n\n- Restricted the version of `cloudpickle` <= `2.0.0`.\n\n## 0.6.0 (2022-04-27)\n\n### New Features:\n\n- Added support for vectorized UDFs with the input as a pandas DataFrame or pandas Series and the output as a pandas Series. This improves the performance of UDFs in Snowpark.\n- Added support for inferring the schema of a DataFrame by default when it is created by reading a Parquet, Avro, or ORC file in the stage.\n- Added functions `current_session()`, `current_statement()`, `current_user()`, `current_version()`, `current_warehouse()`, `date_from_parts()`, `date_trunc()`, `dayname()`, `dayofmonth()`, `dayofweek()`, `dayofyear()`, `grouping()`, `grouping_id()`, `hour()`, `last_day()`, `minute()`, `next_day()`, `previous_day()`, `second()`, `month()`, `monthname()`, `quarter()`, `year()`, `current_database()`, `current_role()`, `current_schema()`, `current_schemas()`, `current_region()`, `current_avaliable_roles()`, `add_months()`, `any_value()`, `bitnot()`, `bitshiftleft()`, `bitshiftright()`, `convert_timezone()`, `uniform()`, `strtok_to_array()`, `sysdate()`, `time_from_parts()`, `timestamp_from_parts()`, `timestamp_ltz_from_parts()`, `timestamp_ntz_from_parts()`, `timestamp_tz_from_parts()`, `weekofyear()`, `percentile_cont()` to `snowflake.snowflake.functions`.\n\n### Breaking Changes:\n\n- Expired deprecations:\n - Removed the following APIs that were deprecated in 0.4.0: `DataFrame.groupByGroupingSets()`, `DataFrame.naturalJoin()`, `DataFrame.joinTableFunction`, `DataFrame.withColumns()`, `Session.getImports()`, `Session.addImport()`, `Session.removeImport()`, `Session.clearImports()`, `Session.getSessionStage()`, `Session.getDefaultDatabase()`, `Session.getDefaultSchema()`, `Session.getCurrentDatabase()`, `Session.getCurrentSchema()`, `Session.getFullyQualifiedCurrentSchema()`.\n\n### Improvements:\n\n- Added support for creating an empty `DataFrame` with a specific schema using the `Session.create_dataframe()` method.\n- Changed the logging level from `INFO` to `DEBUG` for several logs (e.g., the executed query) when evaluating a dataframe.\n- Improved the error message when failing to create a UDF due to pickle errors.\n\n### Bug Fixes:\n\n- Removed pandas hard dependencies in the `Session.create_dataframe()` method.\n\n### Dependency Updates:\n\n- Added `typing-extension` as a new dependency with the version >= `4.1.0`.\n\n## 0.5.0 (2022-03-22)\n\n### New Features\n\n- Added stored procedures API.\n - Added `Session.sproc` property and `sproc()` to `snowflake.snowpark.functions`, so you can register stored procedures.\n - Added `Session.call` to call stored procedures by name.\n- Added `UDFRegistration.register_from_file()` to allow registering UDFs from Python source files or zip files directly.\n- Added `UDFRegistration.describe()` to describe a UDF.\n- Added `DataFrame.random_split()` to provide a way to randomly split a dataframe.\n- Added functions `md5()`, `sha1()`, `sha2()`, `ascii()`, `initcap()`, `length()`, `lower()`, `lpad()`, `ltrim()`, `rpad()`, `rtrim()`, `repeat()`, `soundex()`, `regexp_count()`, `replace()`, `charindex()`, `collate()`, `collation()`, `insert()`, `left()`, `right()`, `endswith()` to `snowflake.snowpark.functions`.\n- Allowed `call_udf()` to accept literal values.\n- Provided a `distinct` keyword in `array_agg()`.\n\n### Bug Fixes:\n\n- Fixed an issue that caused `DataFrame.to_pandas()` to have a string column if `Column.cast(IntegerType())` was used.\n- Fixed a bug in `DataFrame.describe()` when there is more than one string column.\n\n## 0.4.0 (2022-02-15)\n\n### New Features\n\n- You can now specify which Anaconda packages to use when defining UDFs.\n - Added `add_packages()`, `get_packages()`, `clear_packages()`, and `remove_package()`, to class `Session`.\n - Added `add_requirements()` to `Session` so you can use a requirements file to specify which packages this session will use.\n - Added parameter `packages` to function `snowflake.snowpark.functions.udf()` and method `UserDefinedFunction.register()` to indicate UDF-level Anaconda package dependencies when creating a UDF.\n - Added parameter `imports` to `snowflake.snowpark.functions.udf()` and `UserDefinedFunction.register()` to specify UDF-level code imports.\n- Added a parameter `session` to function `udf()` and `UserDefinedFunction.register()` so you can specify which session to use to create a UDF if you have multiple sessions.\n- Added types `Geography` and `Variant` to `snowflake.snowpark.types` to be used as type hints for Geography and Variant data when defining a UDF.\n- Added support for Geography geoJSON data.\n- Added `Table`, a subclass of `DataFrame` for table operations:\n - Methods `update` and `delete` update and delete rows of a table in Snowflake.\n - Method `merge` merges data from a `DataFrame` to a `Table`.\n - Override method `DataFrame.sample()` with an additional parameter `seed`, which works on tables but not on view and sub-queries.\n- Added `DataFrame.to_local_iterator()` and `DataFrame.to_pandas_batches()` to allow getting results from an iterator when the result set returned from the Snowflake database is too large.\n- Added `DataFrame.cache_result()` for caching the operations performed on a `DataFrame` in a temporary table.\n Subsequent operations on the original `DataFrame` have no effect on the cached result `DataFrame`.\n- Added property `DataFrame.queries` to get SQL queries that will be executed to evaluate the `DataFrame`.\n- Added `Session.query_history()` as a context manager to track SQL queries executed on a session, including all SQL queries to evaluate `DataFrame`s created from a session. Both query ID and query text are recorded.\n- You can now create a `Session` instance from an existing established `snowflake.connector.SnowflakeConnection`. Use parameter `connection` in `Session.builder.configs()`.\n- Added `use_database()`, `use_schema()`, `use_warehouse()`, and `use_role()` to class `Session` to switch database/schema/warehouse/role after a session is created.\n- Added `DataFrameWriter.copy_into_table()` to unload a `DataFrame` to stage files.\n- Added `DataFrame.unpivot()`.\n- Added `Column.within_group()` for sorting the rows by columns with some aggregation functions.\n- Added functions `listagg()`, `mode()`, `div0()`, `acos()`, `asin()`, `atan()`, `atan2()`, `cos()`, `cosh()`, `sin()`, `sinh()`, `tan()`, `tanh()`, `degrees()`, `radians()`, `round()`, `trunc()`, and `factorial()` to `snowflake.snowflake.functions`.\n- Added an optional argument `ignore_nulls` in function `lead()` and `lag()`.\n- The `condition` parameter of function `when()` and `iff()` now accepts SQL expressions.\n\n### Improvements\n\n- All function and method names have been renamed to use the snake case naming style, which is more Pythonic. For convenience, some camel case names are kept as aliases to the snake case APIs. It is recommended to use the snake case APIs.\n - Deprecated these methods on class `Session` and replaced them with their snake case equivalents: `getImports()`, `addImports()`, `removeImport()`, `clearImports()`, `getSessionStage()`, `getDefaultSchema()`, `getDefaultSchema()`, `getCurrentDatabase()`, `getFullyQualifiedCurrentSchema()`.\n - Deprecated these methods on class `DataFrame` and replaced them with their snake case equivalents: `groupingByGroupingSets()`, `naturalJoin()`, `withColumns()`, `joinTableFunction()`.\n- Property `DataFrame.columns` is now consistent with `DataFrame.schema.names` and the Snowflake database `Identifier Requirements`.\n- `Column.__bool__()` now raises a `TypeError`. This will ban the use of logical operators `and`, `or`, `not` on `Column` object, for instance `col(\"a\") > 1 and col(\"b\") > 2` will raise the `TypeError`. Use `(col(\"a\") > 1) & (col(\"b\") > 2)` instead.\n- Changed `PutResult` and `GetResult` to subclass `NamedTuple`.\n- Fixed a bug which raised an error when the local path or stage location has a space or other special characters.\n- Changed `DataFrame.describe()` so that non-numeric and non-string columns are ignored instead of raising an exception.\n\n### Dependency updates\n\n- Updated ``snowflake-connector-python`` to 2.7.4.\n\n## 0.3.0 (2022-01-09)\n\n### New Features\n\n- Added `Column.isin()`, with an alias `Column.in_()`.\n- Added `Column.try_cast()`, which is a special version of `cast()`. It tries to cast a string expression to other types and returns `null` if the cast is not possible.\n- Added `Column.startswith()` and `Column.substr()` to process string columns.\n- `Column.cast()` now also accepts a `str` value to indicate the cast type in addition to a `DataType` instance.\n- Added `DataFrame.describe()` to summarize stats of a `DataFrame`.\n- Added `DataFrame.explain()` to print the query plan of a `DataFrame`.\n- `DataFrame.filter()` and `DataFrame.select_expr()` now accepts a sql expression.\n- Added a new `bool` parameter `create_temp_table` to methods `DataFrame.saveAsTable()` and `Session.write_pandas()` to optionally create a temp table.\n- Added `DataFrame.minus()` and `DataFrame.subtract()` as aliases to `DataFrame.except_()`.\n- Added `regexp_replace()`, `concat()`, `concat_ws()`, `to_char()`, `current_timestamp()`, `current_date()`, `current_time()`, `months_between()`, `cast()`, `try_cast()`, `greatest()`, `least()`, and `hash()` to module `snowflake.snowpark.functions`.\n\n### Bug Fixes\n\n- Fixed an issue where `Session.createDataFrame(pandas_df)` and `Session.write_pandas(pandas_df)` raise an exception when the `pandas DataFrame` has spaces in the column name.\n- `DataFrame.copy_into_table()` sometimes prints an `error` level log entry while it actually works. It's fixed now.\n- Fixed an API docs issue where some `DataFrame` APIs are missing from the docs.\n\n### Dependency updates\n\n- Update ``snowflake-connector-python`` to 2.7.2, which upgrades ``pyarrow`` dependency to 6.0.x. Refer to the [python connector 2.7.2 release notes](https://pypi.org/project/snowflake-connector-python/2.7.2/) for more details.\n\n## 0.2.0 (2021-12-02)\n\n### New Features\n\n- Updated the `Session.createDataFrame()` method for creating a `DataFrame` from a pandas DataFrame.\n- Added the `Session.write_pandas()` method for writing a `pandas DataFrame` to a table in Snowflake and getting a `Snowpark DataFrame` object back.\n- Added new classes and methods for calling window functions.\n- Added the new functions `cume_dist()`, to find the cumulative distribution of a value with regard to other values within a window partition,\n and `row_number()`, which returns a unique row number for each row within a window partition.\n- Added functions for computing statistics for DataFrames in the `DataFrameStatFunctions` class.\n- Added functions for handling missing values in a DataFrame in the `DataFrameNaFunctions` class.\n- Added new methods `rollup()`, `cube()`, and `pivot()` to the `DataFrame` class.\n- Added the `GroupingSets` class, which you can use with the DataFrame groupByGroupingSets method to perform a SQL GROUP BY GROUPING SETS.\n- Added the new `FileOperation(session)`\n class that you can use to upload and download files to and from a stage.\n- Added the `DataFrame.copy_into_table()`\n method for loading data from files in a stage into a table.\n- In CASE expressions, the functions `when()` and `otherwise()`\n now accept Python types in addition to `Column` objects.\n- When you register a UDF you can now optionally set the `replace` parameter to `True` to overwrite an existing UDF with the same name.\n\n### Improvements\n\n- UDFs are now compressed before they are uploaded to the server. This makes them about 10 times smaller, which can help\n when you are using large ML model files.\n- When the size of a UDF is less than 8196 bytes, it will be uploaded as in-line code instead of uploaded to a stage.\n\n### Bug Fixes\n\n- Fixed an issue where the statement `df.select(when(col(\"a\") == 1, 4).otherwise(col(\"a\"))), [Row(4), Row(2), Row(3)]` raised an exception.\n- Fixed an issue where `df.toPandas()` raised an exception when a DataFrame was created from large local data.\n\n## 0.1.0 (2021-10-26)\n\nStart of Private Preview\n",
"bugtrack_url": null,
"license": "Apache License, Version 2.0",
"summary": "Snowflake Snowpark for Python",
"version": "1.26.0",
"project_urls": {
"Changelog": "https://github.com/snowflakedb/snowpark-python/blob/main/CHANGELOG.md",
"Documentation": "https://docs.snowflake.com/en/developer-guide/snowpark/python/index.html",
"Homepage": "https://www.snowflake.com/",
"Issues": "https://github.com/snowflakedb/snowpark-python/issues",
"Source": "https://github.com/snowflakedb/snowpark-python"
},
"split_keywords": [
"snowflake",
"db",
"database",
"cloud",
"analytics",
"warehouse"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7dd2c7c0c999c35626463c9bcab883ef739ddacb7ff014a5f25a9bdde1dcb9e9",
"md5": "ebb38372934f40ea7c657577b8275cbe",
"sha256": "cef7314ec9742ffdf506e5a3c6fee58bd475afc4d1383dbdf063871bc0df21dc"
},
"downloads": -1,
"filename": "snowflake_snowpark_python-1.26.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ebb38372934f40ea7c657577b8275cbe",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.12,>=3.8",
"size": 1497102,
"upload_time": "2024-12-05T22:12:00",
"upload_time_iso_8601": "2024-12-05T22:12:00.035800Z",
"url": "https://files.pythonhosted.org/packages/7d/d2/c7c0c999c35626463c9bcab883ef739ddacb7ff014a5f25a9bdde1dcb9e9/snowflake_snowpark_python-1.26.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "63dcb6823c0621236dc7077a74c0da8369f6d4804aafaf4135bce0de8b13359c",
"md5": "209b26bf91b260b4155d309846d9e51d",
"sha256": "da982696597057fae66a76ebea8b7a9bc8d05a5f402eba0cbd6c3a8267457208"
},
"downloads": -1,
"filename": "snowflake_snowpark_python-1.26.0.tar.gz",
"has_sig": false,
"md5_digest": "209b26bf91b260b4155d309846d9e51d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.12,>=3.8",
"size": 1447421,
"upload_time": "2024-12-05T22:12:02",
"upload_time_iso_8601": "2024-12-05T22:12:02.383885Z",
"url": "https://files.pythonhosted.org/packages/63/dc/b6823c0621236dc7077a74c0da8369f6d4804aafaf4135bce0de8b13359c/snowflake_snowpark_python-1.26.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-05 22:12:02",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "snowflakedb",
"github_project": "snowpark-python",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "snowflake-snowpark-python"
}