# pandas-appender
[![Build Status](https://dev.azure.com/lindahl0577/pandas-appender/_apis/build/status/wumpus.pandas-appender?branchName=main)](https://dev.azure.com/lindahl0577/pandas-appender/_build/latest?definitionId=2&branchName=main) [![Coverage](https://coveralls.io/repos/github/wumpus/pandas-appender/badge.svg?branch=main)](https://coveralls.io/github/wumpus/pandas-appender?branch=main) [![Apache License 2.0](https://img.shields.io/github/license/wumpus/pandas-appender.svg)](LICENSE)
Have you ever wanted to append a bunch of rows to a Pandas DataFrame?
Turns out that it's extremely inefficient to do! For a large
dataframe, you're supposed to make multiple dataframes and `pd.concat()`
them instead.
Also, Pandas deprecated `dataframe.append()` in version 1.4 and
intends to remove it in 2.0.
So... helper function? Pandas doesn't have one. Roll your own?
Ugh. OK then: here's that helper function. It can append around 1
million very small rows per cpu-second. It has a modest additional
memory usage of around 5 megabytes, dynamically growing with the
number of rows appended.
## Install
`pip install pandas-appender`
## Usage
```
from pandas_appender import DF_Appender
dfa = DF_Appender(ignore_index=True) # note that ignore_index moves to the init
for i in range(1_000_000):
dfa = dfa.append({'i': i})
df = dfa.finalize() # must call .finalize() before you can use the results
```
## Type hints and category detection
Using narrower types and categories can often dramatically reduce the size of a
DataFrame. There are two ways to do this in pandas-appender. One is to
append to an existing dataframe:
```
dfa = DF_Appender(df, ignore_index=True)
```
and the second is to pass in a `dtypes=` argument:
```
dfa = DF_Appender(ignore_index=True, dtypes=another_dataframe.dtypes)
```
pandas-appender also offers a way to infer which columns would be smaller
if they were categories. This code will either analyze an existing dataframe
that you're appending to:
```
dfa = DF_Appender(df, ignore_index=True, infer_categories=True)
```
or it will analyze the first chunk of appended lines:
```
dfa = DF_Appender(ignore_index=True, infer_categories=True)
```
These inferred categories will override existing types or a `dtypes=` argument.
## Incompatibilities with pandas.DataFrame.append()
### DF_Appender must be finalized before use
* Pandas: `df_new = df.append() # df_new is a dataframe`
* DF_Appender: `dfa_new = dfa.append() # must do df = dfa.finalize() to get a DataFrame`
### pandas.DataFame.append is idempotent, DF_Appender is not
* Pandas: `df_new = df.append() # df is not changed`
* DF_Appender: `dfa_new = dfa.append() # modifies dfa, and dfa_new == dfa`
### pandas.DataFrame.append will promote types, while DF_Appender is strict
* Pandas: append `0.1` to an integer column, and the column will be promoted to float
* DF_Appender: when initialized with `dtypes=` or an existing DataFrame, appending
`0.1` to an integer column causes `0.1` to be cast to an integer, i.e. `0`.
Raw data
{
"_id": null,
"home_page": "https://github.com/wumpus/pandas-appender",
"name": "pandas-appender",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "",
"author": "Greg Lindahl and others",
"author_email": "lindahl@pbm.com",
"download_url": "https://files.pythonhosted.org/packages/71/37/ef6fc8ebc2b9a82adb9d6f9f14d642ddff27e80947a28ab5a29744ef2564/pandas_appender-0.9.8.4.tar.gz",
"platform": null,
"description": "# pandas-appender\n\n[![Build Status](https://dev.azure.com/lindahl0577/pandas-appender/_apis/build/status/wumpus.pandas-appender?branchName=main)](https://dev.azure.com/lindahl0577/pandas-appender/_build/latest?definitionId=2&branchName=main) [![Coverage](https://coveralls.io/repos/github/wumpus/pandas-appender/badge.svg?branch=main)](https://coveralls.io/github/wumpus/pandas-appender?branch=main) [![Apache License 2.0](https://img.shields.io/github/license/wumpus/pandas-appender.svg)](LICENSE)\n\nHave you ever wanted to append a bunch of rows to a Pandas DataFrame?\nTurns out that it's extremely inefficient to do! For a large\ndataframe, you're supposed to make multiple dataframes and `pd.concat()`\nthem instead.\n\nAlso, Pandas deprecated `dataframe.append()` in version 1.4 and\nintends to remove it in 2.0.\n\nSo... helper function? Pandas doesn't have one. Roll your own?\nUgh. OK then: here's that helper function. It can append around 1\nmillion very small rows per cpu-second. It has a modest additional\nmemory usage of around 5 megabytes, dynamically growing with the\nnumber of rows appended.\n\n## Install\n\n`pip install pandas-appender`\n\n## Usage\n\n```\nfrom pandas_appender import DF_Appender\n\ndfa = DF_Appender(ignore_index=True) # note that ignore_index moves to the init\nfor i in range(1_000_000):\n dfa = dfa.append({'i': i})\n\ndf = dfa.finalize() # must call .finalize() before you can use the results\n```\n\n## Type hints and category detection\n\nUsing narrower types and categories can often dramatically reduce the size of a\nDataFrame. There are two ways to do this in pandas-appender. One is to\nappend to an existing dataframe:\n\n```\ndfa = DF_Appender(df, ignore_index=True)\n```\n\nand the second is to pass in a `dtypes=` argument:\n\n```\ndfa = DF_Appender(ignore_index=True, dtypes=another_dataframe.dtypes)\n```\n\npandas-appender also offers a way to infer which columns would be smaller\nif they were categories. This code will either analyze an existing dataframe\nthat you're appending to:\n```\ndfa = DF_Appender(df, ignore_index=True, infer_categories=True)\n```\nor it will analyze the first chunk of appended lines:\n```\ndfa = DF_Appender(ignore_index=True, infer_categories=True)\n```\nThese inferred categories will override existing types or a `dtypes=` argument.\n\n## Incompatibilities with pandas.DataFrame.append()\n\n### DF_Appender must be finalized before use\n\n* Pandas: `df_new = df.append() # df_new is a dataframe`\n* DF_Appender: `dfa_new = dfa.append() # must do df = dfa.finalize() to get a DataFrame`\n\n### pandas.DataFame.append is idempotent, DF_Appender is not\n\n* Pandas: `df_new = df.append() # df is not changed`\n* DF_Appender: `dfa_new = dfa.append() # modifies dfa, and dfa_new == dfa`\n\n### pandas.DataFrame.append will promote types, while DF_Appender is strict \n\n* Pandas: append `0.1` to an integer column, and the column will be promoted to float\n* DF_Appender: when initialized with `dtypes=` or an existing DataFrame, appending\n`0.1` to an integer column causes `0.1` to be cast to an integer, i.e. `0`.\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "A helper class that makes appending to a Pandas DataFrame efficient",
"version": "0.9.8.4",
"project_urls": {
"Homepage": "https://github.com/wumpus/pandas-appender"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7137ef6fc8ebc2b9a82adb9d6f9f14d642ddff27e80947a28ab5a29744ef2564",
"md5": "d9a72ae889f9fb99e24334057b5cc938",
"sha256": "18fe83760ea2f2d109c5f2db648fde8eb7b09888e2586dbb682efacb5d82d6cd"
},
"downloads": -1,
"filename": "pandas_appender-0.9.8.4.tar.gz",
"has_sig": false,
"md5_digest": "d9a72ae889f9fb99e24334057b5cc938",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 12483,
"upload_time": "2023-05-22T16:49:26",
"upload_time_iso_8601": "2023-05-22T16:49:26.889140Z",
"url": "https://files.pythonhosted.org/packages/71/37/ef6fc8ebc2b9a82adb9d6f9f14d642ddff27e80947a28ab5a29744ef2564/pandas_appender-0.9.8.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-22 16:49:26",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "wumpus",
"github_project": "pandas-appender",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "pandas-appender"
}