pandas-appender

Name	pandas-appender JSON
Version	0.9.8.4 JSON
	download
home_page	https://github.com/wumpus/pandas-appender
Summary	A helper class that makes appending to a Pandas DataFrame efficient
upload_time	2023-05-22 16:49:26
maintainer
docs_url	None
author	Greg Lindahl and others
requires_python	>=3.6
license	Apache 2.0
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # pandas-appender

[![Build Status](https://dev.azure.com/lindahl0577/pandas-appender/_apis/build/status/wumpus.pandas-appender?branchName=main)](https://dev.azure.com/lindahl0577/pandas-appender/_build/latest?definitionId=2&branchName=main) [![Coverage](https://coveralls.io/repos/github/wumpus/pandas-appender/badge.svg?branch=main)](https://coveralls.io/github/wumpus/pandas-appender?branch=main) [![Apache License 2.0](https://img.shields.io/github/license/wumpus/pandas-appender.svg)](LICENSE)

Have you ever wanted to append a bunch of rows to a Pandas DataFrame?
Turns out that it's extremely inefficient to do! For a large
dataframe, you're supposed to make multiple dataframes and `pd.concat()`
them instead.

Also, Pandas deprecated `dataframe.append()` in version 1.4 and
intends to remove it in 2.0.

So... helper function? Pandas doesn't have one. Roll your own?
Ugh. OK then: here's that helper function. It can append around 1
million very small rows per cpu-second. It has a modest additional
memory usage of around 5 megabytes, dynamically growing with the
number of rows appended.

## Install

`pip install pandas-appender`

## Usage

```
from pandas_appender import DF_Appender

dfa = DF_Appender(ignore_index=True)  # note that ignore_index moves to the init
for i in range(1_000_000):
    dfa = dfa.append({'i': i})

df = dfa.finalize()  # must call .finalize() before you can use the results
```

## Type hints and category detection

Using narrower types and categories can often dramatically reduce the size of a
DataFrame. There are two ways to do this in pandas-appender. One is to
append to an existing dataframe:

```
dfa = DF_Appender(df, ignore_index=True)
```

and the second is to pass in a `dtypes=` argument:

```
dfa = DF_Appender(ignore_index=True, dtypes=another_dataframe.dtypes)
```

pandas-appender also offers a way to infer which columns would be smaller
if they were categories. This code will either analyze an existing dataframe
that you're appending to:
```
dfa = DF_Appender(df, ignore_index=True, infer_categories=True)
```
or it will analyze the first chunk of appended lines:
```
dfa = DF_Appender(ignore_index=True, infer_categories=True)
```
These inferred categories will override existing types or a `dtypes=` argument.

## Incompatibilities with pandas.DataFrame.append()

### DF_Appender must be finalized before use

* Pandas: `df_new = df.append()  # df_new is a dataframe`
* DF_Appender: `dfa_new = dfa.append()  # must do df = dfa.finalize() to get a DataFrame`

### pandas.DataFame.append is idempotent, DF_Appender is not

* Pandas: `df_new = df.append()  # df is not changed`
* DF_Appender: `dfa_new = dfa.append()  # modifies dfa, and dfa_new == dfa`

### pandas.DataFrame.append will promote types, while DF_Appender is strict 

* Pandas: append `0.1` to an integer column, and the column will be promoted to float
* DF_Appender: when initialized with `dtypes=` or an existing DataFrame, appending
`0.1` to an integer column causes `0.1` to be cast to an integer, i.e. `0`.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/wumpus/pandas-appender",
    "name": "pandas-appender",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Greg Lindahl and others",
    "author_email": "lindahl@pbm.com",
    "download_url": "https://files.pythonhosted.org/packages/71/37/ef6fc8ebc2b9a82adb9d6f9f14d642ddff27e80947a28ab5a29744ef2564/pandas_appender-0.9.8.4.tar.gz",
    "platform": null,
    "description": "# pandas-appender\n\n[![Build Status](https://dev.azure.com/lindahl0577/pandas-appender/_apis/build/status/wumpus.pandas-appender?branchName=main)](https://dev.azure.com/lindahl0577/pandas-appender/_build/latest?definitionId=2&branchName=main) [![Coverage](https://coveralls.io/repos/github/wumpus/pandas-appender/badge.svg?branch=main)](https://coveralls.io/github/wumpus/pandas-appender?branch=main) [![Apache License 2.0](https://img.shields.io/github/license/wumpus/pandas-appender.svg)](LICENSE)\n\nHave you ever wanted to append a bunch of rows to a Pandas DataFrame?\nTurns out that it's extremely inefficient to do! For a large\ndataframe, you're supposed to make multiple dataframes and `pd.concat()`\nthem instead.\n\nAlso, Pandas deprecated `dataframe.append()` in version 1.4 and\nintends to remove it in 2.0.\n\nSo... helper function? Pandas doesn't have one. Roll your own?\nUgh. OK then: here's that helper function. It can append around 1\nmillion very small rows per cpu-second. It has a modest additional\nmemory usage of around 5 megabytes, dynamically growing with the\nnumber of rows appended.\n\n## Install\n\n`pip install pandas-appender`\n\n## Usage\n\n```\nfrom pandas_appender import DF_Appender\n\ndfa = DF_Appender(ignore_index=True)  # note that ignore_index moves to the init\nfor i in range(1_000_000):\n    dfa = dfa.append({'i': i})\n\ndf = dfa.finalize()  # must call .finalize() before you can use the results\n```\n\n## Type hints and category detection\n\nUsing narrower types and categories can often dramatically reduce the size of a\nDataFrame. There are two ways to do this in pandas-appender. One is to\nappend to an existing dataframe:\n\n```\ndfa = DF_Appender(df, ignore_index=True)\n```\n\nand the second is to pass in a `dtypes=` argument:\n\n```\ndfa = DF_Appender(ignore_index=True, dtypes=another_dataframe.dtypes)\n```\n\npandas-appender also offers a way to infer which columns would be smaller\nif they were categories. This code will either analyze an existing dataframe\nthat you're appending to:\n```\ndfa = DF_Appender(df, ignore_index=True, infer_categories=True)\n```\nor it will analyze the first chunk of appended lines:\n```\ndfa = DF_Appender(ignore_index=True, infer_categories=True)\n```\nThese inferred categories will override existing types or a `dtypes=` argument.\n\n## Incompatibilities with pandas.DataFrame.append()\n\n### DF_Appender must be finalized before use\n\n* Pandas: `df_new = df.append()  # df_new is a dataframe`\n* DF_Appender: `dfa_new = dfa.append()  # must do df = dfa.finalize() to get a DataFrame`\n\n### pandas.DataFame.append is idempotent, DF_Appender is not\n\n* Pandas: `df_new = df.append()  # df is not changed`\n* DF_Appender: `dfa_new = dfa.append()  # modifies dfa, and dfa_new == dfa`\n\n### pandas.DataFrame.append will promote types, while DF_Appender is strict \n\n* Pandas: append `0.1` to an integer column, and the column will be promoted to float\n* DF_Appender: when initialized with `dtypes=` or an existing DataFrame, appending\n`0.1` to an integer column causes `0.1` to be cast to an integer, i.e. `0`.\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "A helper class that makes appending to a Pandas DataFrame efficient",
    "version": "0.9.8.4",
    "project_urls": {
        "Homepage": "https://github.com/wumpus/pandas-appender"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7137ef6fc8ebc2b9a82adb9d6f9f14d642ddff27e80947a28ab5a29744ef2564",
                "md5": "d9a72ae889f9fb99e24334057b5cc938",
                "sha256": "18fe83760ea2f2d109c5f2db648fde8eb7b09888e2586dbb682efacb5d82d6cd"
            },
            "downloads": -1,
            "filename": "pandas_appender-0.9.8.4.tar.gz",
            "has_sig": false,
            "md5_digest": "d9a72ae889f9fb99e24334057b5cc938",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 12483,
            "upload_time": "2023-05-22T16:49:26",
            "upload_time_iso_8601": "2023-05-22T16:49:26.889140Z",
            "url": "https://files.pythonhosted.org/packages/71/37/ef6fc8ebc2b9a82adb9d6f9f14d642ddff27e80947a28ab5a29744ef2564/pandas_appender-0.9.8.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-22 16:49:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wumpus",
    "github_project": "pandas-appender",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pandas-appender"
}

Greg Lindahl and others