# Speedup up to 40 percent when sorting Pandas index/Series
### MSVC C++ x64/x86 build tools must be installed.
### This module uses [https://pypi.org/project/npfastsortcpp/](https://pypi.org/project/npfastsortcpp/)
### There you can get all instructions
## Important: Only for float/int
### Tested against Windows 10 / Python 3.9.13
```python
import pandas as pd
from a_pandas_ex_fastsort import pd_add_fastsort
pd_add_fastsort()
dafra = "https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv"
df5 = pd.read_csv(dafra)
```
```python
# Speed gain even for small DataFrames
df = pd.concat([df5.copy() for x in range(10)], ignore_index=True)
df = df.sample(len(df))
%timeit df.d_fast_reindex() # Values must be unique
846 µs ± 37.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit df.sort_index()
933 µs ± 25.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
```
```python
# The bigger, the better
df = pd.concat([df5.copy() for x in range(100)], ignore_index=True)
df = df.sample(len(df))
%timeit df.d_fast_reindex() # Values must be unique
11.1 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.sort_index()
15 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
```python
df = pd.concat([df5.copy() for x in range(100)], ignore_index=True)
df = df.sample(len(df))
%timeit df.Pclass.sort_values()
2.08 ms ± 66 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.Pclass.s_fastsort_copy() # Be careful: original index will be dropped!
583 µs ± 5.85 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
```
```python
# Be careful:
df.Pclass.s_fastsort_inplace()
# sorts only one Series in place,
# values in other columns are not being sorted!
df # starting with:
Out[19]:
PassengerId Survived Pclass ... Fare Cabin Embarked
34102 245 0 3 ... 7.2250 NaN C
28329 709 1 1 ... 151.5500 NaN S
50018 123 0 2 ... 30.0708 NaN C
51258 472 0 3 ... 8.6625 NaN S
51813 136 0 2 ... 15.0458 NaN C
... ... ... ... ... ... ...
36357 718 1 2 ... 10.5000 E101 S
78608 201 0 3 ... 9.5000 NaN S
64989 838 0 3 ... 8.0500 NaN S
20824 332 0 1 ... 28.5000 C124 S
21108 616 1 2 ... 65.0000 NaN S
[89100 rows x 12 columns]
df.Pclass.s_fastsort_inplace()
df # Result - Only Pclass has been sorted
Out[21]:
PassengerId Survived Pclass ... Fare Cabin Embarked
34102 245 0 1 ... 7.2250 NaN C
28329 709 1 1 ... 151.5500 NaN S
50018 123 0 1 ... 30.0708 NaN C
51258 472 0 1 ... 8.6625 NaN S
51813 136 0 1 ... 15.0458 NaN C
... ... ... ... ... ... ...
36357 718 1 3 ... 10.5000 E101 S
78608 201 0 3 ... 9.5000 NaN S
64989 838 0 3 ... 8.0500 NaN S
20824 332 0 3 ... 28.5000 C124 S
21108 616 1 3 ... 65.0000 NaN S
[89100 rows x 12 columns]
```
Raw data
{
"_id": null,
"home_page": "https://github.com/hansalemaos/a_pandas_ex_fastsort",
"name": "a-pandas-ex-fastsort",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "c++,numpy,sort,pandas,reindex",
"author": "Johannes Fischer",
"author_email": "<aulasparticularesdealemaosp@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/a0/92/eb952023690a258ee09606360a95a02b48cf2f204c27a4d6aa54a801f789/a_pandas_ex_fastsort-0.10.tar.gz",
"platform": null,
"description": "\n# Speedup up to 40 percent when sorting Pandas index/Series \n\n\n\n\n\n### MSVC C++ x64/x86 build tools must be installed. \n\n\n\n\n\n### This module uses [https://pypi.org/project/npfastsortcpp/](https://pypi.org/project/npfastsortcpp/)\n\n\n\n\n\n### There you can get all instructions\n\n\n\n## Important: Only for float/int\n\n\n\n### Tested against Windows 10 / Python 3.9.13\n\n\n\n```python\n\nimport pandas as pd\n\nfrom a_pandas_ex_fastsort import pd_add_fastsort\n\npd_add_fastsort()\n\ndafra = \"https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv\"\n\ndf5 = pd.read_csv(dafra)\n\n```\n\n\n\n\n\n\n\n```python\n\n# Speed gain even for small DataFrames\n\ndf = pd.concat([df5.copy() for x in range(10)], ignore_index=True)\n\ndf = df.sample(len(df))\n\n%timeit df.d_fast_reindex() # Values must be unique\n\n846 \u00b5s \u00b1 37.4 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n%timeit df.sort_index()\n\n933 \u00b5s \u00b1 25.7 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n\n\n\n\n\n\n```python\n\n# The bigger, the better\n\ndf = pd.concat([df5.copy() for x in range(100)], ignore_index=True)\n\ndf = df.sample(len(df))\n\n%timeit df.d_fast_reindex() # Values must be unique\n\n11.1 ms \u00b1 131 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 100 loops each)\n\n%timeit df.sort_index()\n\n15 ms \u00b1 220 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 100 loops each)\n\n```\n\n\n\n\n\n\n\n```python\n\ndf = pd.concat([df5.copy() for x in range(100)], ignore_index=True)\n\ndf = df.sample(len(df))\n\n%timeit df.Pclass.sort_values()\n\n2.08 ms \u00b1 66 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 100 loops each)\n\n%timeit df.Pclass.s_fastsort_copy() # Be careful: original index will be dropped!\n\n583 \u00b5s \u00b1 5.85 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n\n\n\n\n\n\n```python\n\n# Be careful: \n\ndf.Pclass.s_fastsort_inplace()\n\n# sorts only one Series in place, \n\n# values in other columns are not being sorted! \n\n\n\ndf # starting with:\n\nOut[19]: \n\n PassengerId Survived Pclass ... Fare Cabin Embarked\n\n34102 245 0 3 ... 7.2250 NaN C\n\n28329 709 1 1 ... 151.5500 NaN S\n\n50018 123 0 2 ... 30.0708 NaN C\n\n51258 472 0 3 ... 8.6625 NaN S\n\n51813 136 0 2 ... 15.0458 NaN C\n\n ... ... ... ... ... ... ...\n\n36357 718 1 2 ... 10.5000 E101 S\n\n78608 201 0 3 ... 9.5000 NaN S\n\n64989 838 0 3 ... 8.0500 NaN S\n\n20824 332 0 1 ... 28.5000 C124 S\n\n21108 616 1 2 ... 65.0000 NaN S\n\n[89100 rows x 12 columns]\n\ndf.Pclass.s_fastsort_inplace()\n\n\n\ndf # Result - Only Pclass has been sorted\n\nOut[21]: \n\n PassengerId Survived Pclass ... Fare Cabin Embarked\n\n34102 245 0 1 ... 7.2250 NaN C\n\n28329 709 1 1 ... 151.5500 NaN S\n\n50018 123 0 1 ... 30.0708 NaN C\n\n51258 472 0 1 ... 8.6625 NaN S\n\n51813 136 0 1 ... 15.0458 NaN C\n\n ... ... ... ... ... ... ...\n\n36357 718 1 3 ... 10.5000 E101 S\n\n78608 201 0 3 ... 9.5000 NaN S\n\n64989 838 0 3 ... 8.0500 NaN S\n\n20824 332 0 3 ... 28.5000 C124 S\n\n21108 616 1 3 ... 65.0000 NaN S\n\n[89100 rows x 12 columns]\n\n```\n\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Speedup up to 40 percent when sorting Pandas index/Series",
"version": "0.10",
"split_keywords": [
"c++",
"numpy",
"sort",
"pandas",
"reindex"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e0cd014f9b64a4d7464a37206e3a3f47ed09c40e549b0df68aa29d5b6b61f77b",
"md5": "7734dad9cdb33a190524d4c7081b9e2d",
"sha256": "a6eb6b07b70c0e1187e7812acafa7a56421ea08e9a52d57e5b017ec69814d104"
},
"downloads": -1,
"filename": "a_pandas_ex_fastsort-0.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7734dad9cdb33a190524d4c7081b9e2d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 6277,
"upload_time": "2023-02-02T00:11:25",
"upload_time_iso_8601": "2023-02-02T00:11:25.107490Z",
"url": "https://files.pythonhosted.org/packages/e0/cd/014f9b64a4d7464a37206e3a3f47ed09c40e549b0df68aa29d5b6b61f77b/a_pandas_ex_fastsort-0.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a092eb952023690a258ee09606360a95a02b48cf2f204c27a4d6aa54a801f789",
"md5": "cf039c9304204e75e93ab4f451ce9615",
"sha256": "6bf7a6a4a318c57c45088ee0c7482de7df3cc1788233eba6f2f20e84c389453b"
},
"downloads": -1,
"filename": "a_pandas_ex_fastsort-0.10.tar.gz",
"has_sig": false,
"md5_digest": "cf039c9304204e75e93ab4f451ce9615",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 4324,
"upload_time": "2023-02-02T00:11:26",
"upload_time_iso_8601": "2023-02-02T00:11:26.712122Z",
"url": "https://files.pythonhosted.org/packages/a0/92/eb952023690a258ee09606360a95a02b48cf2f204c27a4d6aa54a801f789/a_pandas_ex_fastsort-0.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-02-02 00:11:26",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "hansalemaos",
"github_project": "a_pandas_ex_fastsort",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "a-pandas-ex-fastsort"
}