a-pandas-ex-fastsort


Namea-pandas-ex-fastsort JSON
Version 0.10 PyPI version JSON
download
home_pagehttps://github.com/hansalemaos/a_pandas_ex_fastsort
SummarySpeedup up to 40 percent when sorting Pandas index/Series
upload_time2023-02-02 00:11:26
maintainer
docs_urlNone
authorJohannes Fischer
requires_python
licenseMIT
keywords c++ numpy sort pandas reindex
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Speedup up to 40 percent when sorting Pandas index/Series 





### MSVC C++ x64/x86 build tools must be installed.  





### This module uses [https://pypi.org/project/npfastsortcpp/](https://pypi.org/project/npfastsortcpp/)





### There you can get all instructions



## Important: Only for float/int



### Tested against Windows 10 / Python 3.9.13



```python

import pandas as pd

from a_pandas_ex_fastsort import pd_add_fastsort

pd_add_fastsort()

dafra = "https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv"

df5 = pd.read_csv(dafra)

```







```python

# Speed gain even for small DataFrames

df = pd.concat([df5.copy() for x in range(10)], ignore_index=True)

df = df.sample(len(df))

%timeit df.d_fast_reindex() # Values must be unique

846 µs ± 37.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit df.sort_index()

933 µs ± 25.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

```







```python

# The bigger, the better

df = pd.concat([df5.copy() for x in range(100)], ignore_index=True)

df = df.sample(len(df))

%timeit df.d_fast_reindex() # Values must be unique

11.1 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.sort_index()

15 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

```







```python

df = pd.concat([df5.copy() for x in range(100)], ignore_index=True)

df = df.sample(len(df))

%timeit df.Pclass.sort_values()

2.08 ms ± 66 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.Pclass.s_fastsort_copy() # Be careful: original index will be dropped!

583 µs ± 5.85 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

```







```python

# Be careful: 

df.Pclass.s_fastsort_inplace()

# sorts only one Series in place, 

# values in other columns are not being sorted! 



df # starting with:

Out[19]: 

       PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked

34102          245         0       3  ...    7.2250   NaN         C

28329          709         1       1  ...  151.5500   NaN         S

50018          123         0       2  ...   30.0708   NaN         C

51258          472         0       3  ...    8.6625   NaN         S

51813          136         0       2  ...   15.0458   NaN         C

            ...       ...     ...  ...       ...   ...       ...

36357          718         1       2  ...   10.5000  E101         S

78608          201         0       3  ...    9.5000   NaN         S

64989          838         0       3  ...    8.0500   NaN         S

20824          332         0       1  ...   28.5000  C124         S

21108          616         1       2  ...   65.0000   NaN         S

[89100 rows x 12 columns]

df.Pclass.s_fastsort_inplace()



df # Result - Only Pclass has been sorted

Out[21]: 

       PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked

34102          245         0       1  ...    7.2250   NaN         C

28329          709         1       1  ...  151.5500   NaN         S

50018          123         0       1  ...   30.0708   NaN         C

51258          472         0       1  ...    8.6625   NaN         S

51813          136         0       1  ...   15.0458   NaN         C

            ...       ...     ...  ...       ...   ...       ...

36357          718         1       3  ...   10.5000  E101         S

78608          201         0       3  ...    9.5000   NaN         S

64989          838         0       3  ...    8.0500   NaN         S

20824          332         0       3  ...   28.5000  C124         S

21108          616         1       3  ...   65.0000   NaN         S

[89100 rows x 12 columns]

```




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hansalemaos/a_pandas_ex_fastsort",
    "name": "a-pandas-ex-fastsort",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "c++,numpy,sort,pandas,reindex",
    "author": "Johannes Fischer",
    "author_email": "<aulasparticularesdealemaosp@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/a0/92/eb952023690a258ee09606360a95a02b48cf2f204c27a4d6aa54a801f789/a_pandas_ex_fastsort-0.10.tar.gz",
    "platform": null,
    "description": "\n# Speedup up to 40 percent when sorting Pandas index/Series \n\n\n\n\n\n### MSVC C++ x64/x86 build tools must be installed.  \n\n\n\n\n\n### This module uses [https://pypi.org/project/npfastsortcpp/](https://pypi.org/project/npfastsortcpp/)\n\n\n\n\n\n### There you can get all instructions\n\n\n\n## Important: Only for float/int\n\n\n\n### Tested against Windows 10 / Python 3.9.13\n\n\n\n```python\n\nimport pandas as pd\n\nfrom a_pandas_ex_fastsort import pd_add_fastsort\n\npd_add_fastsort()\n\ndafra = \"https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv\"\n\ndf5 = pd.read_csv(dafra)\n\n```\n\n\n\n\n\n\n\n```python\n\n# Speed gain even for small DataFrames\n\ndf = pd.concat([df5.copy() for x in range(10)], ignore_index=True)\n\ndf = df.sample(len(df))\n\n%timeit df.d_fast_reindex() # Values must be unique\n\n846 \u00b5s \u00b1 37.4 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n%timeit df.sort_index()\n\n933 \u00b5s \u00b1 25.7 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n\n\n\n\n\n\n```python\n\n# The bigger, the better\n\ndf = pd.concat([df5.copy() for x in range(100)], ignore_index=True)\n\ndf = df.sample(len(df))\n\n%timeit df.d_fast_reindex() # Values must be unique\n\n11.1 ms \u00b1 131 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 100 loops each)\n\n%timeit df.sort_index()\n\n15 ms \u00b1 220 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 100 loops each)\n\n```\n\n\n\n\n\n\n\n```python\n\ndf = pd.concat([df5.copy() for x in range(100)], ignore_index=True)\n\ndf = df.sample(len(df))\n\n%timeit df.Pclass.sort_values()\n\n2.08 ms \u00b1 66 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 100 loops each)\n\n%timeit df.Pclass.s_fastsort_copy() # Be careful: original index will be dropped!\n\n583 \u00b5s \u00b1 5.85 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n\n\n\n\n\n\n```python\n\n# Be careful: \n\ndf.Pclass.s_fastsort_inplace()\n\n# sorts only one Series in place, \n\n# values in other columns are not being sorted! \n\n\n\ndf # starting with:\n\nOut[19]: \n\n       PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked\n\n34102          245         0       3  ...    7.2250   NaN         C\n\n28329          709         1       1  ...  151.5500   NaN         S\n\n50018          123         0       2  ...   30.0708   NaN         C\n\n51258          472         0       3  ...    8.6625   NaN         S\n\n51813          136         0       2  ...   15.0458   NaN         C\n\n            ...       ...     ...  ...       ...   ...       ...\n\n36357          718         1       2  ...   10.5000  E101         S\n\n78608          201         0       3  ...    9.5000   NaN         S\n\n64989          838         0       3  ...    8.0500   NaN         S\n\n20824          332         0       1  ...   28.5000  C124         S\n\n21108          616         1       2  ...   65.0000   NaN         S\n\n[89100 rows x 12 columns]\n\ndf.Pclass.s_fastsort_inplace()\n\n\n\ndf # Result - Only Pclass has been sorted\n\nOut[21]: \n\n       PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked\n\n34102          245         0       1  ...    7.2250   NaN         C\n\n28329          709         1       1  ...  151.5500   NaN         S\n\n50018          123         0       1  ...   30.0708   NaN         C\n\n51258          472         0       1  ...    8.6625   NaN         S\n\n51813          136         0       1  ...   15.0458   NaN         C\n\n            ...       ...     ...  ...       ...   ...       ...\n\n36357          718         1       3  ...   10.5000  E101         S\n\n78608          201         0       3  ...    9.5000   NaN         S\n\n64989          838         0       3  ...    8.0500   NaN         S\n\n20824          332         0       3  ...   28.5000  C124         S\n\n21108          616         1       3  ...   65.0000   NaN         S\n\n[89100 rows x 12 columns]\n\n```\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Speedup up to 40 percent when sorting Pandas index/Series",
    "version": "0.10",
    "split_keywords": [
        "c++",
        "numpy",
        "sort",
        "pandas",
        "reindex"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e0cd014f9b64a4d7464a37206e3a3f47ed09c40e549b0df68aa29d5b6b61f77b",
                "md5": "7734dad9cdb33a190524d4c7081b9e2d",
                "sha256": "a6eb6b07b70c0e1187e7812acafa7a56421ea08e9a52d57e5b017ec69814d104"
            },
            "downloads": -1,
            "filename": "a_pandas_ex_fastsort-0.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7734dad9cdb33a190524d4c7081b9e2d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 6277,
            "upload_time": "2023-02-02T00:11:25",
            "upload_time_iso_8601": "2023-02-02T00:11:25.107490Z",
            "url": "https://files.pythonhosted.org/packages/e0/cd/014f9b64a4d7464a37206e3a3f47ed09c40e549b0df68aa29d5b6b61f77b/a_pandas_ex_fastsort-0.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a092eb952023690a258ee09606360a95a02b48cf2f204c27a4d6aa54a801f789",
                "md5": "cf039c9304204e75e93ab4f451ce9615",
                "sha256": "6bf7a6a4a318c57c45088ee0c7482de7df3cc1788233eba6f2f20e84c389453b"
            },
            "downloads": -1,
            "filename": "a_pandas_ex_fastsort-0.10.tar.gz",
            "has_sig": false,
            "md5_digest": "cf039c9304204e75e93ab4f451ce9615",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 4324,
            "upload_time": "2023-02-02T00:11:26",
            "upload_time_iso_8601": "2023-02-02T00:11:26.712122Z",
            "url": "https://files.pythonhosted.org/packages/a0/92/eb952023690a258ee09606360a95a02b48cf2f204c27a4d6aa54a801f789/a_pandas_ex_fastsort-0.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-02-02 00:11:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "hansalemaos",
    "github_project": "a_pandas_ex_fastsort",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "a-pandas-ex-fastsort"
}
        
Elapsed time: 0.03626s