a-pandas-ex-numexpr


Namea-pandas-ex-numexpr JSON
Version 0.10 PyPI version JSON
download
home_pagehttps://github.com/hansalemaos/a_pandas_ex_numexpr
SummaryPandas DataFrame/Series operations 8 times faster (or even more)
upload_time2023-02-03 00:27:21
maintainer
docs_urlNone
authorJohannes Fischer
requires_python
licenseMIT
keywords numexpr numpy sort pandas series
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Pandas DataFrame Operations 8 times faster (or even more)



DataFrame.query has never worked for me. On my PC, it has been extremely slow when using small DataFrames, and only a little bit, if at all, faster when using huge DataFrames. 



DataFrame.query uses pd.eval, and pd.eval uses numexpr. The weird thing is that numexpr is insanely fast when it is used against a DataFrame, but nor pd.eval neither DataFrame.query aren’t. First I thought there was a problem with my Pandas/environment configuration, but then I read on the[ Pandas page](https://pandas.pydata.org/docs/user_guide/indexing.html#performance-of-query):



_You will only see the performance benefits of using the numexpr engine with DataFrame.query() if your frame has more than approximately 200,000 rows._



Well, **a_pandas_ex_numexpr** adds different methods to the DataFrame/Series classes, and will get tremendous speed-ups **(up to 8 times faster in my tests)** even for small DataFrames. All tests were done using: [https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv](https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv)



**Let the numbers speak for themselves**



## How to import / use a_pandas_ex_numexpr



```python

from a_pandas_ex_numexpr import pd_add_numexpr

pd_add_numexpr()

import pandas as pd

dafra = "https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv"

df = pd.read_csv(dafra)







df

Out[3]: 

     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked

0              1         0       3  ...   7.2500   NaN         S

1              2         1       1  ...  71.2833   C85         C

2              3         1       3  ...   7.9250   NaN         S

3              4         1       1  ...  53.1000  C123         S

4              5         0       3  ...   8.0500   NaN         S

..           ...       ...     ...  ...      ...   ...       ...

886          887         0       2  ...  13.0000   NaN         S

887          888         1       1  ...  30.0000   B42         S

888          889         0       3  ...  23.4500   NaN         S

889          890         1       1  ...  30.0000  C148         C

890          891         0       3  ...   7.7500   NaN         Q

[891 rows x 12 columns]

```



## Speed test - a_pandas_ex_numexpr



```python

# Code explanation at the end of the page

wholedict = {'c': df.Pclass}

%timeit df['Survived'].ne_query('b * 99.5 / 000.1 + 42123.323211 / 1335523.42232 * c', return_np=True, local_dict=wholedict)

30.8 µs ± 229 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df['Survived'].ne_query('b * 99.5 / 000.1 + 42123.323211 / 1335523.42232 * c', return_np=False, local_dict=wholedict)

70.1 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df['Survived'] * 99.5 / 000.1 + 42123.323211 / 1335523.42232 * df['Pclass']

262 µs ± 4.25 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit pd.eval("df.Survived * 99.5 / 000.1 + 42123.323211 / 1335523.42232 * df.Pclass") #used by df.query

1.37 ms ± 45.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)





df['Survived'].ne_query('b * 99.5 / 000.1 + 42123.323211 / 1335523.42232 * c', return_np=False, local_dict=wholedict)

Out[33]: 

0        0.094622

1      995.031541

2      995.094622

3      995.031541

4        0.094622

          ...    

886      0.063081

887    995.031541

888      0.094622

889    995.031541

890      0.094622

Length: 891, dtype: float64





df['Survived'] * 99.5 / 000.1 + 42123.323211 / 1335523.42232 * df['Pclass']

Out[34]: 

0        0.094622

1      995.031541

2      995.094622

3      995.031541

4        0.094622

          ...    

886      0.063081

887    995.031541

888      0.094622

889    995.031541

890      0.094622

Length: 891, dtype: float64

```



```python

wholedict = {'c': df.Pclass}

%timeit df['Survived'].ne_query('b * 99.5 * c', return_np=True, local_dict=wholedict)

27 µs ± 245 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df['Survived'].ne_query('b * 99.5 * c', return_np=False, local_dict=wholedict)

65.7 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df['Survived'] * 99.5 * df['Pclass']

140 µs ± 5.46 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit pd.eval("df.Survived * 99.5 * df.Pclass")

916 µs ± 7.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

```



```python

wholedict = {'c': df.Pclass}

%timeit df['Survived'].ne_query('b / c', return_np=True, local_dict=wholedict)

26.5 µs ± 200 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df['Survived'].ne_query('b / c', return_np=False, local_dict=wholedict) # returns a Series

60.3 µs ± 336 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df['Survived'] / df['Pclass']

68.2 µs ± 599 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit pd.eval("df.Survived / df.Pclass")

929 µs ± 31.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

```



## All functions/methods



## Speed of some “ready to use methods” for Series



```python

%timeit df.loc[df.PassengerId.ne_less_than(100)]

142 µs ± 412 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df.loc[df.PassengerId <100]

212 µs ± 897 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

##############################################################

%timeit df.loc[df.Survived.ne_not_equal(0)]

157 µs ± 390 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df.loc[df.Survived!=0]

229 µs ± 1.46 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

##############################################################

%timeit df.loc[df.PassengerId.ne_greater_than(100)]

174 µs ± 375 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df.loc[df.PassengerId>100]

248 µs ± 2.26 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

##############################################################

%timeit df.loc[df.PassengerId.ne_equal(1)]

138 µs ± 626 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df.loc[df.PassengerId == 1]

209 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

##############################################################

%timeit df.loc[df.Cabin.ne_search_for_string_contains('C1')]

329 µs ± 1.18 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit df.loc[df.Cabin.str.contains('C1',na=False)]

403 µs ± 924 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

##############################################################

%timeit df.loc[df.PassengerId.ne_greater_than_or_equal_to(100)]

175 µs ± 832 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df.loc[df.PassengerId>=100]

251 µs ± 2.77 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

##############################################################

%timeit df.loc[df.PassengerId.ne_less_than_or_equal_to(100)]

145 µs ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df.loc[df.PassengerId <=100]

212 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

##############################################################

```



## Overview - all methods for DataFrames/Series



```python

# Always use 'b' as the variable for the Series/DataFrame

df.ne_search_in_all_columns('b == 1')

array([  0,   1,   2,   3,   8,   9,  10,  11,  15,  17,  19,  21,  22,

        23,  25,  28,  31,  32,  36,  39,  43,  44,  47,  52,  53,  55,

        56,  58,  61,  65,  66,  68,  74,  78,  79,  81,  82,  84,  85,

        88,  97,  98, 106, 107, 109, 123, 125, 127, 128, 133, 136, 141,

       142, 146, 151, 156, 161, 165, 166, 172, 183, 184, 186, 187, 190,

       192, 193, 194, 195, 198 ...]

```



```python

    # Returns duplicated index if the value is found in

    # several columns. Exceptions will be ignored

    # the dtype argument is useful when searching for

    # strings -> dtype='S' (ascii only)

    df.ne_search_in_all_columns('b == "1"', dtype='S')



array([  0,   1,   2,   3,   8,   9,  10,  11,  15,  17,  19,  21,  22,

        23,  25,  28,  31,  32,  36,  39,  43,  44,  47,  52,  53,  55,

        56,  58,  61,  65,  66,  68,  74,  78,  79,  81,  82,  84,  85,

        88,  97,  98, 106, 107, 109, ...]

```



```python

    # Converts all columns to  dtype='S' before searching

    # Might not work with special characters

    # UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 0:

    df.ne_search_string_allhits_contains('C1')

Out[6]: 

     PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked

3              4         1       1  ...   53.1000  C123         S

11            12         1       1  ...   26.5500  C103         S

110          111         0       1  ...   52.0000  C110         S

137          138         0       1  ...   53.1000  C123         S

268          269         1       1  ...  153.4625  C125         S

273          274         0       1  ...   29.7000  C118         C

298          299         1       1  ...   30.5000  C106         S

331          332         0       1  ...   28.5000  C124         S

351          352         0       1  ...   35.0000  C128         S

449          450         1       1  ...   30.5000  C104         S

452          453         0       1  ...   27.7500  C111         C

571          572         1       1  ...   51.4792  C101         S

609          610         1       1  ...  153.4625  C125         S

669          670         1       1  ...   52.0000  C126         S

711          712         0       1  ...   26.5500  C124         S

712          713         1       1  ...   52.0000  C126         S

889          890         1       1  ...   30.0000  C148         C

[17 rows x 12 columns]

```





```python

# Series doesn't return duplicated results

df.Cabin.ne_search_string_allhits_contains('C1')

Out[9]: 

3      C123

11     C103

110    C110

137    C123

268    C125

273    C118

298    C106

331    C124

351    C128

449    C104

452    C111

571    C101

609    C125

669    C126

711    C124

712    C126

889    C148

Name: Cabin, dtype: object



%timeit df.Cabin.ne_search_string_allhits_contains('C1')

274 µs ± 2.74 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)



%timeit df.Cabin.loc[df.Cabin.str.contains('C1', na=False)]

351 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

```



```python

# All rows where the string/substring C1 is found.

# Numbers are converted to string (ascii)

df.ne_search_string_dataframe_contains('C1')

Out[13]: 

     PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked

3              4         1       1  ...   53.1000  C123         S

11            12         1       1  ...   26.5500  C103         S

110          111         0       1  ...   52.0000  C110         S

137          138         0       1  ...   53.1000  C123         S

268          269         1       1  ...  153.4625  C125         S

273          274         0       1  ...   29.7000  C118         C

298          299         1       1  ...   30.5000  C106         S

331          332         0       1  ...   28.5000  C124         S

351          352         0       1  ...   35.0000  C128         S

449          450         1       1  ...   30.5000  C104         S

452          453         0       1  ...   27.7500  C111         C

571          572         1       1  ...   51.4792  C101         S

609          610         1       1  ...  153.4625  C125         S

669          670         1       1  ...   52.0000  C126         S

711          712         0       1  ...   26.5500  C124         S

712          713         1       1  ...   52.0000  C126         S

889          890         1       1  ...   30.0000  C148         C

[17 rows x 12 columns]





df.ne_search_string_dataframe_contains('610')

Out[14]: 

     PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked

194          195         1       1  ...   27.7208    B4         C

609          610         1       1  ...  153.4625  C125         S

[2 rows x 12 columns]

```



```python

# Converts all columns to ascii and searches in each column

# For each presence in a column, you  get a duplicate of the index

df.ne_search_string_dataframe_allhits_equal('1')

df.ne_search_string_dataframe_allhits_equal('1')

Out[15]: 

     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked

0              1         0       3  ...   7.2500   NaN         S

0              1         0       3  ...   7.2500   NaN         S

1              2         1       1  ...  71.2833   C85         C

1              2         1       1  ...  71.2833   C85         C

1              2         1       1  ...  71.2833   C85         C

..           ...       ...     ...  ...      ...   ...       ...

887          888         1       1  ...  30.0000   B42         S

887          888         1       1  ...  30.0000   B42         S

888          889         0       3  ...  23.4500   NaN         S

889          890         1       1  ...  30.0000  C148         C

889          890         1       1  ...  30.0000  C148         C

[886 rows x 12 columns]

```



```python

# All equal strings in a Series

df.Embarked.ne_search_string_dataframe_allhits_equal('S')

 Out[16]: 

0      S

2      S

3      S

4      S

6      S

      ..

883    S

884    S

886    S

887    S

888    S

Name: Embarked, Length: 644, dtype: object



%timeit df.Embarked.ne_search_string_dataframe_allhits_equal('S')

160 µs ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df.Embarked.loc[df.Embarked=='S']

178 µs ± 3.04 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

```



```python

# Converts the whole df to ascii and checks where the

# the value is present. Exceptions are ignored

df.ne_search_string_dataframe_equal('C123')

Out[20]: 

     PassengerId  Survived  Pclass  ...  Fare Cabin  Embarked

3              4         1       1  ...  53.1  C123         S

137          138         0       1  ...  53.1  C123         S

[2 rows x 12 columns]

```



```python

# Might not be efficient (The only method that was slower during testing)!  

%timeit df.Cabin.loc[df.Cabin.ne_search_for_string_series_equal('C123')]

252 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit df.Cabin.loc[df.Cabin=='C123']

158 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)



Out[21]: 

     PassengerId  Survived  Pclass  ...  Fare Cabin  Embarked

3              4         1       1  ...  53.1  C123         S

137          138         0       1  ...  53.1  C123         S

[2 rows x 12 columns]

```



```python

# Returns bool values

df.loc[df.ne_search_for_string_contains('C1')]

Out[7]: 

array([[False, False, False, ..., False, False, False],

       [False, False, False, ..., False, False, False],

       [False, False, False, ..., False, False, False],

       ...,

       [False, False, False, ..., False, False, False],

       [False, False, False, ..., False,  True, False],

       [False, False, False, ..., False, False, False]])

```



```python

# returns Bool

df.loc[df.Cabin.ne_search_for_string_contains('C1')]



Out[14]: 

     PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked

3              4         1       1  ...   53.1000  C123         S

11            12         1       1  ...   26.5500  C103         S

110          111         0       1  ...   52.0000  C110         S

137          138         0       1  ...   53.1000  C123         S

268          269         1       1  ...  153.4625  C125         S

273          274         0       1  ...   29.7000  C118         C

298          299         1       1  ...   30.5000  C106         S

331          332         0       1  ...   28.5000  C124         S

351          352         0       1  ...   35.0000  C128         S

449          450         1       1  ...   30.5000  C104         S

452          453         0       1  ...   27.7500  C111         C

571          572         1       1  ...   51.4792  C101         S

609          610         1       1  ...  153.4625  C125         S

669          670         1       1  ...   52.0000  C126         S

711          712         0       1  ...   26.5500  C124         S

712          713         1       1  ...   52.0000  C126         S

889          890         1       1  ...   30.0000  C148         C

[17 rows x 12 columns]





%timeit df.loc[df.Cabin.ne_search_for_string_contains('C1')]

329 µs ± 1.18 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit df.loc[df.Cabin.str.contains('C1',na=False)]

403 µs ± 924 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

```



```python

# Returns the index of all rows where the value was found.

# Exceptions (e.g. wrong datatype etc.) are ignored.

# duplicates (more positive results in one row) are not deleted

df.ne_equal_df_ind(1)

array([  0,   1,   2,   3,   8,   9,  10,  11,  15,  17,  19,  21,  22,

        23,  25,  28,  31,  32,  36,  39,  43,  44,  47,  52,  53,  55,

        56,  58,  61,  65,  66,  68,  74,  78,  79,  81,  82,  84,  85,

        88,  97,  98, 106, 107, 109, 123...]

```



```python

# You can pass dtype='S' to convert the values to string 

# (or other formats) before performing the search.

# df.ne_equal_df_ind(b'1', 'S')

# If you use 'S', you have to pass a binary value

df.ne_equal_df_ind(b'1', 'S')

Out[16]: 

array([  0,   1,   2,   3,   8,   9,  10,  11,  15,  17,  19,  21,  22,

        23,  25,  28,  31,  32,  36,  39,  43,  44,  47,  52,  53,  55,

        56,  58,  61,  65,  66,  68,  74,...]

```



```python

# same as DataFrame.ne_equal_df_ind

# but deletes all duplicates

df.ne_equal_df_ind_no_dup(b'1', 'S')

array([  0,   1,   2,   3,   6,   7,   8,   9,  10,  11,  13,  15,  16,

        17,  18,  19,  21,  22,  23,  24,  25,  27,  28,  30,  31,  32,

        34,  35,  36,  39,  40,  41,  43,  44

```



```python

# Same as DataFrame.ne_equal_df_ind,

# but returns the DataFrame (df.loc[])

df.ne_equal_df_dup(b'1', 'S')

Out[18]: 

     PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked

0              1         0       3  ...    7.2500   NaN         S

1              2         1       1  ...   71.2833   C85         C

2              3         1       3  ...    7.9250   NaN         S

3              4         1       1  ...   53.1000  C123         S

8              9         1       3  ...   11.1333   NaN         S

..           ...       ...     ...  ...       ...   ...       ...

856          857         1       1  ...  164.8667   NaN         S

869          870         1       3  ...   11.1333   NaN         S

871          872         1       1  ...   52.5542   D35         S

879          880         1       1  ...   83.1583   C50         C

880          881         1       2  ...   26.0000   NaN         S

[886 rows x 12 columns]

```



```python

# Same as DataFrame.ne_equal_df_ind_no_dup

# but returns the DataFrame (df.loc)

df.ne_equal_df_no_dup(b'1', 'S')

Out[19]: 

     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked

0              1         0       3  ...   7.2500   NaN         S

1              2         1       1  ...  71.2833   C85         C

2              3         1       3  ...   7.9250   NaN         S

3              4         1       1  ...  53.1000  C123         S

6              7         0       1  ...  51.8625   E46         S

..           ...       ...     ...  ...      ...   ...       ...

879          880         1       1  ...  83.1583   C50         C

880          881         1       2  ...  26.0000   NaN         S

887          888         1       1  ...  30.0000   B42         S

888          889         0       3  ...  23.4500   NaN         S

889          890         1       1  ...  30.0000  C148         C

[524 rows x 12 columns]

```



```python

# Returns bool

array([ True, False, False, False, False, False, False, False, False,

       False, False, False, ...]

df.loc[df.PassengerId.ne_equal(1)]

%timeit df.loc[df.PassengerId.ne_equal(1)]

138 µs ± 626 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df.loc[df.PassengerId == 1]

209 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

```



```python

# Every time the condition is False, the index is

# added to the return value.

# Example:

# A row has 6 columns. 2 of them have the value 1.

# That means the index of the row will be added 4 times

# to the final result

df.loc[df.ne_not_equal_df_ind(1)]

array([  1,   2,   3, ..., 888, 889, 890], dtype=int64)



df.loc[df.ne_not_equal_df_ind(1)]

     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked

1              2         1       1  ...  71.2833   C85         C

2              3         1       3  ...   7.9250   NaN         S

3              4         1       1  ...  53.1000  C123         S

4              5         0       3  ...   8.0500   NaN         S

5              6         0       3  ...   8.4583   NaN         Q

..           ...       ...     ...  ...      ...   ...       ...

886          887         0       2  ...  13.0000   NaN         S

887          888         1       1  ...  30.0000   B42         S

888          889         0       3  ...  23.4500   NaN         S

889          890         1       1  ...  30.0000  C148         C

890          891         0       3  ...   7.7500   NaN         Q

[5344 rows x 12 columns]

```



```python

# Same as DataFrame.ne_not_equal_df_ind

# but drops all duplicates



df.ne_not_equal_df_ind_no_dup(0)

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,

        13,  14,  15,  16,...]



df.loc[df.ne_not_equal_df_ind_no_dup(0)]

Out[26]: 

     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked

0              1         0       3  ...   7.2500   NaN         S

1              2         1       1  ...  71.2833   C85         C

2              3         1       3  ...   7.9250   NaN         S

3              4         1       1  ...  53.1000  C123         S

4              5         0       3  ...   8.0500   NaN         S

..           ...       ...     ...  ...      ...   ...       ...

886          887         0       2  ...  13.0000   NaN         S

887          888         1       1  ...  30.0000   B42         S

888          889         0       3  ...  23.4500   NaN         S

889          890         1       1  ...  30.0000  C148         C

890          891         0       3  ...   7.7500   NaN         Q

[891 rows x 12 columns]

```



```python

# same as DataFrame.ne_not_equal_df_ind

# but returns the DataFrame (df.loc)

df.ne_not_equal_df_dup(0)

Out[28]: 

     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked

0              1         0       3  ...   7.2500   NaN         S

1              2         1       1  ...  71.2833   C85         C

2              3         1       3  ...   7.9250   NaN         S

3              4         1       1  ...  53.1000  C123         S

4              5         0       3  ...   8.0500   NaN         S

..           ...       ...     ...  ...      ...   ...       ...

886          887         0       2  ...  13.0000   NaN         S

887          888         1       1  ...  30.0000   B42         S

888          889         0       3  ...  23.4500   NaN         S

889          890         1       1  ...  30.0000  C148         C

890          891         0       3  ...   7.7500   NaN         Q

[4387 rows x 12 columns]

```



```python

# same as DataFrame.ne_not_equal_df_no_dup

# but returns the DataFrame (df.loc)

df.ne_not_equal_df_no_dup(0)

Out[29]: 

     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked

0              1         0       3  ...   7.2500   NaN         S

1              2         1       1  ...  71.2833   C85         C

2              3         1       3  ...   7.9250   NaN         S

3              4         1       1  ...  53.1000  C123         S

4              5         0       3  ...   8.0500   NaN         S

..           ...       ...     ...  ...      ...   ...       ...

886          887         0       2  ...  13.0000   NaN         S

887          888         1       1  ...  30.0000   B42         S

888          889         0       3  ...  23.4500   NaN         S

889          890         1       1  ...  30.0000  C148         C

890          891         0       3  ...   7.7500   NaN         Q

[891 rows x 12 columns]

```



```python

returns Bool

df.Survived.ne_not_equal(0)

array([False,  True,  True,  True, False, False, False, False,  True,

        True,  True,  True, False, False, False,  True, False,  True,

       False,  True ...]



df.loc[df.Survived.ne_not_equal(0)]

     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked

1              2         1       1  ...  71.2833   C85         C

2              3         1       3  ...   7.9250   NaN         S

3              4         1       1  ...  53.1000  C123         S

8              9         1       3  ...  11.1333   NaN         S

9             10         1       2  ...  30.0708   NaN         C

..           ...       ...     ...  ...      ...   ...       ...

875          876         1       3  ...   7.2250   NaN         C

879          880         1       1  ...  83.1583   C50         C

880          881         1       2  ...  26.0000   NaN         S

887          888         1       1  ...  30.0000   B42         S

889          890         1       1  ...  30.0000  C148         C

[342 rows x 12 columns]



%timeit df.loc[df.Survived.ne_not_equal(0)]

157 µs ± 390 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df.loc[df.Survived!=0]

229 µs ± 1.46 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

```



```python

# returns index, duplicates are possible

# if the condition is valid for more than one

# column. Exceptions (e.g. wrong dtype) are ignored

df.ne_greater_than_df_ind(100)

array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,

       113, 114, 115...]

```



```python

# Same as DataFrame.ne_greater_than_df_ind

# but gets rid off all duplicates

df.ne_greater_than_df_ind_no_dup(0)

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,

        13,  14,  15,  16...]

```



```python

# Same as DataFrame.ne_greater_than_df_ind

# but returns the DataFrame (df.loc)

df.ne_greater_than_df_dup(0)

Out[22]: 

     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked

0              1         0       3  ...   7.2500   NaN         S

1              2         1       1  ...  71.2833   C85         C

2              3         1       3  ...   7.9250   NaN         S

3              4         1       1  ...  53.1000  C123         S

4              5         0       3  ...   8.0500   NaN         S

..           ...       ...     ...  ...      ...   ...       ...

886          887         0       2  ...  13.0000   NaN         S

887          888         1       1  ...  30.0000   B42         S

888          889         0       3  ...  23.4500   NaN         S

889          890         1       1  ...  30.0000  C148         C

890          891         0       3  ...   7.7500   NaN         Q

[4210 rows x 12 columns]

```



```python

# same as DataFrame.ne_greater_than_df_ind_no_dup

# but returns the DataFrame (df.loc)

df.ne_greater_than_df_no_dup(600)

Out[24]: 

     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked

600          601         1       2  ...  27.0000   NaN         S

601          602         0       3  ...   7.8958   NaN         S

602          603         0       1  ...  42.4000   NaN         S

603          604         0       3  ...   8.0500   NaN         S

604          605         1       1  ...  26.5500   NaN         C

..           ...       ...     ...  ...      ...   ...       ...

886          887         0       2  ...  13.0000   NaN         S

887          888         1       1  ...  30.0000   B42         S

888          889         0       3  ...  23.4500   NaN         S

889          890         1       1  ...  30.0000  C148         C

890          891         0       3  ...   7.7500   NaN         Q

[291 rows x 12 columns]

```



```python

# Returns bool

df.PassengerId.ne_greater_than(5)

Out[26]: 

array([False, False, False, False, False,  True,  True,  True,  True,

        True,  True,  True,  True,  True,  True...]



df.loc[df.PassengerId.ne_greater_than(100)]

%timeit df.loc[df.PassengerId.ne_greater_than(100)]

174 µs ± 375 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df.loc[df.PassengerId>100]

248 µs ± 2.26 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

```



```python

# returns index, duplicates are possible

# if the condition is valid for more than one

# column. Exceptions (e.g. wrong dtype) are ignored



df.ne_less_than_df_ind(10)

array([  0,   1,   2, ..., 881, 884, 890], dtype=int64)

```



```python

# Same as DataFrame.ne_less_than_df_ind

# but without duplicates

df.ne_less_than_df_ind_no_dup(100)

Out[28]: 

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,

        13,  14,  15,  16,  17,  18,  19,  20,  21,...]

```



```python

# Same as DataFrame.ne_less_than_df_ind,

# but returns DataFrame (df.loc)

df.ne_less_than_df_dup(1)

Out[29]: 

     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked

0              1         0       3  ...   7.2500   NaN         S

4              5         0       3  ...   8.0500   NaN         S

5              6         0       3  ...   8.4583   NaN         Q

6              7         0       1  ...  51.8625   E46         S

7              8         0       3  ...  21.0750   NaN         S

..           ...       ...     ...  ...      ...   ...       ...

674          675         0       2  ...   0.0000   NaN         S

732          733         0       2  ...   0.0000   NaN         S

806          807         0       1  ...   0.0000   A36         S

815          816         0       1  ...   0.0000  B102         S

822          823         0       1  ...   0.0000   NaN         S

[1857 rows x 12 columns]

```



```python

# Same as DataFrame.ne_less_than_df_ind_no_dup

# but returns DataFrame (df.loc)

df.ne_less_than_df_no_dup(1)

Out[30]: 

     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked

0              1         0       3  ...   7.2500   NaN         S

1              2         1       1  ...  71.2833   C85         C

2              3         1       3  ...   7.9250   NaN         S

3              4         1       1  ...  53.1000  C123         S

4              5         0       3  ...   8.0500   NaN         S

..           ...       ...     ...  ...      ...   ...       ...

886          887         0       2  ...  13.0000   NaN         S

887          888         1       1  ...  30.0000   B42         S

888          889         0       3  ...  23.4500   NaN         S

889          890         1       1  ...  30.0000  C148         C

890          891         0       3  ...   7.7500   NaN         Q

[834 rows x 12 columns]

```



```python

# returns bool

# df.PassengerId.ne_less_than(100)

df.PassengerId.ne_less_than(100)

Out[31]: 

array([ True,  True,  True,  True,  True,  True...]

%timeit df.loc[df.PassengerId.ne_less_than(100)]

142 µs ± 412 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df.loc[df.PassengerId <100]

212 µs ± 897 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

```



```python

# returns index, duplicates are possible

# if the condition is valid for more than one

# column. Exceptions (e.g. wrong dtype) are ignored

df.ne_greater_than_or_equal_to_df_ind(100)

Out[35]: 

array([ 99, 100, 101, 102, 103, 104, ...]

```



```python

# Same as DataFrame.ne_greater_than_or_equal_to_df_ind ,

# but without duplicates

# df.ne_greater_than_or_equal_to_df_ind_no_dup(100)

df.ne_greater_than_or_equal_to_df_ind_no_dup(100)

Out[36]: 

array([ 27,  31,  88,  99, 100, 101, 102,...] 

```



```python

# Same as DataFrame.ne_greater_than_or_equal_to_df_ind,

# but returns DataFrame (df.loc)

df.ne_greater_than_or_equal_to_df_dup(100)

Out[37]: 

     PassengerId  Survived  Pclass  ...      Fare            Cabin  Embarked

99           100         0       2  ...   26.0000              NaN         S

100          101         0       3  ...    7.8958              NaN         S

101          102         0       3  ...    7.8958              NaN         S

102          103         0       1  ...   77.2875              D26         S

103          104         0       3  ...    8.6542              NaN         S

..           ...       ...     ...  ...       ...              ...       ...

742          743         1       1  ...  262.3750  B57 B59 B63 B66         C

763          764         1       1  ...  120.0000          B96 B98         S

779          780         1       1  ...  211.3375               B3         S

802          803         1       1  ...  120.0000          B96 B98         S

856          857         1       1  ...  164.8667              NaN         S

[845 rows x 12 columns]

```



```python

# Same as DataFrame.ne_greater_than_or_equal_to_df_ind,

# but returns DataFrame (df.loc)

df.ne_greater_than_or_equal_to_df_no_dup(100)

Out[38]: 

     PassengerId  Survived  Pclass  ...      Fare        Cabin  Embarked

27            28         0       1  ...  263.0000  C23 C25 C27         S

31            32         1       1  ...  146.5208          B78         C

88            89         1       1  ...  263.0000  C23 C25 C27         S

99           100         0       2  ...   26.0000          NaN         S

100          101         0       3  ...    7.8958          NaN         S

..           ...       ...     ...  ...       ...          ...       ...

886          887         0       2  ...   13.0000          NaN         S

887          888         1       1  ...   30.0000          B42         S

888          889         0       3  ...   23.4500          NaN         S

889          890         1       1  ...   30.0000         C148         C

890          891         0       3  ...    7.7500          NaN         Q

[795 rows x 12 columns]

```



```python

# returns bool

df.PassengerId.ne_greater_than_or_equal_to(100)

Out[39]: 

array([False, False, False, False, False...])

df.PassengerId.ne_greater_than_or_equal_to(100)

%timeit df.loc[df.PassengerId.ne_greater_than_or_equal_to(100)]

175 µs ± 832 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df.loc[df.PassengerId>=100]

251 µs ± 2.77 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

```



```python

# returns index, duplicates are possible

# if the condition is valid for more than one

# column. Exceptions (e.g. wrong dtype) are ignored

df.ne_less_than_or_equal_to_df_ind(100)

Out[40]: array([  0,   1,   2, ..., 888, 889, 890], dtype=int64)

```



```python

# Same as DataFrame.ne_less_than_or_equal_to_df_ind ,

# but without duplicates

df.ne_less_than_or_equal_to_df_ind_no_dup(100)

Out[41]: 

array([  0,   1,   2,   3,   4,   5,   6,   7,   8, ...])

```



```python

# Same as DataFrame.ne_less_than_or_equal_to_df_ind,

# but returns DataFrame (df.loc)

df.ne_less_than_or_equal_to_df_dup(100)

Out[42]: 

     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked

0              1         0       3  ...   7.2500   NaN         S

1              2         1       1  ...  71.2833   C85         C

2              3         1       3  ...   7.9250   NaN         S

3              4         1       1  ...  53.1000  C123         S

4              5         0       3  ...   8.0500   NaN         S

..           ...       ...     ...  ...      ...   ...       ...

886          887         0       2  ...  13.0000   NaN         S

887          888         1       1  ...  30.0000   B42         S

888          889         0       3  ...  23.4500   NaN         S

889          890         1       1  ...  30.0000  C148         C

890          891         0       3  ...   7.7500   NaN         Q

[5216 rows x 12 columns]

```



```python

# Same as DataFrame.ne_less_than_or_equal_to_df_ind,

# but returns DataFrame (df.loc)

df.ne_less_than_or_equal_to_df_no_dup(0)

Out[53]: 

     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked

0              1         0       3  ...   7.2500   NaN         S

1              2         1       1  ...  71.2833   C85         C

2              3         1       3  ...   7.9250   NaN         S

3              4         1       1  ...  53.1000  C123         S

4              5         0       3  ...   8.0500   NaN         S

..           ...       ...     ...  ...      ...   ...       ...

886          887         0       2  ...  13.0000   NaN         S

887          888         1       1  ...  30.0000   B42         S

888          889         0       3  ...  23.4500   NaN         S

889          890         1       1  ...  30.0000  C148         C

890          891         0       3  ...   7.7500   NaN         Q

[829 rows x 12 columns]

```



```python

# returns bool

df.PassengerId.ne_less_than_or_equal_to(100)

Out[55]: 

array([ True,  True,  True,  True, ....]



%timeit df.loc[df.PassengerId.ne_less_than_or_equal_to(100)]

145 µs ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df.loc[df.PassengerId <=100]

212 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

```



```python

# Combining conditions

%timeit df.loc[df.PassengerId.ne_greater_than(100) & df.Cabin.ne_search_for_string_series_contains('C1')]

360 µs ± 2.56 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit df.loc[(df.PassengerId>100) & df.Cabin.str.contains('C1',na=False)]

552 µs ± 3.49 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

```



```python

# you can pass your own queries

# If you want to compare the DataFrame/Series to another array

# the variable 'b' represents the DataFrame/Series 

# That means: don't use it for something else

wholedict = {'c': np.array([1])}

df[['Survived','Pclass']].ne_query('b == c',local_dict=wholedict)

Out[14]: 

array([[False, False],

       [ True,  True],

       [ True, False],

       ...,

       [False, False],

       [ True,  True],

       [False, False]])





# You can use any NumExpr operator/function

# https://numexpr.readthedocs.io/projects/NumExpr3/en/latest/user_guide.html

# And get a tremendous speedup (even with small DataFrames)

%timeit df['Survived'] + df.Pclass

68.6 µs ± 167 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df['Survived'] * df.Pclass

69 µs ± 260 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df['Survived'] == df.Pclass

72.3 µs ± 817 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)



# You have to pass the Series/Arrays that you are using in the expression as a dict (local_dict)

wholedict = {'c': df.Pclass}

%timeit df['Survived'].ne_query('b + c',local_dict=wholedict)

25.2 µs ± 130 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df['Survived'].ne_query('b * c',local_dict=wholedict)

25.3 µs ± 177 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df['Survived'].ne_query('b == c',local_dict=wholedict)

25.2 µs ± 197 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)



# Exceptions are not ignored

# If you want to compare the DataFrame with a scalar:

df[['Survived','Pclass']].ne_query('b == 1')



# works also for Series

wholedict = {'c': np.array([1])}

df['Survived'].ne_query('b == c',local_dict=wholedict)



# scalar

df['Pclass'].ne_query('b == 1')



%timeit df.loc[df['Pclass'].ne_query('b == 1')]

155 µs ± 530 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit df.loc[df['Pclass'] == 1]

220 µs ± 3.96 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hansalemaos/a_pandas_ex_numexpr",
    "name": "a-pandas-ex-numexpr",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "numexpr,numpy,sort,pandas,series",
    "author": "Johannes Fischer",
    "author_email": "<aulasparticularesdealemaosp@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/7c/fa/67301d80ba78883296c68552984ecfd440d241831289440d87c895ac0c3c/a_pandas_ex_numexpr-0.10.tar.gz",
    "platform": null,
    "description": "\n# Pandas DataFrame Operations 8 times faster (or even more)\n\n\n\nDataFrame.query has never worked for me. On my PC, it has been extremely slow when using small DataFrames, and only a little bit, if at all, faster when using huge DataFrames. \n\n\n\nDataFrame.query uses pd.eval, and pd.eval uses numexpr. The weird thing is that numexpr is insanely fast when it is used against a DataFrame, but nor pd.eval neither DataFrame.query aren\u2019t. First I thought there was a problem with my Pandas/environment configuration, but then I read on the[ Pandas page](https://pandas.pydata.org/docs/user_guide/indexing.html#performance-of-query):\n\n\n\n_You will only see the performance benefits of using the numexpr engine with DataFrame.query() if your frame has more than approximately 200,000 rows._\n\n\n\nWell, **a_pandas_ex_numexpr** adds different methods to the DataFrame/Series classes, and will get tremendous speed-ups **(up to 8 times faster in my tests)** even for small DataFrames. All tests were done using: [https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv](https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv)\n\n\n\n**Let the numbers speak for themselves**\n\n\n\n## How to import / use a_pandas_ex_numexpr\n\n\n\n```python\n\nfrom a_pandas_ex_numexpr import pd_add_numexpr\n\npd_add_numexpr()\n\nimport pandas as pd\n\ndafra = \"https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv\"\n\ndf = pd.read_csv(dafra)\n\n\n\n\n\n\n\ndf\n\nOut[3]: \n\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\n\n0              1         0       3  ...   7.2500   NaN         S\n\n1              2         1       1  ...  71.2833   C85         C\n\n2              3         1       3  ...   7.9250   NaN         S\n\n3              4         1       1  ...  53.1000  C123         S\n\n4              5         0       3  ...   8.0500   NaN         S\n\n..           ...       ...     ...  ...      ...   ...       ...\n\n886          887         0       2  ...  13.0000   NaN         S\n\n887          888         1       1  ...  30.0000   B42         S\n\n888          889         0       3  ...  23.4500   NaN         S\n\n889          890         1       1  ...  30.0000  C148         C\n\n890          891         0       3  ...   7.7500   NaN         Q\n\n[891 rows x 12 columns]\n\n```\n\n\n\n## Speed test - a_pandas_ex_numexpr\n\n\n\n```python\n\n# Code explanation at the end of the page\n\nwholedict = {'c': df.Pclass}\n\n%timeit df['Survived'].ne_query('b * 99.5 / 000.1 + 42123.323211 / 1335523.42232 * c', return_np=True, local_dict=wholedict)\n\n30.8 \u00b5s \u00b1 229 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df['Survived'].ne_query('b * 99.5 / 000.1 + 42123.323211 / 1335523.42232 * c', return_np=False, local_dict=wholedict)\n\n70.1 \u00b5s \u00b1 2.44 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df['Survived'] * 99.5 / 000.1 + 42123.323211 / 1335523.42232 * df['Pclass']\n\n262 \u00b5s \u00b1 4.25 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n%timeit pd.eval(\"df.Survived * 99.5 / 000.1 + 42123.323211 / 1335523.42232 * df.Pclass\") #used by df.query\n\n1.37 ms \u00b1 45.4 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n\n\n\n\ndf['Survived'].ne_query('b * 99.5 / 000.1 + 42123.323211 / 1335523.42232 * c', return_np=False, local_dict=wholedict)\n\nOut[33]: \n\n0        0.094622\n\n1      995.031541\n\n2      995.094622\n\n3      995.031541\n\n4        0.094622\n\n          ...    \n\n886      0.063081\n\n887    995.031541\n\n888      0.094622\n\n889    995.031541\n\n890      0.094622\n\nLength: 891, dtype: float64\n\n\n\n\n\ndf['Survived'] * 99.5 / 000.1 + 42123.323211 / 1335523.42232 * df['Pclass']\n\nOut[34]: \n\n0        0.094622\n\n1      995.031541\n\n2      995.094622\n\n3      995.031541\n\n4        0.094622\n\n          ...    \n\n886      0.063081\n\n887    995.031541\n\n888      0.094622\n\n889    995.031541\n\n890      0.094622\n\nLength: 891, dtype: float64\n\n```\n\n\n\n```python\n\nwholedict = {'c': df.Pclass}\n\n%timeit df['Survived'].ne_query('b * 99.5 * c', return_np=True, local_dict=wholedict)\n\n27 \u00b5s \u00b1 245 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df['Survived'].ne_query('b * 99.5 * c', return_np=False, local_dict=wholedict)\n\n65.7 \u00b5s \u00b1 1.65 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df['Survived'] * 99.5 * df['Pclass']\n\n140 \u00b5s \u00b1 5.46 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit pd.eval(\"df.Survived * 99.5 * df.Pclass\")\n\n916 \u00b5s \u00b1 7.1 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n\n\n```python\n\nwholedict = {'c': df.Pclass}\n\n%timeit df['Survived'].ne_query('b / c', return_np=True, local_dict=wholedict)\n\n26.5 \u00b5s \u00b1 200 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df['Survived'].ne_query('b / c', return_np=False, local_dict=wholedict) # returns a Series\n\n60.3 \u00b5s \u00b1 336 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df['Survived'] / df['Pclass']\n\n68.2 \u00b5s \u00b1 599 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit pd.eval(\"df.Survived / df.Pclass\")\n\n929 \u00b5s \u00b1 31.7 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n\n\n## All functions/methods\n\n\n\n## Speed of some \u201cready to use methods\u201d for Series\n\n\n\n```python\n\n%timeit df.loc[df.PassengerId.ne_less_than(100)]\n\n142 \u00b5s \u00b1 412 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df.loc[df.PassengerId <100]\n\n212 \u00b5s \u00b1 897 ns per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n##############################################################\n\n%timeit df.loc[df.Survived.ne_not_equal(0)]\n\n157 \u00b5s \u00b1 390 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df.loc[df.Survived!=0]\n\n229 \u00b5s \u00b1 1.46 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n##############################################################\n\n%timeit df.loc[df.PassengerId.ne_greater_than(100)]\n\n174 \u00b5s \u00b1 375 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df.loc[df.PassengerId>100]\n\n248 \u00b5s \u00b1 2.26 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n##############################################################\n\n%timeit df.loc[df.PassengerId.ne_equal(1)]\n\n138 \u00b5s \u00b1 626 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df.loc[df.PassengerId == 1]\n\n209 \u00b5s \u00b1 1.04 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n##############################################################\n\n%timeit df.loc[df.Cabin.ne_search_for_string_contains('C1')]\n\n329 \u00b5s \u00b1 1.18 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n%timeit df.loc[df.Cabin.str.contains('C1',na=False)]\n\n403 \u00b5s \u00b1 924 ns per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n##############################################################\n\n%timeit df.loc[df.PassengerId.ne_greater_than_or_equal_to(100)]\n\n175 \u00b5s \u00b1 832 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df.loc[df.PassengerId>=100]\n\n251 \u00b5s \u00b1 2.77 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n##############################################################\n\n%timeit df.loc[df.PassengerId.ne_less_than_or_equal_to(100)]\n\n145 \u00b5s \u00b1 1.82 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df.loc[df.PassengerId <=100]\n\n212 \u00b5s \u00b1 1.63 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n##############################################################\n\n```\n\n\n\n## Overview - all methods for DataFrames/Series\n\n\n\n```python\n\n# Always use 'b' as the variable for the Series/DataFrame\n\ndf.ne_search_in_all_columns('b == 1')\n\narray([  0,   1,   2,   3,   8,   9,  10,  11,  15,  17,  19,  21,  22,\n\n        23,  25,  28,  31,  32,  36,  39,  43,  44,  47,  52,  53,  55,\n\n        56,  58,  61,  65,  66,  68,  74,  78,  79,  81,  82,  84,  85,\n\n        88,  97,  98, 106, 107, 109, 123, 125, 127, 128, 133, 136, 141,\n\n       142, 146, 151, 156, 161, 165, 166, 172, 183, 184, 186, 187, 190,\n\n       192, 193, 194, 195, 198 ...]\n\n```\n\n\n\n```python\n\n    # Returns duplicated index if the value is found in\n\n    # several columns. Exceptions will be ignored\n\n    # the dtype argument is useful when searching for\n\n    # strings -> dtype='S' (ascii only)\n\n    df.ne_search_in_all_columns('b == \"1\"', dtype='S')\n\n\n\narray([  0,   1,   2,   3,   8,   9,  10,  11,  15,  17,  19,  21,  22,\n\n        23,  25,  28,  31,  32,  36,  39,  43,  44,  47,  52,  53,  55,\n\n        56,  58,  61,  65,  66,  68,  74,  78,  79,  81,  82,  84,  85,\n\n        88,  97,  98, 106, 107, 109, ...]\n\n```\n\n\n\n```python\n\n    # Converts all columns to  dtype='S' before searching\n\n    # Might not work with special characters\n\n    # UnicodeEncodeError: 'ascii' codec can't encode character '\\xe4' in position 0:\n\n    df.ne_search_string_allhits_contains('C1')\n\nOut[6]: \n\n     PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked\n\n3              4         1       1  ...   53.1000  C123         S\n\n11            12         1       1  ...   26.5500  C103         S\n\n110          111         0       1  ...   52.0000  C110         S\n\n137          138         0       1  ...   53.1000  C123         S\n\n268          269         1       1  ...  153.4625  C125         S\n\n273          274         0       1  ...   29.7000  C118         C\n\n298          299         1       1  ...   30.5000  C106         S\n\n331          332         0       1  ...   28.5000  C124         S\n\n351          352         0       1  ...   35.0000  C128         S\n\n449          450         1       1  ...   30.5000  C104         S\n\n452          453         0       1  ...   27.7500  C111         C\n\n571          572         1       1  ...   51.4792  C101         S\n\n609          610         1       1  ...  153.4625  C125         S\n\n669          670         1       1  ...   52.0000  C126         S\n\n711          712         0       1  ...   26.5500  C124         S\n\n712          713         1       1  ...   52.0000  C126         S\n\n889          890         1       1  ...   30.0000  C148         C\n\n[17 rows x 12 columns]\n\n```\n\n\n\n\n\n```python\n\n# Series doesn't return duplicated results\n\ndf.Cabin.ne_search_string_allhits_contains('C1')\n\nOut[9]: \n\n3      C123\n\n11     C103\n\n110    C110\n\n137    C123\n\n268    C125\n\n273    C118\n\n298    C106\n\n331    C124\n\n351    C128\n\n449    C104\n\n452    C111\n\n571    C101\n\n609    C125\n\n669    C126\n\n711    C124\n\n712    C126\n\n889    C148\n\nName: Cabin, dtype: object\n\n\n\n%timeit df.Cabin.ne_search_string_allhits_contains('C1')\n\n274 \u00b5s \u00b1 2.74 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n\n\n%timeit df.Cabin.loc[df.Cabin.str.contains('C1', na=False)]\n\n351 \u00b5s \u00b1 1.16 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n\n\n```python\n\n# All rows where the string/substring C1 is found.\n\n# Numbers are converted to string (ascii)\n\ndf.ne_search_string_dataframe_contains('C1')\n\nOut[13]: \n\n     PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked\n\n3              4         1       1  ...   53.1000  C123         S\n\n11            12         1       1  ...   26.5500  C103         S\n\n110          111         0       1  ...   52.0000  C110         S\n\n137          138         0       1  ...   53.1000  C123         S\n\n268          269         1       1  ...  153.4625  C125         S\n\n273          274         0       1  ...   29.7000  C118         C\n\n298          299         1       1  ...   30.5000  C106         S\n\n331          332         0       1  ...   28.5000  C124         S\n\n351          352         0       1  ...   35.0000  C128         S\n\n449          450         1       1  ...   30.5000  C104         S\n\n452          453         0       1  ...   27.7500  C111         C\n\n571          572         1       1  ...   51.4792  C101         S\n\n609          610         1       1  ...  153.4625  C125         S\n\n669          670         1       1  ...   52.0000  C126         S\n\n711          712         0       1  ...   26.5500  C124         S\n\n712          713         1       1  ...   52.0000  C126         S\n\n889          890         1       1  ...   30.0000  C148         C\n\n[17 rows x 12 columns]\n\n\n\n\n\ndf.ne_search_string_dataframe_contains('610')\n\nOut[14]: \n\n     PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked\n\n194          195         1       1  ...   27.7208    B4         C\n\n609          610         1       1  ...  153.4625  C125         S\n\n[2 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# Converts all columns to ascii and searches in each column\n\n# For each presence in a column, you  get a duplicate of the index\n\ndf.ne_search_string_dataframe_allhits_equal('1')\n\ndf.ne_search_string_dataframe_allhits_equal('1')\n\nOut[15]: \n\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\n\n0              1         0       3  ...   7.2500   NaN         S\n\n0              1         0       3  ...   7.2500   NaN         S\n\n1              2         1       1  ...  71.2833   C85         C\n\n1              2         1       1  ...  71.2833   C85         C\n\n1              2         1       1  ...  71.2833   C85         C\n\n..           ...       ...     ...  ...      ...   ...       ...\n\n887          888         1       1  ...  30.0000   B42         S\n\n887          888         1       1  ...  30.0000   B42         S\n\n888          889         0       3  ...  23.4500   NaN         S\n\n889          890         1       1  ...  30.0000  C148         C\n\n889          890         1       1  ...  30.0000  C148         C\n\n[886 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# All equal strings in a Series\n\ndf.Embarked.ne_search_string_dataframe_allhits_equal('S')\n\n Out[16]: \n\n0      S\n\n2      S\n\n3      S\n\n4      S\n\n6      S\n\n      ..\n\n883    S\n\n884    S\n\n886    S\n\n887    S\n\n888    S\n\nName: Embarked, Length: 644, dtype: object\n\n\n\n%timeit df.Embarked.ne_search_string_dataframe_allhits_equal('S')\n\n160 \u00b5s \u00b1 2.14 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df.Embarked.loc[df.Embarked=='S']\n\n178 \u00b5s \u00b1 3.04 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n```\n\n\n\n```python\n\n# Converts the whole df to ascii and checks where the\n\n# the value is present. Exceptions are ignored\n\ndf.ne_search_string_dataframe_equal('C123')\n\nOut[20]: \n\n     PassengerId  Survived  Pclass  ...  Fare Cabin  Embarked\n\n3              4         1       1  ...  53.1  C123         S\n\n137          138         0       1  ...  53.1  C123         S\n\n[2 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# Might not be efficient (The only method that was slower during testing)!  \n\n%timeit df.Cabin.loc[df.Cabin.ne_search_for_string_series_equal('C123')]\n\n252 \u00b5s \u00b1 1.02 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n%timeit df.Cabin.loc[df.Cabin=='C123']\n\n158 \u00b5s \u00b1 1.28 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n\n\nOut[21]: \n\n     PassengerId  Survived  Pclass  ...  Fare Cabin  Embarked\n\n3              4         1       1  ...  53.1  C123         S\n\n137          138         0       1  ...  53.1  C123         S\n\n[2 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# Returns bool values\n\ndf.loc[df.ne_search_for_string_contains('C1')]\n\nOut[7]: \n\narray([[False, False, False, ..., False, False, False],\n\n       [False, False, False, ..., False, False, False],\n\n       [False, False, False, ..., False, False, False],\n\n       ...,\n\n       [False, False, False, ..., False, False, False],\n\n       [False, False, False, ..., False,  True, False],\n\n       [False, False, False, ..., False, False, False]])\n\n```\n\n\n\n```python\n\n# returns Bool\n\ndf.loc[df.Cabin.ne_search_for_string_contains('C1')]\n\n\n\nOut[14]: \n\n     PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked\n\n3              4         1       1  ...   53.1000  C123         S\n\n11            12         1       1  ...   26.5500  C103         S\n\n110          111         0       1  ...   52.0000  C110         S\n\n137          138         0       1  ...   53.1000  C123         S\n\n268          269         1       1  ...  153.4625  C125         S\n\n273          274         0       1  ...   29.7000  C118         C\n\n298          299         1       1  ...   30.5000  C106         S\n\n331          332         0       1  ...   28.5000  C124         S\n\n351          352         0       1  ...   35.0000  C128         S\n\n449          450         1       1  ...   30.5000  C104         S\n\n452          453         0       1  ...   27.7500  C111         C\n\n571          572         1       1  ...   51.4792  C101         S\n\n609          610         1       1  ...  153.4625  C125         S\n\n669          670         1       1  ...   52.0000  C126         S\n\n711          712         0       1  ...   26.5500  C124         S\n\n712          713         1       1  ...   52.0000  C126         S\n\n889          890         1       1  ...   30.0000  C148         C\n\n[17 rows x 12 columns]\n\n\n\n\n\n%timeit df.loc[df.Cabin.ne_search_for_string_contains('C1')]\n\n329 \u00b5s \u00b1 1.18 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n%timeit df.loc[df.Cabin.str.contains('C1',na=False)]\n\n403 \u00b5s \u00b1 924 ns per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n\n\n```python\n\n# Returns the index of all rows where the value was found.\n\n# Exceptions (e.g. wrong datatype etc.) are ignored.\n\n# duplicates (more positive results in one row) are not deleted\n\ndf.ne_equal_df_ind(1)\n\narray([  0,   1,   2,   3,   8,   9,  10,  11,  15,  17,  19,  21,  22,\n\n        23,  25,  28,  31,  32,  36,  39,  43,  44,  47,  52,  53,  55,\n\n        56,  58,  61,  65,  66,  68,  74,  78,  79,  81,  82,  84,  85,\n\n        88,  97,  98, 106, 107, 109, 123...]\n\n```\n\n\n\n```python\n\n# You can pass dtype='S' to convert the values to string \n\n# (or other formats) before performing the search.\n\n# df.ne_equal_df_ind(b'1', 'S')\n\n# If you use 'S', you have to pass a binary value\n\ndf.ne_equal_df_ind(b'1', 'S')\n\nOut[16]: \n\narray([  0,   1,   2,   3,   8,   9,  10,  11,  15,  17,  19,  21,  22,\n\n        23,  25,  28,  31,  32,  36,  39,  43,  44,  47,  52,  53,  55,\n\n        56,  58,  61,  65,  66,  68,  74,...]\n\n```\n\n\n\n```python\n\n# same as DataFrame.ne_equal_df_ind\n\n# but deletes all duplicates\n\ndf.ne_equal_df_ind_no_dup(b'1', 'S')\n\narray([  0,   1,   2,   3,   6,   7,   8,   9,  10,  11,  13,  15,  16,\n\n        17,  18,  19,  21,  22,  23,  24,  25,  27,  28,  30,  31,  32,\n\n        34,  35,  36,  39,  40,  41,  43,  44\n\n```\n\n\n\n```python\n\n# Same as DataFrame.ne_equal_df_ind,\n\n# but returns the DataFrame (df.loc[])\n\ndf.ne_equal_df_dup(b'1', 'S')\n\nOut[18]: \n\n     PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked\n\n0              1         0       3  ...    7.2500   NaN         S\n\n1              2         1       1  ...   71.2833   C85         C\n\n2              3         1       3  ...    7.9250   NaN         S\n\n3              4         1       1  ...   53.1000  C123         S\n\n8              9         1       3  ...   11.1333   NaN         S\n\n..           ...       ...     ...  ...       ...   ...       ...\n\n856          857         1       1  ...  164.8667   NaN         S\n\n869          870         1       3  ...   11.1333   NaN         S\n\n871          872         1       1  ...   52.5542   D35         S\n\n879          880         1       1  ...   83.1583   C50         C\n\n880          881         1       2  ...   26.0000   NaN         S\n\n[886 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# Same as DataFrame.ne_equal_df_ind_no_dup\n\n# but returns the DataFrame (df.loc)\n\ndf.ne_equal_df_no_dup(b'1', 'S')\n\nOut[19]: \n\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\n\n0              1         0       3  ...   7.2500   NaN         S\n\n1              2         1       1  ...  71.2833   C85         C\n\n2              3         1       3  ...   7.9250   NaN         S\n\n3              4         1       1  ...  53.1000  C123         S\n\n6              7         0       1  ...  51.8625   E46         S\n\n..           ...       ...     ...  ...      ...   ...       ...\n\n879          880         1       1  ...  83.1583   C50         C\n\n880          881         1       2  ...  26.0000   NaN         S\n\n887          888         1       1  ...  30.0000   B42         S\n\n888          889         0       3  ...  23.4500   NaN         S\n\n889          890         1       1  ...  30.0000  C148         C\n\n[524 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# Returns bool\n\narray([ True, False, False, False, False, False, False, False, False,\n\n       False, False, False, ...]\n\ndf.loc[df.PassengerId.ne_equal(1)]\n\n%timeit df.loc[df.PassengerId.ne_equal(1)]\n\n138 \u00b5s \u00b1 626 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df.loc[df.PassengerId == 1]\n\n209 \u00b5s \u00b1 1.04 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n\n\n```python\n\n# Every time the condition is False, the index is\n\n# added to the return value.\n\n# Example:\n\n# A row has 6 columns. 2 of them have the value 1.\n\n# That means the index of the row will be added 4 times\n\n# to the final result\n\ndf.loc[df.ne_not_equal_df_ind(1)]\n\narray([  1,   2,   3, ..., 888, 889, 890], dtype=int64)\n\n\n\ndf.loc[df.ne_not_equal_df_ind(1)]\n\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\n\n1              2         1       1  ...  71.2833   C85         C\n\n2              3         1       3  ...   7.9250   NaN         S\n\n3              4         1       1  ...  53.1000  C123         S\n\n4              5         0       3  ...   8.0500   NaN         S\n\n5              6         0       3  ...   8.4583   NaN         Q\n\n..           ...       ...     ...  ...      ...   ...       ...\n\n886          887         0       2  ...  13.0000   NaN         S\n\n887          888         1       1  ...  30.0000   B42         S\n\n888          889         0       3  ...  23.4500   NaN         S\n\n889          890         1       1  ...  30.0000  C148         C\n\n890          891         0       3  ...   7.7500   NaN         Q\n\n[5344 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# Same as DataFrame.ne_not_equal_df_ind\n\n# but drops all duplicates\n\n\n\ndf.ne_not_equal_df_ind_no_dup(0)\n\narray([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,\n\n        13,  14,  15,  16,...]\n\n\n\ndf.loc[df.ne_not_equal_df_ind_no_dup(0)]\n\nOut[26]: \n\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\n\n0              1         0       3  ...   7.2500   NaN         S\n\n1              2         1       1  ...  71.2833   C85         C\n\n2              3         1       3  ...   7.9250   NaN         S\n\n3              4         1       1  ...  53.1000  C123         S\n\n4              5         0       3  ...   8.0500   NaN         S\n\n..           ...       ...     ...  ...      ...   ...       ...\n\n886          887         0       2  ...  13.0000   NaN         S\n\n887          888         1       1  ...  30.0000   B42         S\n\n888          889         0       3  ...  23.4500   NaN         S\n\n889          890         1       1  ...  30.0000  C148         C\n\n890          891         0       3  ...   7.7500   NaN         Q\n\n[891 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# same as DataFrame.ne_not_equal_df_ind\n\n# but returns the DataFrame (df.loc)\n\ndf.ne_not_equal_df_dup(0)\n\nOut[28]: \n\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\n\n0              1         0       3  ...   7.2500   NaN         S\n\n1              2         1       1  ...  71.2833   C85         C\n\n2              3         1       3  ...   7.9250   NaN         S\n\n3              4         1       1  ...  53.1000  C123         S\n\n4              5         0       3  ...   8.0500   NaN         S\n\n..           ...       ...     ...  ...      ...   ...       ...\n\n886          887         0       2  ...  13.0000   NaN         S\n\n887          888         1       1  ...  30.0000   B42         S\n\n888          889         0       3  ...  23.4500   NaN         S\n\n889          890         1       1  ...  30.0000  C148         C\n\n890          891         0       3  ...   7.7500   NaN         Q\n\n[4387 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# same as DataFrame.ne_not_equal_df_no_dup\n\n# but returns the DataFrame (df.loc)\n\ndf.ne_not_equal_df_no_dup(0)\n\nOut[29]: \n\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\n\n0              1         0       3  ...   7.2500   NaN         S\n\n1              2         1       1  ...  71.2833   C85         C\n\n2              3         1       3  ...   7.9250   NaN         S\n\n3              4         1       1  ...  53.1000  C123         S\n\n4              5         0       3  ...   8.0500   NaN         S\n\n..           ...       ...     ...  ...      ...   ...       ...\n\n886          887         0       2  ...  13.0000   NaN         S\n\n887          888         1       1  ...  30.0000   B42         S\n\n888          889         0       3  ...  23.4500   NaN         S\n\n889          890         1       1  ...  30.0000  C148         C\n\n890          891         0       3  ...   7.7500   NaN         Q\n\n[891 rows x 12 columns]\n\n```\n\n\n\n```python\n\nreturns Bool\n\ndf.Survived.ne_not_equal(0)\n\narray([False,  True,  True,  True, False, False, False, False,  True,\n\n        True,  True,  True, False, False, False,  True, False,  True,\n\n       False,  True ...]\n\n\n\ndf.loc[df.Survived.ne_not_equal(0)]\n\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\n\n1              2         1       1  ...  71.2833   C85         C\n\n2              3         1       3  ...   7.9250   NaN         S\n\n3              4         1       1  ...  53.1000  C123         S\n\n8              9         1       3  ...  11.1333   NaN         S\n\n9             10         1       2  ...  30.0708   NaN         C\n\n..           ...       ...     ...  ...      ...   ...       ...\n\n875          876         1       3  ...   7.2250   NaN         C\n\n879          880         1       1  ...  83.1583   C50         C\n\n880          881         1       2  ...  26.0000   NaN         S\n\n887          888         1       1  ...  30.0000   B42         S\n\n889          890         1       1  ...  30.0000  C148         C\n\n[342 rows x 12 columns]\n\n\n\n%timeit df.loc[df.Survived.ne_not_equal(0)]\n\n157 \u00b5s \u00b1 390 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df.loc[df.Survived!=0]\n\n229 \u00b5s \u00b1 1.46 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n\n\n```python\n\n# returns index, duplicates are possible\n\n# if the condition is valid for more than one\n\n# column. Exceptions (e.g. wrong dtype) are ignored\n\ndf.ne_greater_than_df_ind(100)\n\narray([100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,\n\n       113, 114, 115...]\n\n```\n\n\n\n```python\n\n# Same as DataFrame.ne_greater_than_df_ind\n\n# but gets rid off all duplicates\n\ndf.ne_greater_than_df_ind_no_dup(0)\n\narray([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,\n\n        13,  14,  15,  16...]\n\n```\n\n\n\n```python\n\n# Same as DataFrame.ne_greater_than_df_ind\n\n# but returns the DataFrame (df.loc)\n\ndf.ne_greater_than_df_dup(0)\n\nOut[22]: \n\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\n\n0              1         0       3  ...   7.2500   NaN         S\n\n1              2         1       1  ...  71.2833   C85         C\n\n2              3         1       3  ...   7.9250   NaN         S\n\n3              4         1       1  ...  53.1000  C123         S\n\n4              5         0       3  ...   8.0500   NaN         S\n\n..           ...       ...     ...  ...      ...   ...       ...\n\n886          887         0       2  ...  13.0000   NaN         S\n\n887          888         1       1  ...  30.0000   B42         S\n\n888          889         0       3  ...  23.4500   NaN         S\n\n889          890         1       1  ...  30.0000  C148         C\n\n890          891         0       3  ...   7.7500   NaN         Q\n\n[4210 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# same as DataFrame.ne_greater_than_df_ind_no_dup\n\n# but returns the DataFrame (df.loc)\n\ndf.ne_greater_than_df_no_dup(600)\n\nOut[24]: \n\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\n\n600          601         1       2  ...  27.0000   NaN         S\n\n601          602         0       3  ...   7.8958   NaN         S\n\n602          603         0       1  ...  42.4000   NaN         S\n\n603          604         0       3  ...   8.0500   NaN         S\n\n604          605         1       1  ...  26.5500   NaN         C\n\n..           ...       ...     ...  ...      ...   ...       ...\n\n886          887         0       2  ...  13.0000   NaN         S\n\n887          888         1       1  ...  30.0000   B42         S\n\n888          889         0       3  ...  23.4500   NaN         S\n\n889          890         1       1  ...  30.0000  C148         C\n\n890          891         0       3  ...   7.7500   NaN         Q\n\n[291 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# Returns bool\n\ndf.PassengerId.ne_greater_than(5)\n\nOut[26]: \n\narray([False, False, False, False, False,  True,  True,  True,  True,\n\n        True,  True,  True,  True,  True,  True...]\n\n\n\ndf.loc[df.PassengerId.ne_greater_than(100)]\n\n%timeit df.loc[df.PassengerId.ne_greater_than(100)]\n\n174 \u00b5s \u00b1 375 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df.loc[df.PassengerId>100]\n\n248 \u00b5s \u00b1 2.26 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n\n\n```python\n\n# returns index, duplicates are possible\n\n# if the condition is valid for more than one\n\n# column. Exceptions (e.g. wrong dtype) are ignored\n\n\n\ndf.ne_less_than_df_ind(10)\n\narray([  0,   1,   2, ..., 881, 884, 890], dtype=int64)\n\n```\n\n\n\n```python\n\n# Same as DataFrame.ne_less_than_df_ind\n\n# but without duplicates\n\ndf.ne_less_than_df_ind_no_dup(100)\n\nOut[28]: \n\narray([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,\n\n        13,  14,  15,  16,  17,  18,  19,  20,  21,...]\n\n```\n\n\n\n```python\n\n# Same as DataFrame.ne_less_than_df_ind,\n\n# but returns DataFrame (df.loc)\n\ndf.ne_less_than_df_dup(1)\n\nOut[29]: \n\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\n\n0              1         0       3  ...   7.2500   NaN         S\n\n4              5         0       3  ...   8.0500   NaN         S\n\n5              6         0       3  ...   8.4583   NaN         Q\n\n6              7         0       1  ...  51.8625   E46         S\n\n7              8         0       3  ...  21.0750   NaN         S\n\n..           ...       ...     ...  ...      ...   ...       ...\n\n674          675         0       2  ...   0.0000   NaN         S\n\n732          733         0       2  ...   0.0000   NaN         S\n\n806          807         0       1  ...   0.0000   A36         S\n\n815          816         0       1  ...   0.0000  B102         S\n\n822          823         0       1  ...   0.0000   NaN         S\n\n[1857 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# Same as DataFrame.ne_less_than_df_ind_no_dup\n\n# but returns DataFrame (df.loc)\n\ndf.ne_less_than_df_no_dup(1)\n\nOut[30]: \n\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\n\n0              1         0       3  ...   7.2500   NaN         S\n\n1              2         1       1  ...  71.2833   C85         C\n\n2              3         1       3  ...   7.9250   NaN         S\n\n3              4         1       1  ...  53.1000  C123         S\n\n4              5         0       3  ...   8.0500   NaN         S\n\n..           ...       ...     ...  ...      ...   ...       ...\n\n886          887         0       2  ...  13.0000   NaN         S\n\n887          888         1       1  ...  30.0000   B42         S\n\n888          889         0       3  ...  23.4500   NaN         S\n\n889          890         1       1  ...  30.0000  C148         C\n\n890          891         0       3  ...   7.7500   NaN         Q\n\n[834 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# returns bool\n\n# df.PassengerId.ne_less_than(100)\n\ndf.PassengerId.ne_less_than(100)\n\nOut[31]: \n\narray([ True,  True,  True,  True,  True,  True...]\n\n%timeit df.loc[df.PassengerId.ne_less_than(100)]\n\n142 \u00b5s \u00b1 412 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df.loc[df.PassengerId <100]\n\n212 \u00b5s \u00b1 897 ns per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n\n\n```python\n\n# returns index, duplicates are possible\n\n# if the condition is valid for more than one\n\n# column. Exceptions (e.g. wrong dtype) are ignored\n\ndf.ne_greater_than_or_equal_to_df_ind(100)\n\nOut[35]: \n\narray([ 99, 100, 101, 102, 103, 104, ...]\n\n```\n\n\n\n```python\n\n# Same as DataFrame.ne_greater_than_or_equal_to_df_ind ,\n\n# but without duplicates\n\n# df.ne_greater_than_or_equal_to_df_ind_no_dup(100)\n\ndf.ne_greater_than_or_equal_to_df_ind_no_dup(100)\n\nOut[36]: \n\narray([ 27,  31,  88,  99, 100, 101, 102,...] \n\n```\n\n\n\n```python\n\n# Same as DataFrame.ne_greater_than_or_equal_to_df_ind,\n\n# but returns DataFrame (df.loc)\n\ndf.ne_greater_than_or_equal_to_df_dup(100)\n\nOut[37]: \n\n     PassengerId  Survived  Pclass  ...      Fare            Cabin  Embarked\n\n99           100         0       2  ...   26.0000              NaN         S\n\n100          101         0       3  ...    7.8958              NaN         S\n\n101          102         0       3  ...    7.8958              NaN         S\n\n102          103         0       1  ...   77.2875              D26         S\n\n103          104         0       3  ...    8.6542              NaN         S\n\n..           ...       ...     ...  ...       ...              ...       ...\n\n742          743         1       1  ...  262.3750  B57 B59 B63 B66         C\n\n763          764         1       1  ...  120.0000          B96 B98         S\n\n779          780         1       1  ...  211.3375               B3         S\n\n802          803         1       1  ...  120.0000          B96 B98         S\n\n856          857         1       1  ...  164.8667              NaN         S\n\n[845 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# Same as DataFrame.ne_greater_than_or_equal_to_df_ind,\n\n# but returns DataFrame (df.loc)\n\ndf.ne_greater_than_or_equal_to_df_no_dup(100)\n\nOut[38]: \n\n     PassengerId  Survived  Pclass  ...      Fare        Cabin  Embarked\n\n27            28         0       1  ...  263.0000  C23 C25 C27         S\n\n31            32         1       1  ...  146.5208          B78         C\n\n88            89         1       1  ...  263.0000  C23 C25 C27         S\n\n99           100         0       2  ...   26.0000          NaN         S\n\n100          101         0       3  ...    7.8958          NaN         S\n\n..           ...       ...     ...  ...       ...          ...       ...\n\n886          887         0       2  ...   13.0000          NaN         S\n\n887          888         1       1  ...   30.0000          B42         S\n\n888          889         0       3  ...   23.4500          NaN         S\n\n889          890         1       1  ...   30.0000         C148         C\n\n890          891         0       3  ...    7.7500          NaN         Q\n\n[795 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# returns bool\n\ndf.PassengerId.ne_greater_than_or_equal_to(100)\n\nOut[39]: \n\narray([False, False, False, False, False...])\n\ndf.PassengerId.ne_greater_than_or_equal_to(100)\n\n%timeit df.loc[df.PassengerId.ne_greater_than_or_equal_to(100)]\n\n175 \u00b5s \u00b1 832 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df.loc[df.PassengerId>=100]\n\n251 \u00b5s \u00b1 2.77 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n\n\n```python\n\n# returns index, duplicates are possible\n\n# if the condition is valid for more than one\n\n# column. Exceptions (e.g. wrong dtype) are ignored\n\ndf.ne_less_than_or_equal_to_df_ind(100)\n\nOut[40]: array([  0,   1,   2, ..., 888, 889, 890], dtype=int64)\n\n```\n\n\n\n```python\n\n# Same as DataFrame.ne_less_than_or_equal_to_df_ind ,\n\n# but without duplicates\n\ndf.ne_less_than_or_equal_to_df_ind_no_dup(100)\n\nOut[41]: \n\narray([  0,   1,   2,   3,   4,   5,   6,   7,   8, ...])\n\n```\n\n\n\n```python\n\n# Same as DataFrame.ne_less_than_or_equal_to_df_ind,\n\n# but returns DataFrame (df.loc)\n\ndf.ne_less_than_or_equal_to_df_dup(100)\n\nOut[42]: \n\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\n\n0              1         0       3  ...   7.2500   NaN         S\n\n1              2         1       1  ...  71.2833   C85         C\n\n2              3         1       3  ...   7.9250   NaN         S\n\n3              4         1       1  ...  53.1000  C123         S\n\n4              5         0       3  ...   8.0500   NaN         S\n\n..           ...       ...     ...  ...      ...   ...       ...\n\n886          887         0       2  ...  13.0000   NaN         S\n\n887          888         1       1  ...  30.0000   B42         S\n\n888          889         0       3  ...  23.4500   NaN         S\n\n889          890         1       1  ...  30.0000  C148         C\n\n890          891         0       3  ...   7.7500   NaN         Q\n\n[5216 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# Same as DataFrame.ne_less_than_or_equal_to_df_ind,\n\n# but returns DataFrame (df.loc)\n\ndf.ne_less_than_or_equal_to_df_no_dup(0)\n\nOut[53]: \n\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\n\n0              1         0       3  ...   7.2500   NaN         S\n\n1              2         1       1  ...  71.2833   C85         C\n\n2              3         1       3  ...   7.9250   NaN         S\n\n3              4         1       1  ...  53.1000  C123         S\n\n4              5         0       3  ...   8.0500   NaN         S\n\n..           ...       ...     ...  ...      ...   ...       ...\n\n886          887         0       2  ...  13.0000   NaN         S\n\n887          888         1       1  ...  30.0000   B42         S\n\n888          889         0       3  ...  23.4500   NaN         S\n\n889          890         1       1  ...  30.0000  C148         C\n\n890          891         0       3  ...   7.7500   NaN         Q\n\n[829 rows x 12 columns]\n\n```\n\n\n\n```python\n\n# returns bool\n\ndf.PassengerId.ne_less_than_or_equal_to(100)\n\nOut[55]: \n\narray([ True,  True,  True,  True, ....]\n\n\n\n%timeit df.loc[df.PassengerId.ne_less_than_or_equal_to(100)]\n\n145 \u00b5s \u00b1 1.82 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df.loc[df.PassengerId <=100]\n\n212 \u00b5s \u00b1 1.63 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n\n\n```python\n\n# Combining conditions\n\n%timeit df.loc[df.PassengerId.ne_greater_than(100) & df.Cabin.ne_search_for_string_series_contains('C1')]\n\n360 \u00b5s \u00b1 2.56 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n%timeit df.loc[(df.PassengerId>100) & df.Cabin.str.contains('C1',na=False)]\n\n552 \u00b5s \u00b1 3.49 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n\n\n```python\n\n# you can pass your own queries\n\n# If you want to compare the DataFrame/Series to another array\n\n# the variable 'b' represents the DataFrame/Series \n\n# That means: don't use it for something else\n\nwholedict = {'c': np.array([1])}\n\ndf[['Survived','Pclass']].ne_query('b == c',local_dict=wholedict)\n\nOut[14]: \n\narray([[False, False],\n\n       [ True,  True],\n\n       [ True, False],\n\n       ...,\n\n       [False, False],\n\n       [ True,  True],\n\n       [False, False]])\n\n\n\n\n\n# You can use any NumExpr operator/function\n\n# https://numexpr.readthedocs.io/projects/NumExpr3/en/latest/user_guide.html\n\n# And get a tremendous speedup (even with small DataFrames)\n\n%timeit df['Survived'] + df.Pclass\n\n68.6 \u00b5s \u00b1 167 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df['Survived'] * df.Pclass\n\n69 \u00b5s \u00b1 260 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df['Survived'] == df.Pclass\n\n72.3 \u00b5s \u00b1 817 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n\n\n# You have to pass the Series/Arrays that you are using in the expression as a dict (local_dict)\n\nwholedict = {'c': df.Pclass}\n\n%timeit df['Survived'].ne_query('b + c',local_dict=wholedict)\n\n25.2 \u00b5s \u00b1 130 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df['Survived'].ne_query('b * c',local_dict=wholedict)\n\n25.3 \u00b5s \u00b1 177 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df['Survived'].ne_query('b == c',local_dict=wholedict)\n\n25.2 \u00b5s \u00b1 197 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n\n\n# Exceptions are not ignored\n\n# If you want to compare the DataFrame with a scalar:\n\ndf[['Survived','Pclass']].ne_query('b == 1')\n\n\n\n# works also for Series\n\nwholedict = {'c': np.array([1])}\n\ndf['Survived'].ne_query('b == c',local_dict=wholedict)\n\n\n\n# scalar\n\ndf['Pclass'].ne_query('b == 1')\n\n\n\n%timeit df.loc[df['Pclass'].ne_query('b == 1')]\n\n155 \u00b5s \u00b1 530 ns per loop (mean \u00b1 std. dev. of 7 runs, 10,000 loops each)\n\n%timeit df.loc[df['Pclass'] == 1]\n\n220 \u00b5s \u00b1 3.96 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n```\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Pandas DataFrame/Series operations 8 times faster (or even more)",
    "version": "0.10",
    "split_keywords": [
        "numexpr",
        "numpy",
        "sort",
        "pandas",
        "series"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5580bd0efe141a087946f4e5ac1fea3ff25f47da8f8b0f4d322af6c9a981473a",
                "md5": "fefeea7af88af6ff0ad3274ff24937da",
                "sha256": "ef3fcbb36a2bbd67558da0076277c5d46f3364a7117a3311c581a4cc1702f825"
            },
            "downloads": -1,
            "filename": "a_pandas_ex_numexpr-0.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fefeea7af88af6ff0ad3274ff24937da",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 16788,
            "upload_time": "2023-02-03T00:27:19",
            "upload_time_iso_8601": "2023-02-03T00:27:19.195393Z",
            "url": "https://files.pythonhosted.org/packages/55/80/bd0efe141a087946f4e5ac1fea3ff25f47da8f8b0f4d322af6c9a981473a/a_pandas_ex_numexpr-0.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7cfa67301d80ba78883296c68552984ecfd440d241831289440d87c895ac0c3c",
                "md5": "66f4bf63290e128eed656743e2a2fd86",
                "sha256": "8b31c3907ae8e5117cf73615338ffdb9f549cbaf3904fdb015ad21d63dff045c"
            },
            "downloads": -1,
            "filename": "a_pandas_ex_numexpr-0.10.tar.gz",
            "has_sig": false,
            "md5_digest": "66f4bf63290e128eed656743e2a2fd86",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 26161,
            "upload_time": "2023-02-03T00:27:21",
            "upload_time_iso_8601": "2023-02-03T00:27:21.767462Z",
            "url": "https://files.pythonhosted.org/packages/7c/fa/67301d80ba78883296c68552984ecfd440d241831289440d87c895ac0c3c/a_pandas_ex_numexpr-0.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-02-03 00:27:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "hansalemaos",
    "github_project": "a_pandas_ex_numexpr",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "a-pandas-ex-numexpr"
}
        
Elapsed time: 0.05170s