a-pandas-ex-less-memory-more-speed

Name	a-pandas-ex-less-memory-more-speed JSON
Version	0.38 JSON
	download
home_page	https://github.com/hansalemaos/a_pandas_ex_less_memory_more_speed
Summary	A Python package to reduce the memory usage of pandas DataFrames. It speeds up your workflow and reduces the risk of running out of memory.
upload_time	2023-05-04 14:28:41
maintainer
docs_url	None
author	Johannes Fischer
requires_python
license	MIT
keywords	flatten pandas dict list numpy tuple tagsiter nested iterable listsoflists flattenjson iter explode squeeze nan pd.na np.nan
VCS
bugtrack_url
requirements	check_if_nan deepcopyall isiter numpy pandas tolerant_isinstance
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ## Less memory usage - more speed

A Python package to reduce the memory usage of pandas DataFrames without changing the underlying data. It speeds up your workflow and reduces the risk of running out of memory.

## Installation

```python
pip install a-pandas-ex-less-memory-more-speed
```

```python
from a_pandas_ex_less_memory_more_speed import pd_add_less_memory_more_speed
pd_add_less_memory_more_speed()
import pandas as pd
df = pd.read_csv(    "https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv",)
df.ds_reduce_memory_size()

```

## Update 2023/05/04
```python

# to carefully handle callables, iterables and other objects in cells 

df.ds_reduce_memory_size_carefully()


    Optimizes the memory usage of a pandas DataFrame or Series by converting data types and reducing memory size.

    Args:
    df_ (pd.Series | pd.DataFrame): The DataFrame or Series to be optimized.
    ignore_columns (tuple | list, optional): A tuple or list of column names to ignore during optimization. Defaults to ().
    not_allowed_to_convert (tuple | list, optional): A tuple or list of modules that should not be converted during optimization. Defaults to ("shapely",).
    allowed_to_convert (tuple | list, optional): A tuple or list of modules that are allowed to be converted during optimization. Defaults to ("pandas", "numpy").
    include_empty_iters_in_pd_na (bool, optional): If True, empty iterators will be converted to pd.NA during optimization. Defaults to False.
    include_0_len_string_in_pd_na (bool, optional): If True, zero-length strings will be converted to pd.NA during optimization. Defaults to False.
    verbose (bool, optional): If True, print information about the memory usage before and after optimization. Defaults to True.

    Returns:
    pd.DataFrame | pd.Series: The optimized DataFrame or Series.

    Raises:
    None.
    
    
```

## Update 2022/10/08

```python
#added pandas.Series.ds_optimize_int / pandas.DataFrame.ds_optimize_int
#to optimize only ints

     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
0              1         0       3  ...   7.2500   NaN         S
1              2         1       1  ...  71.2833   C85         C
2              3         1       3  ...   7.9250   NaN         S
3              4         1       1  ...  53.1000  C123         S
4              5         0       3  ...   8.0500   NaN         S
..           ...       ...     ...  ...      ...   ...       ...
886          887         0       2  ...  13.0000   NaN         S
887          888         1       1  ...  30.0000   B42         S
888          889         0       3  ...  23.4500   NaN         S
889          890         1       1  ...  30.0000  C148         C
890          891         0       3  ...   7.7500   NaN         Q
[891 rows x 12 columns]    


df.ds_optimize_int()
df.PassengerId: Using dtype: np.uint16
df.Survived: Using dtype: np.uint8
df.Pclass: Using dtype: np.uint8
df.SibSp: Using dtype: np.uint8
df.Parch: Using dtype: np.uint8
Out[7]: 
     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
0              1         0       3  ...   7.2500   NaN         S
1              2         1       1  ...  71.2833   C85         C
2              3         1       3  ...   7.9250   NaN         S
3              4         1       1  ...  53.1000  C123         S
4              5         0       3  ...   8.0500   NaN         S
..           ...       ...     ...  ...      ...   ...       ...
886          887         0       2  ...  13.0000   NaN         S
887          888         1       1  ...  30.0000   B42         S
888          889         0       3  ...  23.4500   NaN         S
889          890         1       1  ...  30.0000  C148         C
890          891         0       3  ...   7.7500   NaN         Q
```

## Usage

```python
df = pd.read_csv(    "https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv",)
from random import choice

#Let's add some more data types
truefalse = lambda: choice([True, False])
df['truefalse'] = [truefalse() for x in range(len(df))]

df['onlynan'] = pd.NA

df['nestedlists'] = [[[1]*10]] * len(df)

mixedstuff = lambda: choice([True, False, 'right', 'wrong', 1,2,23,343.555,23.444, [442,553,44], [],''])
df['mixedstuff'] =[mixedstuff() for x in range(len(df))]

floatnumbers = lambda: choice([33.44,344.42424265,15.0,3222.33])
df['floatnumbers']=[floatnumbers() for x in range(len(df))]

floatnumbers0 = lambda: choice([33.0,344.0,15.0,3222.0])
df['floatnumbers0']=[floatnumbers0() for x in range(len(df))]

intwithnan = lambda: choice([1,2,3,4,5,pd.NA])
df['intwithnan']=[intwithnan() for x in range(len(df))]


df2 = optimize_dtypes(
    dframe=df,
    point_zero_to_int=True,
    categorylimit=15,
    verbose=True,
    include_na_strings_in_pd_na=True,
    include_empty_iters_in_pd_na=True,
    include_0_len_string_in_pd_na=True,
    convert_float=True,
    check_float_difference=True,
    float_tolerance_negative=-0.1,
    float_tolerance_positive=0.1,
)
print(df)
print(df2)
print(df.dtypes)
print(df2.dtypes)


Memory usage of dataframe is: 0.12333202362060547 MB
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.PassengerId
----------------
df.PassengerId Is numeric!
df.PassengerId Max: 891
df.PassengerId Min: 1
df.PassengerId: Only .000 in columns -> Using int - Checking which size fits best ...
df.PassengerId: Using dtype: np.uint16
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.Survived
----------------
df.Survived Is numeric!
df.Survived Max: 1
df.Survived Min: 0
df.Survived: Only .000 in columns -> Using int - Checking which size fits best ...
df.Survived: Using dtype: np.uint8
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.Pclass
----------------
df.Pclass Is numeric!
df.Pclass Max: 3
df.Pclass Min: 1
df.Pclass: Only .000 in columns -> Using int - Checking which size fits best ...
df.Pclass: Using dtype: np.uint8
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.Name
----------------
df.Name: Using dtype: string
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.Sex
----------------
df.Sex: Using dtype: category
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.Age
----------------
df.Age Is numeric!
df.Age Max: 80.0
df.Age Min: 0.42
df.Age: Using dtype: Float64
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.SibSp
----------------
df.SibSp Is numeric!
df.SibSp Max: 8
df.SibSp Min: 0
df.SibSp: Only .000 in columns -> Using int - Checking which size fits best ...
df.SibSp: Using dtype: np.uint8
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.Parch
----------------
df.Parch Is numeric!
df.Parch Max: 6
df.Parch Min: 0
df.Parch: Only .000 in columns -> Using int - Checking which size fits best ...
df.Parch: Using dtype: np.uint8
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.Ticket
----------------
df.Ticket: Using dtype: string
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.Fare
----------------
df.Fare Is numeric!
df.Fare Max: 512.3292
df.Fare Min: 0.0
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Max. positive difference - limit 0.1
498   -0.05
305   -0.05
708   -0.05
Max. negative difference - limit -0.1
679    0.1708
258    0.1708
737    0.1708
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
------------- <class 'numpy.float16'> ------------- not right for df.Fare
Checking next dtype...
True -> within the desired range: 0.1 / -0.1
False      5
True     886
-------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Max. positive difference - limit 0.1
0      0.0
587    0.0
588    0.0
Max. negative difference - limit -0.1
0      0.0
598    0.0
587    0.0
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+++++++++++++ <class 'numpy.float32'> +++++++++++++ right for df.Fare
True -> within the desired range: 0.1 / -0.1
True    891
-------------------
df.Fare: Using dtype: np.float32
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.Cabin
----------------
df.Cabin: Using dtype: string
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.Embarked
----------------
df.Embarked: Using dtype: category
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.truefalse
----------------
df.truefalse: Using dtype: np.bool_
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.onlynan
----------------
df.onlynan Is numeric!
df.onlynan Max: nan
df.onlynan Min: nan
df.onlynan: Only nan in column, continue ...
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.nestedlists
----------------
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.mixedstuff
----------------
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.floatnumbers
----------------
df.floatnumbers Is numeric!
df.floatnumbers Max: 3222.33
df.floatnumbers Min: 15.0
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Max. positive difference - limit 0.1
890   -0.33
597   -0.33
592   -0.33
Max. negative difference - limit -0.1
527    0.075757
190    0.075757
171    0.075757
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
------------- <class 'numpy.float16'> ------------- not right for df.floatnumbers
Checking next dtype...
True -> within the desired range: 0.1 / -0.1
False    219
True     672
-------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Max. positive difference - limit 0.1
0      0.0
587    0.0
588    0.0
Max. negative difference - limit -0.1
0      0.0
598    0.0
587    0.0
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+++++++++++++ <class 'numpy.float32'> +++++++++++++ right for df.floatnumbers
True -> within the desired range: 0.1 / -0.1
True    891
-------------------
df.floatnumbers: Using dtype: np.float32
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.floatnumbers0
----------------
df.floatnumbers0 Is numeric!
df.floatnumbers0 Max: 3222.0
df.floatnumbers0 Min: 15.0
df.floatnumbers0: Only .000 in columns -> Using int - Checking which size fits best ...
df.floatnumbers0: Using dtype: np.uint16
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Analyzing df.intwithnan
----------------
df.intwithnan Is numeric!
df.intwithnan Max: 5
df.intwithnan Min: 1
df.intwithnan: Only .000 in columns -> Using int - Checking which size fits best ...
df.intwithnan: Using dtype: Int64
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
Memory usage of dataframe was: 0.12333202362060547 MB
Memory usage of dataframe is now: 0.07259273529052734 MB
This is  58.85959960718511 % of the initial size
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ
     PassengerId  Survived  Pclass  ... floatnumbers floatnumbers0  intwithnan
0              1         0       3  ...    33.440000          33.0           4
1              2         1       1  ...  3222.330000          15.0           5
2              3         1       3  ...    33.440000          33.0           3
3              4         1       1  ...    15.000000          33.0           1
4              5         0       3  ...    15.000000         344.0           2
..           ...       ...     ...  ...          ...           ...         ...
886          887         0       2  ...   344.424243         344.0           5
887          888         1       1  ...    15.000000          15.0           4
888          889         0       3  ...   344.424243        3222.0           2
889          890         1       1  ...   344.424243        3222.0           4
890          891         0       3  ...  3222.330000        3222.0        <NA>
[891 rows x 19 columns]
     PassengerId  Survived  Pclass  ... floatnumbers floatnumbers0  intwithnan
0              1         0       3  ...    33.439999            33           4
1              2         1       1  ...  3222.330078            15           5
2              3         1       3  ...    33.439999            33           3
3              4         1       1  ...    15.000000            33           1
4              5         0       3  ...    15.000000           344           2
..           ...       ...     ...  ...          ...           ...         ...
886          887         0       2  ...   344.424255           344           5
887          888         1       1  ...    15.000000            15           4
888          889         0       3  ...   344.424255          3222           2
889          890         1       1  ...   344.424255          3222           4
890          891         0       3  ...  3222.330078          3222        <NA>
[891 rows x 19 columns]
PassengerId        int64
Survived           int64
Pclass             int64
Name              object
Sex               object
Age              float64
SibSp              int64
Parch              int64
Ticket            object
Fare             float64
Cabin             object
Embarked          object
truefalse           bool
onlynan           object
nestedlists       object
mixedstuff        object
floatnumbers     float64
floatnumbers0    float64
intwithnan        object
dtype: object
PassengerId        uint16
Survived            uint8
Pclass              uint8
Name               string
Sex              category
Age               Float64
SibSp               uint8
Parch               uint8
Ticket             string
Fare              float32
Cabin              string
Embarked         category
truefalse            bool
onlynan            object
nestedlists        object
mixedstuff         object
floatnumbers      float32
floatnumbers0      uint16
intwithnan          Int64
dtype: object

    Parameters:
        dframe: Union[pd.Series, pd.DataFrame]
            pd.Series, pd.DataFrame
        point_zero_to_int: bool
            Convert float to int if all float numbers in the column end with .0+
            (default = True)
        categorylimit: int
            Convert strings to category, when ratio len(df) / len(df.value_counts) >= categorylimit
            (default = 4)
        verbose: bool
            Keep track of what is happening
            (default = True)
        include_na_strings_in_pd_na: bool
            When True -> treated as nan:

            [
            "<NA>",
            "<NAN>",
            "<nan>",
            "np.nan",
            "NoneType",
            "None",
            "-1.#IND",
            "1.#QNAN",
            "1.#IND",
            "-1.#QNAN",
            "#N/A N/A",
            "#N/A",
            "N/A",
            "n/a",
            "NA",
            "#NA",
            "NULL",
            "null",
            "NaN",
            "-NaN",
            "nan",
            "-nan",
            ]

            (default =True)
        include_empty_iters_in_pd_na: bool
            When True -> [], {} are treated as nan (default = False )

        include_0_len_string_in_pd_na: bool
            When True -> '' is treated as nan (default = False )
        convert_float: bool
            Don't convert columns containing float numbers.
            Comparing the 2 dataframes from the example, one can see that float numbers frequently
            don't have the exact same value as the original float number.
            If decimal digits are important for your work, disable it!
            (default=True)
        check_float_difference: bool
            If a little difference between float dtypes is fine for you, use True
            Ignored if convert_float=False
            (default=True)
        float_tolerance_negative: float

            The negative tolerance you can live with, e.g.
            3222.330078 - 3222.330000 = 0.000078 is fine for you

            Ignored if convert_float=False
            (default= 0)

        float_tolerance_positive: float = 0,
            The positive tolerance you can live with
            3222.340078 - 3222.330000 = 0.010078 is fine for you
             Ignored if convert_float=False
            (default= 0.05)

    Returns:
        Union[pd.DataFrame, pd.Series]
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hansalemaos/a_pandas_ex_less_memory_more_speed",
    "name": "a-pandas-ex-less-memory-more-speed",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "flatten,pandas,dict,list,numpy,tuple,Tagsiter,nested,iterable,listsoflists,flattenjson,iter,explode,squeeze,nan,pd.NA,np.nan",
    "author": "Johannes Fischer",
    "author_email": "aulasparticularesdealemaosp@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/13/45/53fc5be3213eea03a60d41b3149fd2e74d72fcc2662eb231385cb717860d/a_pandas_ex_less_memory_more_speed-0.38.tar.gz",
    "platform": null,
    "description": "## Less memory usage - more speed\r\n\r\nA Python package to reduce the memory usage of pandas DataFrames without changing the underlying data. It speeds up your workflow and reduces the risk of running out of memory.\r\n\r\n## Installation\r\n\r\n```python\r\npip install a-pandas-ex-less-memory-more-speed\r\n```\r\n\r\n```python\r\nfrom a_pandas_ex_less_memory_more_speed import pd_add_less_memory_more_speed\r\npd_add_less_memory_more_speed()\r\nimport pandas as pd\r\ndf = pd.read_csv(    \"https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv\",)\r\ndf.ds_reduce_memory_size()\r\n\r\n```\r\n\r\n## Update 2023/05/04\r\n```python\r\n\r\n# to carefully handle callables, iterables and other objects in cells \r\n\r\ndf.ds_reduce_memory_size_carefully()\r\n\r\n\r\n    Optimizes the memory usage of a pandas DataFrame or Series by converting data types and reducing memory size.\r\n\r\n    Args:\r\n    df_ (pd.Series | pd.DataFrame): The DataFrame or Series to be optimized.\r\n    ignore_columns (tuple | list, optional): A tuple or list of column names to ignore during optimization. Defaults to ().\r\n    not_allowed_to_convert (tuple | list, optional): A tuple or list of modules that should not be converted during optimization. Defaults to (\"shapely\",).\r\n    allowed_to_convert (tuple | list, optional): A tuple or list of modules that are allowed to be converted during optimization. Defaults to (\"pandas\", \"numpy\").\r\n    include_empty_iters_in_pd_na (bool, optional): If True, empty iterators will be converted to pd.NA during optimization. Defaults to False.\r\n    include_0_len_string_in_pd_na (bool, optional): If True, zero-length strings will be converted to pd.NA during optimization. Defaults to False.\r\n    verbose (bool, optional): If True, print information about the memory usage before and after optimization. Defaults to True.\r\n\r\n    Returns:\r\n    pd.DataFrame | pd.Series: The optimized DataFrame or Series.\r\n\r\n    Raises:\r\n    None.\r\n    \r\n    \r\n```\r\n\r\n## Update 2022/10/08\r\n\r\n```python\r\n#added pandas.Series.ds_optimize_int / pandas.DataFrame.ds_optimize_int\r\n#to optimize only ints\r\n\r\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\r\n0              1         0       3  ...   7.2500   NaN         S\r\n1              2         1       1  ...  71.2833   C85         C\r\n2              3         1       3  ...   7.9250   NaN         S\r\n3              4         1       1  ...  53.1000  C123         S\r\n4              5         0       3  ...   8.0500   NaN         S\r\n..           ...       ...     ...  ...      ...   ...       ...\r\n886          887         0       2  ...  13.0000   NaN         S\r\n887          888         1       1  ...  30.0000   B42         S\r\n888          889         0       3  ...  23.4500   NaN         S\r\n889          890         1       1  ...  30.0000  C148         C\r\n890          891         0       3  ...   7.7500   NaN         Q\r\n[891 rows x 12 columns]    \r\n\r\n\r\ndf.ds_optimize_int()\r\ndf.PassengerId: Using dtype: np.uint16\r\ndf.Survived: Using dtype: np.uint8\r\ndf.Pclass: Using dtype: np.uint8\r\ndf.SibSp: Using dtype: np.uint8\r\ndf.Parch: Using dtype: np.uint8\r\nOut[7]: \r\n     PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked\r\n0              1         0       3  ...   7.2500   NaN         S\r\n1              2         1       1  ...  71.2833   C85         C\r\n2              3         1       3  ...   7.9250   NaN         S\r\n3              4         1       1  ...  53.1000  C123         S\r\n4              5         0       3  ...   8.0500   NaN         S\r\n..           ...       ...     ...  ...      ...   ...       ...\r\n886          887         0       2  ...  13.0000   NaN         S\r\n887          888         1       1  ...  30.0000   B42         S\r\n888          889         0       3  ...  23.4500   NaN         S\r\n889          890         1       1  ...  30.0000  C148         C\r\n890          891         0       3  ...   7.7500   NaN         Q\r\n```\r\n\r\n## Usage\r\n\r\n```python\r\ndf = pd.read_csv(    \"https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv\",)\r\nfrom random import choice\r\n\r\n#Let's add some more data types\r\ntruefalse = lambda: choice([True, False])\r\ndf['truefalse'] = [truefalse() for x in range(len(df))]\r\n\r\ndf['onlynan'] = pd.NA\r\n\r\ndf['nestedlists'] = [[[1]*10]] * len(df)\r\n\r\nmixedstuff = lambda: choice([True, False, 'right', 'wrong', 1,2,23,343.555,23.444, [442,553,44], [],''])\r\ndf['mixedstuff'] =[mixedstuff() for x in range(len(df))]\r\n\r\nfloatnumbers = lambda: choice([33.44,344.42424265,15.0,3222.33])\r\ndf['floatnumbers']=[floatnumbers() for x in range(len(df))]\r\n\r\nfloatnumbers0 = lambda: choice([33.0,344.0,15.0,3222.0])\r\ndf['floatnumbers0']=[floatnumbers0() for x in range(len(df))]\r\n\r\nintwithnan = lambda: choice([1,2,3,4,5,pd.NA])\r\ndf['intwithnan']=[intwithnan() for x in range(len(df))]\r\n\r\n\r\ndf2 = optimize_dtypes(\r\n    dframe=df,\r\n    point_zero_to_int=True,\r\n    categorylimit=15,\r\n    verbose=True,\r\n    include_na_strings_in_pd_na=True,\r\n    include_empty_iters_in_pd_na=True,\r\n    include_0_len_string_in_pd_na=True,\r\n    convert_float=True,\r\n    check_float_difference=True,\r\n    float_tolerance_negative=-0.1,\r\n    float_tolerance_positive=0.1,\r\n)\r\nprint(df)\r\nprint(df2)\r\nprint(df.dtypes)\r\nprint(df2.dtypes)\r\n\r\n\r\nMemory usage of dataframe is: 0.12333202362060547 MB\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.PassengerId\r\n----------------\r\ndf.PassengerId Is numeric!\r\ndf.PassengerId Max: 891\r\ndf.PassengerId Min: 1\r\ndf.PassengerId: Only .000 in columns -> Using int - Checking which size fits best ...\r\ndf.PassengerId: Using dtype: np.uint16\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Survived\r\n----------------\r\ndf.Survived Is numeric!\r\ndf.Survived Max: 1\r\ndf.Survived Min: 0\r\ndf.Survived: Only .000 in columns -> Using int - Checking which size fits best ...\r\ndf.Survived: Using dtype: np.uint8\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Pclass\r\n----------------\r\ndf.Pclass Is numeric!\r\ndf.Pclass Max: 3\r\ndf.Pclass Min: 1\r\ndf.Pclass: Only .000 in columns -> Using int - Checking which size fits best ...\r\ndf.Pclass: Using dtype: np.uint8\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Name\r\n----------------\r\ndf.Name: Using dtype: string\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Sex\r\n----------------\r\ndf.Sex: Using dtype: category\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Age\r\n----------------\r\ndf.Age Is numeric!\r\ndf.Age Max: 80.0\r\ndf.Age Min: 0.42\r\ndf.Age: Using dtype: Float64\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.SibSp\r\n----------------\r\ndf.SibSp Is numeric!\r\ndf.SibSp Max: 8\r\ndf.SibSp Min: 0\r\ndf.SibSp: Only .000 in columns -> Using int - Checking which size fits best ...\r\ndf.SibSp: Using dtype: np.uint8\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Parch\r\n----------------\r\ndf.Parch Is numeric!\r\ndf.Parch Max: 6\r\ndf.Parch Min: 0\r\ndf.Parch: Only .000 in columns -> Using int - Checking which size fits best ...\r\ndf.Parch: Using dtype: np.uint8\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Ticket\r\n----------------\r\ndf.Ticket: Using dtype: string\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Fare\r\n----------------\r\ndf.Fare Is numeric!\r\ndf.Fare Max: 512.3292\r\ndf.Fare Min: 0.0\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\nMax. positive difference - limit 0.1\r\n498   -0.05\r\n305   -0.05\r\n708   -0.05\r\nMax. negative difference - limit -0.1\r\n679    0.1708\r\n258    0.1708\r\n737    0.1708\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\n------------- <class 'numpy.float16'> ------------- not right for df.Fare\r\nChecking next dtype...\r\nTrue -> within the desired range: 0.1 / -0.1\r\nFalse      5\r\nTrue     886\r\n-------------------\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\nMax. positive difference - limit 0.1\r\n0      0.0\r\n587    0.0\r\n588    0.0\r\nMax. negative difference - limit -0.1\r\n0      0.0\r\n598    0.0\r\n587    0.0\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\n+++++++++++++ <class 'numpy.float32'> +++++++++++++ right for df.Fare\r\nTrue -> within the desired range: 0.1 / -0.1\r\nTrue    891\r\n-------------------\r\ndf.Fare: Using dtype: np.float32\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Cabin\r\n----------------\r\ndf.Cabin: Using dtype: string\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Embarked\r\n----------------\r\ndf.Embarked: Using dtype: category\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.truefalse\r\n----------------\r\ndf.truefalse: Using dtype: np.bool_\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.onlynan\r\n----------------\r\ndf.onlynan Is numeric!\r\ndf.onlynan Max: nan\r\ndf.onlynan Min: nan\r\ndf.onlynan: Only nan in column, continue ...\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.nestedlists\r\n----------------\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.mixedstuff\r\n----------------\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.floatnumbers\r\n----------------\r\ndf.floatnumbers Is numeric!\r\ndf.floatnumbers Max: 3222.33\r\ndf.floatnumbers Min: 15.0\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\nMax. positive difference - limit 0.1\r\n890   -0.33\r\n597   -0.33\r\n592   -0.33\r\nMax. negative difference - limit -0.1\r\n527    0.075757\r\n190    0.075757\r\n171    0.075757\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\n------------- <class 'numpy.float16'> ------------- not right for df.floatnumbers\r\nChecking next dtype...\r\nTrue -> within the desired range: 0.1 / -0.1\r\nFalse    219\r\nTrue     672\r\n-------------------\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\nMax. positive difference - limit 0.1\r\n0      0.0\r\n587    0.0\r\n588    0.0\r\nMax. negative difference - limit -0.1\r\n0      0.0\r\n598    0.0\r\n587    0.0\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\n+++++++++++++ <class 'numpy.float32'> +++++++++++++ right for df.floatnumbers\r\nTrue -> within the desired range: 0.1 / -0.1\r\nTrue    891\r\n-------------------\r\ndf.floatnumbers: Using dtype: np.float32\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.floatnumbers0\r\n----------------\r\ndf.floatnumbers0 Is numeric!\r\ndf.floatnumbers0 Max: 3222.0\r\ndf.floatnumbers0 Min: 15.0\r\ndf.floatnumbers0: Only .000 in columns -> Using int - Checking which size fits best ...\r\ndf.floatnumbers0: Using dtype: np.uint16\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.intwithnan\r\n----------------\r\ndf.intwithnan Is numeric!\r\ndf.intwithnan Max: 5\r\ndf.intwithnan Min: 1\r\ndf.intwithnan: Only .000 in columns -> Using int - Checking which size fits best ...\r\ndf.intwithnan: Using dtype: Int64\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nMemory usage of dataframe was: 0.12333202362060547 MB\r\nMemory usage of dataframe is now: 0.07259273529052734 MB\r\nThis is  58.85959960718511 % of the initial size\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\n     PassengerId  Survived  Pclass  ... floatnumbers floatnumbers0  intwithnan\r\n0              1         0       3  ...    33.440000          33.0           4\r\n1              2         1       1  ...  3222.330000          15.0           5\r\n2              3         1       3  ...    33.440000          33.0           3\r\n3              4         1       1  ...    15.000000          33.0           1\r\n4              5         0       3  ...    15.000000         344.0           2\r\n..           ...       ...     ...  ...          ...           ...         ...\r\n886          887         0       2  ...   344.424243         344.0           5\r\n887          888         1       1  ...    15.000000          15.0           4\r\n888          889         0       3  ...   344.424243        3222.0           2\r\n889          890         1       1  ...   344.424243        3222.0           4\r\n890          891         0       3  ...  3222.330000        3222.0        <NA>\r\n[891 rows x 19 columns]\r\n     PassengerId  Survived  Pclass  ... floatnumbers floatnumbers0  intwithnan\r\n0              1         0       3  ...    33.439999            33           4\r\n1              2         1       1  ...  3222.330078            15           5\r\n2              3         1       3  ...    33.439999            33           3\r\n3              4         1       1  ...    15.000000            33           1\r\n4              5         0       3  ...    15.000000           344           2\r\n..           ...       ...     ...  ...          ...           ...         ...\r\n886          887         0       2  ...   344.424255           344           5\r\n887          888         1       1  ...    15.000000            15           4\r\n888          889         0       3  ...   344.424255          3222           2\r\n889          890         1       1  ...   344.424255          3222           4\r\n890          891         0       3  ...  3222.330078          3222        <NA>\r\n[891 rows x 19 columns]\r\nPassengerId        int64\r\nSurvived           int64\r\nPclass             int64\r\nName              object\r\nSex               object\r\nAge              float64\r\nSibSp              int64\r\nParch              int64\r\nTicket            object\r\nFare             float64\r\nCabin             object\r\nEmbarked          object\r\ntruefalse           bool\r\nonlynan           object\r\nnestedlists       object\r\nmixedstuff        object\r\nfloatnumbers     float64\r\nfloatnumbers0    float64\r\nintwithnan        object\r\ndtype: object\r\nPassengerId        uint16\r\nSurvived            uint8\r\nPclass              uint8\r\nName               string\r\nSex              category\r\nAge               Float64\r\nSibSp               uint8\r\nParch               uint8\r\nTicket             string\r\nFare              float32\r\nCabin              string\r\nEmbarked         category\r\ntruefalse            bool\r\nonlynan            object\r\nnestedlists        object\r\nmixedstuff         object\r\nfloatnumbers      float32\r\nfloatnumbers0      uint16\r\nintwithnan          Int64\r\ndtype: object\r\n\r\n    Parameters:\r\n        dframe: Union[pd.Series, pd.DataFrame]\r\n            pd.Series, pd.DataFrame\r\n        point_zero_to_int: bool\r\n            Convert float to int if all float numbers in the column end with .0+\r\n            (default = True)\r\n        categorylimit: int\r\n            Convert strings to category, when ratio len(df) / len(df.value_counts) >= categorylimit\r\n            (default = 4)\r\n        verbose: bool\r\n            Keep track of what is happening\r\n            (default = True)\r\n        include_na_strings_in_pd_na: bool\r\n            When True -> treated as nan:\r\n\r\n            [\r\n            \"<NA>\",\r\n            \"<NAN>\",\r\n            \"<nan>\",\r\n            \"np.nan\",\r\n            \"NoneType\",\r\n            \"None\",\r\n            \"-1.#IND\",\r\n            \"1.#QNAN\",\r\n            \"1.#IND\",\r\n            \"-1.#QNAN\",\r\n            \"#N/A N/A\",\r\n            \"#N/A\",\r\n            \"N/A\",\r\n            \"n/a\",\r\n            \"NA\",\r\n            \"#NA\",\r\n            \"NULL\",\r\n            \"null\",\r\n            \"NaN\",\r\n            \"-NaN\",\r\n            \"nan\",\r\n            \"-nan\",\r\n            ]\r\n\r\n            (default =True)\r\n        include_empty_iters_in_pd_na: bool\r\n            When True -> [], {} are treated as nan (default = False )\r\n\r\n        include_0_len_string_in_pd_na: bool\r\n            When True -> '' is treated as nan (default = False )\r\n        convert_float: bool\r\n            Don't convert columns containing float numbers.\r\n            Comparing the 2 dataframes from the example, one can see that float numbers frequently\r\n            don't have the exact same value as the original float number.\r\n            If decimal digits are important for your work, disable it!\r\n            (default=True)\r\n        check_float_difference: bool\r\n            If a little difference between float dtypes is fine for you, use True\r\n            Ignored if convert_float=False\r\n            (default=True)\r\n        float_tolerance_negative: float\r\n\r\n            The negative tolerance you can live with, e.g.\r\n            3222.330078 - 3222.330000 = 0.000078 is fine for you\r\n\r\n            Ignored if convert_float=False\r\n            (default= 0)\r\n\r\n        float_tolerance_positive: float = 0,\r\n            The positive tolerance you can live with\r\n            3222.340078 - 3222.330000 = 0.010078 is fine for you\r\n             Ignored if convert_float=False\r\n            (default= 0.05)\r\n\r\n    Returns:\r\n        Union[pd.DataFrame, pd.Series]\r\n```\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python package to reduce the memory usage of pandas DataFrames. It speeds up your workflow and reduces the risk of running out of memory.",
    "version": "0.38",
    "project_urls": {
        "Homepage": "https://github.com/hansalemaos/a_pandas_ex_less_memory_more_speed"
    },
    "split_keywords": [
        "flatten",
        "pandas",
        "dict",
        "list",
        "numpy",
        "tuple",
        "tagsiter",
        "nested",
        "iterable",
        "listsoflists",
        "flattenjson",
        "iter",
        "explode",
        "squeeze",
        "nan",
        "pd.na",
        "np.nan"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8b9cd2d50ea8cb63fa47ba76a0baebae55ab71cd82dad58befeb1df8651768d8",
                "md5": "e49f60004c712034d592e93bbca11dd5",
                "sha256": "f8b35ebad5d2154bef95293543fec15c4864d038d65246824b53daed78f1ddbb"
            },
            "downloads": -1,
            "filename": "a_pandas_ex_less_memory_more_speed-0.38-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e49f60004c712034d592e93bbca11dd5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 35832,
            "upload_time": "2023-05-04T14:28:37",
            "upload_time_iso_8601": "2023-05-04T14:28:37.058211Z",
            "url": "https://files.pythonhosted.org/packages/8b/9c/d2d50ea8cb63fa47ba76a0baebae55ab71cd82dad58befeb1df8651768d8/a_pandas_ex_less_memory_more_speed-0.38-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "134553fc5be3213eea03a60d41b3149fd2e74d72fcc2662eb231385cb717860d",
                "md5": "f682f5861937d4995ab7f23a73eec1c1",
                "sha256": "1e0da6f74a125b8a7c9f73196d053cb4b17ce3c153cd7da382cc18b5506d33cd"
            },
            "downloads": -1,
            "filename": "a_pandas_ex_less_memory_more_speed-0.38.tar.gz",
            "has_sig": false,
            "md5_digest": "f682f5861937d4995ab7f23a73eec1c1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 34820,
            "upload_time": "2023-05-04T14:28:41",
            "upload_time_iso_8601": "2023-05-04T14:28:41.859974Z",
            "url": "https://files.pythonhosted.org/packages/13/45/53fc5be3213eea03a60d41b3149fd2e74d72fcc2662eb231385cb717860d/a_pandas_ex_less_memory_more_speed-0.38.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-04 14:28:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hansalemaos",
    "github_project": "a_pandas_ex_less_memory_more_speed",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "check_if_nan",
            "specs": []
        },
        {
            "name": "deepcopyall",
            "specs": []
        },
        {
            "name": "isiter",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "tolerant_isinstance",
            "specs": []
        }
    ],
    "lcname": "a-pandas-ex-less-memory-more-speed"
}

Johannes Fischer