## Less memory usage - more speed
A Python package to reduce the memory usage of pandas DataFrames without changing the underlying data. It speeds up your workflow and reduces the risk of running out of memory.
## Installation
```python
pip install a-pandas-ex-less-memory-more-speed
```
```python
from a_pandas_ex_less_memory_more_speed import pd_add_less_memory_more_speed
pd_add_less_memory_more_speed()
import pandas as pd
df = pd.read_csv( "https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv",)
df.ds_reduce_memory_size()
```
## Update 2023/05/04
```python
# to carefully handle callables, iterables and other objects in cells
df.ds_reduce_memory_size_carefully()
Optimizes the memory usage of a pandas DataFrame or Series by converting data types and reducing memory size.
Args:
df_ (pd.Series | pd.DataFrame): The DataFrame or Series to be optimized.
ignore_columns (tuple | list, optional): A tuple or list of column names to ignore during optimization. Defaults to ().
not_allowed_to_convert (tuple | list, optional): A tuple or list of modules that should not be converted during optimization. Defaults to ("shapely",).
allowed_to_convert (tuple | list, optional): A tuple or list of modules that are allowed to be converted during optimization. Defaults to ("pandas", "numpy").
include_empty_iters_in_pd_na (bool, optional): If True, empty iterators will be converted to pd.NA during optimization. Defaults to False.
include_0_len_string_in_pd_na (bool, optional): If True, zero-length strings will be converted to pd.NA during optimization. Defaults to False.
verbose (bool, optional): If True, print information about the memory usage before and after optimization. Defaults to True.
Returns:
pd.DataFrame | pd.Series: The optimized DataFrame or Series.
Raises:
None.
```
## Update 2022/10/08
```python
#added pandas.Series.ds_optimize_int / pandas.DataFrame.ds_optimize_int
#to optimize only ints
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
.. ... ... ... ... ... ... ...
886 887 0 2 ... 13.0000 NaN S
887 888 1 1 ... 30.0000 B42 S
888 889 0 3 ... 23.4500 NaN S
889 890 1 1 ... 30.0000 C148 C
890 891 0 3 ... 7.7500 NaN Q
[891 rows x 12 columns]
df.ds_optimize_int()
df.PassengerId: Using dtype: np.uint16
df.Survived: Using dtype: np.uint8
df.Pclass: Using dtype: np.uint8
df.SibSp: Using dtype: np.uint8
df.Parch: Using dtype: np.uint8
Out[7]:
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
.. ... ... ... ... ... ... ...
886 887 0 2 ... 13.0000 NaN S
887 888 1 1 ... 30.0000 B42 S
888 889 0 3 ... 23.4500 NaN S
889 890 1 1 ... 30.0000 C148 C
890 891 0 3 ... 7.7500 NaN Q
```
## Usage
```python
df = pd.read_csv( "https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv",)
from random import choice
#Let's add some more data types
truefalse = lambda: choice([True, False])
df['truefalse'] = [truefalse() for x in range(len(df))]
df['onlynan'] = pd.NA
df['nestedlists'] = [[[1]*10]] * len(df)
mixedstuff = lambda: choice([True, False, 'right', 'wrong', 1,2,23,343.555,23.444, [442,553,44], [],''])
df['mixedstuff'] =[mixedstuff() for x in range(len(df))]
floatnumbers = lambda: choice([33.44,344.42424265,15.0,3222.33])
df['floatnumbers']=[floatnumbers() for x in range(len(df))]
floatnumbers0 = lambda: choice([33.0,344.0,15.0,3222.0])
df['floatnumbers0']=[floatnumbers0() for x in range(len(df))]
intwithnan = lambda: choice([1,2,3,4,5,pd.NA])
df['intwithnan']=[intwithnan() for x in range(len(df))]
df2 = optimize_dtypes(
dframe=df,
point_zero_to_int=True,
categorylimit=15,
verbose=True,
include_na_strings_in_pd_na=True,
include_empty_iters_in_pd_na=True,
include_0_len_string_in_pd_na=True,
convert_float=True,
check_float_difference=True,
float_tolerance_negative=-0.1,
float_tolerance_positive=0.1,
)
print(df)
print(df2)
print(df.dtypes)
print(df2.dtypes)
Memory usage of dataframe is: 0.12333202362060547 MB
█████████████████████████████
Analyzing df.PassengerId
----------------
df.PassengerId Is numeric!
df.PassengerId Max: 891
df.PassengerId Min: 1
df.PassengerId: Only .000 in columns -> Using int - Checking which size fits best ...
df.PassengerId: Using dtype: np.uint16
█████████████████████████████
Analyzing df.Survived
----------------
df.Survived Is numeric!
df.Survived Max: 1
df.Survived Min: 0
df.Survived: Only .000 in columns -> Using int - Checking which size fits best ...
df.Survived: Using dtype: np.uint8
█████████████████████████████
Analyzing df.Pclass
----------------
df.Pclass Is numeric!
df.Pclass Max: 3
df.Pclass Min: 1
df.Pclass: Only .000 in columns -> Using int - Checking which size fits best ...
df.Pclass: Using dtype: np.uint8
█████████████████████████████
Analyzing df.Name
----------------
df.Name: Using dtype: string
█████████████████████████████
Analyzing df.Sex
----------------
df.Sex: Using dtype: category
█████████████████████████████
Analyzing df.Age
----------------
df.Age Is numeric!
df.Age Max: 80.0
df.Age Min: 0.42
df.Age: Using dtype: Float64
█████████████████████████████
Analyzing df.SibSp
----------------
df.SibSp Is numeric!
df.SibSp Max: 8
df.SibSp Min: 0
df.SibSp: Only .000 in columns -> Using int - Checking which size fits best ...
df.SibSp: Using dtype: np.uint8
█████████████████████████████
Analyzing df.Parch
----------------
df.Parch Is numeric!
df.Parch Max: 6
df.Parch Min: 0
df.Parch: Only .000 in columns -> Using int - Checking which size fits best ...
df.Parch: Using dtype: np.uint8
█████████████████████████████
Analyzing df.Ticket
----------------
df.Ticket: Using dtype: string
█████████████████████████████
Analyzing df.Fare
----------------
df.Fare Is numeric!
df.Fare Max: 512.3292
df.Fare Min: 0.0
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Max. positive difference - limit 0.1
498 -0.05
305 -0.05
708 -0.05
Max. negative difference - limit -0.1
679 0.1708
258 0.1708
737 0.1708
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
------------- <class 'numpy.float16'> ------------- not right for df.Fare
Checking next dtype...
True -> within the desired range: 0.1 / -0.1
False 5
True 886
-------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Max. positive difference - limit 0.1
0 0.0
587 0.0
588 0.0
Max. negative difference - limit -0.1
0 0.0
598 0.0
587 0.0
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+++++++++++++ <class 'numpy.float32'> +++++++++++++ right for df.Fare
True -> within the desired range: 0.1 / -0.1
True 891
-------------------
df.Fare: Using dtype: np.float32
█████████████████████████████
Analyzing df.Cabin
----------------
df.Cabin: Using dtype: string
█████████████████████████████
Analyzing df.Embarked
----------------
df.Embarked: Using dtype: category
█████████████████████████████
Analyzing df.truefalse
----------------
df.truefalse: Using dtype: np.bool_
█████████████████████████████
Analyzing df.onlynan
----------------
df.onlynan Is numeric!
df.onlynan Max: nan
df.onlynan Min: nan
df.onlynan: Only nan in column, continue ...
█████████████████████████████
Analyzing df.nestedlists
----------------
█████████████████████████████
Analyzing df.mixedstuff
----------------
█████████████████████████████
Analyzing df.floatnumbers
----------------
df.floatnumbers Is numeric!
df.floatnumbers Max: 3222.33
df.floatnumbers Min: 15.0
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Max. positive difference - limit 0.1
890 -0.33
597 -0.33
592 -0.33
Max. negative difference - limit -0.1
527 0.075757
190 0.075757
171 0.075757
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
------------- <class 'numpy.float16'> ------------- not right for df.floatnumbers
Checking next dtype...
True -> within the desired range: 0.1 / -0.1
False 219
True 672
-------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Max. positive difference - limit 0.1
0 0.0
587 0.0
588 0.0
Max. negative difference - limit -0.1
0 0.0
598 0.0
587 0.0
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+++++++++++++ <class 'numpy.float32'> +++++++++++++ right for df.floatnumbers
True -> within the desired range: 0.1 / -0.1
True 891
-------------------
df.floatnumbers: Using dtype: np.float32
█████████████████████████████
Analyzing df.floatnumbers0
----------------
df.floatnumbers0 Is numeric!
df.floatnumbers0 Max: 3222.0
df.floatnumbers0 Min: 15.0
df.floatnumbers0: Only .000 in columns -> Using int - Checking which size fits best ...
df.floatnumbers0: Using dtype: np.uint16
█████████████████████████████
Analyzing df.intwithnan
----------------
df.intwithnan Is numeric!
df.intwithnan Max: 5
df.intwithnan Min: 1
df.intwithnan: Only .000 in columns -> Using int - Checking which size fits best ...
df.intwithnan: Using dtype: Int64
█████████████████████████████
Memory usage of dataframe was: 0.12333202362060547 MB
Memory usage of dataframe is now: 0.07259273529052734 MB
This is 58.85959960718511 % of the initial size
█████████████████████████████
█████████████████████████████
PassengerId Survived Pclass ... floatnumbers floatnumbers0 intwithnan
0 1 0 3 ... 33.440000 33.0 4
1 2 1 1 ... 3222.330000 15.0 5
2 3 1 3 ... 33.440000 33.0 3
3 4 1 1 ... 15.000000 33.0 1
4 5 0 3 ... 15.000000 344.0 2
.. ... ... ... ... ... ... ...
886 887 0 2 ... 344.424243 344.0 5
887 888 1 1 ... 15.000000 15.0 4
888 889 0 3 ... 344.424243 3222.0 2
889 890 1 1 ... 344.424243 3222.0 4
890 891 0 3 ... 3222.330000 3222.0 <NA>
[891 rows x 19 columns]
PassengerId Survived Pclass ... floatnumbers floatnumbers0 intwithnan
0 1 0 3 ... 33.439999 33 4
1 2 1 1 ... 3222.330078 15 5
2 3 1 3 ... 33.439999 33 3
3 4 1 1 ... 15.000000 33 1
4 5 0 3 ... 15.000000 344 2
.. ... ... ... ... ... ... ...
886 887 0 2 ... 344.424255 344 5
887 888 1 1 ... 15.000000 15 4
888 889 0 3 ... 344.424255 3222 2
889 890 1 1 ... 344.424255 3222 4
890 891 0 3 ... 3222.330078 3222 <NA>
[891 rows x 19 columns]
PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
truefalse bool
onlynan object
nestedlists object
mixedstuff object
floatnumbers float64
floatnumbers0 float64
intwithnan object
dtype: object
PassengerId uint16
Survived uint8
Pclass uint8
Name string
Sex category
Age Float64
SibSp uint8
Parch uint8
Ticket string
Fare float32
Cabin string
Embarked category
truefalse bool
onlynan object
nestedlists object
mixedstuff object
floatnumbers float32
floatnumbers0 uint16
intwithnan Int64
dtype: object
Parameters:
dframe: Union[pd.Series, pd.DataFrame]
pd.Series, pd.DataFrame
point_zero_to_int: bool
Convert float to int if all float numbers in the column end with .0+
(default = True)
categorylimit: int
Convert strings to category, when ratio len(df) / len(df.value_counts) >= categorylimit
(default = 4)
verbose: bool
Keep track of what is happening
(default = True)
include_na_strings_in_pd_na: bool
When True -> treated as nan:
[
"<NA>",
"<NAN>",
"<nan>",
"np.nan",
"NoneType",
"None",
"-1.#IND",
"1.#QNAN",
"1.#IND",
"-1.#QNAN",
"#N/A N/A",
"#N/A",
"N/A",
"n/a",
"NA",
"#NA",
"NULL",
"null",
"NaN",
"-NaN",
"nan",
"-nan",
]
(default =True)
include_empty_iters_in_pd_na: bool
When True -> [], {} are treated as nan (default = False )
include_0_len_string_in_pd_na: bool
When True -> '' is treated as nan (default = False )
convert_float: bool
Don't convert columns containing float numbers.
Comparing the 2 dataframes from the example, one can see that float numbers frequently
don't have the exact same value as the original float number.
If decimal digits are important for your work, disable it!
(default=True)
check_float_difference: bool
If a little difference between float dtypes is fine for you, use True
Ignored if convert_float=False
(default=True)
float_tolerance_negative: float
The negative tolerance you can live with, e.g.
3222.330078 - 3222.330000 = 0.000078 is fine for you
Ignored if convert_float=False
(default= 0)
float_tolerance_positive: float = 0,
The positive tolerance you can live with
3222.340078 - 3222.330000 = 0.010078 is fine for you
Ignored if convert_float=False
(default= 0.05)
Returns:
Union[pd.DataFrame, pd.Series]
```
Raw data
{
"_id": null,
"home_page": "https://github.com/hansalemaos/a_pandas_ex_less_memory_more_speed",
"name": "a-pandas-ex-less-memory-more-speed",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "flatten,pandas,dict,list,numpy,tuple,Tagsiter,nested,iterable,listsoflists,flattenjson,iter,explode,squeeze,nan,pd.NA,np.nan",
"author": "Johannes Fischer",
"author_email": "aulasparticularesdealemaosp@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/13/45/53fc5be3213eea03a60d41b3149fd2e74d72fcc2662eb231385cb717860d/a_pandas_ex_less_memory_more_speed-0.38.tar.gz",
"platform": null,
"description": "## Less memory usage - more speed\r\n\r\nA Python package to reduce the memory usage of pandas DataFrames without changing the underlying data. It speeds up your workflow and reduces the risk of running out of memory.\r\n\r\n## Installation\r\n\r\n```python\r\npip install a-pandas-ex-less-memory-more-speed\r\n```\r\n\r\n```python\r\nfrom a_pandas_ex_less_memory_more_speed import pd_add_less_memory_more_speed\r\npd_add_less_memory_more_speed()\r\nimport pandas as pd\r\ndf = pd.read_csv( \"https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv\",)\r\ndf.ds_reduce_memory_size()\r\n\r\n```\r\n\r\n## Update 2023/05/04\r\n```python\r\n\r\n# to carefully handle callables, iterables and other objects in cells \r\n\r\ndf.ds_reduce_memory_size_carefully()\r\n\r\n\r\n Optimizes the memory usage of a pandas DataFrame or Series by converting data types and reducing memory size.\r\n\r\n Args:\r\n df_ (pd.Series | pd.DataFrame): The DataFrame or Series to be optimized.\r\n ignore_columns (tuple | list, optional): A tuple or list of column names to ignore during optimization. Defaults to ().\r\n not_allowed_to_convert (tuple | list, optional): A tuple or list of modules that should not be converted during optimization. Defaults to (\"shapely\",).\r\n allowed_to_convert (tuple | list, optional): A tuple or list of modules that are allowed to be converted during optimization. Defaults to (\"pandas\", \"numpy\").\r\n include_empty_iters_in_pd_na (bool, optional): If True, empty iterators will be converted to pd.NA during optimization. Defaults to False.\r\n include_0_len_string_in_pd_na (bool, optional): If True, zero-length strings will be converted to pd.NA during optimization. Defaults to False.\r\n verbose (bool, optional): If True, print information about the memory usage before and after optimization. Defaults to True.\r\n\r\n Returns:\r\n pd.DataFrame | pd.Series: The optimized DataFrame or Series.\r\n\r\n Raises:\r\n None.\r\n \r\n \r\n```\r\n\r\n## Update 2022/10/08\r\n\r\n```python\r\n#added pandas.Series.ds_optimize_int / pandas.DataFrame.ds_optimize_int\r\n#to optimize only ints\r\n\r\n PassengerId Survived Pclass ... Fare Cabin Embarked\r\n0 1 0 3 ... 7.2500 NaN S\r\n1 2 1 1 ... 71.2833 C85 C\r\n2 3 1 3 ... 7.9250 NaN S\r\n3 4 1 1 ... 53.1000 C123 S\r\n4 5 0 3 ... 8.0500 NaN S\r\n.. ... ... ... ... ... ... ...\r\n886 887 0 2 ... 13.0000 NaN S\r\n887 888 1 1 ... 30.0000 B42 S\r\n888 889 0 3 ... 23.4500 NaN S\r\n889 890 1 1 ... 30.0000 C148 C\r\n890 891 0 3 ... 7.7500 NaN Q\r\n[891 rows x 12 columns] \r\n\r\n\r\ndf.ds_optimize_int()\r\ndf.PassengerId: Using dtype: np.uint16\r\ndf.Survived: Using dtype: np.uint8\r\ndf.Pclass: Using dtype: np.uint8\r\ndf.SibSp: Using dtype: np.uint8\r\ndf.Parch: Using dtype: np.uint8\r\nOut[7]: \r\n PassengerId Survived Pclass ... Fare Cabin Embarked\r\n0 1 0 3 ... 7.2500 NaN S\r\n1 2 1 1 ... 71.2833 C85 C\r\n2 3 1 3 ... 7.9250 NaN S\r\n3 4 1 1 ... 53.1000 C123 S\r\n4 5 0 3 ... 8.0500 NaN S\r\n.. ... ... ... ... ... ... ...\r\n886 887 0 2 ... 13.0000 NaN S\r\n887 888 1 1 ... 30.0000 B42 S\r\n888 889 0 3 ... 23.4500 NaN S\r\n889 890 1 1 ... 30.0000 C148 C\r\n890 891 0 3 ... 7.7500 NaN Q\r\n```\r\n\r\n## Usage\r\n\r\n```python\r\ndf = pd.read_csv( \"https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv\",)\r\nfrom random import choice\r\n\r\n#Let's add some more data types\r\ntruefalse = lambda: choice([True, False])\r\ndf['truefalse'] = [truefalse() for x in range(len(df))]\r\n\r\ndf['onlynan'] = pd.NA\r\n\r\ndf['nestedlists'] = [[[1]*10]] * len(df)\r\n\r\nmixedstuff = lambda: choice([True, False, 'right', 'wrong', 1,2,23,343.555,23.444, [442,553,44], [],''])\r\ndf['mixedstuff'] =[mixedstuff() for x in range(len(df))]\r\n\r\nfloatnumbers = lambda: choice([33.44,344.42424265,15.0,3222.33])\r\ndf['floatnumbers']=[floatnumbers() for x in range(len(df))]\r\n\r\nfloatnumbers0 = lambda: choice([33.0,344.0,15.0,3222.0])\r\ndf['floatnumbers0']=[floatnumbers0() for x in range(len(df))]\r\n\r\nintwithnan = lambda: choice([1,2,3,4,5,pd.NA])\r\ndf['intwithnan']=[intwithnan() for x in range(len(df))]\r\n\r\n\r\ndf2 = optimize_dtypes(\r\n dframe=df,\r\n point_zero_to_int=True,\r\n categorylimit=15,\r\n verbose=True,\r\n include_na_strings_in_pd_na=True,\r\n include_empty_iters_in_pd_na=True,\r\n include_0_len_string_in_pd_na=True,\r\n convert_float=True,\r\n check_float_difference=True,\r\n float_tolerance_negative=-0.1,\r\n float_tolerance_positive=0.1,\r\n)\r\nprint(df)\r\nprint(df2)\r\nprint(df.dtypes)\r\nprint(df2.dtypes)\r\n\r\n\r\nMemory usage of dataframe is: 0.12333202362060547 MB\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.PassengerId\r\n----------------\r\ndf.PassengerId Is numeric!\r\ndf.PassengerId Max: 891\r\ndf.PassengerId Min: 1\r\ndf.PassengerId: Only .000 in columns -> Using int - Checking which size fits best ...\r\ndf.PassengerId: Using dtype: np.uint16\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Survived\r\n----------------\r\ndf.Survived Is numeric!\r\ndf.Survived Max: 1\r\ndf.Survived Min: 0\r\ndf.Survived: Only .000 in columns -> Using int - Checking which size fits best ...\r\ndf.Survived: Using dtype: np.uint8\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Pclass\r\n----------------\r\ndf.Pclass Is numeric!\r\ndf.Pclass Max: 3\r\ndf.Pclass Min: 1\r\ndf.Pclass: Only .000 in columns -> Using int - Checking which size fits best ...\r\ndf.Pclass: Using dtype: np.uint8\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Name\r\n----------------\r\ndf.Name: Using dtype: string\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Sex\r\n----------------\r\ndf.Sex: Using dtype: category\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Age\r\n----------------\r\ndf.Age Is numeric!\r\ndf.Age Max: 80.0\r\ndf.Age Min: 0.42\r\ndf.Age: Using dtype: Float64\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.SibSp\r\n----------------\r\ndf.SibSp Is numeric!\r\ndf.SibSp Max: 8\r\ndf.SibSp Min: 0\r\ndf.SibSp: Only .000 in columns -> Using int - Checking which size fits best ...\r\ndf.SibSp: Using dtype: np.uint8\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Parch\r\n----------------\r\ndf.Parch Is numeric!\r\ndf.Parch Max: 6\r\ndf.Parch Min: 0\r\ndf.Parch: Only .000 in columns -> Using int - Checking which size fits best ...\r\ndf.Parch: Using dtype: np.uint8\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Ticket\r\n----------------\r\ndf.Ticket: Using dtype: string\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Fare\r\n----------------\r\ndf.Fare Is numeric!\r\ndf.Fare Max: 512.3292\r\ndf.Fare Min: 0.0\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\nMax. positive difference - limit 0.1\r\n498 -0.05\r\n305 -0.05\r\n708 -0.05\r\nMax. negative difference - limit -0.1\r\n679 0.1708\r\n258 0.1708\r\n737 0.1708\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\n------------- <class 'numpy.float16'> ------------- not right for df.Fare\r\nChecking next dtype...\r\nTrue -> within the desired range: 0.1 / -0.1\r\nFalse 5\r\nTrue 886\r\n-------------------\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\nMax. positive difference - limit 0.1\r\n0 0.0\r\n587 0.0\r\n588 0.0\r\nMax. negative difference - limit -0.1\r\n0 0.0\r\n598 0.0\r\n587 0.0\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\n+++++++++++++ <class 'numpy.float32'> +++++++++++++ right for df.Fare\r\nTrue -> within the desired range: 0.1 / -0.1\r\nTrue 891\r\n-------------------\r\ndf.Fare: Using dtype: np.float32\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Cabin\r\n----------------\r\ndf.Cabin: Using dtype: string\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.Embarked\r\n----------------\r\ndf.Embarked: Using dtype: category\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.truefalse\r\n----------------\r\ndf.truefalse: Using dtype: np.bool_\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.onlynan\r\n----------------\r\ndf.onlynan Is numeric!\r\ndf.onlynan Max: nan\r\ndf.onlynan Min: nan\r\ndf.onlynan: Only nan in column, continue ...\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.nestedlists\r\n----------------\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.mixedstuff\r\n----------------\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.floatnumbers\r\n----------------\r\ndf.floatnumbers Is numeric!\r\ndf.floatnumbers Max: 3222.33\r\ndf.floatnumbers Min: 15.0\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\nMax. positive difference - limit 0.1\r\n890 -0.33\r\n597 -0.33\r\n592 -0.33\r\nMax. negative difference - limit -0.1\r\n527 0.075757\r\n190 0.075757\r\n171 0.075757\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\n------------- <class 'numpy.float16'> ------------- not right for df.floatnumbers\r\nChecking next dtype...\r\nTrue -> within the desired range: 0.1 / -0.1\r\nFalse 219\r\nTrue 672\r\n-------------------\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\nMax. positive difference - limit 0.1\r\n0 0.0\r\n587 0.0\r\n588 0.0\r\nMax. negative difference - limit -0.1\r\n0 0.0\r\n598 0.0\r\n587 0.0\r\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\r\n+++++++++++++ <class 'numpy.float32'> +++++++++++++ right for df.floatnumbers\r\nTrue -> within the desired range: 0.1 / -0.1\r\nTrue 891\r\n-------------------\r\ndf.floatnumbers: Using dtype: np.float32\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.floatnumbers0\r\n----------------\r\ndf.floatnumbers0 Is numeric!\r\ndf.floatnumbers0 Max: 3222.0\r\ndf.floatnumbers0 Min: 15.0\r\ndf.floatnumbers0: Only .000 in columns -> Using int - Checking which size fits best ...\r\ndf.floatnumbers0: Using dtype: np.uint16\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nAnalyzing df.intwithnan\r\n----------------\r\ndf.intwithnan Is numeric!\r\ndf.intwithnan Max: 5\r\ndf.intwithnan Min: 1\r\ndf.intwithnan: Only .000 in columns -> Using int - Checking which size fits best ...\r\ndf.intwithnan: Using dtype: Int64\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\nMemory usage of dataframe was: 0.12333202362060547 MB\r\nMemory usage of dataframe is now: 0.07259273529052734 MB\r\nThis is 58.85959960718511 % of the initial size\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\n\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\u00e2\u2013\u02c6\r\n PassengerId Survived Pclass ... floatnumbers floatnumbers0 intwithnan\r\n0 1 0 3 ... 33.440000 33.0 4\r\n1 2 1 1 ... 3222.330000 15.0 5\r\n2 3 1 3 ... 33.440000 33.0 3\r\n3 4 1 1 ... 15.000000 33.0 1\r\n4 5 0 3 ... 15.000000 344.0 2\r\n.. ... ... ... ... ... ... ...\r\n886 887 0 2 ... 344.424243 344.0 5\r\n887 888 1 1 ... 15.000000 15.0 4\r\n888 889 0 3 ... 344.424243 3222.0 2\r\n889 890 1 1 ... 344.424243 3222.0 4\r\n890 891 0 3 ... 3222.330000 3222.0 <NA>\r\n[891 rows x 19 columns]\r\n PassengerId Survived Pclass ... floatnumbers floatnumbers0 intwithnan\r\n0 1 0 3 ... 33.439999 33 4\r\n1 2 1 1 ... 3222.330078 15 5\r\n2 3 1 3 ... 33.439999 33 3\r\n3 4 1 1 ... 15.000000 33 1\r\n4 5 0 3 ... 15.000000 344 2\r\n.. ... ... ... ... ... ... ...\r\n886 887 0 2 ... 344.424255 344 5\r\n887 888 1 1 ... 15.000000 15 4\r\n888 889 0 3 ... 344.424255 3222 2\r\n889 890 1 1 ... 344.424255 3222 4\r\n890 891 0 3 ... 3222.330078 3222 <NA>\r\n[891 rows x 19 columns]\r\nPassengerId int64\r\nSurvived int64\r\nPclass int64\r\nName object\r\nSex object\r\nAge float64\r\nSibSp int64\r\nParch int64\r\nTicket object\r\nFare float64\r\nCabin object\r\nEmbarked object\r\ntruefalse bool\r\nonlynan object\r\nnestedlists object\r\nmixedstuff object\r\nfloatnumbers float64\r\nfloatnumbers0 float64\r\nintwithnan object\r\ndtype: object\r\nPassengerId uint16\r\nSurvived uint8\r\nPclass uint8\r\nName string\r\nSex category\r\nAge Float64\r\nSibSp uint8\r\nParch uint8\r\nTicket string\r\nFare float32\r\nCabin string\r\nEmbarked category\r\ntruefalse bool\r\nonlynan object\r\nnestedlists object\r\nmixedstuff object\r\nfloatnumbers float32\r\nfloatnumbers0 uint16\r\nintwithnan Int64\r\ndtype: object\r\n\r\n Parameters:\r\n dframe: Union[pd.Series, pd.DataFrame]\r\n pd.Series, pd.DataFrame\r\n point_zero_to_int: bool\r\n Convert float to int if all float numbers in the column end with .0+\r\n (default = True)\r\n categorylimit: int\r\n Convert strings to category, when ratio len(df) / len(df.value_counts) >= categorylimit\r\n (default = 4)\r\n verbose: bool\r\n Keep track of what is happening\r\n (default = True)\r\n include_na_strings_in_pd_na: bool\r\n When True -> treated as nan:\r\n\r\n [\r\n \"<NA>\",\r\n \"<NAN>\",\r\n \"<nan>\",\r\n \"np.nan\",\r\n \"NoneType\",\r\n \"None\",\r\n \"-1.#IND\",\r\n \"1.#QNAN\",\r\n \"1.#IND\",\r\n \"-1.#QNAN\",\r\n \"#N/A N/A\",\r\n \"#N/A\",\r\n \"N/A\",\r\n \"n/a\",\r\n \"NA\",\r\n \"#NA\",\r\n \"NULL\",\r\n \"null\",\r\n \"NaN\",\r\n \"-NaN\",\r\n \"nan\",\r\n \"-nan\",\r\n ]\r\n\r\n (default =True)\r\n include_empty_iters_in_pd_na: bool\r\n When True -> [], {} are treated as nan (default = False )\r\n\r\n include_0_len_string_in_pd_na: bool\r\n When True -> '' is treated as nan (default = False )\r\n convert_float: bool\r\n Don't convert columns containing float numbers.\r\n Comparing the 2 dataframes from the example, one can see that float numbers frequently\r\n don't have the exact same value as the original float number.\r\n If decimal digits are important for your work, disable it!\r\n (default=True)\r\n check_float_difference: bool\r\n If a little difference between float dtypes is fine for you, use True\r\n Ignored if convert_float=False\r\n (default=True)\r\n float_tolerance_negative: float\r\n\r\n The negative tolerance you can live with, e.g.\r\n 3222.330078 - 3222.330000 = 0.000078 is fine for you\r\n\r\n Ignored if convert_float=False\r\n (default= 0)\r\n\r\n float_tolerance_positive: float = 0,\r\n The positive tolerance you can live with\r\n 3222.340078 - 3222.330000 = 0.010078 is fine for you\r\n Ignored if convert_float=False\r\n (default= 0.05)\r\n\r\n Returns:\r\n Union[pd.DataFrame, pd.Series]\r\n```\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python package to reduce the memory usage of pandas DataFrames. It speeds up your workflow and reduces the risk of running out of memory.",
"version": "0.38",
"project_urls": {
"Homepage": "https://github.com/hansalemaos/a_pandas_ex_less_memory_more_speed"
},
"split_keywords": [
"flatten",
"pandas",
"dict",
"list",
"numpy",
"tuple",
"tagsiter",
"nested",
"iterable",
"listsoflists",
"flattenjson",
"iter",
"explode",
"squeeze",
"nan",
"pd.na",
"np.nan"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8b9cd2d50ea8cb63fa47ba76a0baebae55ab71cd82dad58befeb1df8651768d8",
"md5": "e49f60004c712034d592e93bbca11dd5",
"sha256": "f8b35ebad5d2154bef95293543fec15c4864d038d65246824b53daed78f1ddbb"
},
"downloads": -1,
"filename": "a_pandas_ex_less_memory_more_speed-0.38-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e49f60004c712034d592e93bbca11dd5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 35832,
"upload_time": "2023-05-04T14:28:37",
"upload_time_iso_8601": "2023-05-04T14:28:37.058211Z",
"url": "https://files.pythonhosted.org/packages/8b/9c/d2d50ea8cb63fa47ba76a0baebae55ab71cd82dad58befeb1df8651768d8/a_pandas_ex_less_memory_more_speed-0.38-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "134553fc5be3213eea03a60d41b3149fd2e74d72fcc2662eb231385cb717860d",
"md5": "f682f5861937d4995ab7f23a73eec1c1",
"sha256": "1e0da6f74a125b8a7c9f73196d053cb4b17ce3c153cd7da382cc18b5506d33cd"
},
"downloads": -1,
"filename": "a_pandas_ex_less_memory_more_speed-0.38.tar.gz",
"has_sig": false,
"md5_digest": "f682f5861937d4995ab7f23a73eec1c1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 34820,
"upload_time": "2023-05-04T14:28:41",
"upload_time_iso_8601": "2023-05-04T14:28:41.859974Z",
"url": "https://files.pythonhosted.org/packages/13/45/53fc5be3213eea03a60d41b3149fd2e74d72fcc2662eb231385cb717860d/a_pandas_ex_less_memory_more_speed-0.38.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-04 14:28:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hansalemaos",
"github_project": "a_pandas_ex_less_memory_more_speed",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "check_if_nan",
"specs": []
},
{
"name": "deepcopyall",
"specs": []
},
{
"name": "isiter",
"specs": []
},
{
"name": "numpy",
"specs": []
},
{
"name": "pandas",
"specs": []
},
{
"name": "tolerant_isinstance",
"specs": []
}
],
"lcname": "a-pandas-ex-less-memory-more-speed"
}