# spinesUtils
*Dedicated to helping users do more in less time.*
<big><i><b>spinesUtils</b></i></big>
is a user-friendly toolkit for the machine learning ecosystem, offering ready-to-use features such as
- [x] Logging functionality
- [x] Type checking and parameter generation
- [x] CSV file reading acceleration
- [x] Classifiers for imbalanced data
- [x] Pandas Dataframe data compression
- [x] Pandas DataFrame insight tools
- [x] Large data training and testing set splitting functions
- [x] An intuitive timer.
It is currently undergoing rapid iteration. If you encounter any issues with its functionalities, feel free to raise an issue.
# Installation
You can install spinesUtils from PyPI:
```bash
pip install spinesUtils
```
# Logger
You can use the Logger class to print your logs without worrying about handler conflicts with the native Python logging module.
This class provides log/debug/info/warning/error/critical methods, where debug/info/warning/error/critical are partial versions of the log method, available for use as needed.
```python
# load spinesUtils module
from spinesUtils.logging import Logger
# create a logger instance, with name "MyLogger", and no file handler, the default level is "INFO"
# You can specify a file path `fp` during instantiation. If not specified, logs will not be written to a file.
logger = Logger(name="MyLogger", fp=None, level="DEBUG")
logger.log("This is an info log emitted by the log function.", level='INFO')
logger.debug("This is an debug message")
logger.info("This is an info message.")
logger.warning("This is an warning message.")
logger.error("This is an error message.")
logger.critical("This is an critical message.")
```
2024-01-19 15:02:51 - MyLogger - INFO - This is an info log emitted by the log function.
2024-01-19 15:02:51 - MyLogger - DEBUG - This is an debug message
2024-01-19 15:02:51 - MyLogger - INFO - This is an info message.
2024-01-19 15:02:51 - MyLogger - WARNING - This is an warning message.
2024-01-19 15:02:51 - MyLogger - ERROR - This is an error message.
2024-01-19 15:02:51 - MyLogger - CRITICAL - This is an critical message.
## Type checking and parameter generation
```python
from spinesUtils.asserts import *
# check parameter type
@ParameterTypeAssert({
'a': (int, float),
'b': (int, float)
})
def add(a, b):
pass
# try to pass a string to the function, and it will raise an ParametersTypeError error
add(a=1, b='2')
```
---------------------------------------------------------------------------
ParametersTypeError Traceback (most recent call last)
Cell In[2], line 12
9 pass
11 # try to pass a string to the function, and it will raise an ParametersTypeError error
---> 12 add(a=1, b='2')
File ~/projects/spinesUtils/spinesUtils/asserts/_inspect.py:196, in ParameterTypeAssert.__call__.<locals>.wrapper(*args, **kwargs)
194 if mismatched_params:
195 error_msg = self.build_type_error_msg(mismatched_params)
--> 196 raise ParametersTypeError(error_msg)
198 return func(**kwargs)
ParametersTypeError: Function 'add' parameter(s) type mismatch: b only accept '['int', 'float']' type.
```python
# check parameter value
@ParameterValuesAssert({
'a': lambda x: x > 0,
'b': lambda x: x > 0
})
def add(a, b):
pass
# try to pass a negative number to the function, and it will raise an ParametersValueError error
add(a=1, b=-2)
```
---------------------------------------------------------------------------
ParametersValueError Traceback (most recent call last)
Cell In[3], line 10
7 pass
9 # try to pass a negative number to the function, and it will raise an ParametersValueError error
---> 10 add(a=1, b=-2)
File ~/projects/spinesUtils/spinesUtils/asserts/_inspect.py:258, in ParameterValuesAssert.__call__.<locals>.wrapper(*args, **kwargs)
256 if mismatched_params:
257 error_msg = self.build_values_error_msg(mismatched_params)
--> 258 raise ParametersValueError(error_msg)
260 return func(**kwargs)
ParametersValueError: Function 'add' parameter(s) values mismatch: `b` must in or satisfy ''b': lambda x: x > 0' condition(s).
```python
# generate a dictionary of keyword arguments for a given function using provided arguments
generate_function_kwargs(add, a=1, b=2)
```
{'a': 1, 'b': 2}
```python
# isinstance function with support for None
augmented_isinstance(1, (int, float, None))
```
True
```python
# raise_if and raise_if_not functions
raise_if(ValueError, 1 == 1, "test raise_if")
```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[6], line 2
1 # raise_if and raise_if_not functions
----> 2 raise_if(ValueError, 1 == 1, "test raise_if")
File ~/projects/spinesUtils/spinesUtils/asserts/_type_and_exceptions.py:115, in raise_if(exception, condition, error_msg)
112 assert issubclass(exception, BaseException), "Exception must be a subclass of BaseException."
114 if condition:
--> 115 raise exception(error_msg)
ValueError: test raise_if
```python
raise_if_not(ZeroDivisionError, 1 != 1, "test raise_if_not")
```
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
Cell In[7], line 1
----> 1 raise_if_not(ZeroDivisionError, 1 != 1, "test raise_if_not")
File ~/projects/spinesUtils/spinesUtils/asserts/_type_and_exceptions.py:144, in raise_if_not(exception, condition, error_msg)
141 assert issubclass(exception, BaseException), "Exception must be a subclass of BaseException."
143 if not condition:
--> 144 raise exception(error_msg)
ZeroDivisionError: test raise_if_not
## Faster csv reader
```python
from spinesUtils import read_csv
your_df = read_csv(
fp='/path/to/your/file.csv',
sep=',', # equal to pandas read_csv.sep
turbo_method='polars', # use turbo_method to speed up load time
chunk_size=None, # it can be integer if you want to use pandas backend
transform2low_mem=True, # it can compresses file to save more memory
verbose=False
)
```
## Classifiers for imbalanced data
```python
from spinesUtils.models import MultiClassBalanceClassifier
```
```python
# make a toy dataset
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
dataset = make_classification(
n_samples=10000,
n_features=2,
n_informative=2,
n_redundant=0,
n_repeated=0,
n_classes=3,
n_clusters_per_class=1,
weights=[0.01, 0.05, 0.94],
class_sep=0.8,
random_state=0
)
X, y = dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
```
```python
from sklearn.ensemble import RandomForestClassifier
classifier = MultiClassBalanceClassifier(
base_estimator=RandomForestClassifier(n_estimators=100),
n_classes=3,
random_state=0,
verbose=0
)
# fit the classifier
classifier.fit(X_train, y_train)
# predict
y_pred = classifier.predict(X_test)
# print classification report
print(classification_report(y_test, y_pred))
```
precision recall f1-score support
0 0.74 0.72 0.73 32
1 0.91 0.71 0.80 111
2 0.98 1.00 0.99 1857
accuracy 0.98 2000
macro avg 0.88 0.81 0.84 2000
weighted avg 0.98 0.98 0.98 2000
## Pandas dataframe data compression
```python
# make a toy dataset
import pandas as pd
import numpy as np
df = pd.DataFrame({
'a': np.random.randint(0, 100, 100000),
'b': np.random.randint(0, 100, 100000),
'c': np.random.randint(0, 100, 100000),
'd': np.random.randint(0, 100, 100000),
'e': np.random.randint(0, 100, 100000),
'f': np.random.randint(0, 100, 100000),
'g': np.random.randint(0, 100, 100000),
'h': np.random.randint(0, 100, 100000),
'i': np.random.randint(0, 100, 100000),
'j': np.random.randint(0, 100, 100000),
'k': np.random.randint(0, 100, 100000),
'l': np.random.randint(0, 100, 100000),
'm': np.random.randint(0, 100, 100000),
'n': np.random.randint(0, 100, 100000),
'o': np.random.randint(0, 100, 100000),
'p': np.random.randint(0, 100, 100000),
'q': np.random.randint(0, 100, 100000),
'r': np.random.randint(0, 100, 100000),
's': np.random.randint(0, 100, 100000),
't': np.random.randint(0, 100, 100000),
'u': np.random.randint(0, 100, 100000),
'v': np.random.randint(0, 100, 100000),
'w': np.random.randint(0, 100, 100000),
'x': np.random.randint(0, 100, 100000),
'y': np.random.randint(0, 100, 100000),
'z': np.random.randint(0, 100, 100000),
})
# compress dataframe
from spinesUtils import transform_dtypes_low_mem
transform_dtypes_low_mem(df, verbose=True, inplace=True)
```
Converting ...: 0%| | 0/26 [00:00<?, ?it/s]
[log] INFO - Memory usage before conversion is: 19.84 MB
[log] INFO - Memory usage after conversion is: 2.48 MB
[log] INFO - After conversion, the percentage of memory fluctuation is 87.5 %
```python
# batch compress dataframes
from spinesUtils import transform_batch_dtypes_low_mem
# make some toy datasets
df1 = pd.DataFrame({
'a': np.random.randint(0, 100, 100000),
'b': np.random.randint(0, 100, 100000),
'c': np.random.randint(0, 100, 100000),
'd': np.random.randint(0, 100, 100000),
'e': np.random.randint(0, 100, 100000),
'f': np.random.randint(0, 100, 100000),
'g': np.random.randint(0, 100, 100000),
'h': np.random.randint(0, 100, 100000),
'i': np.random.randint(0, 100, 100000),
'j': np.random.randint(0, 100, 100000),
'k': np.random.randint(0, 100, 100000),
'l': np.random.randint(0, 100, 100000),
'm': np.random.randint(0, 100, 100000),
'n': np.random.randint(0, 100, 100000),
'o': np.random.randint(0, 100, 100000),
'p': np.random.randint(0, 100, 100000),
'q': np.random.randint(0, 100, 100000),
'r': np.random.randint(0, 100, 100000),
's': np.random.randint(0, 100, 100000),
't': np.random.randint(0, 100, 100000),
'u': np.random.randint(0, 100, 100000),
'v': np.random.randint(0, 100, 100000),
'w': np.random.randint(0, 100, 100000),
'x': np.random.randint(0, 100, 100000),
'y': np.random.randint(0, 100, 100000),
'z': np.random.randint(0, 100, 100000),
})
df2 = df1.copy()
df3 = df1.copy()
df4 = df1.copy()
# batch compress dataframes
transform_batch_dtypes_low_mem([df1, df2, df3, df4], verbose=True, inplace=True)
```
Batch converting ...: 0%| | 0/4 [00:00<?, ?it/s]
[log] INFO - Memory usage before conversion is: 79.35 MB
[log] INFO - Memory usage after conversion is: 9.92 MB
[log] INFO - After conversion, the percentage of memory fluctuation is 87.5 %
## Pandas DataFrame insight tools
```python
from spinesUtils import df_preview, classify_samples_dist
# make a toy dataset
import pandas as pd
import numpy as np
df = pd.DataFrame({
'a': np.random.randint(0, 100, 100000),
'b': np.random.randint(0, 100, 100000),
'c': np.random.randint(0, 100, 100000),
'd': np.random.randint(0, 100, 100000),
'e': np.random.randint(0, 100, 100000),
'f': np.random.randint(0, 100, 100000),
'g': np.random.randint(0, 100, 100000),
'h': np.random.randint(0, 100, 100000),
'i': np.random.randint(0, 100, 100000),
'j': np.random.randint(0, 100, 100000),
'k': np.random.randint(0, 100, 100000),
'l': np.random.randint(0, 100, 100000),
'm': np.random.randint(0, 100, 100000),
'n': np.random.randint(0, 100, 100000),
'o': np.random.randint(0, 100, 100000),
'p': np.random.randint(0, 100, 100000),
'q': np.random.randint(0, 100, 100000),
'r': np.random.randint(0, 100, 100000),
's': np.random.randint(0, 100, 100000),
't': np.random.randint(0, 100, 100000),
'u': np.random.randint(0, 100, 100000),
'v': np.random.randint(0, 100, 100000),
'w': np.random.randint(0, 100, 100000),
'x': np.random.randint(0, 100, 100000),
'y': np.random.randint(0, 100, 100000),
'z': np.random.randint(0, 100, 100000),
})
df_insight = df_preview(df)
df_insight
```
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>total</th>
<th>na</th>
<th>naPercent</th>
<th>nunique</th>
<th>dtype</th>
<th>max</th>
<th>75%</th>
<th>median</th>
<th>25%</th>
<th>min</th>
<th>mean</th>
<th>mode</th>
<th>variation</th>
<th>std</th>
<th>skew</th>
<th>kurt</th>
<th>samples</th>
</tr>
</thead>
<tbody>
<tr>
<th>a</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>74.0</td>
<td>50.0</td>
<td>25.0</td>
<td>0.0</td>
<td>49.53968</td>
<td>36</td>
<td>0.9892</td>
<td>28.848392</td>
<td>-0.000158</td>
<td>-1.196434</td>
<td>(32, 81)</td>
</tr>
<tr>
<th>b</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>75.0</td>
<td>49.0</td>
<td>24.0</td>
<td>0.0</td>
<td>49.41822</td>
<td>40</td>
<td>0.98928</td>
<td>28.937601</td>
<td>0.005974</td>
<td>-1.206987</td>
<td>(76, 28)</td>
</tr>
<tr>
<th>c</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>75.0</td>
<td>50.0</td>
<td>25.0</td>
<td>0.0</td>
<td>49.58261</td>
<td>82</td>
<td>0.98923</td>
<td>28.928019</td>
<td>-0.003537</td>
<td>-1.202994</td>
<td>(21, 68)</td>
</tr>
<tr>
<th>d</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>75.0</td>
<td>49.0</td>
<td>24.0</td>
<td>0.0</td>
<td>49.46308</td>
<td>9</td>
<td>0.98906</td>
<td>28.886459</td>
<td>0.003344</td>
<td>-1.200654</td>
<td>(42, 90)</td>
</tr>
<tr>
<th>e</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>75.0</td>
<td>49.0</td>
<td>25.0</td>
<td>0.0</td>
<td>49.55014</td>
<td>37</td>
<td>0.98911</td>
<td>28.834041</td>
<td>0.003987</td>
<td>-1.196103</td>
<td>(15, 59)</td>
</tr>
<tr>
<th>f</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>74.0</td>
<td>49.0</td>
<td>24.0</td>
<td>0.0</td>
<td>49.20195</td>
<td>4</td>
<td>0.98926</td>
<td>28.886463</td>
<td>0.009183</td>
<td>-1.203297</td>
<td>(72, 9)</td>
</tr>
<tr>
<th>g</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>75.0</td>
<td>50.0</td>
<td>25.0</td>
<td>0.0</td>
<td>49.62199</td>
<td>4</td>
<td>0.98919</td>
<td>28.849264</td>
<td>-0.012746</td>
<td>-1.199283</td>
<td>(69, 64)</td>
</tr>
<tr>
<th>h</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>75.0</td>
<td>50.0</td>
<td>25.0</td>
<td>0.0</td>
<td>49.58739</td>
<td>40</td>
<td>0.98917</td>
<td>28.83744</td>
<td>-0.004719</td>
<td>-1.193858</td>
<td>(30, 79)</td>
</tr>
<tr>
<th>i</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>75.0</td>
<td>49.0</td>
<td>24.0</td>
<td>0.0</td>
<td>49.41076</td>
<td>10</td>
<td>0.98939</td>
<td>28.910095</td>
<td>0.005218</td>
<td>-1.207459</td>
<td>(36, 54)</td>
</tr>
<tr>
<th>j</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>74.0</td>
<td>49.0</td>
<td>25.0</td>
<td>0.0</td>
<td>49.45686</td>
<td>46</td>
<td>0.98909</td>
<td>28.816681</td>
<td>0.004751</td>
<td>-1.190756</td>
<td>(29, 95)</td>
</tr>
<tr>
<th>k</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>74.0</td>
<td>50.0</td>
<td>25.0</td>
<td>0.0</td>
<td>49.54948</td>
<td>46</td>
<td>0.98914</td>
<td>28.806187</td>
<td>-0.003731</td>
<td>-1.196876</td>
<td>(32, 94)</td>
</tr>
<tr>
<th>l</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>74.0</td>
<td>49.0</td>
<td>24.0</td>
<td>0.0</td>
<td>49.45631</td>
<td>20</td>
<td>0.98923</td>
<td>28.921314</td>
<td>0.002344</td>
<td>-1.205342</td>
<td>(22, 91)</td>
</tr>
<tr>
<th>m</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>74.0</td>
<td>49.0</td>
<td>24.0</td>
<td>0.0</td>
<td>49.43142</td>
<td>49</td>
<td>0.98901</td>
<td>28.852962</td>
<td>0.002507</td>
<td>-1.198267</td>
<td>(94, 26)</td>
</tr>
<tr>
<th>n</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>75.0</td>
<td>50.0</td>
<td>24.0</td>
<td>0.0</td>
<td>49.49325</td>
<td>8</td>
<td>0.98931</td>
<td>28.899022</td>
<td>0.000698</td>
<td>-1.200786</td>
<td>(46, 50)</td>
</tr>
<tr>
<th>o</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>75.0</td>
<td>50.0</td>
<td>25.0</td>
<td>0.0</td>
<td>49.52091</td>
<td>4</td>
<td>0.98923</td>
<td>28.869563</td>
<td>-0.003987</td>
<td>-1.202426</td>
<td>(33, 13)</td>
</tr>
<tr>
<th>p</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>74.0</td>
<td>49.0</td>
<td>24.0</td>
<td>0.0</td>
<td>49.40997</td>
<td>61</td>
<td>0.98918</td>
<td>28.900207</td>
<td>0.007921</td>
<td>-1.204621</td>
<td>(58, 93)</td>
</tr>
<tr>
<th>q</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>75.0</td>
<td>50.0</td>
<td>25.0</td>
<td>0.0</td>
<td>49.62826</td>
<td>33</td>
<td>0.98936</td>
<td>28.831896</td>
<td>-0.003291</td>
<td>-1.201172</td>
<td>(82, 31)</td>
</tr>
<tr>
<th>r</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>75.0</td>
<td>50.0</td>
<td>24.0</td>
<td>0.0</td>
<td>49.47208</td>
<td>60</td>
<td>0.98925</td>
<td>28.873943</td>
<td>0.000515</td>
<td>-1.202925</td>
<td>(0, 26)</td>
</tr>
<tr>
<th>s</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>75.0</td>
<td>50.0</td>
<td>25.0</td>
<td>0.0</td>
<td>49.64847</td>
<td>48</td>
<td>0.9893</td>
<td>28.853741</td>
<td>-0.010258</td>
<td>-1.202701</td>
<td>(94, 37)</td>
</tr>
<tr>
<th>t</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>74.0</td>
<td>50.0</td>
<td>25.0</td>
<td>0.0</td>
<td>49.55305</td>
<td>32</td>
<td>0.98898</td>
<td>28.801028</td>
<td>-0.001721</td>
<td>-1.193403</td>
<td>(85, 10)</td>
</tr>
<tr>
<th>u</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>74.0</td>
<td>49.0</td>
<td>24.0</td>
<td>0.0</td>
<td>49.45428</td>
<td>80</td>
<td>0.98928</td>
<td>28.876812</td>
<td>0.002018</td>
<td>-1.201612</td>
<td>(56, 16)</td>
</tr>
<tr>
<th>v</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>75.0</td>
<td>50.0</td>
<td>25.0</td>
<td>0.0</td>
<td>49.59953</td>
<td>16</td>
<td>0.98945</td>
<td>28.891313</td>
<td>-0.006261</td>
<td>-1.199011</td>
<td>(60, 39)</td>
</tr>
<tr>
<th>w</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>74.0</td>
<td>49.0</td>
<td>24.0</td>
<td>0.0</td>
<td>49.34131</td>
<td>4</td>
<td>0.98915</td>
<td>28.925175</td>
<td>0.009523</td>
<td>-1.203308</td>
<td>(78, 96)</td>
</tr>
<tr>
<th>x</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>74.0</td>
<td>49.0</td>
<td>25.0</td>
<td>0.0</td>
<td>49.45791</td>
<td>95</td>
<td>0.98933</td>
<td>28.860322</td>
<td>0.007199</td>
<td>-1.198962</td>
<td>(93, 79)</td>
</tr>
<tr>
<th>y</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>74.0</td>
<td>50.0</td>
<td>25.0</td>
<td>0.0</td>
<td>49.58517</td>
<td>34</td>
<td>0.98929</td>
<td>28.765474</td>
<td>-0.000497</td>
<td>-1.193016</td>
<td>(80, 42)</td>
</tr>
<tr>
<th>z</th>
<td>100000</td>
<td>0</td>
<td>0.0</td>
<td>100</td>
<td>int64</td>
<td>99.0</td>
<td>74.0</td>
<td>50.0</td>
<td>24.0</td>
<td>0.0</td>
<td>49.44355</td>
<td>21</td>
<td>0.98876</td>
<td>28.85751</td>
<td>0.000819</td>
<td>-1.201063</td>
<td>(25, 25)</td>
</tr>
</tbody>
</table>
</div>
## Large data training and testing set splitting functions
```python
# make a toy dataset
import pandas as pd
import numpy as np
df = pd.DataFrame({
'a': np.random.randint(0, 100, 100000),
'b': np.random.randint(0, 100, 100000),
'c': np.random.randint(0, 100, 100000),
'd': np.random.randint(0, 100, 100000),
'e': np.random.randint(0, 100, 100000),
'f': np.random.randint(0, 100, 100000),
'g': np.random.randint(0, 100, 100000),
'h': np.random.randint(0, 100, 100000),
'i': np.random.randint(0, 100, 100000),
'j': np.random.randint(0, 100, 100000),
'k': np.random.randint(0, 100, 100000),
'l': np.random.randint(0, 100, 100000),
'm': np.random.randint(0, 100, 100000),
'n': np.random.randint(0, 100, 100000),
'o': np.random.randint(0, 100, 100000),
'p': np.random.randint(0, 100, 100000),
'q': np.random.randint(0, 100, 100000),
'r': np.random.randint(0, 100, 100000),
's': np.random.randint(0, 100, 100000),
't': np.random.randint(0, 100, 100000),
'u': np.random.randint(0, 100, 100000),
'v': np.random.randint(0, 100, 100000),
'w': np.random.randint(0, 100, 100000),
'x': np.random.randint(0, 100, 100000),
'y': np.random.randint(0, 100, 100000),
'z': np.random.randint(0, 100, 100000),
})
# split dataframe into training and testing sets
# return numpy.ndarray
from spinesUtils import train_test_split_bigdata
from spinesUtils.feature_tools import get_x_cols
X_train, X_valid, X_test, y_train, y_valid, y_test = train_test_split_bigdata(
df=df,
x_cols=get_x_cols(df, y_col='a'),
y_col='a',
shuffle=True,
return_valid=True,
train_size=0.8,
valid_size=0.5
)
print(X_train.shape, X_valid.shape, X_test.shape, y_train.shape, y_valid.shape, y_test.shape)
X_train[:5]
```
(80000, 25) (80000,) (10000, 25) (10000,) (10000, 25) (10000,)
array([[45, 83, 43, 94, 1, 86, 56, 0, 78, 60, 79, 42, 24, 43, 94, 83,
45, 50, 59, 50, 17, 99, 40, 95, 70],
[ 4, 81, 9, 25, 54, 18, 14, 6, 17, 39, 0, 36, 82, 33, 11, 76,
92, 29, 33, 50, 44, 11, 87, 86, 31],
[72, 82, 52, 96, 55, 89, 35, 71, 48, 73, 34, 19, 53, 89, 46, 57,
84, 67, 10, 40, 50, 61, 10, 76, 84],
[46, 45, 79, 53, 80, 85, 58, 65, 26, 49, 46, 97, 83, 47, 77, 97,
26, 4, 33, 79, 36, 65, 50, 94, 87],
[36, 7, 46, 10, 11, 33, 3, 7, 82, 29, 28, 2, 42, 89, 42, 66,
79, 51, 49, 43, 63, 14, 13, 74, 26]])
```python
# return pandas.DataFrame
from spinesUtils import train_test_split_bigdata_df
from spinesUtils.feature_tools import get_x_cols
train_df, valid_df, test_df = train_test_split_bigdata_df(
df=df,
x_cols=get_x_cols(df, y_col='a'),
y_col='a',
shuffle=True,
return_valid=True,
train_size=0.8,
valid_size=0.5
)
print(train_df.shape, valid_df.shape, test_df.shape)
train_df.head()
```
(8000000, 26) (1000000, 26) (1000000, 26)
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>b</th>
<th>c</th>
<th>d</th>
<th>e</th>
<th>f</th>
<th>g</th>
<th>h</th>
<th>i</th>
<th>j</th>
<th>k</th>
<th>...</th>
<th>r</th>
<th>s</th>
<th>t</th>
<th>u</th>
<th>v</th>
<th>w</th>
<th>x</th>
<th>y</th>
<th>z</th>
<th>a</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>14</td>
<td>67</td>
<td>41</td>
<td>87</td>
<td>68</td>
<td>87</td>
<td>27</td>
<td>67</td>
<td>26</td>
<td>62</td>
<td>...</td>
<td>63</td>
<td>43</td>
<td>77</td>
<td>4</td>
<td>6</td>
<td>72</td>
<td>5</td>
<td>63</td>
<td>73</td>
<td>27</td>
</tr>
<tr>
<th>1</th>
<td>47</td>
<td>37</td>
<td>43</td>
<td>98</td>
<td>55</td>
<td>68</td>
<td>82</td>
<td>48</td>
<td>37</td>
<td>35</td>
<td>...</td>
<td>99</td>
<td>92</td>
<td>23</td>
<td>44</td>
<td>92</td>
<td>14</td>
<td>54</td>
<td>95</td>
<td>58</td>
<td>59</td>
</tr>
<tr>
<th>2</th>
<td>52</td>
<td>97</td>
<td>71</td>
<td>62</td>
<td>18</td>
<td>54</td>
<td>22</td>
<td>2</td>
<td>57</td>
<td>93</td>
<td>...</td>
<td>82</td>
<td>6</td>
<td>61</td>
<td>41</td>
<td>24</td>
<td>40</td>
<td>54</td>
<td>11</td>
<td>9</td>
<td>5</td>
</tr>
<tr>
<th>3</th>
<td>48</td>
<td>45</td>
<td>22</td>
<td>46</td>
<td>32</td>
<td>37</td>
<td>6</td>
<td>13</td>
<td>42</td>
<td>67</td>
<td>...</td>
<td>9</td>
<td>1</td>
<td>65</td>
<td>84</td>
<td>11</td>
<td>86</td>
<td>54</td>
<td>22</td>
<td>89</td>
<td>85</td>
</tr>
<tr>
<th>4</th>
<td>26</td>
<td>23</td>
<td>55</td>
<td>31</td>
<td>61</td>
<td>72</td>
<td>68</td>
<td>82</td>
<td>6</td>
<td>19</td>
<td>...</td>
<td>13</td>
<td>44</td>
<td>3</td>
<td>93</td>
<td>66</td>
<td>53</td>
<td>75</td>
<td>93</td>
<td>53</td>
<td>43</td>
</tr>
</tbody>
</table>
<p>5 rows × 26 columns</p>
</div>
```python
# performances comparison
from sklearn.model_selection import train_test_split
from spinesUtils import train_test_split_bigdata, train_test_split_bigdata_df
from spinesUtils.feature_tools import get_x_cols
# make a toy dataset
import pandas as pd
import numpy as np
df = pd.DataFrame({
'a': np.random.randint(0, 100, 10000),
'b': np.random.randint(0, 100, 10000),
'c': np.random.randint(0, 100, 10000),
'd': np.random.randint(0, 100, 10000),
'e': np.random.randint(0, 100, 10000),
'f': np.random.randint(0, 100, 10000),
'g': np.random.randint(0, 100, 10000),
'h': np.random.randint(0, 100, 10000),
'i': np.random.randint(0, 100, 10000),
'j': np.random.randint(0, 100, 10000),
'k': np.random.randint(0, 100, 10000),
'l': np.random.randint(0, 100, 10000),
'm': np.random.randint(0, 100, 10000),
'n': np.random.randint(0, 100, 10000),
'o': np.random.randint(0, 100, 10000),
'p': np.random.randint(0, 100, 10000),
'q': np.random.randint(0, 100, 10000),
'r': np.random.randint(0, 100, 10000),
's': np.random.randint(0, 100, 10000),
't': np.random.randint(0, 100, 10000),
'u': np.random.randint(0, 100, 10000),
'v': np.random.randint(0, 100, 10000),
'w': np.random.randint(0, 100, 10000),
'x': np.random.randint(0, 100, 10000),
'y': np.random.randint(0, 100, 10000),
'z': np.random.randint(0, 100, 10000),
})
# define a function to split a valid set for sklearn train_test_split
def train_test_split_sklearn(df, x_cols, y_col, shuffle, train_size, valid_size):
X_train, X_test, y_train, y_test = train_test_split(df[x_cols], df[y_col], test_size=1-train_size, random_state=0, shuffle=shuffle)
X_valid, X_test, y_valid, y_test = train_test_split(X_test, y_test, test_size=valid_size, random_state=0, shuffle=shuffle)
return X_train, X_valid, X_test, y_train, y_valid, y_test
%timeit X_train, X_valid, X_test, y_train, y_valid, y_test = train_test_split_sklearn(df=df, x_cols=get_x_cols(df, y_col='a'), y_col='a', shuffle=True, train_size=0.8, valid_size=0.5)
%timeit X_train, X_valid, X_test, y_train, y_valid, y_test = train_test_split_bigdata(df=df, x_cols=get_x_cols(df, y_col='a'), y_col='a', shuffle=True, return_valid=True, train_size=0.8, valid_size=0.5)
%timeit train_df, valid_df, test_df = train_test_split_bigdata_df(df=df, x_cols=get_x_cols(df, y_col='a'), y_col='a', shuffle=True, return_valid=True, train_size=0.8, valid_size=0.5)
```
1.28 ms ± 20.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
1.05 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
1.36 ms ± 11.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
## An intuitive timer
```python
from spinesUtils.timer import Timer
# create a timer instance
timer = Timer()
# start the timer
timer.start()
# do something
for i in range(10):
# timer sleep for 1 second
timer.sleep(1)
# print the elapsed time from last sleep
print("Elapsed time: {} seconds".format(timer.last_timestamp_diff()))
# print the elapsed time
print("Total elapsed time: {} seconds".format(timer.total_elapsed_time()))
# stop the timer
timer.end()
```
Elapsed time: 1.0117900371551514 seconds
Elapsed time: 2.016140937805176 seconds
Elapsed time: 3.0169479846954346 seconds
Elapsed time: 4.0224690437316895 seconds
Elapsed time: 5.027086019515991 seconds
Elapsed time: 6.0309507846832275 seconds
Elapsed time: 7.035104036331177 seconds
Elapsed time: 8.040709972381592 seconds
Elapsed time: 9.042311906814575 seconds
Elapsed time: 10.046867847442627 seconds
Total elapsed time: 10.047839879989624 seconds
10.047943830490112
```python
from spinesUtils.timer import Timer
# you can also use the timer as a context manager
t = Timer()
with t.session():
t.sleep(1)
print("Last step elapsed time:", round(t.last_timestamp_diff(), 2), 'seconds')
t.middle_point()
t.sleep(2)
print("Last step elapsed time:", round(t.last_timestamp_diff(), 2), 'seconds')
total_elapsed_time = t.total_elapsed_time()
print("Total Time:", round(total_elapsed_time, 2), 'seconds')
```
Last step elapsed time: 1.01 seconds
Last step elapsed time: 2.01 seconds
Total Time: 3.01 seconds
Raw data
{
"_id": null,
"home_page": "https://github.com/BirchKwok/spinesUtils",
"name": "spinesUtils",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "machine learning",
"author": "Birch Kwok",
"author_email": "birchkwok@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/fb/18/576358a108b185844376794b8550f3ffe5f664853d72bf0ac4b582016922/spinesutils-0.4.5.tar.gz",
"platform": null,
"description": "# spinesUtils \n*Dedicated to helping users do more in less time.*\n\n<big><i><b>spinesUtils</b></i></big>\n is a user-friendly toolkit for the machine learning ecosystem, offering ready-to-use features such as\n\n- [x] Logging functionality\n- [x] Type checking and parameter generation\n- [x] CSV file reading acceleration\n- [x] Classifiers for imbalanced data\n- [x] Pandas Dataframe data compression\n- [x] Pandas DataFrame insight tools\n- [x] Large data training and testing set splitting functions\n- [x] An intuitive timer.\n\nIt is currently undergoing rapid iteration. If you encounter any issues with its functionalities, feel free to raise an issue.\n\n# Installation\nYou can install spinesUtils from PyPI:\n```bash\npip install spinesUtils\n```\n\n# Logger\n\nYou can use the Logger class to print your logs without worrying about handler conflicts with the native Python logging module. \n\nThis class provides log/debug/info/warning/error/critical methods, where debug/info/warning/error/critical are partial versions of the log method, available for use as needed.\n\n\n```python\n# load spinesUtils module\nfrom spinesUtils.logging import Logger\n\n# create a logger instance, with name \"MyLogger\", and no file handler, the default level is \"INFO\"\n# You can specify a file path `fp` during instantiation. If not specified, logs will not be written to a file.\nlogger = Logger(name=\"MyLogger\", fp=None, level=\"DEBUG\")\n\nlogger.log(\"This is an info log emitted by the log function.\", level='INFO')\nlogger.debug(\"This is an debug message\")\nlogger.info(\"This is an info message.\")\nlogger.warning(\"This is an warning message.\")\nlogger.error(\"This is an error message.\")\nlogger.critical(\"This is an critical message.\")\n```\n\n 2024-01-19 15:02:51 - MyLogger - INFO - This is an info log emitted by the log function.\n 2024-01-19 15:02:51 - MyLogger - DEBUG - This is an debug message\n 2024-01-19 15:02:51 - MyLogger - INFO - This is an info message.\n 2024-01-19 15:02:51 - MyLogger - WARNING - This is an warning message.\n 2024-01-19 15:02:51 - MyLogger - ERROR - This is an error message.\n 2024-01-19 15:02:51 - MyLogger - CRITICAL - This is an critical message.\n\n\n## Type checking and parameter generation \n\n\n```python\nfrom spinesUtils.asserts import *\n\n# check parameter type\n@ParameterTypeAssert({\n 'a': (int, float),\n 'b': (int, float)\n})\ndef add(a, b):\n pass\n\n# try to pass a string to the function, and it will raise an ParametersTypeError error\nadd(a=1, b='2')\n```\n\n\n ---------------------------------------------------------------------------\n\n ParametersTypeError Traceback (most recent call last)\n\n Cell In[2], line 12\n 9 pass\n 11 # try to pass a string to the function, and it will raise an ParametersTypeError error\n ---> 12 add(a=1, b='2')\n\n\n File ~/projects/spinesUtils/spinesUtils/asserts/_inspect.py:196, in ParameterTypeAssert.__call__.<locals>.wrapper(*args, **kwargs)\n 194 if mismatched_params:\n 195 error_msg = self.build_type_error_msg(mismatched_params)\n --> 196 raise ParametersTypeError(error_msg)\n 198 return func(**kwargs)\n\n\n ParametersTypeError: Function 'add' parameter(s) type mismatch: b only accept '['int', 'float']' type.\n\n\n\n```python\n# check parameter value\n@ParameterValuesAssert({\n 'a': lambda x: x > 0,\n 'b': lambda x: x > 0\n})\ndef add(a, b):\n pass\n\n# try to pass a negative number to the function, and it will raise an ParametersValueError error\nadd(a=1, b=-2)\n```\n\n\n ---------------------------------------------------------------------------\n\n ParametersValueError Traceback (most recent call last)\n\n Cell In[3], line 10\n 7 pass\n 9 # try to pass a negative number to the function, and it will raise an ParametersValueError error\n ---> 10 add(a=1, b=-2)\n\n\n File ~/projects/spinesUtils/spinesUtils/asserts/_inspect.py:258, in ParameterValuesAssert.__call__.<locals>.wrapper(*args, **kwargs)\n 256 if mismatched_params:\n 257 error_msg = self.build_values_error_msg(mismatched_params)\n --> 258 raise ParametersValueError(error_msg)\n 260 return func(**kwargs)\n\n\n ParametersValueError: Function 'add' parameter(s) values mismatch: `b` must in or satisfy ''b': lambda x: x > 0' condition(s).\n\n\n\n```python\n# generate a dictionary of keyword arguments for a given function using provided arguments\ngenerate_function_kwargs(add, a=1, b=2)\n```\n\n\n\n\n {'a': 1, 'b': 2}\n\n\n\n\n```python\n# isinstance function with support for None\naugmented_isinstance(1, (int, float, None))\n```\n\n\n\n\n True\n\n\n\n\n```python\n# raise_if and raise_if_not functions\nraise_if(ValueError, 1 == 1, \"test raise_if\")\n```\n\n\n ---------------------------------------------------------------------------\n\n ValueError Traceback (most recent call last)\n\n Cell In[6], line 2\n 1 # raise_if and raise_if_not functions\n ----> 2 raise_if(ValueError, 1 == 1, \"test raise_if\")\n\n\n File ~/projects/spinesUtils/spinesUtils/asserts/_type_and_exceptions.py:115, in raise_if(exception, condition, error_msg)\n 112 assert issubclass(exception, BaseException), \"Exception must be a subclass of BaseException.\"\n 114 if condition:\n --> 115 raise exception(error_msg)\n\n\n ValueError: test raise_if\n\n\n\n```python\nraise_if_not(ZeroDivisionError, 1 != 1, \"test raise_if_not\")\n```\n\n\n ---------------------------------------------------------------------------\n\n ZeroDivisionError Traceback (most recent call last)\n\n Cell In[7], line 1\n ----> 1 raise_if_not(ZeroDivisionError, 1 != 1, \"test raise_if_not\")\n\n\n File ~/projects/spinesUtils/spinesUtils/asserts/_type_and_exceptions.py:144, in raise_if_not(exception, condition, error_msg)\n 141 assert issubclass(exception, BaseException), \"Exception must be a subclass of BaseException.\"\n 143 if not condition:\n --> 144 raise exception(error_msg)\n\n\n ZeroDivisionError: test raise_if_not\n\n\n## Faster csv reader\n\n\n```python\nfrom spinesUtils import read_csv\n\nyour_df = read_csv(\n fp='/path/to/your/file.csv',\n sep=',', # equal to pandas read_csv.sep\n turbo_method='polars', # use turbo_method to speed up load time\n chunk_size=None, # it can be integer if you want to use pandas backend\n transform2low_mem=True, # it can compresses file to save more memory\n verbose=False\n)\n```\n\n## Classifiers for imbalanced data\n\n\n```python\nfrom spinesUtils.models import MultiClassBalanceClassifier\n```\n\n\n```python\n# make a toy dataset\nfrom sklearn.datasets import make_classification\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import classification_report\n\ndataset = make_classification(\n n_samples=10000,\n n_features=2,\n n_informative=2,\n n_redundant=0,\n n_repeated=0,\n n_classes=3,\n n_clusters_per_class=1,\n weights=[0.01, 0.05, 0.94],\n class_sep=0.8,\n random_state=0\n)\n\nX, y = dataset\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n```\n\n\n```python\nfrom sklearn.ensemble import RandomForestClassifier\n\nclassifier = MultiClassBalanceClassifier(\n base_estimator=RandomForestClassifier(n_estimators=100),\n n_classes=3,\n random_state=0,\n verbose=0\n)\n\n# fit the classifier\nclassifier.fit(X_train, y_train)\n\n# predict\ny_pred = classifier.predict(X_test)\n\n# print classification report\nprint(classification_report(y_test, y_pred))\n```\n\n precision recall f1-score support\n \n 0 0.74 0.72 0.73 32\n 1 0.91 0.71 0.80 111\n 2 0.98 1.00 0.99 1857\n \n accuracy 0.98 2000\n macro avg 0.88 0.81 0.84 2000\n weighted avg 0.98 0.98 0.98 2000\n\n\n## Pandas dataframe data compression\n\n\n```python\n# make a toy dataset\nimport pandas as pd\nimport numpy as np\n\ndf = pd.DataFrame({\n 'a': np.random.randint(0, 100, 100000),\n 'b': np.random.randint(0, 100, 100000),\n 'c': np.random.randint(0, 100, 100000),\n 'd': np.random.randint(0, 100, 100000),\n 'e': np.random.randint(0, 100, 100000),\n 'f': np.random.randint(0, 100, 100000),\n 'g': np.random.randint(0, 100, 100000),\n 'h': np.random.randint(0, 100, 100000),\n 'i': np.random.randint(0, 100, 100000),\n 'j': np.random.randint(0, 100, 100000),\n 'k': np.random.randint(0, 100, 100000),\n 'l': np.random.randint(0, 100, 100000),\n 'm': np.random.randint(0, 100, 100000),\n 'n': np.random.randint(0, 100, 100000),\n 'o': np.random.randint(0, 100, 100000),\n 'p': np.random.randint(0, 100, 100000),\n 'q': np.random.randint(0, 100, 100000),\n 'r': np.random.randint(0, 100, 100000),\n 's': np.random.randint(0, 100, 100000),\n 't': np.random.randint(0, 100, 100000),\n 'u': np.random.randint(0, 100, 100000),\n 'v': np.random.randint(0, 100, 100000),\n 'w': np.random.randint(0, 100, 100000),\n 'x': np.random.randint(0, 100, 100000),\n 'y': np.random.randint(0, 100, 100000),\n 'z': np.random.randint(0, 100, 100000),\n})\n\n# compress dataframe\nfrom spinesUtils import transform_dtypes_low_mem\n\ntransform_dtypes_low_mem(df, verbose=True, inplace=True)\n```\n\n\n Converting ...: 0%| | 0/26 [00:00<?, ?it/s]\n\n\n [log] INFO - Memory usage before conversion is: 19.84 MB \n [log] INFO - Memory usage after conversion is: 2.48 MB \n [log] INFO - After conversion, the percentage of memory fluctuation is 87.5 %\n\n\n\n```python\n# batch compress dataframes\nfrom spinesUtils import transform_batch_dtypes_low_mem\n\n# make some toy datasets\ndf1 = pd.DataFrame({\n 'a': np.random.randint(0, 100, 100000),\n 'b': np.random.randint(0, 100, 100000),\n 'c': np.random.randint(0, 100, 100000),\n 'd': np.random.randint(0, 100, 100000),\n 'e': np.random.randint(0, 100, 100000),\n 'f': np.random.randint(0, 100, 100000),\n 'g': np.random.randint(0, 100, 100000),\n 'h': np.random.randint(0, 100, 100000),\n 'i': np.random.randint(0, 100, 100000),\n 'j': np.random.randint(0, 100, 100000),\n 'k': np.random.randint(0, 100, 100000),\n 'l': np.random.randint(0, 100, 100000),\n 'm': np.random.randint(0, 100, 100000),\n 'n': np.random.randint(0, 100, 100000),\n 'o': np.random.randint(0, 100, 100000),\n 'p': np.random.randint(0, 100, 100000),\n 'q': np.random.randint(0, 100, 100000),\n 'r': np.random.randint(0, 100, 100000),\n 's': np.random.randint(0, 100, 100000),\n 't': np.random.randint(0, 100, 100000),\n 'u': np.random.randint(0, 100, 100000),\n 'v': np.random.randint(0, 100, 100000),\n 'w': np.random.randint(0, 100, 100000),\n 'x': np.random.randint(0, 100, 100000),\n 'y': np.random.randint(0, 100, 100000),\n 'z': np.random.randint(0, 100, 100000),\n})\n\ndf2 = df1.copy()\ndf3 = df1.copy()\ndf4 = df1.copy()\n\n# batch compress dataframes\ntransform_batch_dtypes_low_mem([df1, df2, df3, df4], verbose=True, inplace=True)\n```\n\n\n Batch converting ...: 0%| | 0/4 [00:00<?, ?it/s]\n\n\n [log] INFO - Memory usage before conversion is: 79.35 MB \n [log] INFO - Memory usage after conversion is: 9.92 MB \n [log] INFO - After conversion, the percentage of memory fluctuation is 87.5 %\n\n\n## Pandas DataFrame insight tools\n\n\n```python\nfrom spinesUtils import df_preview, classify_samples_dist\n\n# make a toy dataset\nimport pandas as pd\nimport numpy as np\n\ndf = pd.DataFrame({\n 'a': np.random.randint(0, 100, 100000),\n 'b': np.random.randint(0, 100, 100000),\n 'c': np.random.randint(0, 100, 100000),\n 'd': np.random.randint(0, 100, 100000),\n 'e': np.random.randint(0, 100, 100000),\n 'f': np.random.randint(0, 100, 100000),\n 'g': np.random.randint(0, 100, 100000),\n 'h': np.random.randint(0, 100, 100000),\n 'i': np.random.randint(0, 100, 100000),\n 'j': np.random.randint(0, 100, 100000),\n 'k': np.random.randint(0, 100, 100000),\n 'l': np.random.randint(0, 100, 100000),\n 'm': np.random.randint(0, 100, 100000),\n 'n': np.random.randint(0, 100, 100000),\n 'o': np.random.randint(0, 100, 100000),\n 'p': np.random.randint(0, 100, 100000),\n 'q': np.random.randint(0, 100, 100000),\n 'r': np.random.randint(0, 100, 100000),\n 's': np.random.randint(0, 100, 100000),\n 't': np.random.randint(0, 100, 100000),\n 'u': np.random.randint(0, 100, 100000),\n 'v': np.random.randint(0, 100, 100000),\n 'w': np.random.randint(0, 100, 100000),\n 'x': np.random.randint(0, 100, 100000),\n 'y': np.random.randint(0, 100, 100000),\n 'z': np.random.randint(0, 100, 100000),\n})\n\ndf_insight = df_preview(df)\n\ndf_insight\n```\n\n\n\n\n<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>total</th>\n <th>na</th>\n <th>naPercent</th>\n <th>nunique</th>\n <th>dtype</th>\n <th>max</th>\n <th>75%</th>\n <th>median</th>\n <th>25%</th>\n <th>min</th>\n <th>mean</th>\n <th>mode</th>\n <th>variation</th>\n <th>std</th>\n <th>skew</th>\n <th>kurt</th>\n <th>samples</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>a</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>74.0</td>\n <td>50.0</td>\n <td>25.0</td>\n <td>0.0</td>\n <td>49.53968</td>\n <td>36</td>\n <td>0.9892</td>\n <td>28.848392</td>\n <td>-0.000158</td>\n <td>-1.196434</td>\n <td>(32, 81)</td>\n </tr>\n <tr>\n <th>b</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>75.0</td>\n <td>49.0</td>\n <td>24.0</td>\n <td>0.0</td>\n <td>49.41822</td>\n <td>40</td>\n <td>0.98928</td>\n <td>28.937601</td>\n <td>0.005974</td>\n <td>-1.206987</td>\n <td>(76, 28)</td>\n </tr>\n <tr>\n <th>c</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>75.0</td>\n <td>50.0</td>\n <td>25.0</td>\n <td>0.0</td>\n <td>49.58261</td>\n <td>82</td>\n <td>0.98923</td>\n <td>28.928019</td>\n <td>-0.003537</td>\n <td>-1.202994</td>\n <td>(21, 68)</td>\n </tr>\n <tr>\n <th>d</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>75.0</td>\n <td>49.0</td>\n <td>24.0</td>\n <td>0.0</td>\n <td>49.46308</td>\n <td>9</td>\n <td>0.98906</td>\n <td>28.886459</td>\n <td>0.003344</td>\n <td>-1.200654</td>\n <td>(42, 90)</td>\n </tr>\n <tr>\n <th>e</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>75.0</td>\n <td>49.0</td>\n <td>25.0</td>\n <td>0.0</td>\n <td>49.55014</td>\n <td>37</td>\n <td>0.98911</td>\n <td>28.834041</td>\n <td>0.003987</td>\n <td>-1.196103</td>\n <td>(15, 59)</td>\n </tr>\n <tr>\n <th>f</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>74.0</td>\n <td>49.0</td>\n <td>24.0</td>\n <td>0.0</td>\n <td>49.20195</td>\n <td>4</td>\n <td>0.98926</td>\n <td>28.886463</td>\n <td>0.009183</td>\n <td>-1.203297</td>\n <td>(72, 9)</td>\n </tr>\n <tr>\n <th>g</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>75.0</td>\n <td>50.0</td>\n <td>25.0</td>\n <td>0.0</td>\n <td>49.62199</td>\n <td>4</td>\n <td>0.98919</td>\n <td>28.849264</td>\n <td>-0.012746</td>\n <td>-1.199283</td>\n <td>(69, 64)</td>\n </tr>\n <tr>\n <th>h</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>75.0</td>\n <td>50.0</td>\n <td>25.0</td>\n <td>0.0</td>\n <td>49.58739</td>\n <td>40</td>\n <td>0.98917</td>\n <td>28.83744</td>\n <td>-0.004719</td>\n <td>-1.193858</td>\n <td>(30, 79)</td>\n </tr>\n <tr>\n <th>i</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>75.0</td>\n <td>49.0</td>\n <td>24.0</td>\n <td>0.0</td>\n <td>49.41076</td>\n <td>10</td>\n <td>0.98939</td>\n <td>28.910095</td>\n <td>0.005218</td>\n <td>-1.207459</td>\n <td>(36, 54)</td>\n </tr>\n <tr>\n <th>j</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>74.0</td>\n <td>49.0</td>\n <td>25.0</td>\n <td>0.0</td>\n <td>49.45686</td>\n <td>46</td>\n <td>0.98909</td>\n <td>28.816681</td>\n <td>0.004751</td>\n <td>-1.190756</td>\n <td>(29, 95)</td>\n </tr>\n <tr>\n <th>k</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>74.0</td>\n <td>50.0</td>\n <td>25.0</td>\n <td>0.0</td>\n <td>49.54948</td>\n <td>46</td>\n <td>0.98914</td>\n <td>28.806187</td>\n <td>-0.003731</td>\n <td>-1.196876</td>\n <td>(32, 94)</td>\n </tr>\n <tr>\n <th>l</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>74.0</td>\n <td>49.0</td>\n <td>24.0</td>\n <td>0.0</td>\n <td>49.45631</td>\n <td>20</td>\n <td>0.98923</td>\n <td>28.921314</td>\n <td>0.002344</td>\n <td>-1.205342</td>\n <td>(22, 91)</td>\n </tr>\n <tr>\n <th>m</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>74.0</td>\n <td>49.0</td>\n <td>24.0</td>\n <td>0.0</td>\n <td>49.43142</td>\n <td>49</td>\n <td>0.98901</td>\n <td>28.852962</td>\n <td>0.002507</td>\n <td>-1.198267</td>\n <td>(94, 26)</td>\n </tr>\n <tr>\n <th>n</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>75.0</td>\n <td>50.0</td>\n <td>24.0</td>\n <td>0.0</td>\n <td>49.49325</td>\n <td>8</td>\n <td>0.98931</td>\n <td>28.899022</td>\n <td>0.000698</td>\n <td>-1.200786</td>\n <td>(46, 50)</td>\n </tr>\n <tr>\n <th>o</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>75.0</td>\n <td>50.0</td>\n <td>25.0</td>\n <td>0.0</td>\n <td>49.52091</td>\n <td>4</td>\n <td>0.98923</td>\n <td>28.869563</td>\n <td>-0.003987</td>\n <td>-1.202426</td>\n <td>(33, 13)</td>\n </tr>\n <tr>\n <th>p</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>74.0</td>\n <td>49.0</td>\n <td>24.0</td>\n <td>0.0</td>\n <td>49.40997</td>\n <td>61</td>\n <td>0.98918</td>\n <td>28.900207</td>\n <td>0.007921</td>\n <td>-1.204621</td>\n <td>(58, 93)</td>\n </tr>\n <tr>\n <th>q</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>75.0</td>\n <td>50.0</td>\n <td>25.0</td>\n <td>0.0</td>\n <td>49.62826</td>\n <td>33</td>\n <td>0.98936</td>\n <td>28.831896</td>\n <td>-0.003291</td>\n <td>-1.201172</td>\n <td>(82, 31)</td>\n </tr>\n <tr>\n <th>r</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>75.0</td>\n <td>50.0</td>\n <td>24.0</td>\n <td>0.0</td>\n <td>49.47208</td>\n <td>60</td>\n <td>0.98925</td>\n <td>28.873943</td>\n <td>0.000515</td>\n <td>-1.202925</td>\n <td>(0, 26)</td>\n </tr>\n <tr>\n <th>s</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>75.0</td>\n <td>50.0</td>\n <td>25.0</td>\n <td>0.0</td>\n <td>49.64847</td>\n <td>48</td>\n <td>0.9893</td>\n <td>28.853741</td>\n <td>-0.010258</td>\n <td>-1.202701</td>\n <td>(94, 37)</td>\n </tr>\n <tr>\n <th>t</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>74.0</td>\n <td>50.0</td>\n <td>25.0</td>\n <td>0.0</td>\n <td>49.55305</td>\n <td>32</td>\n <td>0.98898</td>\n <td>28.801028</td>\n <td>-0.001721</td>\n <td>-1.193403</td>\n <td>(85, 10)</td>\n </tr>\n <tr>\n <th>u</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>74.0</td>\n <td>49.0</td>\n <td>24.0</td>\n <td>0.0</td>\n <td>49.45428</td>\n <td>80</td>\n <td>0.98928</td>\n <td>28.876812</td>\n <td>0.002018</td>\n <td>-1.201612</td>\n <td>(56, 16)</td>\n </tr>\n <tr>\n <th>v</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>75.0</td>\n <td>50.0</td>\n <td>25.0</td>\n <td>0.0</td>\n <td>49.59953</td>\n <td>16</td>\n <td>0.98945</td>\n <td>28.891313</td>\n <td>-0.006261</td>\n <td>-1.199011</td>\n <td>(60, 39)</td>\n </tr>\n <tr>\n <th>w</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>74.0</td>\n <td>49.0</td>\n <td>24.0</td>\n <td>0.0</td>\n <td>49.34131</td>\n <td>4</td>\n <td>0.98915</td>\n <td>28.925175</td>\n <td>0.009523</td>\n <td>-1.203308</td>\n <td>(78, 96)</td>\n </tr>\n <tr>\n <th>x</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>74.0</td>\n <td>49.0</td>\n <td>25.0</td>\n <td>0.0</td>\n <td>49.45791</td>\n <td>95</td>\n <td>0.98933</td>\n <td>28.860322</td>\n <td>0.007199</td>\n <td>-1.198962</td>\n <td>(93, 79)</td>\n </tr>\n <tr>\n <th>y</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>74.0</td>\n <td>50.0</td>\n <td>25.0</td>\n <td>0.0</td>\n <td>49.58517</td>\n <td>34</td>\n <td>0.98929</td>\n <td>28.765474</td>\n <td>-0.000497</td>\n <td>-1.193016</td>\n <td>(80, 42)</td>\n </tr>\n <tr>\n <th>z</th>\n <td>100000</td>\n <td>0</td>\n <td>0.0</td>\n <td>100</td>\n <td>int64</td>\n <td>99.0</td>\n <td>74.0</td>\n <td>50.0</td>\n <td>24.0</td>\n <td>0.0</td>\n <td>49.44355</td>\n <td>21</td>\n <td>0.98876</td>\n <td>28.85751</td>\n <td>0.000819</td>\n <td>-1.201063</td>\n <td>(25, 25)</td>\n </tr>\n </tbody>\n</table>\n</div>\n\n\n\n## Large data training and testing set splitting functions\n\n\n```python\n# make a toy dataset\nimport pandas as pd\nimport numpy as np\n\ndf = pd.DataFrame({\n 'a': np.random.randint(0, 100, 100000),\n 'b': np.random.randint(0, 100, 100000),\n 'c': np.random.randint(0, 100, 100000),\n 'd': np.random.randint(0, 100, 100000),\n 'e': np.random.randint(0, 100, 100000),\n 'f': np.random.randint(0, 100, 100000),\n 'g': np.random.randint(0, 100, 100000),\n 'h': np.random.randint(0, 100, 100000),\n 'i': np.random.randint(0, 100, 100000),\n 'j': np.random.randint(0, 100, 100000),\n 'k': np.random.randint(0, 100, 100000),\n 'l': np.random.randint(0, 100, 100000),\n 'm': np.random.randint(0, 100, 100000),\n 'n': np.random.randint(0, 100, 100000),\n 'o': np.random.randint(0, 100, 100000),\n 'p': np.random.randint(0, 100, 100000),\n 'q': np.random.randint(0, 100, 100000),\n 'r': np.random.randint(0, 100, 100000),\n 's': np.random.randint(0, 100, 100000),\n 't': np.random.randint(0, 100, 100000),\n 'u': np.random.randint(0, 100, 100000),\n 'v': np.random.randint(0, 100, 100000),\n 'w': np.random.randint(0, 100, 100000),\n 'x': np.random.randint(0, 100, 100000),\n 'y': np.random.randint(0, 100, 100000),\n 'z': np.random.randint(0, 100, 100000),\n})\n\n# split dataframe into training and testing sets\n\n# return numpy.ndarray\nfrom spinesUtils import train_test_split_bigdata\nfrom spinesUtils.feature_tools import get_x_cols\n\nX_train, X_valid, X_test, y_train, y_valid, y_test = train_test_split_bigdata(\n df=df, \n x_cols=get_x_cols(df, y_col='a'),\n y_col='a', \n shuffle=True,\n return_valid=True,\n train_size=0.8,\n valid_size=0.5\n)\n\nprint(X_train.shape, X_valid.shape, X_test.shape, y_train.shape, y_valid.shape, y_test.shape)\nX_train[:5]\n```\n\n (80000, 25) (80000,) (10000, 25) (10000,) (10000, 25) (10000,)\n\n\n\n\n\n array([[45, 83, 43, 94, 1, 86, 56, 0, 78, 60, 79, 42, 24, 43, 94, 83,\n 45, 50, 59, 50, 17, 99, 40, 95, 70],\n [ 4, 81, 9, 25, 54, 18, 14, 6, 17, 39, 0, 36, 82, 33, 11, 76,\n 92, 29, 33, 50, 44, 11, 87, 86, 31],\n [72, 82, 52, 96, 55, 89, 35, 71, 48, 73, 34, 19, 53, 89, 46, 57,\n 84, 67, 10, 40, 50, 61, 10, 76, 84],\n [46, 45, 79, 53, 80, 85, 58, 65, 26, 49, 46, 97, 83, 47, 77, 97,\n 26, 4, 33, 79, 36, 65, 50, 94, 87],\n [36, 7, 46, 10, 11, 33, 3, 7, 82, 29, 28, 2, 42, 89, 42, 66,\n 79, 51, 49, 43, 63, 14, 13, 74, 26]])\n\n\n\n\n```python\n# return pandas.DataFrame\nfrom spinesUtils import train_test_split_bigdata_df\nfrom spinesUtils.feature_tools import get_x_cols\n\ntrain_df, valid_df, test_df = train_test_split_bigdata_df(\n df=df, \n x_cols=get_x_cols(df, y_col='a'),\n y_col='a', \n shuffle=True,\n return_valid=True,\n train_size=0.8,\n valid_size=0.5\n)\n\nprint(train_df.shape, valid_df.shape, test_df.shape)\ntrain_df.head()\n```\n\n (8000000, 26) (1000000, 26) (1000000, 26)\n\n\n\n\n\n<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>b</th>\n <th>c</th>\n <th>d</th>\n <th>e</th>\n <th>f</th>\n <th>g</th>\n <th>h</th>\n <th>i</th>\n <th>j</th>\n <th>k</th>\n <th>...</th>\n <th>r</th>\n <th>s</th>\n <th>t</th>\n <th>u</th>\n <th>v</th>\n <th>w</th>\n <th>x</th>\n <th>y</th>\n <th>z</th>\n <th>a</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>14</td>\n <td>67</td>\n <td>41</td>\n <td>87</td>\n <td>68</td>\n <td>87</td>\n <td>27</td>\n <td>67</td>\n <td>26</td>\n <td>62</td>\n <td>...</td>\n <td>63</td>\n <td>43</td>\n <td>77</td>\n <td>4</td>\n <td>6</td>\n <td>72</td>\n <td>5</td>\n <td>63</td>\n <td>73</td>\n <td>27</td>\n </tr>\n <tr>\n <th>1</th>\n <td>47</td>\n <td>37</td>\n <td>43</td>\n <td>98</td>\n <td>55</td>\n <td>68</td>\n <td>82</td>\n <td>48</td>\n <td>37</td>\n <td>35</td>\n <td>...</td>\n <td>99</td>\n <td>92</td>\n <td>23</td>\n <td>44</td>\n <td>92</td>\n <td>14</td>\n <td>54</td>\n <td>95</td>\n <td>58</td>\n <td>59</td>\n </tr>\n <tr>\n <th>2</th>\n <td>52</td>\n <td>97</td>\n <td>71</td>\n <td>62</td>\n <td>18</td>\n <td>54</td>\n <td>22</td>\n <td>2</td>\n <td>57</td>\n <td>93</td>\n <td>...</td>\n <td>82</td>\n <td>6</td>\n <td>61</td>\n <td>41</td>\n <td>24</td>\n <td>40</td>\n <td>54</td>\n <td>11</td>\n <td>9</td>\n <td>5</td>\n </tr>\n <tr>\n <th>3</th>\n <td>48</td>\n <td>45</td>\n <td>22</td>\n <td>46</td>\n <td>32</td>\n <td>37</td>\n <td>6</td>\n <td>13</td>\n <td>42</td>\n <td>67</td>\n <td>...</td>\n <td>9</td>\n <td>1</td>\n <td>65</td>\n <td>84</td>\n <td>11</td>\n <td>86</td>\n <td>54</td>\n <td>22</td>\n <td>89</td>\n <td>85</td>\n </tr>\n <tr>\n <th>4</th>\n <td>26</td>\n <td>23</td>\n <td>55</td>\n <td>31</td>\n <td>61</td>\n <td>72</td>\n <td>68</td>\n <td>82</td>\n <td>6</td>\n <td>19</td>\n <td>...</td>\n <td>13</td>\n <td>44</td>\n <td>3</td>\n <td>93</td>\n <td>66</td>\n <td>53</td>\n <td>75</td>\n <td>93</td>\n <td>53</td>\n <td>43</td>\n </tr>\n </tbody>\n</table>\n<p>5 rows \u00d7 26 columns</p>\n</div>\n\n\n\n\n```python\n# performances comparison\nfrom sklearn.model_selection import train_test_split\nfrom spinesUtils import train_test_split_bigdata, train_test_split_bigdata_df\nfrom spinesUtils.feature_tools import get_x_cols\n\n# make a toy dataset\nimport pandas as pd\nimport numpy as np\n\ndf = pd.DataFrame({\n 'a': np.random.randint(0, 100, 10000),\n 'b': np.random.randint(0, 100, 10000),\n 'c': np.random.randint(0, 100, 10000),\n 'd': np.random.randint(0, 100, 10000),\n 'e': np.random.randint(0, 100, 10000),\n 'f': np.random.randint(0, 100, 10000),\n 'g': np.random.randint(0, 100, 10000),\n 'h': np.random.randint(0, 100, 10000),\n 'i': np.random.randint(0, 100, 10000),\n 'j': np.random.randint(0, 100, 10000),\n 'k': np.random.randint(0, 100, 10000),\n 'l': np.random.randint(0, 100, 10000),\n 'm': np.random.randint(0, 100, 10000),\n 'n': np.random.randint(0, 100, 10000),\n 'o': np.random.randint(0, 100, 10000),\n 'p': np.random.randint(0, 100, 10000),\n 'q': np.random.randint(0, 100, 10000),\n 'r': np.random.randint(0, 100, 10000),\n 's': np.random.randint(0, 100, 10000),\n 't': np.random.randint(0, 100, 10000),\n 'u': np.random.randint(0, 100, 10000),\n 'v': np.random.randint(0, 100, 10000),\n 'w': np.random.randint(0, 100, 10000),\n 'x': np.random.randint(0, 100, 10000),\n 'y': np.random.randint(0, 100, 10000),\n 'z': np.random.randint(0, 100, 10000),\n})\n\n# define a function to split a valid set for sklearn train_test_split\ndef train_test_split_sklearn(df, x_cols, y_col, shuffle, train_size, valid_size):\n X_train, X_test, y_train, y_test = train_test_split(df[x_cols], df[y_col], test_size=1-train_size, random_state=0, shuffle=shuffle)\n X_valid, X_test, y_valid, y_test = train_test_split(X_test, y_test, test_size=valid_size, random_state=0, shuffle=shuffle)\n return X_train, X_valid, X_test, y_train, y_valid, y_test\n\n%timeit X_train, X_valid, X_test, y_train, y_valid, y_test = train_test_split_sklearn(df=df, x_cols=get_x_cols(df, y_col='a'), y_col='a', shuffle=True, train_size=0.8, valid_size=0.5)\n%timeit X_train, X_valid, X_test, y_train, y_valid, y_test = train_test_split_bigdata(df=df, x_cols=get_x_cols(df, y_col='a'), y_col='a', shuffle=True, return_valid=True, train_size=0.8, valid_size=0.5)\n%timeit train_df, valid_df, test_df = train_test_split_bigdata_df(df=df, x_cols=get_x_cols(df, y_col='a'), y_col='a', shuffle=True, return_valid=True, train_size=0.8, valid_size=0.5)\n```\n\n 1.28 ms \u00b1 20.5 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n 1.05 ms \u00b1 14.1 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n 1.36 ms \u00b1 11.7 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 1,000 loops each)\n\n\n## An intuitive timer\n\n\n```python\nfrom spinesUtils.timer import Timer\n\n# create a timer instance\ntimer = Timer()\n\n# start the timer\ntimer.start()\n\n# do something\nfor i in range(10):\n # timer sleep for 1 second\n timer.sleep(1)\n # print the elapsed time from last sleep\n print(\"Elapsed time: {} seconds\".format(timer.last_timestamp_diff()))\n\n# print the elapsed time\nprint(\"Total elapsed time: {} seconds\".format(timer.total_elapsed_time()))\n\n# stop the timer\ntimer.end()\n```\n\n Elapsed time: 1.0117900371551514 seconds\n Elapsed time: 2.016140937805176 seconds\n Elapsed time: 3.0169479846954346 seconds\n Elapsed time: 4.0224690437316895 seconds\n Elapsed time: 5.027086019515991 seconds\n Elapsed time: 6.0309507846832275 seconds\n Elapsed time: 7.035104036331177 seconds\n Elapsed time: 8.040709972381592 seconds\n Elapsed time: 9.042311906814575 seconds\n Elapsed time: 10.046867847442627 seconds\n Total elapsed time: 10.047839879989624 seconds\n\n\n\n\n\n 10.047943830490112\n\n\n\n\n```python\nfrom spinesUtils.timer import Timer\n\n# you can also use the timer as a context manager\nt = Timer()\nwith t.session():\n t.sleep(1)\n print(\"Last step elapsed time:\", round(t.last_timestamp_diff(), 2), 'seconds')\n t.middle_point()\n t.sleep(2)\n print(\"Last step elapsed time:\", round(t.last_timestamp_diff(), 2), 'seconds')\n \n total_elapsed_time = t.total_elapsed_time()\n \nprint(\"Total Time:\", round(total_elapsed_time, 2), 'seconds')\n```\n\n Last step elapsed time: 1.01 seconds\n Last step elapsed time: 2.01 seconds\n Total Time: 3.01 seconds\n\n",
"bugtrack_url": null,
"license": null,
"summary": "spinesUtils is a user-friendly toolkit for the machine learning ecosystem.",
"version": "0.4.5",
"project_urls": {
"Homepage": "https://github.com/BirchKwok/spinesUtils"
},
"split_keywords": [
"machine",
"learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "321a54bd70530ad0e33e85d94cf5fb472395bbafaf28c81e2d02385ca8ff4870",
"md5": "e434c52b29bf9c1178fdc408785727ae",
"sha256": "8a3a986c1dafd34262b5f3b24055b2072b6e52856989d5158458ea1bca01faa6"
},
"downloads": -1,
"filename": "spinesUtils-0.4.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e434c52b29bf9c1178fdc408785727ae",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 42332,
"upload_time": "2024-08-31T05:37:54",
"upload_time_iso_8601": "2024-08-31T05:37:54.787493Z",
"url": "https://files.pythonhosted.org/packages/32/1a/54bd70530ad0e33e85d94cf5fb472395bbafaf28c81e2d02385ca8ff4870/spinesUtils-0.4.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "fb18576358a108b185844376794b8550f3ffe5f664853d72bf0ac4b582016922",
"md5": "2512e3216989ba83f532359a99a25975",
"sha256": "4c679e55613146d7da5aa4a71ca74bfaef348119e555d12d76a960a436e785cd"
},
"downloads": -1,
"filename": "spinesutils-0.4.5.tar.gz",
"has_sig": false,
"md5_digest": "2512e3216989ba83f532359a99a25975",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 47906,
"upload_time": "2024-08-31T05:37:56",
"upload_time_iso_8601": "2024-08-31T05:37:56.680897Z",
"url": "https://files.pythonhosted.org/packages/fb/18/576358a108b185844376794b8550f3ffe5f664853d72bf0ac4b582016922/spinesutils-0.4.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-31 05:37:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "BirchKwok",
"github_project": "spinesUtils",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "spinesutils"
}