data-science-utils

Name	data-science-utils JSON
Version	1.7.3 JSON
	download
home_page
Summary	This project is an ensemble of methods which are frequently used in python Data Science projects.
upload_time	2024-02-11 09:44:34
maintainer
docs_url	None
author
requires_python
license	MIT License Copyright (c) 2018 Idan Morad Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	data-science utilities python machine-learning scikit-learn matplotlib
VCS
bugtrack_url
requirements	numpy scipy pandas matplotlib seaborn scikit-learn pydotplus joblib
Travis-CI
coveralls test coverage

            # Data Science Utils: Frequently Used Methods for Data Science
[![License: MIT](https://img.shields.io/github/license/idanmoradarthas/DataScienceUtils)](https://opensource.org/licenses/MIT)
![GitHub release (latest SemVer)](https://img.shields.io/github/v/release/idanmoradarthas/DataScienceUtils)
[![GitHub issues](https://img.shields.io/github/issues/idanmoradarthas/DataScienceUtils)](https://github.com/idanmoradarthas/DataScienceUtils/issues)
[![Documentation Status](https://readthedocs.org/projects/datascienceutils/badge/?version=latest)](https://datascienceutils.readthedocs.io/en/latest/?badge=latest)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/data-science-utils)
![PyPI - Wheel](https://img.shields.io/pypi/wheel/data-science-utils)
[![PyPI version](https://badge.fury.io/py/data-science-utils.svg)](https://badge.fury.io/py/data-science-utils)
[![Anaconda-Server Badge](https://anaconda.org/idanmorad/data-science-utils/badges/version.svg)](https://anaconda.org/idanmorad/data-science-utils)
[![Build Status](https://travis-ci.org/idanmoradarthas/DataScienceUtils.svg?branch=master)](https://travis-ci.org/idanmoradarthas/DataScienceUtils)
[![Coverage Status](https://coveralls.io/repos/github/idanmoradarthas/DataScienceUtils/badge.svg?branch=master)](https://coveralls.io/github/idanmoradarthas/DataScienceUtils?branch=master)


Data Science Utils extends the Scikit-Learn API and Matplotlib API to provide simple methods that simplify task and 
visualization over data. 

# Code Examples and Documentation
**Let's see some code examples and outputs.** 

**You can read the full documentation with all the code examples from:
[https://datascienceutils.readthedocs.io/en/latest/](https://datascienceutils.readthedocs.io/en/latest/)**

In the documentation you can find more methods and more examples.

The API of the package is build to work with Scikit-Learn API and Matplotlib API. Here are some of capabilities of this
package:

## Metrics
### Plot Confusion Matrix
Computes and plot confusion matrix, False Positive Rate, False Negative Rate, Accuracy and F1 score of a classification.

```python
from ds_utils.metrics import plot_confusion_matrix



plot_confusion_matrix(y_test, y_pred, [0, 1, 2])
```

![multi label classification confusion matrix](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_metrics/test_print_confusion_matrix.png)

### Plot Metric Growth per Labeled Instances

Receives a train and test sets, and plots given metric change in increasing amount of trained instances.

```python
from ds_utils.metrics import plot_metric_growth_per_labeled_instances



plot_metric_growth_per_labeled_instances(x_train, y_train, x_test, y_test,
                                             {"DecisionTreeClassifier":
                                                DecisionTreeClassifier(random_state=0),
                                              "RandomForestClassifier":
                                                RandomForestClassifier(random_state=0, n_estimators=5)})
```

![metric growth per labeled instances with n samples](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_metrics/test_plot_metric_growth_per_labeled_instances_with_n_samples.png)

### Visualize Accuracy Grouped by Probability

Receives test true labels and classifier probabilities predictions, divide and classify the results and finally
plots a stacked bar chart with the results. [Original code](https://github.com/EthicalML/XAI)

```python
from ds_utils.metrics import visualize_accuracy_grouped_by_probability


visualize_accuracy_grouped_by_probability(test["target"], 1, 
                                          classifier.predict_proba(test[selected_features]),
                                          display_breakdown=False)
```

Without breakdown:

![visualize_accuracy_grouped_by_probability](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_metrics/test_visualize_accuracy_grouped_by_probability.png)

With breakdown:

![visualize_accuracy_grouped_by_probability_with_breakdown](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_metrics/test_visualize_accuracy_grouped_by_probability_with_breakdown.png)

## Preprocess
### Visualize Feature

Receives a feature and visualize its values on a graph:

* If the feature is float then the method plots the distribution plot.
* If the feature is datetime then the method plots a line plot of progression of amount thought time.
* If the feature is object, categorical, boolean or integer then the method plots count plot (histogram).

```python
from ds_utils.preprocess import visualize_feature



visualize_feature(X_train["feature"])
```

|Feature Type      |Plot|
|------------------|----|
|Float             |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_visualize_feature_float.png)|
|Integer           |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_visualize_feature_int.png)|
|Datetime          |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_visualize_feature_datetime.png)|
|Category / Object |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_visualize_feature_category_more_than_10_categories.png)|
|Boolean           |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_visualize_feature_bool.png)|

### Get Correlated Features

Calculate which features correlated above a threshold and extract a data frame with the correlations and correlation to 
the target feature.

```python
from ds_utils.preprocess import get_correlated_features



correlations = get_correlated_features(train, features, target)
```

|level_0               |level_1               |level_0_level_1_corr|level_0_target_corr|level_1_target_corr|
|----------------------|----------------------|--------------------|-------------------|-------------------|
|income_category_Low   |income_category_Medium| 1.0                | 0.1182165609358650|0.11821656093586504|
|term\_ 36 months      |term\_ 60 months      | 1.0                | 0.1182165609358650|0.11821656093586504|
|interest_payments_High|interest_payments_Low | 1.0                | 0.1182165609358650|0.11821656093586504|

### Visualize Correlations
Compute pairwise correlation of columns, excluding NA/null values, and visualize it with heat map.
[Original code](https://seaborn.pydata.org/examples/many_pairwise_correlations.html)

```python
from ds_utils.preprocess import visualize_correlations



visualize_correlations(data)
```

![visualize features](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_visualize_correlations.png)

### Plot Correlation Dendrogram
Plot dendrogram of a correlation matrix. This consists of a chart that that shows hierarchically the variables that are 
most correlated by the connecting trees. The closer to the right that the connection is, the more correlated the 
features are. [Original code](https://github.com/EthicalML/XAI)

```python
from ds_utils.preprocess import plot_correlation_dendrogram



plot_correlation_dendrogram(data)
```

![plot correlation dendrogram](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_correlation_dendrogram.png)

### Plot Features' Interaction
Plots the joint distribution between two features:
* If both features are either categorical, boolean or object then the method plots the shared histogram.
* If one feature is either categorical, boolean or object and the other is numeric then the method plots a boxplot chart.
* If one feature is datetime and the other is numeric or datetime then the method plots a line plot graph.
* If one feature is datetime and the other is either categorical, boolean or object the method plots a violin plot (combination of boxplot and kernel density estimate).
* If both features are numeric then the method plots scatter graph.

```python
from ds_utils.preprocess import plot_features_interaction



plot_features_interaction("feature_1", "feature_2", data)
```

|               | Numeric | Categorical | Boolean | Datetime
|---------------|---------|-------------|---------|---------|
|**Numeric**    |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_both_numeric.png)| | | |
|**Categorical**|![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_numeric_categorical.png)|![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_both_categorical.png)| | |
|**Boolean**    |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_numeric_boolean.png)|![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_categorical_bool.png)|![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_both_bool.png)| |
|**Datetime**   |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_datetime_numeric.png)|![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_datetime_categorical.png)|![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_datetime_bool.png)|![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_datetime_datetime.png)|

## Strings
### Append Tags to Frame

Extracts tags from a given field and append them as dataframe.

A dataset that looks like this:

``x_train``:

|article_name|article_tags|
|------------|------------|
|1           |ds,ml,dl    |
|2           |ds,ml       |

``x_test``:

|article_name|article_tags|
|------------|------------|
|3           |ds,ml,py    |

Using this code:
```python
import pandas as pd

from ds_utils.strings import append_tags_to_frame


x_train = pd.DataFrame([{"article_name": "1", "article_tags": "ds,ml,dl"},
                             {"article_name": "2", "article_tags": "ds,ml"}])
x_test = pd.DataFrame([{"article_name": "3", "article_tags": "ds,ml,py"}])

x_train_with_tags, x_test_with_tags = append_tags_to_frame(x_train, x_test, "article_tags", "tag_")
```

will be parsed into this:

``x_train_with_tags``:

|article_name|tag_ds|tag_ml|tag_dl|
|------------|------|------|------|
|1           |1     |1     |1     |
|2           |1     |1     |0     |

``x_test_with_tags``:

|article_name|tag_ds|tag_ml|tag_dl|
|------------|------|------|------|
|3           |1     |1     |0     |

### Extract Significant Terms from Subset
Returns interesting or unusual occurrences of terms in a subset. Based on the [elasticsearch significant_text aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html#_scripted).

```python
import pandas as pd

from ds_utils.strings import extract_significant_terms_from_subset

corpus = ['This is the first document.', 'This document is the second document.',
          'And this is the third one.', 'Is this the first document?']
data_frame = pd.DataFrame(corpus, columns=["content"])
# Let's differentiate between the last two documents from the full corpus
subset_data_frame = data_frame[data_frame.index > 1]
terms = extract_significant_terms_from_subset(data_frame, subset_data_frame, 
                                               "content")

```
And the following table will be the output for ``terms``:

|third|one|and|this|the |is  |first|document|second|
|-----|---|---|----|----|----|-----|--------|------|
|1.0  |1.0|1.0|0.67|0.67|0.67|0.5  |0.25    |0.0   |

## Unsupervised
### Cluster Cardinality
Cluster cardinality is the number of examples per cluster. This method plots the number of points per cluster as a bar 
chart.

```python
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.cluster import KMeans

from ds_utils.unsupervised import plot_cluster_cardinality


data = pd.read_csv(path/to/dataset)
estimator = KMeans(n_clusters=8, random_state=42)
estimator.fit(data)

plot_cluster_cardinality(estimator.labels_)

plt.show()
```
![Cluster Cardinality](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_unsupervised/test_cluster_cardinality.png)

### Plot Cluster Magnitude
Cluster magnitude is the sum of distances from all examples to the centroid of the cluster. This method plots the 
Total Point-to-Centroid Distance per cluster as a bar chart.

```python
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.cluster import KMeans
from scipy.spatial.distance import euclidean

from ds_utils.unsupervised import plot_cluster_magnitude

data = pd.read_csv(path/to/dataset)
estimator = KMeans(n_clusters=8, random_state=42)
estimator.fit(data)

plot_cluster_magnitude(data, estimator.labels_, estimator.cluster_centers_, euclidean)

plt.show()
```
![Plot Cluster Magnitude](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_unsupervised/test_plot_cluster_magnitude.png)

### Magnitude vs. Cardinality
Higher cluster cardinality tends to result in a higher cluster magnitude, which intuitively makes sense. Clusters
are anomalous when cardinality doesn't correlate with magnitude relative to the other clusters. Find anomalous 
clusters by plotting magnitude against cardinality as a scatter plot.
```python
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.cluster import KMeans
from scipy.spatial.distance import euclidean

from ds_utils.unsupervised import plot_magnitude_vs_cardinality

data = pd.read_csv(path/to/dataset)
estimator = KMeans(n_clusters=8, random_state=42)
estimator.fit(data)

plot_magnitude_vs_cardinality(data, estimator.labels_, estimator.cluster_centers_, euclidean)

plt.show()
```
![Magnitude vs. Cardinality](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_unsupervised/test_plot_magnitude_vs_cardinality.png)

### Optimum Number of Clusters
k-means requires you to decide the number of clusters ``k`` beforehand. This method runs the KMean algorithm and 
increases the cluster number at each try. The Total magnitude or sum of distance is used as loss.

Right now the method only works with ``sklearn.cluster.KMeans``.

```python
import pandas as pd

from matplotlib import pyplot as plt
from scipy.spatial.distance import euclidean

from ds_utils.unsupervised import plot_loss_vs_cluster_number



data = pd.read_csv(path/to/dataset)

plot_loss_vs_cluster_number(data, 3, 20, euclidean)

plt.show()
```
![Optimum Number of Clusters](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_unsupervised/test_plot_loss_vs_cluster_number.png)

## XAI
### Generate Decision Paths
Receives a decision tree and return the underlying decision-rules (or 'decision paths') as text (valid python syntax). 
[Original code](https://stackoverflow.com/questions/20224526/how-to-extract-the-decision-rules-from-scikit-learn-decision-tree)

```python
from sklearn.tree import DecisionTreeClassifier

from ds_utils.xai import generate_decision_paths
    

# Create decision tree classifier object
clf = DecisionTreeClassifier(max_depth=3)

# Train model
clf.fit(x, y)
print(generate_decision_paths(clf, feature_names, target_names.tolist(), 
                              "iris_tree"))
```
The following text will be printed:
```
def iris_tree(petal width (cm), petal length (cm)):
    if petal width (cm) <= 0.8000:
        # return class setosa with probability 0.9804
        return ("setosa", 0.9804)
    else:  # if petal width (cm) > 0.8000
        if petal width (cm) <= 1.7500:
            if petal length (cm) <= 4.9500:
                # return class versicolor with probability 0.9792
                return ("versicolor", 0.9792)
            else:  # if petal length (cm) > 4.9500
                # return class virginica with probability 0.6667
                return ("virginica", 0.6667)
        else:  # if petal width (cm) > 1.7500
            if petal length (cm) <= 4.8500:
                # return class virginica with probability 0.6667
                return ("virginica", 0.6667)
            else:  # if petal length (cm) > 4.8500
                # return class virginica with probability 0.9773
                return ("virginica", 0.9773)
```

## Plot Features` Importance

plot feature importance as a bar chart.

```python
import pandas as pd

from matplotlib import pyplot as plt
from sklearn.tree import DecisionTreeClassifier

from ds_utils.xai import plot_features_importance


data = pd.read_csv(path/to/dataset)
target = data["target"]
features = data.columns.to_list()
features.remove("target")

clf = DecisionTreeClassifier(random_state=42)
clf.fit(data[features], target)
plot_features_importance(features, clf.feature_importances_)

plt.show()
```
![Plot Features Importance](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_xai/test_plot_features_importance.png)

Excited?

Read about all the modules here and see more abilities: 
* [Metrics](https://datascienceutils.readthedocs.io/en/latest/metrics.html) - The module of metrics contains methods that help to calculate and/or visualize evaluation performance of an algorithm.
* [Preprocess](https://datascienceutils.readthedocs.io/en/latest/preprocess.html) - The module of preprocess contains methods that are processes that could be made to data before training.
* [Strings](https://datascienceutils.readthedocs.io/en/latest/strings.html) - The module of strings contains methods that help manipulate and process strings in a dataframe.
* [Unsupervised](https://datascienceutils.readthedocs.io/en/latest/unsupervised.html) - The module od unsupervised contains methods that calculate and/or visualize evaluation performance of an unsupervised model.
* [XAI](https://datascienceutils.readthedocs.io/en/latest/xai.html) - The module of xai contains methods that help explain a model decisions.

## Contributing
Interested in contributing to Data Science Utils? Great! You're welcome,  and we would love to have you. We follow 
the [Python Software Foundation Code of Conduct](http://www.python.org/psf/codeofconduct/) and 
[Matplotlib Usage Guide](https://matplotlib.org/tutorials/introductory/usage.html#coding-styles).

No matter your level of technical skill, you can be helpful. We appreciate bug reports, user testing, feature 
requests, bug fixes, product enhancements, and documentation improvements.

Thank you for your contributions!

## Find a Bug?
Check if there's already an open [issue](https://github.com/idanmoradarthas/DataScienceUtils/issues) on the topic. If 
needed, file an issue.

## Open Source
Data Science Utils license is [MIT License](https://opensource.org/licenses/MIT). 

## Installing Data Science Utils
Data Science Utils is compatible with Python 3.6 or later. The simplest way to install Data Science Utils and its 
dependencies is from PyPI with pip, Python's preferred package installer:
```bash
pip install data-science-utils
```
Note that this package is an active project and routinely publishes new releases with more methods.  In order to 
upgrade Data Science Utils to the latest version, use pip as follows:
```bash
pip install -U data-science-utils
```
Alternatively you can install from source by cloning the repo and running:
```bash
git clone https://github.com/idanmoradarthas/DataScienceUtils.git
cd DataScienceUtils
pip install .
```
Or installation using pip from source:
```bash
pip install git+https://github.com/idanmoradarthas/DataScienceUtils.git
```
If you're using Anaconda, you can install using conda:
```bash
conda install idanmorad::data-science-utils
```

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "data-science-utils",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "data-science,utilities,python,machine-learning,scikit-learn,matplotlib",
    "author": "",
    "author_email": "Idan Morad <idanmorad.arthas@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/20/79/c563a1417b59ca2890a90b1a46c0afeb78723dc947ac45cbc7e2a3f0a111/data_science_utils-1.7.3.tar.gz",
    "platform": null,
    "description": "# Data Science Utils: Frequently Used Methods for Data Science\r\n[![License: MIT](https://img.shields.io/github/license/idanmoradarthas/DataScienceUtils)](https://opensource.org/licenses/MIT)\r\n![GitHub release (latest SemVer)](https://img.shields.io/github/v/release/idanmoradarthas/DataScienceUtils)\r\n[![GitHub issues](https://img.shields.io/github/issues/idanmoradarthas/DataScienceUtils)](https://github.com/idanmoradarthas/DataScienceUtils/issues)\r\n[![Documentation Status](https://readthedocs.org/projects/datascienceutils/badge/?version=latest)](https://datascienceutils.readthedocs.io/en/latest/?badge=latest)\r\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/data-science-utils)\r\n![PyPI - Wheel](https://img.shields.io/pypi/wheel/data-science-utils)\r\n[![PyPI version](https://badge.fury.io/py/data-science-utils.svg)](https://badge.fury.io/py/data-science-utils)\r\n[![Anaconda-Server Badge](https://anaconda.org/idanmorad/data-science-utils/badges/version.svg)](https://anaconda.org/idanmorad/data-science-utils)\r\n[![Build Status](https://travis-ci.org/idanmoradarthas/DataScienceUtils.svg?branch=master)](https://travis-ci.org/idanmoradarthas/DataScienceUtils)\r\n[![Coverage Status](https://coveralls.io/repos/github/idanmoradarthas/DataScienceUtils/badge.svg?branch=master)](https://coveralls.io/github/idanmoradarthas/DataScienceUtils?branch=master)\r\n\r\n\r\nData Science Utils extends the Scikit-Learn API and Matplotlib API to provide simple methods that simplify task and \r\nvisualization over data. \r\n\r\n# Code Examples and Documentation\r\n**Let's see some code examples and outputs.** \r\n\r\n**You can read the full documentation with all the code examples from:\r\n[https://datascienceutils.readthedocs.io/en/latest/](https://datascienceutils.readthedocs.io/en/latest/)**\r\n\r\nIn the documentation you can find more methods and more examples.\r\n\r\nThe API of the package is build to work with Scikit-Learn API and Matplotlib API. Here are some of capabilities of this\r\npackage:\r\n\r\n## Metrics\r\n### Plot Confusion Matrix\r\nComputes and plot confusion matrix, False Positive Rate, False Negative Rate, Accuracy and F1 score of a classification.\r\n\r\n```python\r\nfrom ds_utils.metrics import plot_confusion_matrix\r\n\r\n\r\n\r\nplot_confusion_matrix(y_test, y_pred, [0, 1, 2])\r\n```\r\n\r\n![multi label classification confusion matrix](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_metrics/test_print_confusion_matrix.png)\r\n\r\n### Plot Metric Growth per Labeled Instances\r\n\r\nReceives a train and test sets, and plots given metric change in increasing amount of trained instances.\r\n\r\n```python\r\nfrom ds_utils.metrics import plot_metric_growth_per_labeled_instances\r\n\r\n\r\n\r\nplot_metric_growth_per_labeled_instances(x_train, y_train, x_test, y_test,\r\n                                             {\"DecisionTreeClassifier\":\r\n                                                DecisionTreeClassifier(random_state=0),\r\n                                              \"RandomForestClassifier\":\r\n                                                RandomForestClassifier(random_state=0, n_estimators=5)})\r\n```\r\n\r\n![metric growth per labeled instances with n samples](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_metrics/test_plot_metric_growth_per_labeled_instances_with_n_samples.png)\r\n\r\n### Visualize Accuracy Grouped by Probability\r\n\r\nReceives test true labels and classifier probabilities predictions, divide and classify the results and finally\r\nplots a stacked bar chart with the results. [Original code](https://github.com/EthicalML/XAI)\r\n\r\n```python\r\nfrom ds_utils.metrics import visualize_accuracy_grouped_by_probability\r\n\r\n\r\nvisualize_accuracy_grouped_by_probability(test[\"target\"], 1, \r\n                                          classifier.predict_proba(test[selected_features]),\r\n                                          display_breakdown=False)\r\n```\r\n\r\nWithout breakdown:\r\n\r\n![visualize_accuracy_grouped_by_probability](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_metrics/test_visualize_accuracy_grouped_by_probability.png)\r\n\r\nWith breakdown:\r\n\r\n![visualize_accuracy_grouped_by_probability_with_breakdown](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_metrics/test_visualize_accuracy_grouped_by_probability_with_breakdown.png)\r\n\r\n## Preprocess\r\n### Visualize Feature\r\n\r\nReceives a feature and visualize its values on a graph:\r\n\r\n* If the feature is float then the method plots the distribution plot.\r\n* If the feature is datetime then the method plots a line plot of progression of amount thought time.\r\n* If the feature is object, categorical, boolean or integer then the method plots count plot (histogram).\r\n\r\n```python\r\nfrom ds_utils.preprocess import visualize_feature\r\n\r\n\r\n\r\nvisualize_feature(X_train[\"feature\"])\r\n```\r\n\r\n|Feature Type      |Plot|\r\n|------------------|----|\r\n|Float             |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_visualize_feature_float.png)|\r\n|Integer           |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_visualize_feature_int.png)|\r\n|Datetime          |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_visualize_feature_datetime.png)|\r\n|Category / Object |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_visualize_feature_category_more_than_10_categories.png)|\r\n|Boolean           |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_visualize_feature_bool.png)|\r\n\r\n### Get Correlated Features\r\n\r\nCalculate which features correlated above a threshold and extract a data frame with the correlations and correlation to \r\nthe target feature.\r\n\r\n```python\r\nfrom ds_utils.preprocess import get_correlated_features\r\n\r\n\r\n\r\ncorrelations = get_correlated_features(train, features, target)\r\n```\r\n\r\n|level_0               |level_1               |level_0_level_1_corr|level_0_target_corr|level_1_target_corr|\r\n|----------------------|----------------------|--------------------|-------------------|-------------------|\r\n|income_category_Low   |income_category_Medium| 1.0                | 0.1182165609358650|0.11821656093586504|\r\n|term\\_ 36 months      |term\\_ 60 months      | 1.0                | 0.1182165609358650|0.11821656093586504|\r\n|interest_payments_High|interest_payments_Low | 1.0                | 0.1182165609358650|0.11821656093586504|\r\n\r\n### Visualize Correlations\r\nCompute pairwise correlation of columns, excluding NA/null values, and visualize it with heat map.\r\n[Original code](https://seaborn.pydata.org/examples/many_pairwise_correlations.html)\r\n\r\n```python\r\nfrom ds_utils.preprocess import visualize_correlations\r\n\r\n\r\n\r\nvisualize_correlations(data)\r\n```\r\n\r\n![visualize features](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_visualize_correlations.png)\r\n\r\n### Plot Correlation Dendrogram\r\nPlot dendrogram of a correlation matrix. This consists of a chart that that shows hierarchically the variables that are \r\nmost correlated by the connecting trees. The closer to the right that the connection is, the more correlated the \r\nfeatures are. [Original code](https://github.com/EthicalML/XAI)\r\n\r\n```python\r\nfrom ds_utils.preprocess import plot_correlation_dendrogram\r\n\r\n\r\n\r\nplot_correlation_dendrogram(data)\r\n```\r\n\r\n![plot correlation dendrogram](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_correlation_dendrogram.png)\r\n\r\n### Plot Features' Interaction\r\nPlots the joint distribution between two features:\r\n* If both features are either categorical, boolean or object then the method plots the shared histogram.\r\n* If one feature is either categorical, boolean or object and the other is numeric then the method plots a boxplot chart.\r\n* If one feature is datetime and the other is numeric or datetime then the method plots a line plot graph.\r\n* If one feature is datetime and the other is either categorical, boolean or object the method plots a violin plot (combination of boxplot and kernel density estimate).\r\n* If both features are numeric then the method plots scatter graph.\r\n\r\n```python\r\nfrom ds_utils.preprocess import plot_features_interaction\r\n\r\n\r\n\r\nplot_features_interaction(\"feature_1\", \"feature_2\", data)\r\n```\r\n\r\n|               | Numeric | Categorical | Boolean | Datetime\r\n|---------------|---------|-------------|---------|---------|\r\n|**Numeric**    |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_both_numeric.png)| | | |\r\n|**Categorical**|![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_numeric_categorical.png)|![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_both_categorical.png)| | |\r\n|**Boolean**    |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_numeric_boolean.png)|![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_categorical_bool.png)|![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_both_bool.png)| |\r\n|**Datetime**   |![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_datetime_numeric.png)|![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_datetime_categorical.png)|![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_datetime_bool.png)|![](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_visualization_aids/test_plot_relationship_between_features_datetime_datetime.png)|\r\n\r\n## Strings\r\n### Append Tags to Frame\r\n\r\nExtracts tags from a given field and append them as dataframe.\r\n\r\nA dataset that looks like this:\r\n\r\n``x_train``:\r\n\r\n|article_name|article_tags|\r\n|------------|------------|\r\n|1           |ds,ml,dl    |\r\n|2           |ds,ml       |\r\n\r\n``x_test``:\r\n\r\n|article_name|article_tags|\r\n|------------|------------|\r\n|3           |ds,ml,py    |\r\n\r\nUsing this code:\r\n```python\r\nimport pandas as pd\r\n\r\nfrom ds_utils.strings import append_tags_to_frame\r\n\r\n\r\nx_train = pd.DataFrame([{\"article_name\": \"1\", \"article_tags\": \"ds,ml,dl\"},\r\n                             {\"article_name\": \"2\", \"article_tags\": \"ds,ml\"}])\r\nx_test = pd.DataFrame([{\"article_name\": \"3\", \"article_tags\": \"ds,ml,py\"}])\r\n\r\nx_train_with_tags, x_test_with_tags = append_tags_to_frame(x_train, x_test, \"article_tags\", \"tag_\")\r\n```\r\n\r\nwill be parsed into this:\r\n\r\n``x_train_with_tags``:\r\n\r\n|article_name|tag_ds|tag_ml|tag_dl|\r\n|------------|------|------|------|\r\n|1           |1     |1     |1     |\r\n|2           |1     |1     |0     |\r\n\r\n``x_test_with_tags``:\r\n\r\n|article_name|tag_ds|tag_ml|tag_dl|\r\n|------------|------|------|------|\r\n|3           |1     |1     |0     |\r\n\r\n### Extract Significant Terms from Subset\r\nReturns interesting or unusual occurrences of terms in a subset. Based on the [elasticsearch significant_text aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html#_scripted).\r\n\r\n```python\r\nimport pandas as pd\r\n\r\nfrom ds_utils.strings import extract_significant_terms_from_subset\r\n\r\ncorpus = ['This is the first document.', 'This document is the second document.',\r\n          'And this is the third one.', 'Is this the first document?']\r\ndata_frame = pd.DataFrame(corpus, columns=[\"content\"])\r\n# Let's differentiate between the last two documents from the full corpus\r\nsubset_data_frame = data_frame[data_frame.index > 1]\r\nterms = extract_significant_terms_from_subset(data_frame, subset_data_frame, \r\n                                               \"content\")\r\n\r\n```\r\nAnd the following table will be the output for ``terms``:\r\n\r\n|third|one|and|this|the |is  |first|document|second|\r\n|-----|---|---|----|----|----|-----|--------|------|\r\n|1.0  |1.0|1.0|0.67|0.67|0.67|0.5  |0.25    |0.0   |\r\n\r\n## Unsupervised\r\n### Cluster Cardinality\r\nCluster cardinality is the number of examples per cluster. This method plots the number of points per cluster as a bar \r\nchart.\r\n\r\n```python\r\nimport pandas as pd\r\nfrom matplotlib import pyplot as plt\r\nfrom sklearn.cluster import KMeans\r\n\r\nfrom ds_utils.unsupervised import plot_cluster_cardinality\r\n\r\n\r\ndata = pd.read_csv(path/to/dataset)\r\nestimator = KMeans(n_clusters=8, random_state=42)\r\nestimator.fit(data)\r\n\r\nplot_cluster_cardinality(estimator.labels_)\r\n\r\nplt.show()\r\n```\r\n![Cluster Cardinality](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_unsupervised/test_cluster_cardinality.png)\r\n\r\n### Plot Cluster Magnitude\r\nCluster magnitude is the sum of distances from all examples to the centroid of the cluster. This method plots the \r\nTotal Point-to-Centroid Distance per cluster as a bar chart.\r\n\r\n```python\r\nimport pandas as pd\r\nfrom matplotlib import pyplot as plt\r\nfrom sklearn.cluster import KMeans\r\nfrom scipy.spatial.distance import euclidean\r\n\r\nfrom ds_utils.unsupervised import plot_cluster_magnitude\r\n\r\ndata = pd.read_csv(path/to/dataset)\r\nestimator = KMeans(n_clusters=8, random_state=42)\r\nestimator.fit(data)\r\n\r\nplot_cluster_magnitude(data, estimator.labels_, estimator.cluster_centers_, euclidean)\r\n\r\nplt.show()\r\n```\r\n![Plot Cluster Magnitude](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_unsupervised/test_plot_cluster_magnitude.png)\r\n\r\n### Magnitude vs. Cardinality\r\nHigher cluster cardinality tends to result in a higher cluster magnitude, which intuitively makes sense. Clusters\r\nare anomalous when cardinality doesn't correlate with magnitude relative to the other clusters. Find anomalous \r\nclusters by plotting magnitude against cardinality as a scatter plot.\r\n```python\r\nimport pandas as pd\r\nfrom matplotlib import pyplot as plt\r\nfrom sklearn.cluster import KMeans\r\nfrom scipy.spatial.distance import euclidean\r\n\r\nfrom ds_utils.unsupervised import plot_magnitude_vs_cardinality\r\n\r\ndata = pd.read_csv(path/to/dataset)\r\nestimator = KMeans(n_clusters=8, random_state=42)\r\nestimator.fit(data)\r\n\r\nplot_magnitude_vs_cardinality(data, estimator.labels_, estimator.cluster_centers_, euclidean)\r\n\r\nplt.show()\r\n```\r\n![Magnitude vs. Cardinality](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_unsupervised/test_plot_magnitude_vs_cardinality.png)\r\n\r\n### Optimum Number of Clusters\r\nk-means requires you to decide the number of clusters ``k`` beforehand. This method runs the KMean algorithm and \r\nincreases the cluster number at each try. The Total magnitude or sum of distance is used as loss.\r\n\r\nRight now the method only works with ``sklearn.cluster.KMeans``.\r\n\r\n```python\r\nimport pandas as pd\r\n\r\nfrom matplotlib import pyplot as plt\r\nfrom scipy.spatial.distance import euclidean\r\n\r\nfrom ds_utils.unsupervised import plot_loss_vs_cluster_number\r\n\r\n\r\n\r\ndata = pd.read_csv(path/to/dataset)\r\n\r\nplot_loss_vs_cluster_number(data, 3, 20, euclidean)\r\n\r\nplt.show()\r\n```\r\n![Optimum Number of Clusters](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_unsupervised/test_plot_loss_vs_cluster_number.png)\r\n\r\n## XAI\r\n### Generate Decision Paths\r\nReceives a decision tree and return the underlying decision-rules (or 'decision paths') as text (valid python syntax). \r\n[Original code](https://stackoverflow.com/questions/20224526/how-to-extract-the-decision-rules-from-scikit-learn-decision-tree)\r\n\r\n```python\r\nfrom sklearn.tree import DecisionTreeClassifier\r\n\r\nfrom ds_utils.xai import generate_decision_paths\r\n    \r\n\r\n# Create decision tree classifier object\r\nclf = DecisionTreeClassifier(max_depth=3)\r\n\r\n# Train model\r\nclf.fit(x, y)\r\nprint(generate_decision_paths(clf, feature_names, target_names.tolist(), \r\n                              \"iris_tree\"))\r\n```\r\nThe following text will be printed:\r\n```\r\ndef iris_tree(petal width (cm), petal length (cm)):\r\n    if petal width (cm) <= 0.8000:\r\n        # return class setosa with probability 0.9804\r\n        return (\"setosa\", 0.9804)\r\n    else:  # if petal width (cm) > 0.8000\r\n        if petal width (cm) <= 1.7500:\r\n            if petal length (cm) <= 4.9500:\r\n                # return class versicolor with probability 0.9792\r\n                return (\"versicolor\", 0.9792)\r\n            else:  # if petal length (cm) > 4.9500\r\n                # return class virginica with probability 0.6667\r\n                return (\"virginica\", 0.6667)\r\n        else:  # if petal width (cm) > 1.7500\r\n            if petal length (cm) <= 4.8500:\r\n                # return class virginica with probability 0.6667\r\n                return (\"virginica\", 0.6667)\r\n            else:  # if petal length (cm) > 4.8500\r\n                # return class virginica with probability 0.9773\r\n                return (\"virginica\", 0.9773)\r\n```\r\n\r\n## Plot Features` Importance\r\n\r\nplot feature importance as a bar chart.\r\n\r\n```python\r\nimport pandas as pd\r\n\r\nfrom matplotlib import pyplot as plt\r\nfrom sklearn.tree import DecisionTreeClassifier\r\n\r\nfrom ds_utils.xai import plot_features_importance\r\n\r\n\r\ndata = pd.read_csv(path/to/dataset)\r\ntarget = data[\"target\"]\r\nfeatures = data.columns.to_list()\r\nfeatures.remove(\"target\")\r\n\r\nclf = DecisionTreeClassifier(random_state=42)\r\nclf.fit(data[features], target)\r\nplot_features_importance(features, clf.feature_importances_)\r\n\r\nplt.show()\r\n```\r\n![Plot Features Importance](https://raw.githubusercontent.com/idanmoradarthas/DataScienceUtils/master/tests/baseline_images/test_xai/test_plot_features_importance.png)\r\n\r\nExcited?\r\n\r\nRead about all the modules here and see more abilities: \r\n* [Metrics](https://datascienceutils.readthedocs.io/en/latest/metrics.html) - The module of metrics contains methods that help to calculate and/or visualize evaluation performance of an algorithm.\r\n* [Preprocess](https://datascienceutils.readthedocs.io/en/latest/preprocess.html) - The module of preprocess contains methods that are processes that could be made to data before training.\r\n* [Strings](https://datascienceutils.readthedocs.io/en/latest/strings.html) - The module of strings contains methods that help manipulate and process strings in a dataframe.\r\n* [Unsupervised](https://datascienceutils.readthedocs.io/en/latest/unsupervised.html) - The module od unsupervised contains methods that calculate and/or visualize evaluation performance of an unsupervised model.\r\n* [XAI](https://datascienceutils.readthedocs.io/en/latest/xai.html) - The module of xai contains methods that help explain a model decisions.\r\n\r\n## Contributing\r\nInterested in contributing to Data Science Utils? Great! You're welcome,  and we would love to have you. We follow \r\nthe [Python Software Foundation Code of Conduct](http://www.python.org/psf/codeofconduct/) and \r\n[Matplotlib Usage Guide](https://matplotlib.org/tutorials/introductory/usage.html#coding-styles).\r\n\r\nNo matter your level of technical skill, you can be helpful. We appreciate bug reports, user testing, feature \r\nrequests, bug fixes, product enhancements, and documentation improvements.\r\n\r\nThank you for your contributions!\r\n\r\n## Find a Bug?\r\nCheck if there's already an open [issue](https://github.com/idanmoradarthas/DataScienceUtils/issues) on the topic. If \r\nneeded, file an issue.\r\n\r\n## Open Source\r\nData Science Utils license is [MIT License](https://opensource.org/licenses/MIT). \r\n\r\n## Installing Data Science Utils\r\nData Science Utils is compatible with Python 3.6 or later. The simplest way to install Data Science Utils and its \r\ndependencies is from PyPI with pip, Python's preferred package installer:\r\n```bash\r\npip install data-science-utils\r\n```\r\nNote that this package is an active project and routinely publishes new releases with more methods.  In order to \r\nupgrade Data Science Utils to the latest version, use pip as follows:\r\n```bash\r\npip install -U data-science-utils\r\n```\r\nAlternatively you can install from source by cloning the repo and running:\r\n```bash\r\ngit clone https://github.com/idanmoradarthas/DataScienceUtils.git\r\ncd DataScienceUtils\r\npip install .\r\n```\r\nOr installation using pip from source:\r\n```bash\r\npip install git+https://github.com/idanmoradarthas/DataScienceUtils.git\r\n```\r\nIf you're using Anaconda, you can install using conda:\r\n```bash\r\nconda install idanmorad::data-science-utils\r\n```\r\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2018 Idan Morad  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "This project is an ensemble of methods which are frequently used in python Data Science projects.",
    "version": "1.7.3",
    "project_urls": {
        "Changelog": "https://github.com/idanmoradarthas/DataScienceUtils/blob/master/CHANGELOG.md",
        "Documentation": "https://datascienceutils.readthedocs.io/en/latest/",
        "Homepage": "https://github.com/idanmoradarthas/DataScienceUtils",
        "Issues": "https://github.com/idanmoradarthas/DataScienceUtils/issues",
        "Repository": "https://github.com/idanmoradarthas/DataScienceUtils.git"
    },
    "split_keywords": [
        "data-science",
        "utilities",
        "python",
        "machine-learning",
        "scikit-learn",
        "matplotlib"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "06a3aec8d27e52645cca4b344b41287339a0b5c77eeb583a818df05291454c95",
                "md5": "8f349fb7b6e429eac4f42cd450b1bd45",
                "sha256": "92b7484f26eecc1079c8893919e7d0a3197c012b623f17afb077b227a0a0a2d9"
            },
            "downloads": -1,
            "filename": "data_science_utils-1.7.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8f349fb7b6e429eac4f42cd450b1bd45",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 22350,
            "upload_time": "2024-02-11T09:44:32",
            "upload_time_iso_8601": "2024-02-11T09:44:32.925398Z",
            "url": "https://files.pythonhosted.org/packages/06/a3/aec8d27e52645cca4b344b41287339a0b5c77eeb583a818df05291454c95/data_science_utils-1.7.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2079c563a1417b59ca2890a90b1a46c0afeb78723dc947ac45cbc7e2a3f0a111",
                "md5": "a6d59f54b1428445c09d937f42c18c8f",
                "sha256": "f8fed3cc251d3cbac1b53f0c7e091b3db7f698df97d753656683d8bdd9002664"
            },
            "downloads": -1,
            "filename": "data_science_utils-1.7.3.tar.gz",
            "has_sig": false,
            "md5_digest": "a6d59f54b1428445c09d937f42c18c8f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 27740,
            "upload_time": "2024-02-11T09:44:34",
            "upload_time_iso_8601": "2024-02-11T09:44:34.857746Z",
            "url": "https://files.pythonhosted.org/packages/20/79/c563a1417b59ca2890a90b1a46c0afeb78723dc947ac45cbc7e2a3f0a111/data_science_utils-1.7.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-11 09:44:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "idanmoradarthas",
    "github_project": "DataScienceUtils",
    "travis_ci": true,
    "coveralls": true,
    "github_actions": false,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.26.3"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.11.4"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "2.1.4"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.8.0"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    ">=",
                    "0.12.2"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.2.2"
                ]
            ]
        },
        {
            "name": "pydotplus",
            "specs": [
                [
                    ">=",
                    "2.0.2"
                ]
            ]
        },
        {
            "name": "joblib",
            "specs": [
                [
                    ">=",
                    "1.2.0"
                ]
            ]
        }
    ],
    "tox": true,
    "lcname": "data-science-utils"
}