statsToolkit

Name	statsToolkit JSON
Version	1.0.1 JSON
	download
home_page	None
Summary	Statistical Methods for Data Science Toolkit
upload_time	2024-10-28 12:35:48
maintainer	None
docs_url	None
author	Eng. Marco Schivo and Eng. Alberto Biscalchin
requires_python	None
license	None
keywords	statistics data science machine learning data analysis descriptive statistics probability distributions visualizations
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Statistical Methods for Data Science (statsToolkit)
The project is sponsored by **Malmö Universitet** developed by Eng. Marco Schivo and Eng. Alberto Biscalchin under the supervision of Associete Professor Yuanji Cheng and is released under the **MIT License**. It is open source and available for anyone to use and contribute to.

Internal course Code reference: MA660E

---

# Descriptive Statistics

This module contains basic descriptive statistics functions that allow users to perform statistical analysis on numerical datasets. The functions are designed for flexibility and ease of use, and they provide essential statistical metrics such as mean, median, range, variance, standard deviation, and quantiles.

## Functions Overview

### 1. `mean(X)`
Calculates the arithmetic mean (average) of a list of numbers.

#### Example Usage:

```python
from statstoolkit.statistics import mean

data = [1, 2, 3, 4, 5]
print(mean(data))  # Output: 3.0
```

### 2. `median(X)`
Calculates the median of a list of numbers. The median is the middle value that separates the higher half from the lower half of the dataset.

#### Example Usage:

```python
from statstoolkit.statistics import median

data = [1, 2, 3, 4, 5]
print(median(data))  # Output: 3
```

### 3. `range_(X)`
Calculates the range, which is the difference between the maximum and minimum values in the dataset.

#### Example Usage:

```python
from statstoolkit.statistics import range_

data = [1, 2, 3, 4, 5]
print(range_(data))  # Output: 4
```

### 4. `var(X, ddof=0)`
Calculates the variance of the dataset. Variance measures the spread of the data from the mean. You can calculate both population variance (`ddof=0`) or sample variance (`ddof=1`).

#### Example Usage:

```python
from statstoolkit.statistics import var

data = [1, 2, 3, 4, 5]
print(var(data))  # Output: 2.0  (Population variance)
print(var(data, ddof=1))  # Output: 2.5  (Sample variance)
```

### 5. `std(X, ddof=0)`
Calculates the standard deviation, which is the square root of the variance. It indicates how much the data varies from the mean.

#### Example Usage:

```python
from statstoolkit.statistics import std

data = [1, 2, 3, 4, 5]
print(std(data))  # Output: 1.4142135623730951  (Population standard deviation)
print(std(data, ddof=1))  # Output: 1.5811388300841898  (Sample standard deviation)
```

### 6. `quantile(X, Q)`
Calculates the quantile, which is the value below which a given percentage of the data falls. For example, the 0.25 quantile is the first quartile (25th percentile).

#### Example Usage:

```python
from statstoolkit.statistics import quantile

data = [1, 2, 3, 4, 5]
print(quantile(data, 0.25))  # Output: 2.0 (25th percentile)
print(quantile(data, 0.5))  # Output: 3.0 (Median)
print(quantile(data, 0.75))  # Output: 4.0 (75th percentile)
```

### 7. `corrcoef(x, y, alternative='two-sided', method=None)`
Calculates the Pearson correlation coefficient between two datasets and provides the p-value for testing non-correlation.

#### Example Usage:

```python
from statstoolkit.statistics import corrcoef

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
R, P = corrcoef(x, y)
print("Correlation coefficient:", R)
print("P-value matrix:", P)
```

### 8. `partialcorr(X, columns=None)`
Computes the partial correlation matrix, controlling for the influence of all other variables.

#### Example Usage:

```python
from statstoolkit.statistics import partialcorr
import pandas as pd

data = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [2, 4, 6, 8, 10],
    'C': [5, 6, 7, 8, 9]
})
partial_corr_matrix = partialcorr(data)
print(partial_corr_matrix)
```

### 9. `cov(data1, data2=None)`
Calculates the covariance matrix between two datasets or within a single dataset.

#### Example Usage:

```python
from statstoolkit.statistics import cov

data1 = [1, 2, 3, 4]
data2 = [2, 4, 6, 8]
print(cov(data1, data2))
```

### 10. `fitlm(x, y)`
Performs simple linear regression of `y` on `x`, returning a dictionary of regression results.

#### Example Usage:

```python
from statstoolkit.statistics import fitlm

x = [1, 2, 3, 4]
y = [2, 4, 6, 8]
result = fitlm(x, y)
print(result)
```

### 11. `anova(y=None, factors=None, data=None, formula=None, response=None, sum_of_squares='type I')`
Performs one-way, two-way, or N-way Analysis of Variance (ANOVA) on data, supporting custom models.

#### Example Usage:

```python
from statstoolkit.statistics import anova
import pandas as pd

data = pd.DataFrame({
    'y': [23, 25, 20, 21],
    'A': ['High', 'Low', 'High', 'Low'],
    'B': ['Type1', 'Type2', 'Type1', 'Type2']
})
result = anova(y='y', data=data, formula='y ~ A + B + A:B')
print(result)
```

### 12. `kruskalwallis(x, group=None, displayopt=False)`
Performs the Kruskal-Wallis H-test for independent samples, a non-parametric alternative to one-way ANOVA.

#### Example Usage:

```python
from statstoolkit.statistics import kruskalwallis

x = [1.2, 3.4, 5.6, 1.1, 3.6, 5.5]
group = ['A', 'A', 'A', 'B', 'B', 'B']
p_value, anova_table, stats = kruskalwallis(x, group=group, displayopt=True)
print("P-value:", p_value)
print(anova_table)
```

---

# Visualization Functions

This module contains several flexible visualization functions built using **Matplotlib** and **Seaborn**, allowing users to generate commonly used plots such as bar charts, pie charts, histograms, boxplots, and scatter plots. The visualizations are designed for customization, giving the user control over various parameters such as color, labels, figure size, and more.

## Functions Overview

### 1. `bar_chart(x, y, title=None, xlabel=None, ylabel=None, color=None, figsize=(10, 6), **kwargs)`
Creates a bar chart with customizable x-axis labels, y-axis values, title, colors, and more.

#### Example Usage:
```python
from statstoolkit.visualization import bar_chart

categories = ['Category A', 'Category B', 'Category C']
values = [10, 20, 30]

bar_chart(categories, values, title="Example Bar Chart", xlabel="Category", ylabel="Values", color="blue")
```
![Example Bar Chart](examples/bar_chart.png)

---

### 2. `pie_chart(sizes, labels=None, title=None, colors=None, explode=None, autopct='%1.1f%%', shadow=False, startangle=90, **kwargs)`
Creates a pie chart with options for custom labels, colors, explode effect, and more.

#### Example Usage:
```python
from statstoolkit.visualization import pie_chart

sizes = [15, 30, 45, 10]
labels = ['Category A', 'Category B', 'Category C', 'Category D']

pie_chart(sizes, labels=labels, title="Example Pie Chart", autopct='%1.1f%%', shadow=True)
```
![Example Pie Chart](examples/pie_chart.png)

---

### 3. `histogram(x, bins=10, title=None, xlabel=None, ylabel=None, color=None, figsize=(10, 6), **kwargs)`
Creates a histogram for visualizing the frequency distribution of data. The number of bins or bin edges can be adjusted.

#### Example Usage:
```python
from statstoolkit.visualization import histogram

data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]

histogram(data, bins=4, title="Example Histogram", xlabel="Value", ylabel="Frequency", color="green")
```
![Example Histogram](examples/histogram.png)

---

### 4. `boxplot(MPG, origin=None, title=None, xlabel=None, ylabel=None, color=None, figsize=(10, 6), **kwargs)`
Creates a boxplot for visualizing the distribution of a dataset. It supports optional grouping variables, such as plotting the distribution of MPG (miles per gallon) by car origin.

#### Example Usage:
```python
from statstoolkit.visualization import boxplot

MPG = [15, 18, 21, 24, 30]
origin = ['USA', 'Japan', 'Japan', 'USA', 'Europe']

boxplot(MPG, origin=origin, title="Example Boxplot by Car Origin", xlabel="Origin", ylabel="Miles per Gallon")
```
![Example Boxplot](examples/boxplot.png)

---

### 5. `scatterplot(x, y, z=None, symbol='o', title=None, xlabel=None, ylabel=None, color=None, figsize=(10, 6), **kwargs)`
Creates a scatter plot, optionally supporting 3D-like plots where a third variable `z` can be mapped to point sizes or colors. Marker symbols and other plot properties can be customized.

#### Example Usage (2D Scatter Plot):
```python
from statstoolkit.visualization import scatterplot

x = [1, 2, 3, 4]
y = [10, 20, 30, 40]

scatterplot(x, y, title="Example 2D Scatter Plot", xlabel="X-Axis", ylabel="Y-Axis", color="red")
```
![Example 2D Scatter Plot](examples/scatterplot_2d.png)

#### Example Usage (3D-like Scatter Plot):
```python
from statstoolkit.visualization import scatterplot_3d

x = [1, 2, 3, 4]
y = [10, 20, 30, 40]
z = [50, 100, 200, 300]

scatterplot_3d(x, y, z, symbol="o", title="Example 3D Scatter Plot", xlabel="X-Axis", ylabel="Y-Axis", zlabel="Z-Axis", color="blue")
```
![Example 3D Scatter Plot](examples/scatterplot_3d.png)

---
# Probability

### 1. `binopdf(k, n, p)`
Calculates the binomial probability mass function for given trials and success probability.

#### Example Usage:

```python
from statstoolkit.probability import binopdf

k = 3  # number of successes
n = 10  # number of trials
p = 0.5  # probability of success
print(binopdf(k, n, p))  # Output: probability of exactly 3 successes
```

### 2. `poisspdf(k, mu)`
Calculates the Poisson probability mass function for given mean number of events.

#### Example Usage:

```python
from statstoolkit.probability import poisspdf

k = 5  # number of events
mu = 3  # mean number of events
print(poisspdf(k, mu))  # Output: probability of exactly 5 events
```

### 3. `geopdf(k, p)`
Calculates the geometric probability mass function for the number of trials needed to get the first success.

#### Example Usage:

```python
from statstoolkit.probability import geopdf

k = 3  # number of trials
p = 0.5  # probability of success
print(geopdf(k, p))  # Output: probability of success on the 3rd trial
```

### 4. `nbinpdf(k, r, p)`
Calculates the negative binomial probability mass function, where `k` is the number of failures until `r` successes occur.

#### Example Usage:

```python
from statstoolkit.probability import nbinpdf

k = 4  # number of failures
r = 2  # number of successes
p = 0.5  # probability of success
print(nbinpdf(k, r, p))  # Output: probability of exactly 4 failures before 2 successes
```

### 5. `hygepdf(k, M, n, N)`
Calculates the hypergeometric probability mass function for the probability of drawing `k` successes from a population of `M`, with `n` successes in the sample.

#### Example Usage:

```python
from statstoolkit.probability import hygepdf

k = 3  # successes in sample
M = 50  # population size
n = 20  # sample size
N = 10  # number of successes in population
print(hygepdf(k, M, n, N))  # Output: probability of drawing 3 successes
```

### 6. `betapdf(x, a, b)`
Calculates the beta probability density function.

#### Example Usage:

```python
from statstoolkit.probability import betapdf

x = 0.5
a = 2  # shape parameter alpha
b = 2  # shape parameter beta
print(betapdf(x, a, b))  # Output: beta distribution density at x=0.5
```

### 7. `chi2pdf(x, df)`
Calculates the chi-squared probability density function.

#### Example Usage:

```python
from statstoolkit.probability import chi2pdf

x = 5
df = 3  # degrees of freedom
print(chi2pdf(x, df))  # Output: chi-squared density at x=5
```

### 8. `exppdf(x, scale)`
Calculates the exponential probability density function.

#### Example Usage:

```python
from statstoolkit.probability import exppdf

x = 2
scale = 1  # inverse of rate parameter lambda
print(exppdf(x, scale=scale))  # Output: exponential density at x=2
```

### 9. `fpdf(x, dfn, dfd)`
Calculates the F-distribution probability density function.

#### Example Usage:

```python
from statstoolkit.probability import fpdf

x = 1.5
dfn = 5  # degrees of freedom numerator
dfd = 2  # degrees of freedom denominator
print(fpdf(x, dfn, dfd))  # Output: F-distribution density at x=1.5
```

### 10. `normpdf(x, mu, sigma)`
Calculates the normal (Gaussian) probability density function.

#### Example Usage:

```python
from statstoolkit.probability import normpdf

x = 0
mu = 0  # mean
sigma = 1  # standard deviation
print(normpdf(x, mu, sigma))  # Output: normal density at x=0
```

### 11. `lognpdf(x, s, scale)`
Calculates the log-normal probability density function.

#### Example Usage:

```python
from statstoolkit.probability import lognpdf

x = 1.5
s = 0.5  # shape parameter
scale = 1  # scale parameter
print(lognpdf(x, s, scale=scale))  # Output: log-normal density at x=1.5
```

### 12. `tpdf(x, df)`
Calculates the Student's t probability density function.

#### Example Usage:

```python
from statstoolkit.probability import tpdf

x = 2
df = 10  # degrees of freedom
print(tpdf(x, df))  # Output: t-distribution density at x=2
```

### 13. `wblpdf(x, c, scale)`
Calculates the Weibull probability density function.

#### Example Usage:

```python
from statstoolkit.probability import wblpdf

x = 1.2
c = 2  # shape parameter
scale = 1  # scale parameter
print(wblpdf(x, c, scale=scale))  # Output: Weibull density at x=1.2
```

---

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "statsToolkit",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "statistics, data science, machine learning, data analysis, descriptive statistics, probability distributions, visualizations",
    "author": "Eng. Marco Schivo and Eng. Alberto Biscalchin",
    "author_email": "<biscalchin.mau.se@gmail.com>, <marcoschivo1@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/56/1f/7685f20be3308ae8022220589034ff54ea4c532b38df7e9ad96dd9581de2/statsToolkit-1.0.1.tar.gz",
    "platform": null,
    "description": "# Statistical Methods for Data Science (statsToolkit)\r\nThe project is sponsored by **Malm\u00f6 Universitet** developed by Eng. Marco Schivo and Eng. Alberto Biscalchin under the supervision of Associete Professor Yuanji Cheng and is released under the **MIT License**. It is open source and available for anyone to use and contribute to.\r\n\r\nInternal course Code reference: MA660E\r\n\r\n---\r\n\r\n# Descriptive Statistics\r\n\r\nThis module contains basic descriptive statistics functions that allow users to perform statistical analysis on numerical datasets. The functions are designed for flexibility and ease of use, and they provide essential statistical metrics such as mean, median, range, variance, standard deviation, and quantiles.\r\n\r\n## Functions Overview\r\n\r\n### 1. `mean(X)`\r\nCalculates the arithmetic mean (average) of a list of numbers.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.statistics import mean\r\n\r\ndata = [1, 2, 3, 4, 5]\r\nprint(mean(data))  # Output: 3.0\r\n```\r\n\r\n### 2. `median(X)`\r\nCalculates the median of a list of numbers. The median is the middle value that separates the higher half from the lower half of the dataset.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.statistics import median\r\n\r\ndata = [1, 2, 3, 4, 5]\r\nprint(median(data))  # Output: 3\r\n```\r\n\r\n### 3. `range_(X)`\r\nCalculates the range, which is the difference between the maximum and minimum values in the dataset.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.statistics import range_\r\n\r\ndata = [1, 2, 3, 4, 5]\r\nprint(range_(data))  # Output: 4\r\n```\r\n\r\n### 4. `var(X, ddof=0)`\r\nCalculates the variance of the dataset. Variance measures the spread of the data from the mean. You can calculate both population variance (`ddof=0`) or sample variance (`ddof=1`).\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.statistics import var\r\n\r\ndata = [1, 2, 3, 4, 5]\r\nprint(var(data))  # Output: 2.0  (Population variance)\r\nprint(var(data, ddof=1))  # Output: 2.5  (Sample variance)\r\n```\r\n\r\n### 5. `std(X, ddof=0)`\r\nCalculates the standard deviation, which is the square root of the variance. It indicates how much the data varies from the mean.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.statistics import std\r\n\r\ndata = [1, 2, 3, 4, 5]\r\nprint(std(data))  # Output: 1.4142135623730951  (Population standard deviation)\r\nprint(std(data, ddof=1))  # Output: 1.5811388300841898  (Sample standard deviation)\r\n```\r\n\r\n### 6. `quantile(X, Q)`\r\nCalculates the quantile, which is the value below which a given percentage of the data falls. For example, the 0.25 quantile is the first quartile (25th percentile).\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.statistics import quantile\r\n\r\ndata = [1, 2, 3, 4, 5]\r\nprint(quantile(data, 0.25))  # Output: 2.0 (25th percentile)\r\nprint(quantile(data, 0.5))  # Output: 3.0 (Median)\r\nprint(quantile(data, 0.75))  # Output: 4.0 (75th percentile)\r\n```\r\n\r\n### 7. `corrcoef(x, y, alternative='two-sided', method=None)`\r\nCalculates the Pearson correlation coefficient between two datasets and provides the p-value for testing non-correlation.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.statistics import corrcoef\r\n\r\nx = [1, 2, 3, 4, 5]\r\ny = [2, 4, 6, 8, 10]\r\nR, P = corrcoef(x, y)\r\nprint(\"Correlation coefficient:\", R)\r\nprint(\"P-value matrix:\", P)\r\n```\r\n\r\n### 8. `partialcorr(X, columns=None)`\r\nComputes the partial correlation matrix, controlling for the influence of all other variables.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.statistics import partialcorr\r\nimport pandas as pd\r\n\r\ndata = pd.DataFrame({\r\n    'A': [1, 2, 3, 4, 5],\r\n    'B': [2, 4, 6, 8, 10],\r\n    'C': [5, 6, 7, 8, 9]\r\n})\r\npartial_corr_matrix = partialcorr(data)\r\nprint(partial_corr_matrix)\r\n```\r\n\r\n### 9. `cov(data1, data2=None)`\r\nCalculates the covariance matrix between two datasets or within a single dataset.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.statistics import cov\r\n\r\ndata1 = [1, 2, 3, 4]\r\ndata2 = [2, 4, 6, 8]\r\nprint(cov(data1, data2))\r\n```\r\n\r\n### 10. `fitlm(x, y)`\r\nPerforms simple linear regression of `y` on `x`, returning a dictionary of regression results.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.statistics import fitlm\r\n\r\nx = [1, 2, 3, 4]\r\ny = [2, 4, 6, 8]\r\nresult = fitlm(x, y)\r\nprint(result)\r\n```\r\n\r\n### 11. `anova(y=None, factors=None, data=None, formula=None, response=None, sum_of_squares='type I')`\r\nPerforms one-way, two-way, or N-way Analysis of Variance (ANOVA) on data, supporting custom models.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.statistics import anova\r\nimport pandas as pd\r\n\r\ndata = pd.DataFrame({\r\n    'y': [23, 25, 20, 21],\r\n    'A': ['High', 'Low', 'High', 'Low'],\r\n    'B': ['Type1', 'Type2', 'Type1', 'Type2']\r\n})\r\nresult = anova(y='y', data=data, formula='y ~ A + B + A:B')\r\nprint(result)\r\n```\r\n\r\n### 12. `kruskalwallis(x, group=None, displayopt=False)`\r\nPerforms the Kruskal-Wallis H-test for independent samples, a non-parametric alternative to one-way ANOVA.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.statistics import kruskalwallis\r\n\r\nx = [1.2, 3.4, 5.6, 1.1, 3.6, 5.5]\r\ngroup = ['A', 'A', 'A', 'B', 'B', 'B']\r\np_value, anova_table, stats = kruskalwallis(x, group=group, displayopt=True)\r\nprint(\"P-value:\", p_value)\r\nprint(anova_table)\r\n```\r\n\r\n---\r\n\r\n# Visualization Functions\r\n\r\nThis module contains several flexible visualization functions built using **Matplotlib** and **Seaborn**, allowing users to generate commonly used plots such as bar charts, pie charts, histograms, boxplots, and scatter plots. The visualizations are designed for customization, giving the user control over various parameters such as color, labels, figure size, and more.\r\n\r\n## Functions Overview\r\n\r\n### 1. `bar_chart(x, y, title=None, xlabel=None, ylabel=None, color=None, figsize=(10, 6), **kwargs)`\r\nCreates a bar chart with customizable x-axis labels, y-axis values, title, colors, and more.\r\n\r\n#### Example Usage:\r\n```python\r\nfrom statstoolkit.visualization import bar_chart\r\n\r\ncategories = ['Category A', 'Category B', 'Category C']\r\nvalues = [10, 20, 30]\r\n\r\nbar_chart(categories, values, title=\"Example Bar Chart\", xlabel=\"Category\", ylabel=\"Values\", color=\"blue\")\r\n```\r\n![Example Bar Chart](examples/bar_chart.png)\r\n\r\n---\r\n\r\n### 2. `pie_chart(sizes, labels=None, title=None, colors=None, explode=None, autopct='%1.1f%%', shadow=False, startangle=90, **kwargs)`\r\nCreates a pie chart with options for custom labels, colors, explode effect, and more.\r\n\r\n#### Example Usage:\r\n```python\r\nfrom statstoolkit.visualization import pie_chart\r\n\r\nsizes = [15, 30, 45, 10]\r\nlabels = ['Category A', 'Category B', 'Category C', 'Category D']\r\n\r\npie_chart(sizes, labels=labels, title=\"Example Pie Chart\", autopct='%1.1f%%', shadow=True)\r\n```\r\n![Example Pie Chart](examples/pie_chart.png)\r\n\r\n---\r\n\r\n### 3. `histogram(x, bins=10, title=None, xlabel=None, ylabel=None, color=None, figsize=(10, 6), **kwargs)`\r\nCreates a histogram for visualizing the frequency distribution of data. The number of bins or bin edges can be adjusted.\r\n\r\n#### Example Usage:\r\n```python\r\nfrom statstoolkit.visualization import histogram\r\n\r\ndata = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]\r\n\r\nhistogram(data, bins=4, title=\"Example Histogram\", xlabel=\"Value\", ylabel=\"Frequency\", color=\"green\")\r\n```\r\n![Example Histogram](examples/histogram.png)\r\n\r\n---\r\n\r\n### 4. `boxplot(MPG, origin=None, title=None, xlabel=None, ylabel=None, color=None, figsize=(10, 6), **kwargs)`\r\nCreates a boxplot for visualizing the distribution of a dataset. It supports optional grouping variables, such as plotting the distribution of MPG (miles per gallon) by car origin.\r\n\r\n#### Example Usage:\r\n```python\r\nfrom statstoolkit.visualization import boxplot\r\n\r\nMPG = [15, 18, 21, 24, 30]\r\norigin = ['USA', 'Japan', 'Japan', 'USA', 'Europe']\r\n\r\nboxplot(MPG, origin=origin, title=\"Example Boxplot by Car Origin\", xlabel=\"Origin\", ylabel=\"Miles per Gallon\")\r\n```\r\n![Example Boxplot](examples/boxplot.png)\r\n\r\n---\r\n\r\n### 5. `scatterplot(x, y, z=None, symbol='o', title=None, xlabel=None, ylabel=None, color=None, figsize=(10, 6), **kwargs)`\r\nCreates a scatter plot, optionally supporting 3D-like plots where a third variable `z` can be mapped to point sizes or colors. Marker symbols and other plot properties can be customized.\r\n\r\n#### Example Usage (2D Scatter Plot):\r\n```python\r\nfrom statstoolkit.visualization import scatterplot\r\n\r\nx = [1, 2, 3, 4]\r\ny = [10, 20, 30, 40]\r\n\r\nscatterplot(x, y, title=\"Example 2D Scatter Plot\", xlabel=\"X-Axis\", ylabel=\"Y-Axis\", color=\"red\")\r\n```\r\n![Example 2D Scatter Plot](examples/scatterplot_2d.png)\r\n\r\n#### Example Usage (3D-like Scatter Plot):\r\n```python\r\nfrom statstoolkit.visualization import scatterplot_3d\r\n\r\nx = [1, 2, 3, 4]\r\ny = [10, 20, 30, 40]\r\nz = [50, 100, 200, 300]\r\n\r\nscatterplot_3d(x, y, z, symbol=\"o\", title=\"Example 3D Scatter Plot\", xlabel=\"X-Axis\", ylabel=\"Y-Axis\", zlabel=\"Z-Axis\", color=\"blue\")\r\n```\r\n![Example 3D Scatter Plot](examples/scatterplot_3d.png)\r\n\r\n---\r\n# Probability\r\n\r\n### 1. `binopdf(k, n, p)`\r\nCalculates the binomial probability mass function for given trials and success probability.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.probability import binopdf\r\n\r\nk = 3  # number of successes\r\nn = 10  # number of trials\r\np = 0.5  # probability of success\r\nprint(binopdf(k, n, p))  # Output: probability of exactly 3 successes\r\n```\r\n\r\n### 2. `poisspdf(k, mu)`\r\nCalculates the Poisson probability mass function for given mean number of events.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.probability import poisspdf\r\n\r\nk = 5  # number of events\r\nmu = 3  # mean number of events\r\nprint(poisspdf(k, mu))  # Output: probability of exactly 5 events\r\n```\r\n\r\n### 3. `geopdf(k, p)`\r\nCalculates the geometric probability mass function for the number of trials needed to get the first success.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.probability import geopdf\r\n\r\nk = 3  # number of trials\r\np = 0.5  # probability of success\r\nprint(geopdf(k, p))  # Output: probability of success on the 3rd trial\r\n```\r\n\r\n### 4. `nbinpdf(k, r, p)`\r\nCalculates the negative binomial probability mass function, where `k` is the number of failures until `r` successes occur.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.probability import nbinpdf\r\n\r\nk = 4  # number of failures\r\nr = 2  # number of successes\r\np = 0.5  # probability of success\r\nprint(nbinpdf(k, r, p))  # Output: probability of exactly 4 failures before 2 successes\r\n```\r\n\r\n### 5. `hygepdf(k, M, n, N)`\r\nCalculates the hypergeometric probability mass function for the probability of drawing `k` successes from a population of `M`, with `n` successes in the sample.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.probability import hygepdf\r\n\r\nk = 3  # successes in sample\r\nM = 50  # population size\r\nn = 20  # sample size\r\nN = 10  # number of successes in population\r\nprint(hygepdf(k, M, n, N))  # Output: probability of drawing 3 successes\r\n```\r\n\r\n### 6. `betapdf(x, a, b)`\r\nCalculates the beta probability density function.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.probability import betapdf\r\n\r\nx = 0.5\r\na = 2  # shape parameter alpha\r\nb = 2  # shape parameter beta\r\nprint(betapdf(x, a, b))  # Output: beta distribution density at x=0.5\r\n```\r\n\r\n### 7. `chi2pdf(x, df)`\r\nCalculates the chi-squared probability density function.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.probability import chi2pdf\r\n\r\nx = 5\r\ndf = 3  # degrees of freedom\r\nprint(chi2pdf(x, df))  # Output: chi-squared density at x=5\r\n```\r\n\r\n### 8. `exppdf(x, scale)`\r\nCalculates the exponential probability density function.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.probability import exppdf\r\n\r\nx = 2\r\nscale = 1  # inverse of rate parameter lambda\r\nprint(exppdf(x, scale=scale))  # Output: exponential density at x=2\r\n```\r\n\r\n### 9. `fpdf(x, dfn, dfd)`\r\nCalculates the F-distribution probability density function.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.probability import fpdf\r\n\r\nx = 1.5\r\ndfn = 5  # degrees of freedom numerator\r\ndfd = 2  # degrees of freedom denominator\r\nprint(fpdf(x, dfn, dfd))  # Output: F-distribution density at x=1.5\r\n```\r\n\r\n### 10. `normpdf(x, mu, sigma)`\r\nCalculates the normal (Gaussian) probability density function.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.probability import normpdf\r\n\r\nx = 0\r\nmu = 0  # mean\r\nsigma = 1  # standard deviation\r\nprint(normpdf(x, mu, sigma))  # Output: normal density at x=0\r\n```\r\n\r\n### 11. `lognpdf(x, s, scale)`\r\nCalculates the log-normal probability density function.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.probability import lognpdf\r\n\r\nx = 1.5\r\ns = 0.5  # shape parameter\r\nscale = 1  # scale parameter\r\nprint(lognpdf(x, s, scale=scale))  # Output: log-normal density at x=1.5\r\n```\r\n\r\n### 12. `tpdf(x, df)`\r\nCalculates the Student's t probability density function.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.probability import tpdf\r\n\r\nx = 2\r\ndf = 10  # degrees of freedom\r\nprint(tpdf(x, df))  # Output: t-distribution density at x=2\r\n```\r\n\r\n### 13. `wblpdf(x, c, scale)`\r\nCalculates the Weibull probability density function.\r\n\r\n#### Example Usage:\r\n\r\n```python\r\nfrom statstoolkit.probability import wblpdf\r\n\r\nx = 1.2\r\nc = 2  # shape parameter\r\nscale = 1  # scale parameter\r\nprint(wblpdf(x, c, scale=scale))  # Output: Weibull density at x=1.2\r\n```\r\n\r\n---\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n---\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Statistical Methods for Data Science Toolkit",
    "version": "1.0.1",
    "project_urls": null,
    "split_keywords": [
        "statistics",
        " data science",
        " machine learning",
        " data analysis",
        " descriptive statistics",
        " probability distributions",
        " visualizations"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2cccaefee6640586544c721c0ffff54d7a9f25b853510aa316e2a62c5f8ab53e",
                "md5": "1e86c54de53daaa099adbe726f0e7877",
                "sha256": "42f717d0e86fcaa35e990625a82adaf83cf7b4d2b13a66544c6748c829b2a842"
            },
            "downloads": -1,
            "filename": "statsToolkit-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1e86c54de53daaa099adbe726f0e7877",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 29055,
            "upload_time": "2024-10-28T12:35:46",
            "upload_time_iso_8601": "2024-10-28T12:35:46.368736Z",
            "url": "https://files.pythonhosted.org/packages/2c/cc/aefee6640586544c721c0ffff54d7a9f25b853510aa316e2a62c5f8ab53e/statsToolkit-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "561f7685f20be3308ae8022220589034ff54ea4c532b38df7e9ad96dd9581de2",
                "md5": "48913542c55132ef7bd9c47b30c2469d",
                "sha256": "43773c01afa900f7ba20b633c1d8d1a968207886d97309e7beac3ab69d5050f1"
            },
            "downloads": -1,
            "filename": "statsToolkit-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "48913542c55132ef7bd9c47b30c2469d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 28545,
            "upload_time": "2024-10-28T12:35:48",
            "upload_time_iso_8601": "2024-10-28T12:35:48.046349Z",
            "url": "https://files.pythonhosted.org/packages/56/1f/7685f20be3308ae8022220589034ff54ea4c532b38df7e9ad96dd9581de2/statsToolkit-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-28 12:35:48",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "statstoolkit"
}

Eng. Marco Schivo and Eng. Alberto Biscalchin