# SimpleEDA
SimpleEDA is a Python library for simple exploratory data analysis tasks. It provides functions to handle outliers, find special characters, calculate Variance Inflation Factor (VIF), detect duplicates, and visualize continuous data using box plots.
## Installation
You can install SimpleEDA via pip:
```bash
pip install SimpleEDA
```
## Usage
Below are examples of how to use the various functions provided by SimpleEDA.
### Importing the Library
```python
import SimpleEDA as eda
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 2, 4, 5, 6, 7, 8, 9, 10],
'B': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
'C': [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
'D': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
})
```
### remove_outlier
This function removes outliers from a column based on the Interquartile Range (IQR) method.
```python
lower, upper = eda.remove_outlier(df['A'])
print(f"Lower bound: {lower}, Upper bound: {upper}")
```
**Parameters:**
- `col` (pd.Series): The column from which to remove outliers.
- `multiplier` (float): The multiplier for the IQR to define outliers. Default is 1.5.
**Returns:**
- `tuple`: Lower and upper range for outlier detection.
### find_specialchar
This function finds special characters in a DataFrame.
```python
eda.find_specialchar(df)
```
**Parameters:**
- `df` (pd.DataFrame): The DataFrame to check.
**Returns:**
- None
### vif_cal
This function calculates the Variance Inflation Factor (VIF) for each feature in the DataFrame.
```python
eda.vif_cal(df[['A', 'B', 'C']])
```
**Parameters:**
- `input_data` (pd.DataFrame): The DataFrame for which to calculate VIF.
**Returns:**
- None
### dups
This function shows a duplicate summary of a DataFrame.
```python
eda.dups(df)
```
**Parameters:**
- `df` (pd.DataFrame): The DataFrame to check for duplicates.
**Returns:**
- None
### boxplt_continous
This function plots boxplots for all continuous features in the DataFrame.
```python
eda.boxplt_continous(df)
```
**Parameters:**
- `df` (pd.DataFrame): The DataFrame to plot.
**Returns:**
- None
## Example
Here's a complete example of using SimpleEDA with a sample DataFrame:
```python
import SimpleEDA as eda
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 2, 4, 5, 6, 7, 8, 9, 10],
'B': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
'C': [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
'D': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
})
# Remove outliers
lower, upper = eda.remove_outlier(df['A'])
print(f"Lower bound: {lower}, Upper bound: {upper}")
# Find special characters
eda.find_specialchar(df)
# Calculate VIF
eda.vif_cal(df[['A', 'B', 'C']])
# Detect duplicates
eda.dups(df)
# Plot boxplots for continuous features
eda.boxplt_continous(df)
```
### enhance_summary
Provides an enhanced summary of a pandas DataFrame, including custom percentiles, IQR, outliers, duplicates, missing values, and skewness. It also handles both numerical and categorical variables.
```python
summary = eda.enhance_summary(df, custom_percentiles=[5, 95])
print(summary)
```
### Parameters:
dataframe (pd.DataFrame): The DataFrame to summarize.
custom_percentiles (list, optional): A list of custom percentiles to include in the summary.
### Returns:
pd.DataFrame: DataFrame containing the enhanced summary statistics.
## Example
Here's a complete example of using SimplyEDA with a sample DataFrame:
```python
import SimplyEDA as eda
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 2, 4, 5, 6, 7, 8, 9, 10],
'B': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
'C': [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
'D': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
})
# Remove outliers
lower, upper = eda.remove_outlier(df['A'])
print(f"Lower bound: {lower}, Upper bound: {upper}")
# Find special characters
eda.find_specialchar(df)
# Calculate VIF
vif = eda.vif_cal(df[['A', 'B', 'C']])
print(vif)
# Detect duplicates
eda.dups(df)
# Plot boxplots for continuous features
eda.boxplt_continous(df)
# Enhanced summary
summary = eda.enhance_summary(df, custom_percentiles=[5, 95])
print(summary)
```
## Author
This project was created by M.R.Vijay Krishnan. You can reach me at [vijaykrishnanmr@gmail.com](mailto:vijaykrishnanmr@gmail.com).
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/kitranet/SimpleEDA",
"name": "SimplyEDA",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "EDA data-analysis",
"author": "M.R.Vijay Krishnan",
"author_email": "vijaykrishnanmr@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/33/dc/9c24bd46b3888ae6d98f632b770b40c49d2a69197ecaed6425ac86bd85e5/SimplyEDA-0.1.7.tar.gz",
"platform": null,
"description": "\r\n# SimpleEDA\r\n\r\nSimpleEDA is a Python library for simple exploratory data analysis tasks. It provides functions to handle outliers, find special characters, calculate Variance Inflation Factor (VIF), detect duplicates, and visualize continuous data using box plots.\r\n\r\n## Installation\r\n\r\nYou can install SimpleEDA via pip:\r\n\r\n```bash\r\npip install SimpleEDA\r\n```\r\n\r\n## Usage\r\n\r\nBelow are examples of how to use the various functions provided by SimpleEDA.\r\n\r\n### Importing the Library\r\n\r\n```python\r\nimport SimpleEDA as eda\r\nimport pandas as pd\r\n\r\n# Sample DataFrame\r\ndf = pd.DataFrame({\r\n 'A': [1, 2, 2, 4, 5, 6, 7, 8, 9, 10],\r\n 'B': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],\r\n 'C': [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],\r\n 'D': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']\r\n})\r\n```\r\n\r\n### remove_outlier\r\n\r\nThis function removes outliers from a column based on the Interquartile Range (IQR) method.\r\n\r\n```python\r\nlower, upper = eda.remove_outlier(df['A'])\r\nprint(f\"Lower bound: {lower}, Upper bound: {upper}\")\r\n```\r\n\r\n**Parameters:**\r\n- `col` (pd.Series): The column from which to remove outliers.\r\n- `multiplier` (float): The multiplier for the IQR to define outliers. Default is 1.5.\r\n\r\n**Returns:**\r\n- `tuple`: Lower and upper range for outlier detection.\r\n\r\n### find_specialchar\r\n\r\nThis function finds special characters in a DataFrame.\r\n\r\n```python\r\neda.find_specialchar(df)\r\n```\r\n\r\n**Parameters:**\r\n- `df` (pd.DataFrame): The DataFrame to check.\r\n\r\n**Returns:**\r\n- None\r\n\r\n### vif_cal\r\n\r\nThis function calculates the Variance Inflation Factor (VIF) for each feature in the DataFrame.\r\n\r\n```python\r\neda.vif_cal(df[['A', 'B', 'C']])\r\n```\r\n\r\n**Parameters:**\r\n- `input_data` (pd.DataFrame): The DataFrame for which to calculate VIF.\r\n\r\n**Returns:**\r\n- None\r\n\r\n### dups\r\n\r\nThis function shows a duplicate summary of a DataFrame.\r\n\r\n```python\r\neda.dups(df)\r\n```\r\n\r\n**Parameters:**\r\n- `df` (pd.DataFrame): The DataFrame to check for duplicates.\r\n\r\n**Returns:**\r\n- None\r\n\r\n### boxplt_continous\r\n\r\nThis function plots boxplots for all continuous features in the DataFrame.\r\n\r\n```python\r\neda.boxplt_continous(df)\r\n```\r\n\r\n**Parameters:**\r\n- `df` (pd.DataFrame): The DataFrame to plot.\r\n\r\n**Returns:**\r\n- None\r\n\r\n## Example\r\n\r\nHere's a complete example of using SimpleEDA with a sample DataFrame:\r\n\r\n```python\r\nimport SimpleEDA as eda\r\nimport pandas as pd\r\n\r\n# Sample DataFrame\r\ndf = pd.DataFrame({\r\n 'A': [1, 2, 2, 4, 5, 6, 7, 8, 9, 10],\r\n 'B': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],\r\n 'C': [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],\r\n 'D': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']\r\n})\r\n\r\n# Remove outliers\r\nlower, upper = eda.remove_outlier(df['A'])\r\nprint(f\"Lower bound: {lower}, Upper bound: {upper}\")\r\n\r\n# Find special characters\r\neda.find_specialchar(df)\r\n\r\n# Calculate VIF\r\neda.vif_cal(df[['A', 'B', 'C']])\r\n\r\n# Detect duplicates\r\neda.dups(df)\r\n\r\n# Plot boxplots for continuous features\r\neda.boxplt_continous(df)\r\n```\r\n### enhance_summary\r\nProvides an enhanced summary of a pandas DataFrame, including custom percentiles, IQR, outliers, duplicates, missing values, and skewness. It also handles both numerical and categorical variables.\r\n```python\r\nsummary = eda.enhance_summary(df, custom_percentiles=[5, 95])\r\nprint(summary)\r\n```\r\n### Parameters:\r\n\r\ndataframe (pd.DataFrame): The DataFrame to summarize.\r\ncustom_percentiles (list, optional): A list of custom percentiles to include in the summary.\r\n### Returns:\r\n\r\npd.DataFrame: DataFrame containing the enhanced summary statistics.\r\n## Example\r\nHere's a complete example of using SimplyEDA with a sample DataFrame:\r\n\r\n```python\r\nimport SimplyEDA as eda\r\nimport pandas as pd\r\n\r\n# Sample DataFrame\r\ndf = pd.DataFrame({\r\n 'A': [1, 2, 2, 4, 5, 6, 7, 8, 9, 10],\r\n 'B': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],\r\n 'C': [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],\r\n 'D': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']\r\n})\r\n\r\n# Remove outliers\r\nlower, upper = eda.remove_outlier(df['A'])\r\nprint(f\"Lower bound: {lower}, Upper bound: {upper}\")\r\n\r\n# Find special characters\r\neda.find_specialchar(df)\r\n\r\n# Calculate VIF\r\nvif = eda.vif_cal(df[['A', 'B', 'C']])\r\nprint(vif)\r\n\r\n# Detect duplicates\r\neda.dups(df)\r\n\r\n# Plot boxplots for continuous features\r\neda.boxplt_continous(df)\r\n\r\n# Enhanced summary\r\nsummary = eda.enhance_summary(df, custom_percentiles=[5, 95])\r\nprint(summary)\r\n\r\n```\r\n## Author\r\n\r\nThis project was created by M.R.Vijay Krishnan. You can reach me at [vijaykrishnanmr@gmail.com](mailto:vijaykrishnanmr@gmail.com).\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A simple library for exploratory data analysis",
"version": "0.1.7",
"project_urls": {
"Homepage": "https://github.com/kitranet/SimpleEDA"
},
"split_keywords": [
"eda",
"data-analysis"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d89ed518481a4b33c1f7f43eaa191167b5dade1e98f75ae02c662c403890f2d8",
"md5": "52e0077eecf61fd1aef5d78396c1ec75",
"sha256": "050e71f1aee2503e85d4e999b6b6a7b2207f7e4b57beb964ff8aca7d8f955a54"
},
"downloads": -1,
"filename": "SimplyEDA-0.1.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "52e0077eecf61fd1aef5d78396c1ec75",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 4849,
"upload_time": "2024-06-23T17:52:36",
"upload_time_iso_8601": "2024-06-23T17:52:36.688488Z",
"url": "https://files.pythonhosted.org/packages/d8/9e/d518481a4b33c1f7f43eaa191167b5dade1e98f75ae02c662c403890f2d8/SimplyEDA-0.1.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "33dc9c24bd46b3888ae6d98f632b770b40c49d2a69197ecaed6425ac86bd85e5",
"md5": "a5373330e061c067bbac796b81264f2a",
"sha256": "f7b693c7a4d1c79cdde43c65c05445977401d077cb1cd8038394c2f197ebb6eb"
},
"downloads": -1,
"filename": "SimplyEDA-0.1.7.tar.gz",
"has_sig": false,
"md5_digest": "a5373330e061c067bbac796b81264f2a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 5588,
"upload_time": "2024-06-23T17:52:38",
"upload_time_iso_8601": "2024-06-23T17:52:38.113856Z",
"url": "https://files.pythonhosted.org/packages/33/dc/9c24bd46b3888ae6d98f632b770b40c49d2a69197ecaed6425ac86bd85e5/SimplyEDA-0.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-23 17:52:38",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kitranet",
"github_project": "SimpleEDA",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "simplyeda"
}