# Data Analysis Toolkit
[![Upload Python Package](https://github.com/thomasthaddeus/DataAnalysisToolkit/actions/workflows/python-publish.yml/badge.svg)](https://github.com/thomasthaddeus/DataAnalysisToolkit/actions/workflows/python-publish.yml) [![PyPI](https://img.shields.io/pypi/v/DataAnalysisToolkit.svg)](https://pypi.org/project/DataAnalysisToolkit/) ![License](https://img.shields.io/github/license/thomasthaddeus/DataAnalysisToolkit.svg) ![Python Version](https://img.shields.io/pypi/pyversions/DataAnalysisToolkit.svg) ![Code Size](https://img.shields.io/github/languages/code-size/thomasthaddeus/DataAnalysisToolkit.svg) ![Last Commit](https://img.shields.io/github/last-commit/thomasthaddeus/DataAnalysisToolkit.svg) ![Issues](https://img.shields.io/github/issues-raw/thomasthaddeus/DataAnalysisToolkit.svg) ![Pull Requests](https://img.shields.io/github/issues-pr/thomasthaddeus/DataAnalysisToolkit.svg) [![Documentation Status](https://readthedocs.org/projects/dataanalysistoolkit/badge/?version=latest)](https://dataanalysistoolkit.readthedocs.io/en/latest/?badge=latest)
DataAnalysisToolkit is a comprehensive Python package offering a suite of tools designed for efficient data analysis. This toolkit simplifies tasks such as loading CSV data, performing statistical analysis, cleaning data, and visualizing results. It's an ideal tool for data analysts, scientists, and anyone looking to dive into data exploration and machine learning.
## Features
- **Data Loading**: Load data directly from CSV files into a Python environment.
- **Statistical Analysis**: Perform calculations like mean, median, mode, and trimmed mean.
- **Outlier Detection**: Identify outliers using the z-score method.
- **Data Cleaning**: Handle missing values, drop duplicates, and encode categorical data.
- **Data Splitting**: Easily split data into training and testing sets for machine learning models.
- **Data Visualization**: Create histograms and other plots to explore data visually.
- **Data Export**: Export cleaned and processed data back into CSV format.
## Enhanced Functionalities
- **Advanced Visualization**: Utilize a dedicated visualizer for creating a variety of insightful data plots.
- **Feature Engineering**: Enhance your data with new, informative features.
- **Model Evaluation**: Assess the performance of machine learning models.
- **Report Generation**: Automatically generate comprehensive HTML reports with summaries and visualizations.
- **Data Imputation**: Implement advanced imputation techniques to handle missing data.
This toolkit is an asset for conducting preliminary data analysis, and it seamlessly integrates into larger data processing workflows.
## Getting Started
Here's how you can get started with DataAnalysisToolkit:
<!-- TODO: This coding example needs updated to the most recent release of the package. -->
```python
from data_analysis_toolkit import DataAnalysisToolkit
# Initialize the analyzer with the path to a CSV file
analyzer = DataAnalysisToolkit('../data/test.csv')
# Calculate the mean, median, mode, and trimmed mean of a column
statistics = analyzer.calculate_budget_statistics('column_name')
print(statistics)
# Detect outliers in a column using the z-score method
outliers = analyzer.detect_outliers('column_name')
print(outliers)
# Handle missing values in a column
analyzer.handle_missing_values('column_name', strategy='fill', fill_value=0)
# Drop duplicate rows in the DataFrame
analyzer.drop_duplicates()
# Encode categorical features in the DataFrame
analyzer.encode_categorical_features()
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = analyzer.split_data('target_column')
# Plot a histogram of a column
analyzer.plot_data('column_name')
# Export the data to a CSV file
analyzer.export_data('new_file.csv')
```
## Installation
Install DataAnalysisToolkit using pip:
```bash
pip install dataanalysistoolkit
```
## Documentation
For detailed documentation, examples, and usage guides, please visit [DataAnalysisToolkit Documentation](https://dataanalysistoolkit.readthedocs.io/en/latest/).
## Contributing
Contributions are welcome! For guidelines on how to contribute, please refer to our [Contribution Guide](https://github.com/thomasthaddeus/DataAnalysisToolkit/CONTRIBUTING.md).
## License
DataAnalysisToolkit is open-sourced under the MIT License. For more details, see the [LICENSE](./LICENSE) file.
---
Developed with ❤ by the DataAnalysisToolkit Team.
Raw data
{
"_id": null,
"home_page": "https://github.com/thomasthaddeus/dataanalysistoolkit",
"name": "DataAnalysisToolkit",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "data analysis, CSV, statistics, data cleaning, data visualization",
"author": "Thaddeus Thomas",
"author_email": "thaddeus.r.thomas@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/88/6b/d73bcf92b3afbfb76e29382fc3bec865e9a85750d920010a620ab08c0ca1/dataanalysistoolkit-1.2.2.tar.gz",
"platform": null,
"description": "# Data Analysis Toolkit\n\n[![Upload Python Package](https://github.com/thomasthaddeus/DataAnalysisToolkit/actions/workflows/python-publish.yml/badge.svg)](https://github.com/thomasthaddeus/DataAnalysisToolkit/actions/workflows/python-publish.yml) [![PyPI](https://img.shields.io/pypi/v/DataAnalysisToolkit.svg)](https://pypi.org/project/DataAnalysisToolkit/) ![License](https://img.shields.io/github/license/thomasthaddeus/DataAnalysisToolkit.svg) ![Python Version](https://img.shields.io/pypi/pyversions/DataAnalysisToolkit.svg) ![Code Size](https://img.shields.io/github/languages/code-size/thomasthaddeus/DataAnalysisToolkit.svg) ![Last Commit](https://img.shields.io/github/last-commit/thomasthaddeus/DataAnalysisToolkit.svg) ![Issues](https://img.shields.io/github/issues-raw/thomasthaddeus/DataAnalysisToolkit.svg) ![Pull Requests](https://img.shields.io/github/issues-pr/thomasthaddeus/DataAnalysisToolkit.svg) [![Documentation Status](https://readthedocs.org/projects/dataanalysistoolkit/badge/?version=latest)](https://dataanalysistoolkit.readthedocs.io/en/latest/?badge=latest)\n\nDataAnalysisToolkit is a comprehensive Python package offering a suite of tools designed for efficient data analysis. This toolkit simplifies tasks such as loading CSV data, performing statistical analysis, cleaning data, and visualizing results. It's an ideal tool for data analysts, scientists, and anyone looking to dive into data exploration and machine learning.\n\n## Features\n\n- **Data Loading**: Load data directly from CSV files into a Python environment.\n- **Statistical Analysis**: Perform calculations like mean, median, mode, and trimmed mean.\n- **Outlier Detection**: Identify outliers using the z-score method.\n- **Data Cleaning**: Handle missing values, drop duplicates, and encode categorical data.\n- **Data Splitting**: Easily split data into training and testing sets for machine learning models.\n- **Data Visualization**: Create histograms and other plots to explore data visually.\n- **Data Export**: Export cleaned and processed data back into CSV format.\n\n## Enhanced Functionalities\n\n- **Advanced Visualization**: Utilize a dedicated visualizer for creating a variety of insightful data plots.\n- **Feature Engineering**: Enhance your data with new, informative features.\n- **Model Evaluation**: Assess the performance of machine learning models.\n- **Report Generation**: Automatically generate comprehensive HTML reports with summaries and visualizations.\n- **Data Imputation**: Implement advanced imputation techniques to handle missing data.\n\nThis toolkit is an asset for conducting preliminary data analysis, and it seamlessly integrates into larger data processing workflows.\n\n## Getting Started\n\nHere's how you can get started with DataAnalysisToolkit:\n\n<!-- TODO: This coding example needs updated to the most recent release of the package. -->\n\n```python\nfrom data_analysis_toolkit import DataAnalysisToolkit\n\n# Initialize the analyzer with the path to a CSV file\nanalyzer = DataAnalysisToolkit('../data/test.csv')\n\n\n# Calculate the mean, median, mode, and trimmed mean of a column\nstatistics = analyzer.calculate_budget_statistics('column_name')\nprint(statistics)\n\n# Detect outliers in a column using the z-score method\noutliers = analyzer.detect_outliers('column_name')\nprint(outliers)\n\n# Handle missing values in a column\nanalyzer.handle_missing_values('column_name', strategy='fill', fill_value=0)\n\n# Drop duplicate rows in the DataFrame\nanalyzer.drop_duplicates()\n\n# Encode categorical features in the DataFrame\nanalyzer.encode_categorical_features()\n\n# Split the data into training and testing sets\nX_train, X_test, y_train, y_test = analyzer.split_data('target_column')\n\n# Plot a histogram of a column\nanalyzer.plot_data('column_name')\n\n# Export the data to a CSV file\nanalyzer.export_data('new_file.csv')\n```\n\n## Installation\n\nInstall DataAnalysisToolkit using pip:\n\n```bash\npip install dataanalysistoolkit\n```\n\n## Documentation\n\nFor detailed documentation, examples, and usage guides, please visit [DataAnalysisToolkit Documentation](https://dataanalysistoolkit.readthedocs.io/en/latest/).\n\n## Contributing\n\nContributions are welcome! For guidelines on how to contribute, please refer to our [Contribution Guide](https://github.com/thomasthaddeus/DataAnalysisToolkit/CONTRIBUTING.md).\n\n## License\n\nDataAnalysisToolkit is open-sourced under the MIT License. For more details, see the [LICENSE](./LICENSE) file.\n\n---\n\nDeveloped with \u2764 by the DataAnalysisToolkit Team.\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "The DataAnalysisToolkit project is a Python-based data analysis tool designed to streamline various data analysis tasks. It allows users to load data from CSV files and perform operations such as statistical calculations, outlier detection, data cleaning, and visualization.",
"version": "1.2.2",
"project_urls": {
"Documentation": "https://dataanalysistoolkit.readthedocs.io/en/latest/",
"Homepage": "https://github.com/thomasthaddeus/dataanalysistoolkit",
"Repository": "https://github.com/thomasthaddeus/dataanalysistoolkit"
},
"split_keywords": [
"data analysis",
" csv",
" statistics",
" data cleaning",
" data visualization"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f4eb81fcf52d2347049ede1328c2429164076a290a1466b56fef488491fbddd2",
"md5": "9bb9f8f4b94c0234b5f9fceb291db384",
"sha256": "c58a38fd5f1a8f438c6392769c87b2a87291d78984054cebb838bdc3066d7212"
},
"downloads": -1,
"filename": "dataanalysistoolkit-1.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9bb9f8f4b94c0234b5f9fceb291db384",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 67813,
"upload_time": "2024-05-09T04:56:53",
"upload_time_iso_8601": "2024-05-09T04:56:53.038930Z",
"url": "https://files.pythonhosted.org/packages/f4/eb/81fcf52d2347049ede1328c2429164076a290a1466b56fef488491fbddd2/dataanalysistoolkit-1.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "886bd73bcf92b3afbfb76e29382fc3bec865e9a85750d920010a620ab08c0ca1",
"md5": "9f683c9078d1343979aae6abf453a037",
"sha256": "800964229bbc5c911aaf52e4f6ee61d84f27e75a89d2ead2a9c470b5992b8f5b"
},
"downloads": -1,
"filename": "dataanalysistoolkit-1.2.2.tar.gz",
"has_sig": false,
"md5_digest": "9f683c9078d1343979aae6abf453a037",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 61778,
"upload_time": "2024-05-09T04:56:55",
"upload_time_iso_8601": "2024-05-09T04:56:55.341509Z",
"url": "https://files.pythonhosted.org/packages/88/6b/d73bcf92b3afbfb76e29382fc3bec865e9a85750d920010a620ab08c0ca1/dataanalysistoolkit-1.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-09 04:56:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "thomasthaddeus",
"github_project": "dataanalysistoolkit",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "dataanalysistoolkit"
}