# datahelp
datahelp is a Python library designed to assist data science and data analysis teams in their workflow. It provides various utility functions and tools to streamline common data science tasks.
## Features
datahelp offers the following key features:
1. **Project Management:** datahelp simplifies the creation of standard data science project structures. With a single function call, you can generate a well-organized project directory with predefined folders for datasets, processed data, raw data, outputs, models, scripts, notebooks, and more.
2. **Model Saving and Loading:** datahelp provides easy-to-use functions for saving and loading trained machine learning models. It supports various formats such as joblib, pickle, and keras, enabling seamless integration with different model types.
3. **Data Exploration and Visualization:** The library includes functions for data exploration, summary statistics, and visualization. You can quickly generate feature vi plots and visualize missing data to gain insights into your datasets.
4. **Feature Engineering:** datahelp includes methods for handling missing data and noise in your datasets. It offers functions for dropping missing columns based on a specified threshold and detecting outliers using Tukey's Interquartile Range (IQR) method.
5. **Model Evaluation and Cross-Validation:** datahelp provides tools to evaluate model performance, including functions to calculate accuracy, F1-score, precision, recall, and generate classification reports. It also supports cross-validation for model evaluation.
6. **Scaling and Normalization:** The library offers functions for min-max scaling and z-score normalization of data to bring features to a common scale.
## Quickstart
To use datahelp in your data science projects, you can install it via pip:
```bash
pip install datahelp
```
Once installed, you can import the library and explore its functionality:
```python
import datahelp as dh # import the datahelp library
df = pd.read_csv("data/iris.csv") # load iris dataset
df.head()
cats = dh.eda.get_cat_vars(df)
print(cats)
num_var = dh.eda.get_num_vars(df)
print(num_var)
cat_count = dh.eda.get_cat_counts(df)
cat_count
missing = dh.eda.display_missing(df)
missing
```
## Lins
Project: https://github.com/kimxons/datahelp
PyPi: https://pypi.python.org/pypi/dataehlp/
## Documentation
For detailed usage instructions and API reference, please refer to the official documentation at [https://datahelp-docs.example.com](https://datahelp-docs.example.com)
## Contribution
datahelp is an open-source project, and we welcome contributions from the data science community. If you find a bug, have a feature request, or want to contribute improvements, please open an issue or submit a pull request on our GitHub repository at [https://github.com/datahelp/datahelp](https://github.com/datahelp/datahelp).
## License
datahelp is licensed under the MIT License. See the [LICENSE](https://github.com/datahelp/datahelp/blob/main/LICENSE) file for more details.
## Contact
If you have any questions or feedback, feel free to reach out to our support team at dev.kitonga@gmail.com or join our community forum at [https://community.datahelp.com](https://community.datahelp.com). We are here to assist you in making your data science journey smooth and successful!
Raw data
{
"_id": null,
"home_page": "https://github.com/kimxons/datahelp",
"name": "datahelp",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "",
"author": "Meshack Kitonga",
"author_email": "dev.kitonga@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d8/f1/d5302c2685c42ffed4061d183c9188126e925d243c3884db4d16e0b49185/datahelp-1.0.0.tar.gz",
"platform": null,
"description": "# datahelp\n\ndatahelp is a Python library designed to assist data science and data analysis teams in their workflow. It provides various utility functions and tools to streamline common data science tasks.\n\n## Features\n\ndatahelp offers the following key features:\n\n1. **Project Management:** datahelp simplifies the creation of standard data science project structures. With a single function call, you can generate a well-organized project directory with predefined folders for datasets, processed data, raw data, outputs, models, scripts, notebooks, and more.\n\n2. **Model Saving and Loading:** datahelp provides easy-to-use functions for saving and loading trained machine learning models. It supports various formats such as joblib, pickle, and keras, enabling seamless integration with different model types.\n\n3. **Data Exploration and Visualization:** The library includes functions for data exploration, summary statistics, and visualization. You can quickly generate feature vi plots and visualize missing data to gain insights into your datasets.\n\n4. **Feature Engineering:** datahelp includes methods for handling missing data and noise in your datasets. It offers functions for dropping missing columns based on a specified threshold and detecting outliers using Tukey's Interquartile Range (IQR) method.\n\n5. **Model Evaluation and Cross-Validation:** datahelp provides tools to evaluate model performance, including functions to calculate accuracy, F1-score, precision, recall, and generate classification reports. It also supports cross-validation for model evaluation.\n\n6. **Scaling and Normalization:** The library offers functions for min-max scaling and z-score normalization of data to bring features to a common scale.\n\n## Quickstart\n\nTo use datahelp in your data science projects, you can install it via pip:\n\n```bash\npip install datahelp\n```\n\nOnce installed, you can import the library and explore its functionality:\n\n```python\nimport datahelp as dh # import the datahelp library\n\ndf = pd.read_csv(\"data/iris.csv\") # load iris dataset\n\ndf.head()\n\ncats = dh.eda.get_cat_vars(df)\nprint(cats)\n\nnum_var = dh.eda.get_num_vars(df)\nprint(num_var)\n\ncat_count = dh.eda.get_cat_counts(df)\ncat_count\n\nmissing = dh.eda.display_missing(df)\nmissing\n```\n## Lins\nProject: https://github.com/kimxons/datahelp\nPyPi: https://pypi.python.org/pypi/dataehlp/\n\n## Documentation\n\nFor detailed usage instructions and API reference, please refer to the official documentation at [https://datahelp-docs.example.com](https://datahelp-docs.example.com)\n\n## Contribution\n\ndatahelp is an open-source project, and we welcome contributions from the data science community. If you find a bug, have a feature request, or want to contribute improvements, please open an issue or submit a pull request on our GitHub repository at [https://github.com/datahelp/datahelp](https://github.com/datahelp/datahelp).\n\n## License\n\ndatahelp is licensed under the MIT License. See the [LICENSE](https://github.com/datahelp/datahelp/blob/main/LICENSE) file for more details.\n\n## Contact\n\nIf you have any questions or feedback, feel free to reach out to our support team at dev.kitonga@gmail.com or join our community forum at [https://community.datahelp.com](https://community.datahelp.com). We are here to assist you in making your data science journey smooth and successful!\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Data science library for data science / data analysis teams",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/kimxons/datahelp"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2962b36a4f23417193dfe808c67b85f41df316cc02465dbe6b0844d7795cb0d2",
"md5": "f0336d8dffcc9a8c6e00fa9168ac4ae6",
"sha256": "0343b4e0f86845f4b9ec649caee632727c8ed01d75379e03035c80efba9f8e76"
},
"downloads": -1,
"filename": "datahelp-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f0336d8dffcc9a8c6e00fa9168ac4ae6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 20647,
"upload_time": "2023-09-21T11:04:17",
"upload_time_iso_8601": "2023-09-21T11:04:17.582091Z",
"url": "https://files.pythonhosted.org/packages/29/62/b36a4f23417193dfe808c67b85f41df316cc02465dbe6b0844d7795cb0d2/datahelp-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d8f1d5302c2685c42ffed4061d183c9188126e925d243c3884db4d16e0b49185",
"md5": "ee794c8b31cbd30411d2da3eef3d105f",
"sha256": "6407694f441ec10597cf91a7774e5c34554a3f665a56509bb35571aafb26b646"
},
"downloads": -1,
"filename": "datahelp-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "ee794c8b31cbd30411d2da3eef3d105f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 18746,
"upload_time": "2023-09-21T11:04:19",
"upload_time_iso_8601": "2023-09-21T11:04:19.780454Z",
"url": "https://files.pythonhosted.org/packages/d8/f1/d5302c2685c42ffed4061d183c9188126e925d243c3884db4d16e0b49185/datahelp-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-21 11:04:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kimxons",
"github_project": "datahelp",
"travis_ci": true,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "datahelp"
}