Name | atom-ml JSON |
Version |
6.1.0
JSON |
| download |
home_page | None |
Summary | A Python package for fast exploration of machine learning pipelines |
upload_time | 2024-07-05 09:46:04 |
maintainer | None |
docs_url | None |
author | None |
requires_python | <3.13,>=3.10 |
license | MIT License Copyright (c) 2024 Mavs Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
keywords |
python package
machine learning
modeling
data pipeline
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<div align="center">
<p align="center">
<img src="https://github.com/tvdboom/ATOM/blob/master/images/logo.png?raw=true" alt="ATOM" title="ATOM" height="130" width="500"/>
</p>
# Automated Tool for Optimized Modeling
### A Python package for fast exploration of machine learning pipelines
</div>
<br><br>
📜 Overview
-----------
<p align="center" style="font-size: 1.4em">
<a href="https://github.com/tvdboom" style="text-decoration: none" draggable="false"><img src="https://github.com/tvdboom/ATOM/blob/master/docs_sources/img/icons/avatar.png?raw=true" alt="Author" height=15 width=15 draggable="false" /> Mavs</a>
<a href="mailto:m.524687@gmail.com" style="text-decoration: none" draggable="false"><img src="https://github.com/tvdboom/ATOM/blob/master/docs_sources/img/icons/email.png?raw=true" alt="Email" height=13 width=17 draggable="false" /> m.524687@gmail.com</a>
<a href="https://tvdboom.github.io/ATOM/" style="text-decoration: none" draggable="false"><img src="https://github.com/tvdboom/ATOM/blob/master/docs_sources/img/icons/documentation.png?raw=true" alt="Documentation" height=17 width=17 draggable="false" /> Documentation</a>
<a href="https://join.slack.com/t/atom-alm7229/shared_invite/zt-upd8uc0z-LL63MzBWxFf5tVWOGCBY5g" style="text-decoration: none" draggable="false"><img src="https://github.com/tvdboom/ATOM/blob/master/docs_sources/img/icons/slack.png?raw=true" alt="Slack" height=16 width=16 draggable="false"/> Slack</a>
</p>
<br>
**General Information** | |
--- | ---
**Repository** | [![Project Status: Active](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) [![Conda Recipe](https://img.shields.io/badge/recipe-atom--ml-green.svg)](https://anaconda.org/conda-forge/atom-ml) [![License: MIT](https://img.shields.io/github/license/tvdboom/ATOM)](https://opensource.org/licenses/MIT) [![Downloads](https://static.pepy.tech/badge/atom-ml)](https://pepy.tech/project/atom-ml)
**Release** | [![pdm-managed](https://img.shields.io/badge/pdm-managed-blueviolet)](https://pdm.fming.dev) [![PyPI version](https://img.shields.io/pypi/v/atom-ml)](https://pypi.org/project/atom-ml/) [![Conda Version](https://img.shields.io/conda/vn/conda-forge/atom-ml.svg)](https://anaconda.org/conda-forge/atom-ml) [![DOI](https://zenodo.org/badge/195069958.svg)](https://zenodo.org/badge/latestdoi/195069958)
**Compatibility** | [![Python 3.10\|3.11](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12-blue?logo=python)](https://www.python.org) [![Conda Platforms](https://img.shields.io/conda/pn/conda-forge/atom-ml.svg)](https://anaconda.org/conda-forge/atom-ml)
**Build status** | [![Build and release](https://github.com/tvdboom/ATOM/actions/workflows/release.yml/badge.svg)](https://github.com/tvdboom/ATOM/actions/workflows/release.yml) [![Azure Pipelines](https://dev.azure.com/conda-forge/feedstock-builds/_apis/build/status/atom-ml-feedstock?branchName=main&jobName=linux&configuration=linux%20linux_64_python3.11.____cpython)](https://dev.azure.com/conda-forge/feedstock-builds/_build/latest?definitionId=10822&branchName=master) [![codecov](https://codecov.io/gh/tvdboom/ATOM/branch/master/graph/badge.svg)](https://codecov.io/gh/tvdboom/ATOM)
**Code analysis** | [![Linting and tests](https://github.com/tvdboom/ATOM/actions/workflows/config.yml/badge.svg)](https://github.com/tvdboom/ATOM/actions/workflows/config.yml) [![PEP8](https://img.shields.io/badge/code%20style-pep8-orange.svg)](https://www.python.org/dev/peps/pep-0008/) [![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/) [![ruff](https://img.shields.io/badge/ruff-checked-blue)](https://docs.astral.sh/ruff/) [![mypy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://www.mypy-lang.org/)
<br><br>
💡 Introduction
---------------
During the exploration phase of a machine learning project, a data
scientist tries to find the optimal pipeline for his specific use case.
This usually involves applying standard data cleaning steps, creating
or selecting useful features, trying out different models, etc. Testing
multiple pipelines requires many lines of code, and writing it all in
the same notebook often makes it long and cluttered. On the other hand,
using multiple notebooks makes it harder to compare the results and to
keep an overview. On top of that, refactoring the code for every test
can be quite time-consuming. How many times have you conducted the same
action to pre-process a raw dataset? How many times have you
copy-and-pasted code from an old repository to re-use it in a new use
case?
ATOM is here to help solve these common issues. The package acts as
a wrapper of the whole machine learning pipeline, helping the data
scientist to rapidly find a good model for his problem. Avoid
endless imports and documentation lookups. Avoid rewriting the same
code over and over again. With just a few lines of code, it's now
possible to perform basic data cleaning steps, select relevant
features and compare the performance of multiple models on a given
dataset, providing quick insights on which pipeline performs best
for the task at hand.
Example steps taken by ATOM's pipeline:
1. Data Cleaning
* Handle missing values
* Encode categorical features
* Detect and remove outliers
* Balance the training set
2. Feature engineering
* Create new non-linear features
* Select the most promising features
3. Train and validate multiple models
* Apply hyperparameter tuning
* Fit the models on the training set
* Evaluate the results on the test set
4. Analyze the results
* Get the scores on various metrics
* Make plots to compare the model performances
<br/><br/>
<img src="https://github.com/tvdboom/ATOM/blob/master/images/diagram_pipeline.png?raw=true" alt="diagram_pipeline" title="diagram_pipeline" width="900" height="300" />
<br><br>
❗ Why you should use ATOM
-------------------------
* [Multiple data cleaning and feature engineering classes](https://tvdboom.github.io/ATOM/latest/user_guide/data_cleaning/)
* [55+ classification, regression and forecast models to choose from](https://tvdboom.github.io/ATOM/latest/user_guide/models/)
* [Possibility to train multiple models with one line of code](https://tvdboom.github.io/ATOM/latest/getting_started/#usage)
* [Fast implementation of hyperparameter tuning](https://tvdboom.github.io/ATOM/latest/user_guide/training/#hyperparameter-tuning)
* [Easy way to compare the results from different models](https://tvdboom.github.io/ATOM/latest/user_guide/training/)
* [50+ plots to analyze the data and model performance](https://tvdboom.github.io/ATOM/latest/user_guide/plots/#available-plots)
* [Avoid refactoring to test new pipelines](https://tvdboom.github.io/ATOM/latest/user_guide/data_management/#branches)
* [Native support for GPU training](https://tvdboom.github.io/ATOM/latest/user_guide/accelerating/#gpu-acceleration)
* [Integration with polars, pyspark and pyarrow](https://tvdboom.github.io/ATOM/latest/user_guide/data_management/#data-engines)
* [30+ example notebooks to get you started](https://tvdboom.github.io/ATOM/latest/examples/accelerating_cuml/)
* [Full integration with multilabel and multioutput datasets](https://tvdboom.github.io/ATOM/latest/user_guide/data_management/#multioutput-tasks)
* [Native support for sparse datasets](https://tvdboom.github.io/ATOM/latest/user_guide/data_management/#sparse-datasets)
* [Build-in transformers for NLP pipelines](https://tvdboom.github.io/ATOM/latest/user_guide/nlp/)
* [Avoid endless imports and documentation lookups](https://tvdboom.github.io/ATOM/latest/getting_started/#usage)
<br><br>
🛠️ Installation
---------------
Install ATOM's newest release easily via `pip`:
$ pip install -U atom-ml
or via `conda`:
$ conda install -c conda-forge atom-ml
<br><br>
⚡ Usage
-------
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1H8pL-iAICeaKqWQxWsb6fN9zPNZK722s#scrollTo=LrtjgDQFvU2z&forceEdit=true&sandboxMode=true)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tvdboom/ATOM/HEAD)
ATOM contains a variety of classes and functions to perform data cleaning,
feature engineering, model training, plotting and much more. The easiest
way to use everything ATOM has to offer is through one of the main classes:
* [ATOMClassifier](https://tvdboom.github.io/ATOM/latest//API/ATOM/atomclassifier) for binary or multiclass classification tasks.
* [ATOMForecaster](https://tvdboom.github.io/ATOM/latest//API/ATOM/atomforecaster) for forecasting tasks.
* [ATOMRegressor](https://tvdboom.github.io/ATOM/latest//API/ATOM/atomregressor) for regression tasks.
Let's walk you through an example. Click on the SageMaker Studio Lab badge
on top of this section to run this example yourself.
Make the necessary imports and load the data.
```python
import pandas as pd
from atom import ATOMClassifier
# Load the Australian Weather dataset
X = pd.read_csv("https://raw.githubusercontent.com/tvdboom/ATOM/master/examples/datasets/weatherAUS.csv")
X.head()
```
Initialize the ATOMClassifier or ATOMRegressor class. These two classes
are convenient wrappers for the whole machine learning pipeline. Contrary
to sklearn's API, they are initialized providing the data you want to
manipulate.
```python
atom = ATOMClassifier(X, y="RainTomorrow", n_rows=1000, verbose=2)
```
Data transformations are applied through atom's methods. For example,
calling the [impute](https://tvdboom.github.io/ATOM/latest/API/ATOM/atomclassifier/#impute)
method will initialize an [Imputer](https://tvdboom.github.io/ATOM/latest/API/data_cleaning/imputer)
instance, fit it on the training set and transform the whole dataset.
The transformations are applied immediately after calling the method
(no fit and transform commands necessary).
```python
atom.impute(strat_num="median", strat_cat="most_frequent")
atom.encode(strategy="target", max_onehot=8)
```
Similarly, models are [trained and evaluated](https://tvdboom.github.io/ATOM/latest/user_guide/training)
using the [run](https://tvdboom.github.io/ATOM/latest/API/ATOM/atomclassifier/#run)
method. Here, we fit both a [LinearDiscriminantAnalysis](https://tvdboom.github.io/ATOM/latest/API/models/lda)
and [AdaBoost](https://tvdboom.github.io/ATOM/latest/API/models/adab) model,
and apply [hyperparameter tuning](https://tvdboom.github.io/ATOM/latest/user_guide/training/#hyperparameter-tuning).
```python
atom.run(models=["LDA", "AdaB"], metric="auc", n_trials=10)
```
And lastly, analyze the results.
```python
atom.results
atom.plot_roc()
```
<br><br>
<img src="https://github.com/tvdboom/ATOM/blob/master/docs_sources/img/icons/documentation.png?raw=true" alt="Documentation" height=28 width=28 draggable="false" /> Documentation
----------------
**Relevant links** | |
--- | ---
⭐ **[About](https://tvdboom.github.io/ATOM/latest/release_history/)** | Learn more about the package.
🚀 **[Getting started](https://tvdboom.github.io/ATOM/latest/getting_started/)** | New to ATOM? Here's how to get you started!
👨💻 **[User guide](https://tvdboom.github.io/ATOM/latest/user_guide/introduction/)** | How to use ATOM and its features.
🎛️ **[API Reference](https://tvdboom.github.io/ATOM/latest/API/ATOM/atomclassifier/)** | The detailed reference for ATOM's API.
📋 **[Examples](https://tvdboom.github.io/ATOM/latest/examples/binary_classification/)** | Example notebooks show you what can be done and how.
📢 **[Chagelog](https://tvdboom.github.io/ATOM/latest/changelog/)** | What are the new features in the latest release?
❔ **[FAQ](https://tvdboom.github.io/ATOM/latest/faq/)** | Get answers to frequently asked questions.
🔧 **[Contributing](https://tvdboom.github.io/ATOM/latest/contributing/)** | Do you wan to contribute to the project? Read this before creating a PR.
🌳 **[Dependencies](https://tvdboom.github.io/ATOM/latest/dependencies/)** | Which other packages does ATOM depend on?
📃 **[License](https://tvdboom.github.io/ATOM/latest/license/)** | Copyright and permissions under the MIT license.
Raw data
{
"_id": null,
"home_page": null,
"name": "atom-ml",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.10",
"maintainer_email": null,
"keywords": "Python package, Machine Learning, Modeling, Data Pipeline",
"author": null,
"author_email": "Mavs <m.524687@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/bc/dd/dabcf7f5a023974820dcb6b9e0aa3582d4fa82e38b3dc096ba2bf51df30e/atom_ml-6.1.0.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n<p align=\"center\">\n\t<img src=\"https://github.com/tvdboom/ATOM/blob/master/images/logo.png?raw=true\" alt=\"ATOM\" title=\"ATOM\" height=\"130\" width=\"500\"/>\n</p>\n\n# Automated Tool for Optimized Modeling\n### A Python package for fast exploration of machine learning pipelines\n</div>\n\n<br><br>\n\n\n\n\ud83d\udcdc Overview\n-----------\n\n<p align=\"center\" style=\"font-size: 1.4em\">\n<a href=\"https://github.com/tvdboom\" style=\"text-decoration: none\" draggable=\"false\"><img src=\"https://github.com/tvdboom/ATOM/blob/master/docs_sources/img/icons/avatar.png?raw=true\" alt=\"Author\" height=15 width=15 draggable=\"false\" /> Mavs</a>\n \n<a href=\"mailto:m.524687@gmail.com\" style=\"text-decoration: none\" draggable=\"false\"><img src=\"https://github.com/tvdboom/ATOM/blob/master/docs_sources/img/icons/email.png?raw=true\" alt=\"Email\" height=13 width=17 draggable=\"false\" /> m.524687@gmail.com</a>\n \n<a href=\"https://tvdboom.github.io/ATOM/\" style=\"text-decoration: none\" draggable=\"false\"><img src=\"https://github.com/tvdboom/ATOM/blob/master/docs_sources/img/icons/documentation.png?raw=true\" alt=\"Documentation\" height=17 width=17 draggable=\"false\" /> Documentation</a>\n \n<a href=\"https://join.slack.com/t/atom-alm7229/shared_invite/zt-upd8uc0z-LL63MzBWxFf5tVWOGCBY5g\" style=\"text-decoration: none\" draggable=\"false\"><img src=\"https://github.com/tvdboom/ATOM/blob/master/docs_sources/img/icons/slack.png?raw=true\" alt=\"Slack\" height=16 width=16 draggable=\"false\"/> Slack</a>\n</p>\n\n<br>\n\n**General Information** | |\n--- | ---\n**Repository** | [![Project Status: Active](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) [![Conda Recipe](https://img.shields.io/badge/recipe-atom--ml-green.svg)](https://anaconda.org/conda-forge/atom-ml) [![License: MIT](https://img.shields.io/github/license/tvdboom/ATOM)](https://opensource.org/licenses/MIT) [![Downloads](https://static.pepy.tech/badge/atom-ml)](https://pepy.tech/project/atom-ml)\n**Release** | [![pdm-managed](https://img.shields.io/badge/pdm-managed-blueviolet)](https://pdm.fming.dev) [![PyPI version](https://img.shields.io/pypi/v/atom-ml)](https://pypi.org/project/atom-ml/) [![Conda Version](https://img.shields.io/conda/vn/conda-forge/atom-ml.svg)](https://anaconda.org/conda-forge/atom-ml) [![DOI](https://zenodo.org/badge/195069958.svg)](https://zenodo.org/badge/latestdoi/195069958)\n**Compatibility** | [![Python 3.10\\|3.11](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12-blue?logo=python)](https://www.python.org) [![Conda Platforms](https://img.shields.io/conda/pn/conda-forge/atom-ml.svg)](https://anaconda.org/conda-forge/atom-ml)\n**Build status** | [![Build and release](https://github.com/tvdboom/ATOM/actions/workflows/release.yml/badge.svg)](https://github.com/tvdboom/ATOM/actions/workflows/release.yml) [![Azure Pipelines](https://dev.azure.com/conda-forge/feedstock-builds/_apis/build/status/atom-ml-feedstock?branchName=main&jobName=linux&configuration=linux%20linux_64_python3.11.____cpython)](https://dev.azure.com/conda-forge/feedstock-builds/_build/latest?definitionId=10822&branchName=master) [![codecov](https://codecov.io/gh/tvdboom/ATOM/branch/master/graph/badge.svg)](https://codecov.io/gh/tvdboom/ATOM)\n**Code analysis** | [![Linting and tests](https://github.com/tvdboom/ATOM/actions/workflows/config.yml/badge.svg)](https://github.com/tvdboom/ATOM/actions/workflows/config.yml) [![PEP8](https://img.shields.io/badge/code%20style-pep8-orange.svg)](https://www.python.org/dev/peps/pep-0008/) [![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/) [![ruff](https://img.shields.io/badge/ruff-checked-blue)](https://docs.astral.sh/ruff/) [![mypy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://www.mypy-lang.org/)\n\n\n<br><br>\n\n\n\n\ud83d\udca1 Introduction\n---------------\n\nDuring the exploration phase of a machine learning project, a data\nscientist tries to find the optimal pipeline for his specific use case.\nThis usually involves applying standard data cleaning steps, creating\nor selecting useful features, trying out different models, etc. Testing\nmultiple pipelines requires many lines of code, and writing it all in\nthe same notebook often makes it long and cluttered. On the other hand,\nusing multiple notebooks makes it harder to compare the results and to\nkeep an overview. On top of that, refactoring the code for every test\ncan be quite time-consuming. How many times have you conducted the same\naction to pre-process a raw dataset? How many times have you\ncopy-and-pasted code from an old repository to re-use it in a new use\ncase?\n\nATOM is here to help solve these common issues. The package acts as\na wrapper of the whole machine learning pipeline, helping the data\nscientist to rapidly find a good model for his problem. Avoid\nendless imports and documentation lookups. Avoid rewriting the same\ncode over and over again. With just a few lines of code, it's now\npossible to perform basic data cleaning steps, select relevant\nfeatures and compare the performance of multiple models on a given\ndataset, providing quick insights on which pipeline performs best\nfor the task at hand.\n\nExample steps taken by ATOM's pipeline:\n\n1. Data Cleaning\n\t* Handle missing values\n\t* Encode categorical features\n * Detect and remove outliers\n\t* Balance the training set\n2. Feature engineering\n * Create new non-linear features\n\t* Select the most promising features\n3. Train and validate multiple models\n\t* Apply hyperparameter tuning\n\t* Fit the models on the training set\n * Evaluate the results on the test set\n4. Analyze the results\n * Get the scores on various metrics\n * Make plots to compare the model performances\n\n\n<br/><br/>\n\n<img src=\"https://github.com/tvdboom/ATOM/blob/master/images/diagram_pipeline.png?raw=true\" alt=\"diagram_pipeline\" title=\"diagram_pipeline\" width=\"900\" height=\"300\" />\n\n<br><br>\n\n\u2757 Why you should use ATOM\n-------------------------\n\n* [Multiple data cleaning and feature engineering classes](https://tvdboom.github.io/ATOM/latest/user_guide/data_cleaning/)\n* [55+ classification, regression and forecast models to choose from](https://tvdboom.github.io/ATOM/latest/user_guide/models/)\n* [Possibility to train multiple models with one line of code](https://tvdboom.github.io/ATOM/latest/getting_started/#usage)\n* [Fast implementation of hyperparameter tuning](https://tvdboom.github.io/ATOM/latest/user_guide/training/#hyperparameter-tuning)\n* [Easy way to compare the results from different models](https://tvdboom.github.io/ATOM/latest/user_guide/training/)\n* [50+ plots to analyze the data and model performance](https://tvdboom.github.io/ATOM/latest/user_guide/plots/#available-plots)\n* [Avoid refactoring to test new pipelines](https://tvdboom.github.io/ATOM/latest/user_guide/data_management/#branches)\n* [Native support for GPU training](https://tvdboom.github.io/ATOM/latest/user_guide/accelerating/#gpu-acceleration)\n* [Integration with polars, pyspark and pyarrow](https://tvdboom.github.io/ATOM/latest/user_guide/data_management/#data-engines)\n* [30+ example notebooks to get you started](https://tvdboom.github.io/ATOM/latest/examples/accelerating_cuml/)\n* [Full integration with multilabel and multioutput datasets](https://tvdboom.github.io/ATOM/latest/user_guide/data_management/#multioutput-tasks)\n* [Native support for sparse datasets](https://tvdboom.github.io/ATOM/latest/user_guide/data_management/#sparse-datasets)\n* [Build-in transformers for NLP pipelines](https://tvdboom.github.io/ATOM/latest/user_guide/nlp/)\n* [Avoid endless imports and documentation lookups](https://tvdboom.github.io/ATOM/latest/getting_started/#usage)\n\n<br><br>\n\n\ud83d\udee0\ufe0f Installation\n---------------\n\nInstall ATOM's newest release easily via `pip`:\n\n $ pip install -U atom-ml\n\n\nor via `conda`:\n\n $ conda install -c conda-forge atom-ml\n\n<br><br>\n\n\n\u26a1 Usage\n-------\n\n[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1H8pL-iAICeaKqWQxWsb6fN9zPNZK722s#scrollTo=LrtjgDQFvU2z&forceEdit=true&sandboxMode=true)\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tvdboom/ATOM/HEAD)\n\nATOM contains a variety of classes and functions to perform data cleaning,\nfeature engineering, model training, plotting and much more. The easiest\nway to use everything ATOM has to offer is through one of the main classes:\n\n* [ATOMClassifier](https://tvdboom.github.io/ATOM/latest//API/ATOM/atomclassifier) for binary or multiclass classification tasks.\n* [ATOMForecaster](https://tvdboom.github.io/ATOM/latest//API/ATOM/atomforecaster) for forecasting tasks.\n* [ATOMRegressor](https://tvdboom.github.io/ATOM/latest//API/ATOM/atomregressor) for regression tasks.\n\nLet's walk you through an example. Click on the SageMaker Studio Lab badge\non top of this section to run this example yourself.\n\nMake the necessary imports and load the data.\n\n```python\nimport pandas as pd\nfrom atom import ATOMClassifier\n\n# Load the Australian Weather dataset\nX = pd.read_csv(\"https://raw.githubusercontent.com/tvdboom/ATOM/master/examples/datasets/weatherAUS.csv\")\nX.head()\n```\n\nInitialize the ATOMClassifier or ATOMRegressor class. These two classes\nare convenient wrappers for the whole machine learning pipeline. Contrary\nto sklearn's API, they are initialized providing the data you want to\nmanipulate.\n\n```python\natom = ATOMClassifier(X, y=\"RainTomorrow\", n_rows=1000, verbose=2)\n```\n\nData transformations are applied through atom's methods. For example,\ncalling the [impute](https://tvdboom.github.io/ATOM/latest/API/ATOM/atomclassifier/#impute)\nmethod will initialize an [Imputer](https://tvdboom.github.io/ATOM/latest/API/data_cleaning/imputer)\ninstance, fit it on the training set and transform the whole dataset.\nThe transformations are applied immediately after calling the method\n(no fit and transform commands necessary).\n\n```python\natom.impute(strat_num=\"median\", strat_cat=\"most_frequent\")\natom.encode(strategy=\"target\", max_onehot=8)\n```\n\nSimilarly, models are [trained and evaluated](https://tvdboom.github.io/ATOM/latest/user_guide/training)\nusing the [run](https://tvdboom.github.io/ATOM/latest/API/ATOM/atomclassifier/#run)\nmethod. Here, we fit both a [LinearDiscriminantAnalysis](https://tvdboom.github.io/ATOM/latest/API/models/lda)\nand [AdaBoost](https://tvdboom.github.io/ATOM/latest/API/models/adab) model,\nand apply [hyperparameter tuning](https://tvdboom.github.io/ATOM/latest/user_guide/training/#hyperparameter-tuning).\n\n```python\natom.run(models=[\"LDA\", \"AdaB\"], metric=\"auc\", n_trials=10)\n```\n\nAnd lastly, analyze the results.\n\n```python\natom.results\n\natom.plot_roc()\n```\n\n<br><br>\n\n\n<img src=\"https://github.com/tvdboom/ATOM/blob/master/docs_sources/img/icons/documentation.png?raw=true\" alt=\"Documentation\" height=28 width=28 draggable=\"false\" /> Documentation\n----------------\n\n**Relevant links** | |\n--- | ---\n\u2b50 **[About](https://tvdboom.github.io/ATOM/latest/release_history/)** | Learn more about the package.\n\ud83d\ude80 **[Getting started](https://tvdboom.github.io/ATOM/latest/getting_started/)** | New to ATOM? Here's how to get you started!\n\ud83d\udc68\u200d\ud83d\udcbb **[User guide](https://tvdboom.github.io/ATOM/latest/user_guide/introduction/)** | How to use ATOM and its features.\n\ud83c\udf9b\ufe0f **[API Reference](https://tvdboom.github.io/ATOM/latest/API/ATOM/atomclassifier/)** | The detailed reference for ATOM's API.\n\ud83d\udccb **[Examples](https://tvdboom.github.io/ATOM/latest/examples/binary_classification/)** | Example notebooks show you what can be done and how.\n\ud83d\udce2 **[Chagelog](https://tvdboom.github.io/ATOM/latest/changelog/)** | What are the new features in the latest release?\n\u2754 **[FAQ](https://tvdboom.github.io/ATOM/latest/faq/)** | Get answers to frequently asked questions.\n\ud83d\udd27 **[Contributing](https://tvdboom.github.io/ATOM/latest/contributing/)** | Do you wan to contribute to the project? Read this before creating a PR.\n\ud83c\udf33 **[Dependencies](https://tvdboom.github.io/ATOM/latest/dependencies/)** | Which other packages does ATOM depend on?\n\ud83d\udcc3 **[License](https://tvdboom.github.io/ATOM/latest/license/)** | Copyright and permissions under the MIT license.\n",
"bugtrack_url": null,
"license": "MIT License Copyright (c) 2024 Mavs Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
"summary": "A Python package for fast exploration of machine learning pipelines",
"version": "6.1.0",
"project_urls": {
"Documentation": "https://tvdboom.github.io/ATOM/",
"Issues": "https://github.com/tvdboom/ATOM/issues",
"Repository": "https://github.com/tvdboom/ATOM",
"Slack": "https://join.slack.com/t/atom-alm7229/shared_invite/zt-upd8uc0z-LL63MzBWxFf5tVWOGCBY5g"
},
"split_keywords": [
"python package",
" machine learning",
" modeling",
" data pipeline"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9b6bf0009c74783daa565577613462fecdc411be280e0f75e04173b7a2e171d2",
"md5": "cd339e3fd6049b4b2f2f8ed5a3ec41ad",
"sha256": "656896e88e40d6c7e9e331f2c3182f401e7e9946ae4a33ec0d60d0cf46130251"
},
"downloads": -1,
"filename": "atom_ml-6.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cd339e3fd6049b4b2f2f8ed5a3ec41ad",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.10",
"size": 268996,
"upload_time": "2024-07-05T09:46:02",
"upload_time_iso_8601": "2024-07-05T09:46:02.497908Z",
"url": "https://files.pythonhosted.org/packages/9b/6b/f0009c74783daa565577613462fecdc411be280e0f75e04173b7a2e171d2/atom_ml-6.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "bcdddabcf7f5a023974820dcb6b9e0aa3582d4fa82e38b3dc096ba2bf51df30e",
"md5": "8c4d03e9f511d1b2bfa8e0edd8fc9ea5",
"sha256": "90669d0ed075a5b07053a9d8d5f8db3389f78267045566e2c0eb78cd6948a833"
},
"downloads": -1,
"filename": "atom_ml-6.1.0.tar.gz",
"has_sig": false,
"md5_digest": "8c4d03e9f511d1b2bfa8e0edd8fc9ea5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.10",
"size": 304648,
"upload_time": "2024-07-05T09:46:04",
"upload_time_iso_8601": "2024-07-05T09:46:04.865220Z",
"url": "https://files.pythonhosted.org/packages/bc/dd/dabcf7f5a023974820dcb6b9e0aa3582d4fa82e38b3dc096ba2bf51df30e/atom_ml-6.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-05 09:46:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tvdboom",
"github_project": "ATOM",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "atom-ml"
}