composeml


Namecomposeml JSON
Version 0.10.1 PyPI version JSON
download
home_page
Summarya framework for automated prediction engineering
upload_time2023-01-07 03:28:34
maintainer
docs_urlNone
author
requires_python<4,>=3.8
licenseBSD 3-clause
keywords prediction engineering data science machine learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center"><img width=50% src="https://raw.githubusercontent.com/alteryx/compose/main/docs/source/images/compose.png" alt="Compose" /></p>
<p align="center"><i>"Build better training examples in a fraction of the time."</i></p>
<p align="center">
    <a href="https://github.com/alteryx/compose/actions?query=workflow%3ATests" target="_blank">
        <img src="https://github.com/alteryx/compose/workflows/Tests/badge.svg" alt="Tests" />
    </a>
    <a href="https://codecov.io/gh/alteryx/compose">
        <img src="https://codecov.io/gh/alteryx/compose/branch/main/graph/badge.svg?token=mDz4ueTUEO"/>
    </a>
    <a href="https://compose.alteryx.com/en/stable/?badge=stable" target="_blank">
        <img src="https://readthedocs.com/projects/feature-labs-inc-compose/badge/?version=stable&token=5c3ace685cdb6e10eb67828a4dc74d09b20bb842980c8ee9eb4e9ed168d05b00"
            alt="ReadTheDocs" />
    </a>
    <a href="https://badge.fury.io/py/composeml" target="_blank">
        <img src="https://badge.fury.io/py/composeml.svg?maxAge=2592000" alt="PyPI Version" />
    </a>
    <a href="https://stackoverflow.com/questions/tagged/compose-ml" target="_blank">
        <img src="https://img.shields.io/badge/questions-on_stackoverflow-blue.svg?" alt="StackOverflow" />
    </a>
    <a href="https://pepy.tech/project/composeml" target="_blank">
        <img src="https://pepy.tech/badge/composeml/month" alt="PyPI Downloads" />
    </a>
</p>
<hr>

[Compose](https://compose.alteryx.com) is a machine learning tool for automated prediction engineering. It allows you to structure prediction problems and generate labels for supervised learning. An end user defines an outcome of interest by writing a *labeling function*, then runs a search to automatically extract training examples from historical data. Its result is then provided to [Featuretools](https://docs.featuretools.com/) for automated feature engineering and subsequently to [EvalML](https://evalml.alteryx.com/) for automated machine learning. The workflow of an applied machine learning engineer then becomes:

<br><p align="center"><img width=90% src="https://raw.githubusercontent.com/alteryx/compose/main/docs/source/images/workflow.png" alt="Compose" /></p><br>

By automating the early stage of the machine learning pipeline, our end user can easily define a task and solve it. See the [documentation](https://compose.alteryx.com) for more information.

## Installation
Install with pip

```
python -m pip install composeml
```

or from the Conda-forge channel on [conda](https://anaconda.org/conda-forge/composeml):

```
conda install -c conda-forge composeml
```

### Add-ons

**Update checker** - Receive automatic notifications of new Compose releases

```
python -m pip install "composeml[update_checker]"
```

## Example
> Will a customer spend more than 300 in the next hour of transactions?

In this example, we automatically generate new training examples from a historical dataset of transactions.

```python
import composeml as cp
df = cp.demos.load_transactions()
df = df[df.columns[:7]]
df.head()
```

<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>transaction_id</th>
      <th>session_id</th>
      <th>transaction_time</th>
      <th>product_id</th>
      <th>amount</th>
      <th>customer_id</th>
      <th>device</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>298</td>
      <td>1</td>
      <td>2014-01-01 00:00:00</td>
      <td>5</td>
      <td>127.64</td>
      <td>2</td>
      <td>desktop</td>
    </tr>
    <tr>
      <td>10</td>
      <td>1</td>
      <td>2014-01-01 00:09:45</td>
      <td>5</td>
      <td>57.39</td>
      <td>2</td>
      <td>desktop</td>
    </tr>
    <tr>
      <td>495</td>
      <td>1</td>
      <td>2014-01-01 00:14:05</td>
      <td>5</td>
      <td>69.45</td>
      <td>2</td>
      <td>desktop</td>
    </tr>
    <tr>
      <td>460</td>
      <td>10</td>
      <td>2014-01-01 02:33:50</td>
      <td>5</td>
      <td>123.19</td>
      <td>2</td>
      <td>tablet</td>
    </tr>
    <tr>
      <td>302</td>
      <td>10</td>
      <td>2014-01-01 02:37:05</td>
      <td>5</td>
      <td>64.47</td>
      <td>2</td>
      <td>tablet</td>
    </tr>
  </tbody>
</table>

First, we represent the prediction problem with a labeling function and a label maker.

```python
def total_spent(ds):
    return ds['amount'].sum()

label_maker = cp.LabelMaker(
    target_dataframe_index="customer_id",
    time_index="transaction_time",
    labeling_function=total_spent,
    window_size="1h",
)
```

Then, we run a search to automatically generate the training examples.

```python
label_times = label_maker.search(
    df.sort_values('transaction_time'),
    num_examples_per_instance=2,
    minimum_data='2014-01-01',
    drop_empty=False,
    verbose=False,
)

label_times = label_times.threshold(300)
label_times.head()
```

<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>customer_id</th>
      <th>time</th>
      <th>total_spent</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>2014-01-01 00:00:00</td>
      <td>True</td>
    </tr>
    <tr>
      <td>1</td>
      <td>2014-01-01 01:00:00</td>
      <td>True</td>
    </tr>
    <tr>
      <td>2</td>
      <td>2014-01-01 00:00:00</td>
      <td>False</td>
    </tr>
    <tr>
      <td>2</td>
      <td>2014-01-01 01:00:00</td>
      <td>False</td>
    </tr>
    <tr>
      <td>3</td>
      <td>2014-01-01 00:00:00</td>
      <td>False</td>
    </tr>
  </tbody>
</table>

We now have labels that are ready to use in [Featuretools](https://docs.featuretools.com/) to generate features.

## Support

The Innovation Labs open source community is happy to provide support to users of Compose. Project support can be found in three places depending on the type of question:

1. For usage questions, use [Stack Overflow](https://stackoverflow.com/questions/tagged/compose-ml) with the `composeml` tag.
2. For bugs, issues, or feature requests start a Github [issue](https://github.com/alteryx/compose/issues/new).
3. For discussion regarding development on the core library, use [Slack](https://join.slack.com/t/alteryx-oss/shared_invite/zt-182tyvuxv-NzIn6eiCEf8TBziuKp0bNA).
4. For everything else, the core developers can be reached by email at open_source_support@alteryx.com

## Citing Compose
Compose is built upon a newly defined part of the machine learning process — prediction engineering. If you use Compose, please consider citing this paper:
James Max Kanter, Gillespie, Owen, Kalyan Veeramachaneni. [Label, Segment,Featurize: a cross domain framework for prediction engineering.](https://dai.lids.mit.edu/wp-content/uploads/2017/10/Pred_eng1.pdf) IEEE DSAA 2016.

BibTeX entry:

```bibtex
@inproceedings{kanter2016label,
  title={Label, segment, featurize: a cross domain framework for prediction engineering},
  author={Kanter, James Max and Gillespie, Owen and Veeramachaneni, Kalyan},
  booktitle={2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)},
  pages={430--439},
  year={2016},
  organization={IEEE}
}
```

## Acknowledgements 

The open source development has been supported in part by DARPA's Data driven discovery of models program (D3M). 

## Alteryx

**Compose** is an open source project maintained by [Alteryx](https://www.alteryx.com). We developed Compose to enable flexible definition of the machine learning task. To see the other open source projects we’re working on visit [Alteryx Open Source](https://www.alteryx.com/open-source). If building impactful data science pipelines is important to you or your business, please get in touch.

<p align="center">
  <a href="https://www.alteryx.com/open-source">
    <img src="https://alteryx-oss-web-images.s3.amazonaws.com/OpenSource_Logo-01.png" alt="Alteryx Open Source" width="800"/>
  </a>
</p>

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "composeml",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "<4,>=3.8",
    "maintainer_email": "\"Alteryx, Inc.\" <open_source_support@alteryx.com>",
    "keywords": "prediction engineering,data science,machine learning",
    "author": "",
    "author_email": "\"Alteryx, Inc.\" <open_source_support@alteryx.com>",
    "download_url": "https://files.pythonhosted.org/packages/98/d7/70264fb178f79f6c7b1981cf6780dc0a2feb4ed76bdea771882b38b971f7/composeml-0.10.1.tar.gz",
    "platform": null,
    "description": "<p align=\"center\"><img width=50% src=\"https://raw.githubusercontent.com/alteryx/compose/main/docs/source/images/compose.png\" alt=\"Compose\" /></p>\n<p align=\"center\"><i>\"Build better training examples in a fraction of the time.\"</i></p>\n<p align=\"center\">\n    <a href=\"https://github.com/alteryx/compose/actions?query=workflow%3ATests\" target=\"_blank\">\n        <img src=\"https://github.com/alteryx/compose/workflows/Tests/badge.svg\" alt=\"Tests\" />\n    </a>\n    <a href=\"https://codecov.io/gh/alteryx/compose\">\n        <img src=\"https://codecov.io/gh/alteryx/compose/branch/main/graph/badge.svg?token=mDz4ueTUEO\"/>\n    </a>\n    <a href=\"https://compose.alteryx.com/en/stable/?badge=stable\" target=\"_blank\">\n        <img src=\"https://readthedocs.com/projects/feature-labs-inc-compose/badge/?version=stable&token=5c3ace685cdb6e10eb67828a4dc74d09b20bb842980c8ee9eb4e9ed168d05b00\"\n            alt=\"ReadTheDocs\" />\n    </a>\n    <a href=\"https://badge.fury.io/py/composeml\" target=\"_blank\">\n        <img src=\"https://badge.fury.io/py/composeml.svg?maxAge=2592000\" alt=\"PyPI Version\" />\n    </a>\n    <a href=\"https://stackoverflow.com/questions/tagged/compose-ml\" target=\"_blank\">\n        <img src=\"https://img.shields.io/badge/questions-on_stackoverflow-blue.svg?\" alt=\"StackOverflow\" />\n    </a>\n    <a href=\"https://pepy.tech/project/composeml\" target=\"_blank\">\n        <img src=\"https://pepy.tech/badge/composeml/month\" alt=\"PyPI Downloads\" />\n    </a>\n</p>\n<hr>\n\n[Compose](https://compose.alteryx.com) is a machine learning tool for automated prediction engineering. It allows you to structure prediction problems and generate labels for supervised learning. An end user defines an outcome of interest by writing a *labeling function*, then runs a search to automatically extract training examples from historical data. Its result is then provided to [Featuretools](https://docs.featuretools.com/) for automated feature engineering and subsequently to [EvalML](https://evalml.alteryx.com/) for automated machine learning. The workflow of an applied machine learning engineer then becomes:\n\n<br><p align=\"center\"><img width=90% src=\"https://raw.githubusercontent.com/alteryx/compose/main/docs/source/images/workflow.png\" alt=\"Compose\" /></p><br>\n\nBy automating the early stage of the machine learning pipeline, our end user can easily define a task and solve it. See the [documentation](https://compose.alteryx.com) for more information.\n\n## Installation\nInstall with pip\n\n```\npython -m pip install composeml\n```\n\nor from the Conda-forge channel on [conda](https://anaconda.org/conda-forge/composeml):\n\n```\nconda install -c conda-forge composeml\n```\n\n### Add-ons\n\n**Update checker** - Receive automatic notifications of new Compose releases\n\n```\npython -m pip install \"composeml[update_checker]\"\n```\n\n## Example\n> Will a customer spend more than 300 in the next hour of transactions?\n\nIn this example, we automatically generate new training examples from a historical dataset of transactions.\n\n```python\nimport composeml as cp\ndf = cp.demos.load_transactions()\ndf = df[df.columns[:7]]\ndf.head()\n```\n\n<table border=\"0\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th>transaction_id</th>\n      <th>session_id</th>\n      <th>transaction_time</th>\n      <th>product_id</th>\n      <th>amount</th>\n      <th>customer_id</th>\n      <th>device</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <td>298</td>\n      <td>1</td>\n      <td>2014-01-01 00:00:00</td>\n      <td>5</td>\n      <td>127.64</td>\n      <td>2</td>\n      <td>desktop</td>\n    </tr>\n    <tr>\n      <td>10</td>\n      <td>1</td>\n      <td>2014-01-01 00:09:45</td>\n      <td>5</td>\n      <td>57.39</td>\n      <td>2</td>\n      <td>desktop</td>\n    </tr>\n    <tr>\n      <td>495</td>\n      <td>1</td>\n      <td>2014-01-01 00:14:05</td>\n      <td>5</td>\n      <td>69.45</td>\n      <td>2</td>\n      <td>desktop</td>\n    </tr>\n    <tr>\n      <td>460</td>\n      <td>10</td>\n      <td>2014-01-01 02:33:50</td>\n      <td>5</td>\n      <td>123.19</td>\n      <td>2</td>\n      <td>tablet</td>\n    </tr>\n    <tr>\n      <td>302</td>\n      <td>10</td>\n      <td>2014-01-01 02:37:05</td>\n      <td>5</td>\n      <td>64.47</td>\n      <td>2</td>\n      <td>tablet</td>\n    </tr>\n  </tbody>\n</table>\n\nFirst, we represent the prediction problem with a labeling function and a label maker.\n\n```python\ndef total_spent(ds):\n    return ds['amount'].sum()\n\nlabel_maker = cp.LabelMaker(\n    target_dataframe_index=\"customer_id\",\n    time_index=\"transaction_time\",\n    labeling_function=total_spent,\n    window_size=\"1h\",\n)\n```\n\nThen, we run a search to automatically generate the training examples.\n\n```python\nlabel_times = label_maker.search(\n    df.sort_values('transaction_time'),\n    num_examples_per_instance=2,\n    minimum_data='2014-01-01',\n    drop_empty=False,\n    verbose=False,\n)\n\nlabel_times = label_times.threshold(300)\nlabel_times.head()\n```\n\n<table border=\"0\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th>customer_id</th>\n      <th>time</th>\n      <th>total_spent</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <td>1</td>\n      <td>2014-01-01 00:00:00</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <td>1</td>\n      <td>2014-01-01 01:00:00</td>\n      <td>True</td>\n    </tr>\n    <tr>\n      <td>2</td>\n      <td>2014-01-01 00:00:00</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <td>2</td>\n      <td>2014-01-01 01:00:00</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <td>3</td>\n      <td>2014-01-01 00:00:00</td>\n      <td>False</td>\n    </tr>\n  </tbody>\n</table>\n\nWe now have labels that are ready to use in [Featuretools](https://docs.featuretools.com/) to generate features.\n\n## Support\n\nThe Innovation Labs open source community is happy to provide support to users of Compose. Project support can be found in three places depending on the type of question:\n\n1. For usage questions, use [Stack Overflow](https://stackoverflow.com/questions/tagged/compose-ml) with the `composeml` tag.\n2. For bugs, issues, or feature requests start a Github [issue](https://github.com/alteryx/compose/issues/new).\n3. For discussion regarding development on the core library, use [Slack](https://join.slack.com/t/alteryx-oss/shared_invite/zt-182tyvuxv-NzIn6eiCEf8TBziuKp0bNA).\n4. For everything else, the core developers can be reached by email at open_source_support@alteryx.com\n\n## Citing Compose\nCompose is built upon a newly defined part of the machine learning process \u2014 prediction engineering. If you use Compose, please consider citing this paper:\nJames Max Kanter, Gillespie, Owen, Kalyan Veeramachaneni. [Label, Segment,Featurize: a cross domain framework for prediction engineering.](https://dai.lids.mit.edu/wp-content/uploads/2017/10/Pred_eng1.pdf) IEEE DSAA 2016.\n\nBibTeX entry:\n\n```bibtex\n@inproceedings{kanter2016label,\n  title={Label, segment, featurize: a cross domain framework for prediction engineering},\n  author={Kanter, James Max and Gillespie, Owen and Veeramachaneni, Kalyan},\n  booktitle={2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)},\n  pages={430--439},\n  year={2016},\n  organization={IEEE}\n}\n```\n\n## Acknowledgements \n\nThe open source development has been supported in part by DARPA's Data driven discovery of models program (D3M). \n\n## Alteryx\n\n**Compose** is an open source project maintained by [Alteryx](https://www.alteryx.com). We developed Compose to enable flexible definition of the machine learning task. To see the other open source projects we\u2019re working on visit [Alteryx Open Source](https://www.alteryx.com/open-source). If building impactful data science pipelines is important to you or your business, please get in touch.\n\n<p align=\"center\">\n  <a href=\"https://www.alteryx.com/open-source\">\n    <img src=\"https://alteryx-oss-web-images.s3.amazonaws.com/OpenSource_Logo-01.png\" alt=\"Alteryx Open Source\" width=\"800\"/>\n  </a>\n</p>\n",
    "bugtrack_url": null,
    "license": "BSD 3-clause",
    "summary": "a framework for automated prediction engineering",
    "version": "0.10.1",
    "split_keywords": [
        "prediction engineering",
        "data science",
        "machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f509d830ebb1401a2e2ca445e22b91d338c7056a2e53776afed9fc422356a91e",
                "md5": "9a9d7a8050364b91ba61834630de958f",
                "sha256": "e6d292cbd8619d8e5649207be3954f1b04900b651e51ee85d838ab05d51b7bdb"
            },
            "downloads": -1,
            "filename": "composeml-0.10.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9a9d7a8050364b91ba61834630de958f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.8",
            "size": 39408,
            "upload_time": "2023-01-07T03:28:32",
            "upload_time_iso_8601": "2023-01-07T03:28:32.634270Z",
            "url": "https://files.pythonhosted.org/packages/f5/09/d830ebb1401a2e2ca445e22b91d338c7056a2e53776afed9fc422356a91e/composeml-0.10.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "98d770264fb178f79f6c7b1981cf6780dc0a2feb4ed76bdea771882b38b971f7",
                "md5": "acb86c6efe955e5a43dcfed52a5d6278",
                "sha256": "d87fc181be72fadec16ea0e44ebd363b4e2e3620c87c8b2cae6e236fd00761d4"
            },
            "downloads": -1,
            "filename": "composeml-0.10.1.tar.gz",
            "has_sig": false,
            "md5_digest": "acb86c6efe955e5a43dcfed52a5d6278",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.8",
            "size": 32480,
            "upload_time": "2023-01-07T03:28:34",
            "upload_time_iso_8601": "2023-01-07T03:28:34.579635Z",
            "url": "https://files.pythonhosted.org/packages/98/d7/70264fb178f79f6c7b1981cf6780dc0a2feb4ed76bdea771882b38b971f7/composeml-0.10.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-07 03:28:34",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "composeml"
}
        
Elapsed time: 0.03529s