trane


Nametrane JSON
Version 0.8.0 PyPI version JSON
download
home_page
Summaryautomatically generate prediction problems and labels for supervised learning.
upload_time2024-01-02 15:50:36
maintainer
docs_urlNone
author
requires_python<4,>=3.8
licenseMIT License
keywords trane data science machine learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
<p align="center">
<img width=50% src="https://github.com/trane-dev/Trane/blob/main/docs/trane-header.png" alt="Trane Logo" />
</p>

<p align="center">
    <a href="https://github.com/trane-dev/Trane/actions/workflows/tests.yaml" target="_blank">
      <img src="https://github.com/trane-dev/Trane/actions/workflows/tests.yaml/badge.svg" alt="Tests Status" />
    </a>
    <a href="https://codecov.io/gh/trane-dev/Trane" target="_blank">
      <img src="https://codecov.io/gh/trane-dev/Trane/branch/main/graph/badge.svg?token=HafAlYGH8F" alt="Code Coverage" />
    </a>
    <a href="https://badge.fury.io/py/Trane" target="_blank">
        <img src="https://badge.fury.io/py/Trane.svg?maxAge=2592000" alt="PyPI Version" />
    </a>
    <a href="https://pepy.tech/project/Trane" target="_blank">
        <img src="https://static.pepy.tech/badge/trane" alt="PyPI Downloads" />
    </a>
</p>

<hr>

**Trane** is a software package that automatically generates problems for temporal datasets and produces labels for supervised learning. Its goal is to streamline the machine learning problem-solving process.

## Install

Install Trane using pip:

```shell
python -m pip install trane
```

## Usage

Here's a quick demonstration of Trane in action:

```python
import trane

data, metadata = trane.load_airbnb()
problem_generator = trane.ProblemGenerator(
  metadata=metadata,
  entity_columns=["location"]
)
problems = problem_generator.generate()

for problem in problems[:5]:
    print(problem)
```

A few of the generated problems:
```
==================================================
Generated 40 total problems
--------------------------------------------------
Classification problems: 5
Regression problems: 35
==================================================
For each <location> predict if there exists a record
For each <location> predict if there exists a record with <location> equal to <str>
For each <location> predict if there exists a record with <location> not equal to <str>
For each <location> predict if there exists a record with <rating> equal to <str>
For each <location> predict if there exists a record with <rating> not equal to <str>
```

With Trane's LLM add-on (`pip install trane[llm]`), we can determine the relevant problems with OpenAI:
```python
from trane.llm import analyze

instructions = "determine 5 most relevant problems about user's booking preferences. Do not include 'predict the first/last X' problems"
context = "Airbnb data listings in major cities, including information about hosts, pricing, location, and room type, along with over 5 million historical reviews."
relevant_problems = analyze(
    problems=problems,
    instructions=instructions,
    context=context,
    model="gpt-3.5-turbo-16k"
)
for problem in relevant_problems:
    print(problem)
    print(f'Reasoning: {problem.get_reasoning()}\n')
```
Output
```text
For each <location> predict if there exists a record
Reasoning: This problem can help identify locations with missing data or locations that have not been booked at all.

For each <location> predict the first <location> in all related records
Reasoning: Predicting the first location in all related records can provide insights into the most frequently booked locations for each city.

For each <location> predict the first <rating> in all related records
Reasoning: Predicting the first rating in all related records can provide insights into the average satisfaction level of guests for each location.

For each <location> predict the last <location> in all related records
Reasoning: Predicting the last location in all related records can provide insights into the most recent bookings for each city.

For each <location> predict the last <rating> in all related records
Reasoning: Predicting the last rating in all related records can provide insights into the recent satisfaction level of guests for each location.
```

## Community

- **Questions or Issues?** Create a [GitHub issue](https://github.com/trane-dev/Trane/issues).
- **Want to Chat?** [Join our Slack community](https://join.slack.com/t/trane-dev/shared_invite/zt-1zglnh25c-ryuQFarw0rVgKHC6ywUOlg).

## Cite Trane

If you find Trane beneficial, consider citing our paper:

Ben Schreck, Kalyan Veeramachaneni. [What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems.](https://dai.lids.mit.edu/wp-content/uploads/2017/10/Trane1.pdf) *IEEE DSAA 2016*, 440-451.

BibTeX entry:

```bibtex
@inproceedings{schreck2016would,
  title={What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems},
  author={Schreck, Benjamin and Veeramachaneni, Kalyan},
  booktitle={Data Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on},
  pages={440--451},
  year={2016},
  organization={IEEE}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "trane",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "<4,>=3.8",
    "maintainer_email": "MIT Data to AI Lab <dai-lab-trane@mit.edu>",
    "keywords": "trane,data science,machine learning",
    "author": "",
    "author_email": "MIT Data to AI Lab <dai-lab-trane@mit.edu>",
    "download_url": "https://files.pythonhosted.org/packages/2c/87/77b9b61a74c9b66392b9b383efdd2c572bffd4b40fd4d64e9fdb3f19a805/trane-0.8.0.tar.gz",
    "platform": null,
    "description": "\n<p align=\"center\">\n<img width=50% src=\"https://github.com/trane-dev/Trane/blob/main/docs/trane-header.png\" alt=\"Trane Logo\" />\n</p>\n\n<p align=\"center\">\n    <a href=\"https://github.com/trane-dev/Trane/actions/workflows/tests.yaml\" target=\"_blank\">\n      <img src=\"https://github.com/trane-dev/Trane/actions/workflows/tests.yaml/badge.svg\" alt=\"Tests Status\" />\n    </a>\n    <a href=\"https://codecov.io/gh/trane-dev/Trane\" target=\"_blank\">\n      <img src=\"https://codecov.io/gh/trane-dev/Trane/branch/main/graph/badge.svg?token=HafAlYGH8F\" alt=\"Code Coverage\" />\n    </a>\n    <a href=\"https://badge.fury.io/py/Trane\" target=\"_blank\">\n        <img src=\"https://badge.fury.io/py/Trane.svg?maxAge=2592000\" alt=\"PyPI Version\" />\n    </a>\n    <a href=\"https://pepy.tech/project/Trane\" target=\"_blank\">\n        <img src=\"https://static.pepy.tech/badge/trane\" alt=\"PyPI Downloads\" />\n    </a>\n</p>\n\n<hr>\n\n**Trane** is a software package that automatically generates problems for temporal datasets and produces labels for supervised learning. Its goal is to streamline the machine learning problem-solving process.\n\n## Install\n\nInstall Trane using pip:\n\n```shell\npython -m pip install trane\n```\n\n## Usage\n\nHere's a quick demonstration of Trane in action:\n\n```python\nimport trane\n\ndata, metadata = trane.load_airbnb()\nproblem_generator = trane.ProblemGenerator(\n  metadata=metadata,\n  entity_columns=[\"location\"]\n)\nproblems = problem_generator.generate()\n\nfor problem in problems[:5]:\n    print(problem)\n```\n\nA few of the generated problems:\n```\n==================================================\nGenerated 40 total problems\n--------------------------------------------------\nClassification problems: 5\nRegression problems: 35\n==================================================\nFor each <location> predict if there exists a record\nFor each <location> predict if there exists a record with <location> equal to <str>\nFor each <location> predict if there exists a record with <location> not equal to <str>\nFor each <location> predict if there exists a record with <rating> equal to <str>\nFor each <location> predict if there exists a record with <rating> not equal to <str>\n```\n\nWith Trane's LLM add-on (`pip install trane[llm]`), we can determine the relevant problems with OpenAI:\n```python\nfrom trane.llm import analyze\n\ninstructions = \"determine 5 most relevant problems about user's booking preferences. Do not include 'predict the first/last X' problems\"\ncontext = \"Airbnb data listings in major cities, including information about hosts, pricing, location, and room type, along with over 5 million historical reviews.\"\nrelevant_problems = analyze(\n    problems=problems,\n    instructions=instructions,\n    context=context,\n    model=\"gpt-3.5-turbo-16k\"\n)\nfor problem in relevant_problems:\n    print(problem)\n    print(f'Reasoning: {problem.get_reasoning()}\\n')\n```\nOutput\n```text\nFor each <location> predict if there exists a record\nReasoning: This problem can help identify locations with missing data or locations that have not been booked at all.\n\nFor each <location> predict the first <location> in all related records\nReasoning: Predicting the first location in all related records can provide insights into the most frequently booked locations for each city.\n\nFor each <location> predict the first <rating> in all related records\nReasoning: Predicting the first rating in all related records can provide insights into the average satisfaction level of guests for each location.\n\nFor each <location> predict the last <location> in all related records\nReasoning: Predicting the last location in all related records can provide insights into the most recent bookings for each city.\n\nFor each <location> predict the last <rating> in all related records\nReasoning: Predicting the last rating in all related records can provide insights into the recent satisfaction level of guests for each location.\n```\n\n## Community\n\n- **Questions or Issues?** Create a [GitHub issue](https://github.com/trane-dev/Trane/issues).\n- **Want to Chat?** [Join our Slack community](https://join.slack.com/t/trane-dev/shared_invite/zt-1zglnh25c-ryuQFarw0rVgKHC6ywUOlg).\n\n## Cite Trane\n\nIf you find Trane beneficial, consider citing our paper:\n\nBen Schreck, Kalyan Veeramachaneni. [What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems.](https://dai.lids.mit.edu/wp-content/uploads/2017/10/Trane1.pdf) *IEEE DSAA 2016*, 440-451.\n\nBibTeX entry:\n\n```bibtex\n@inproceedings{schreck2016would,\n  title={What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems},\n  author={Schreck, Benjamin and Veeramachaneni, Kalyan},\n  booktitle={Data Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on},\n  pages={440--451},\n  year={2016},\n  organization={IEEE}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "automatically generate prediction problems and labels for supervised learning.",
    "version": "0.8.0",
    "project_urls": {
        "Changes": "https://github.com/trane-dev/Trane/blob/main/docs/changelog.md",
        "Chat": "https://join.slack.com/t/trane-dev/shared_invite/zt-1zglnh25c-ryuQFarw0rVgKHC6ywUOlg",
        "Issue Tracker": "https://github.com/trane-dev/Trane/issues",
        "Source Code": "https://github.com/trane-dev/Trane/",
        "Twitter": "https://twitter.com/lab_dai"
    },
    "split_keywords": [
        "trane",
        "data science",
        "machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f2f01755d68322eca0c1344c5786650ec0d4d1f2d141d1b3e9135fff28090d64",
                "md5": "7fd6e736471214a7059e6ce19fe38a18",
                "sha256": "9f69b86da4bd3226a1b25bb7f6fafb91ae47b9e7ef21a9dc99d4e200f6c9a8b5"
            },
            "downloads": -1,
            "filename": "trane-0.8.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7fd6e736471214a7059e6ce19fe38a18",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.8",
            "size": 4390115,
            "upload_time": "2024-01-02T15:50:33",
            "upload_time_iso_8601": "2024-01-02T15:50:33.652073Z",
            "url": "https://files.pythonhosted.org/packages/f2/f0/1755d68322eca0c1344c5786650ec0d4d1f2d141d1b3e9135fff28090d64/trane-0.8.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2c8777b9b61a74c9b66392b9b383efdd2c572bffd4b40fd4d64e9fdb3f19a805",
                "md5": "1ce664566a94b7eb49792eb64af887cf",
                "sha256": "677514a691ba5a49a4b4569605a23990005549cd7943c71c8fc8e4ccef60684f"
            },
            "downloads": -1,
            "filename": "trane-0.8.0.tar.gz",
            "has_sig": false,
            "md5_digest": "1ce664566a94b7eb49792eb64af887cf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.8",
            "size": 4366668,
            "upload_time": "2024-01-02T15:50:36",
            "upload_time_iso_8601": "2024-01-02T15:50:36.697110Z",
            "url": "https://files.pythonhosted.org/packages/2c/87/77b9b61a74c9b66392b9b383efdd2c572bffd4b40fd4d64e9fdb3f19a805/trane-0.8.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-02 15:50:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "trane-dev",
    "github_project": "Trane",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "trane"
}
        
Elapsed time: 0.16344s