Triage
======
ML/Data Science Toolkit for Social Good and Public Policy Problems
[![image](https://travis-ci.com/dssg/triage.svg?branch=master)](https://travis-ci.org/dssg/triage)
[![image](https://codecov.io/gh/dssg/triage/branch/master/graph/badge.svg)](https://codecov.io/gh/dssg/triage)
[![image](https://codeclimate.com/github/dssg/triage.png)](https://codeclimate.com/github/dssg/triage)
Building ML/Data Science systems requires answering many design questions, turning them into modeling choices, which in turn define and machine learning models. Questions such as cohort selection, unit of analysis determination, outcome determination, feature (explanatory variables or predictors) generation, model/classifier training, evaluation, selection, bias audits, interpretation, and list generation are often complicated and hard to make design choices around apriori. In addition, once these choices are made, they have to be combined in different ways throughout the course of a project.
Triage is designed to:
- Guide users (data scientists, analysts, researchers) through these design choices by highlighting critical operational use questions.
- Provide an integrated interface to components that are needed throughout a ML/data science project workflow.
## Getting Started with Triage
- Are you completely new to Triage? Run through a quick tutorial hosted on google colab (no setup necessary) to see what triage can do! [Tutorial hosted on Google Colab](https://colab.research.google.com/github/dssg/triage/blob/master/example/colab/colab_triage.ipynb)
- Runj it locally on an [example problem and data set from Donors Choose](https://github.com/dssg/donors-choose)
- [Dirty Duck Tutorial](https://dssg.github.io/triage/dirtyduck/) - Want a more in-depth walk through of triage's functionality and concepts? Go through the dirty duck tutorial that you can install on your local machine with sample data
- [QuickStart Guide](https://dssg.github.io/triage/quickstart/) - Try Triage out with your own project and data
- [Triage Documentation Site](https://dssg.github.io/triage/) - Used Triage before and want more reference documentation?
- [Development](https://github.com/dssg/triage#development) - Contribute to Triage development.
## Installation
To install Triage locally, you need:
- Ubuntu/RedHat
- Python 3.8+
- A PostgreSQL 9.6+ database with your source data (events,
geographical data, etc) loaded.
- **NOTE**: If your database is PostgreSQL 11+ you will get some
speed improvements. We recommend updating to a recent
version of PostgreSQL.
- Ample space on an available disk, (or for example in Amazon Web
Services's S3), to store the matrices and models that will be created for your
experiments
We recommend starting with a new python virtual environment and pip installing triage there.
```bash
$ virtualenv triage-env
$ . triage-env/bin/activate
(triage-env) $ pip install triage
```
If you get an error related to pg_config executable, run the following command (make sure you have sudo access):
```bash
(triage-env) $ sudo apt-get install libpq-dev python3.9-dev
```
Then rerun pip install triage
```bash
(triage-env) $ pip install triage
```
To test if triage was installed correctly, type:
```bash
(triage-env) $ triage -h
```
## Data
Triage needs data in a postgres database and a configuration file that has credentials for the database. The Triage CLI defaults database connection information to a file stored in 'database.yaml' (example in [example/database.yaml](https://github.com/dssg/triage/blob/master/example/database.yaml)).
If you don't want to install Postgres yourself, try `triage db up` to create a vanilla Postgres 12 database using docker. For more details on this command, check out [Triage Database Provisioner](db.md)
## Configure Triage for your project
Triage is configured with a config.yaml file that has parameters defined for each component. You can see some [sample configuration with explanations](https://github.com/dssg/triage/blob/master/example/config/experiment.yaml) to see what configuration looks like.
## Using Triage
1. Via CLI:
```bash
triage experiment example/config/experiment.yaml
```
2. Import as a python package:
```python
from triage.experiments import SingleThreadedExperiment
experiment = SingleThreadedExperiment(
config=experiment_config, # a dictionary
db_engine=create_engine(...), # http://docs.sqlalchemy.org/en/latest/core/engines.html
project_path='/path/to/directory/to/save/data' # could be an S3 path too: 's3://mybucket/myprefix/'
)
experiment.run()
```
There are a plethora of options available for experiment running, affecting things like parallelization, storage, and more. These options are detailed in the [Running an Experiment](https://dssg.github.io/triage/experiments/running/) page.
## Development
Triag was initially developed at [University of Chicago's Center For Data Science and Public Policy](http://dsapp.uchicago.edu) and is now being maintained at Carnegie Mellon University.
To build this package (without installation), its dependencies may
alternatively be installed from the terminal using `pip`:
pip install -r requirement/main.txt
### Testing
To add test (and development) dependencies, use **test.txt**:
pip install -r requirement/test.txt [-r requirement/dev.txt]
Then, to run tests:
pytest
### Development Environment
To quickly bootstrap a development environment, having cloned the
repository, invoke the executable `develop` script from your system
shell:
./develop
A "wizard" will suggest set-up steps and optionally execute these, for
example:
(install) begin
(pyenv) installed
(python-3.9.10) installed
(virtualenv) installed
(activation) installed
(libs) install?
1) yes, install {pip install -r requirement/main.txt -r requirement/test.txt -r requirement/dev.txt}
2) no, ignore
#? 1
### Contributing
If you'd like to contribute to Triage development, see the [CONTRIBUTING.md](https://github.com/dssg/triage/blob/master/CONTRIBUTING.md) document.
Raw data
{
"_id": null,
"home_page": "https://dssg.github.io/triage/",
"name": "triage",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "triage",
"author": "Center for Data Science and Public Policy",
"author_email": "datascifellows@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/ae/85/858b180f23020cf45e20a8667feb9b1aae5984b74a2016d88bb45ec22810/triage-5.3.3.tar.gz",
"platform": null,
"description": "Triage\n======\n\nML/Data Science Toolkit for Social Good and Public Policy Problems\n\n[![image](https://travis-ci.com/dssg/triage.svg?branch=master)](https://travis-ci.org/dssg/triage)\n[![image](https://codecov.io/gh/dssg/triage/branch/master/graph/badge.svg)](https://codecov.io/gh/dssg/triage)\n[![image](https://codeclimate.com/github/dssg/triage.png)](https://codeclimate.com/github/dssg/triage)\n\nBuilding ML/Data Science systems requires answering many design questions, turning them into modeling choices, which in turn define and machine learning models. Questions such as cohort selection, unit of analysis determination, outcome determination, feature (explanatory variables or predictors) generation, model/classifier training, evaluation, selection, bias audits, interpretation, and list generation are often complicated and hard to make design choices around apriori. In addition, once these choices are made, they have to be combined in different ways throughout the course of a project.\n\nTriage is designed to:\n\n- Guide users (data scientists, analysts, researchers) through these design choices by highlighting critical operational use questions.\n- Provide an integrated interface to components that are needed throughout a ML/data science project workflow.\n\n## Getting Started with Triage\n\n- Are you completely new to Triage? Run through a quick tutorial hosted on google colab (no setup necessary) to see what triage can do! [Tutorial hosted on Google Colab](https://colab.research.google.com/github/dssg/triage/blob/master/example/colab/colab_triage.ipynb)\n- Runj it locally on an [example problem and data set from Donors Choose](https://github.com/dssg/donors-choose)\n- [Dirty Duck Tutorial](https://dssg.github.io/triage/dirtyduck/) - Want a more in-depth walk through of triage's functionality and concepts? Go through the dirty duck tutorial that you can install on your local machine with sample data\n- [QuickStart Guide](https://dssg.github.io/triage/quickstart/) - Try Triage out with your own project and data\n- [Triage Documentation Site](https://dssg.github.io/triage/) - Used Triage before and want more reference documentation?\n- [Development](https://github.com/dssg/triage#development) - Contribute to Triage development.\n\n## Installation\n\nTo install Triage locally, you need:\n\n- Ubuntu/RedHat\n- Python 3.8+\n- A PostgreSQL 9.6+ database with your source data (events,\n geographical data, etc) loaded.\n - **NOTE**: If your database is PostgreSQL 11+ you will get some\n speed improvements. We recommend updating to a recent\n version of PostgreSQL.\n- Ample space on an available disk, (or for example in Amazon Web\n Services's S3), to store the matrices and models that will be created for your\n experiments\n\nWe recommend starting with a new python virtual environment and pip installing triage there.\n```bash\n$ virtualenv triage-env\n$ . triage-env/bin/activate\n(triage-env) $ pip install triage\n```\nIf you get an error related to pg_config executable, run the following command (make sure you have sudo access):\n```bash\n(triage-env) $ sudo apt-get install libpq-dev python3.9-dev\n```\nThen rerun pip install triage\n```bash\n(triage-env) $ pip install triage\n```\nTo test if triage was installed correctly, type:\n```bash\n(triage-env) $ triage -h\n```\n\n\n## Data\nTriage needs data in a postgres database and a configuration file that has credentials for the database. The Triage CLI defaults database connection information to a file stored in 'database.yaml' (example in [example/database.yaml](https://github.com/dssg/triage/blob/master/example/database.yaml)).\n\nIf you don't want to install Postgres yourself, try `triage db up` to create a vanilla Postgres 12 database using docker. For more details on this command, check out [Triage Database Provisioner](db.md)\n\n## Configure Triage for your project\n\nTriage is configured with a config.yaml file that has parameters defined for each component. You can see some [sample configuration with explanations](https://github.com/dssg/triage/blob/master/example/config/experiment.yaml) to see what configuration looks like.\n\n## Using Triage\n\n1. Via CLI:\n```bash\n\ntriage experiment example/config/experiment.yaml\n```\n2. Import as a python package:\n```python\nfrom triage.experiments import SingleThreadedExperiment\n\nexperiment = SingleThreadedExperiment(\n config=experiment_config, # a dictionary\n db_engine=create_engine(...), # http://docs.sqlalchemy.org/en/latest/core/engines.html\n project_path='/path/to/directory/to/save/data' # could be an S3 path too: 's3://mybucket/myprefix/'\n)\nexperiment.run()\n```\n\nThere are a plethora of options available for experiment running, affecting things like parallelization, storage, and more. These options are detailed in the [Running an Experiment](https://dssg.github.io/triage/experiments/running/) page.\n\n## Development\n\nTriag was initially developed at [University of Chicago's Center For Data Science and Public Policy](http://dsapp.uchicago.edu) and is now being maintained at Carnegie Mellon University.\n\nTo build this package (without installation), its dependencies may\nalternatively be installed from the terminal using `pip`:\n\n pip install -r requirement/main.txt\n\n### Testing\n\nTo add test (and development) dependencies, use **test.txt**:\n\n pip install -r requirement/test.txt [-r requirement/dev.txt]\n\nThen, to run tests:\n\n pytest\n\n### Development Environment\n\nTo quickly bootstrap a development environment, having cloned the\nrepository, invoke the executable `develop` script from your system\nshell:\n\n ./develop\n\nA \"wizard\" will suggest set-up steps and optionally execute these, for\nexample:\n\n (install) begin\n\n (pyenv) installed\n\n (python-3.9.10) installed\n\n (virtualenv) installed\n\n (activation) installed\n\n (libs) install?\n 1) yes, install {pip install -r requirement/main.txt -r requirement/test.txt -r requirement/dev.txt}\n 2) no, ignore\n #? 1\n\n### Contributing\n\nIf you'd like to contribute to Triage development, see the [CONTRIBUTING.md](https://github.com/dssg/triage/blob/master/CONTRIBUTING.md) document.\n\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Risk modeling and prediction",
"version": "5.3.3",
"project_urls": {
"Documentation": "https://dssg.github.io/triage/",
"Homepage": "https://dssg.github.io/triage/",
"Source Code": "https://github.com/dssg/triage",
"Tutorial": "https://dssg.github.io/triage/dirtyduck/"
},
"split_keywords": [
"triage"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f506bf51d217b58fc170c039c086d621cff06a75f9692a90a6035862312b6443",
"md5": "90cade68bd1d86bc39e86e8fcda60e73",
"sha256": "9f1a022c9f7a36c2d3bd3eb968666e8b240cb8d1ce6bebac7936a908e87363f7"
},
"downloads": -1,
"filename": "triage-5.3.3-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "90cade68bd1d86bc39e86e8fcda60e73",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 258144,
"upload_time": "2024-10-21T16:04:00",
"upload_time_iso_8601": "2024-10-21T16:04:00.657030Z",
"url": "https://files.pythonhosted.org/packages/f5/06/bf51d217b58fc170c039c086d621cff06a75f9692a90a6035862312b6443/triage-5.3.3-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ae85858b180f23020cf45e20a8667feb9b1aae5984b74a2016d88bb45ec22810",
"md5": "e87b6b3164be4d134b0823e1e67b679b",
"sha256": "c439cc4d82179617ee25f3c5f730fa4a6481037b98d8b39b4c4e32fe964462b2"
},
"downloads": -1,
"filename": "triage-5.3.3.tar.gz",
"has_sig": false,
"md5_digest": "e87b6b3164be4d134b0823e1e67b679b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 3816822,
"upload_time": "2024-10-21T16:04:03",
"upload_time_iso_8601": "2024-10-21T16:04:03.053620Z",
"url": "https://files.pythonhosted.org/packages/ae/85/858b180f23020cf45e20a8667feb9b1aae5984b74a2016d88bb45ec22810/triage-5.3.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-21 16:04:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dssg",
"github_project": "triage",
"travis_ci": true,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "triage"
}