skll


Nameskll JSON
Version 3.2.0 PyPI version JSON
download
home_pagehttp://github.com/EducationalTestingService/skll
SummarySciKit-Learn Laboratory makes it easier to run machine learning experiments with scikit-learn.
upload_time2023-01-19 17:26:51
maintainer
docs_urlNone
authorNitin Madnani
requires_python
licenseBSD 3 clause
keywords learning scikit-learn
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            SciKit-Learn Laboratory
-----------------------

.. image:: https://gitlab.com/EducationalTestingService/skll/badges/main/pipeline.svg
   :target: https://gitlab.com/EducationalTestingService/skll/-/pipelines
   :alt: Gitlab CI status

.. image:: https://dev.azure.com/EducationalTestingService/SKLL/_apis/build/status/EducationalTestingService.skll
   :target: https://dev.azure.com/EducationalTestingService/SKLL/_build?view=runs
   :alt: Azure Pipelines status

.. image:: https://codecov.io/gh/EducationalTestingService/skll/branch/main/graph/badge.svg
  :target: https://codecov.io/gh/EducationalTestingService/skll

.. image:: https://img.shields.io/pypi/v/skll.svg
   :target: https://pypi.org/project/skll/
   :alt: Latest version on PyPI

.. image:: https://img.shields.io/pypi/l/skll.svg
   :alt: License

.. image:: https://img.shields.io/conda/v/ets/skll.svg
   :target: https://anaconda.org/ets/skll
   :alt: Conda package for SKLL

.. image:: https://img.shields.io/pypi/pyversions/skll.svg
   :target: https://pypi.org/project/skll/
   :alt: Supported python versions for SKLL

.. image:: https://img.shields.io/badge/DOI-10.5281%2Fzenodo.12825-blue.svg
   :target: http://dx.doi.org/10.5281/zenodo.12825
   :alt: DOI for citing SKLL 1.0.0

.. image:: https://mybinder.org/badge_logo.svg
 :target: https://mybinder.org/v2/gh/EducationalTestingService/skll/main?filepath=examples%2FTutorial.ipynb


This Python package provides command-line utilities to make it easier to run
machine learning experiments with scikit-learn.  One of the primary goals of
our project is to make it so that you can run scikit-learn experiments without
actually needing to write any code other than what you used to generate/extract
the features.

Installation
~~~~~~~~~~~~

You can install using either ``pip`` or ``conda``. See details `here <https://skll.readthedocs.io/en/latest/getting_started.html>`__.

Requirements
~~~~~~~~~~~~

-  Python 3.8, 3.9, or 3.10
-  `beautifulsoup4 <http://www.crummy.com/software/BeautifulSoup/>`__
-  `gridmap <https://pypi.org/project/gridmap/>`__ (only required if you plan
   to run things in parallel on a DRMAA-compatible cluster)
-  `joblib <https://pypi.org/project/joblib/>`__
-  `pandas <http://pandas.pydata.org>`__
-  `ruamel.yaml <http://yaml.readthedocs.io/en/latest/overview.html>`__
-  `scikit-learn <http://scikit-learn.org/stable/>`__
-  `seaborn <http://seaborn.pydata.org>`__
-  `tabulate <https://pypi.org/project/tabulate/>`__

Command-line Interface
~~~~~~~~~~~~~~~~~~~~~~

The main utility we provide is called ``run_experiment`` and it can be used to
easily run a series of learners on datasets specified in a configuration file
like:

.. code:: ini

  [General]
  experiment_name = Titanic_Evaluate_Tuned
  # valid tasks: cross_validate, evaluate, predict, train
  task = evaluate

  [Input]
  # these directories could also be absolute paths
  # (and must be if you're not running things in local mode)
  train_directory = train
  test_directory = dev
  # Can specify multiple sets of feature files that are merged together automatically
  featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]]
  # List of scikit-learn learners to use
  learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"]
  # Column in CSV containing labels to predict
  label_col = Survived
  # Column in CSV containing instance IDs (if any)
  id_col = PassengerId

  [Tuning]
  # Should we tune parameters of all learners by searching provided parameter grids?
  grid_search = true
  # Function to maximize when performing grid search
  objectives = ['accuracy']

  [Output]
  # Also compute the area under the ROC curve as an additional metric
  metrics = ['roc_auc']
  # The following can also be absolute paths
  logs = output
  results = output
  predictions = output
  probability = true
  models = output

For more information about getting started with ``run_experiment``, please check
out `our tutorial <https://skll.readthedocs.org/en/latest/tutorial.html>`__, or
`our config file specs <https://skll.readthedocs.org/en/latest/run_experiment.html>`__.

You can also follow this `interactive Jupyter tutorial <https://mybinder.org/v2/gh/AVajpayeeJr/skll/feature/448-interactive-binder?filepath=examples>`__.

We also provide utilities for:

-  `converting between machine learning toolkit formats <https://skll.readthedocs.org/en/latest/utilities.html#skll-convert>`__
   (e.g., ARFF, CSV)
-  `filtering feature files <https://skll.readthedocs.org/en/latest/utilities.html#filter-features>`__
-  `joining feature files <https://skll.readthedocs.org/en/latest/utilities.html#join-features>`__
-  `other common tasks <https://skll.readthedocs.org/en/latest/utilities.html>`__


Python API
~~~~~~~~~~

If you just want to avoid writing a lot of boilerplate learning code, you can
also use our simple Python API which also supports pandas DataFrames.
The main way you'll want to use the API is through
the ``Learner`` and ``Reader`` classes. For more details on our API, see
`the documentation <https://skll.readthedocs.org/en/latest/api.html>`__.

While our API can be broadly useful, it should be noted that the command-line
utilities are intended as the primary way of using SKLL.  The API is just a nice
side-effect of our developing the utilities.


A Note on Pronunciation
~~~~~~~~~~~~~~~~~~~~~~~

.. image:: doc/skll.png
   :alt: SKLL logo
   :align: right

.. container:: clear

  .. image:: doc/spacer.png

SciKit-Learn Laboratory (SKLL) is pronounced "skull": that's where the learning
happens.

Talks
~~~~~

-  *Simpler Machine Learning with SKLL 1.0*, Dan Blanchard, PyData NYC 2014 (`video <https://www.youtube.com/watch?v=VEo2shBuOrc&feature=youtu.be&t=1s>`__ | `slides <http://www.slideshare.net/DanielBlanchard2/py-data-nyc-2014>`__)
-  *Simpler Machine Learning with SKLL*, Dan Blanchard, PyData NYC 2013 (`video <http://vimeo.com/79511496>`__ | `slides <http://www.slideshare.net/DanielBlanchard2/simple-machine-learning-with-skll>`__)

Citing
~~~~~~
If you are using SKLL in your work, you can cite it as follows: "We used scikit-learn (Pedragosa et al, 2011) via the SKLL toolkit (https://github.com/EducationalTestingService/skll)."

Books
~~~~~

SKLL is featured in `Data Science at the Command Line <http://datascienceatthecommandline.com>`__
by `Jeroen Janssens <http://jeroenjanssens.com>`__.

Changelog
~~~~~~~~~

See `GitHub releases <https://github.com/EducationalTestingService/skll/releases>`__.

Contribute
~~~~~~~~~~

Thank you for your interest in contributing to SKLL! See `CONTRIBUTING.md <https://github.com/EducationalTestingService/skll/blob/main/CONTRIBUTING.md>`__ for instructions on how to get started.

            

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/EducationalTestingService/skll",
    "name": "skll",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "learning scikit-learn",
    "author": "Nitin Madnani",
    "author_email": "nmadnani@ets.org",
    "download_url": "https://files.pythonhosted.org/packages/2f/40/970365e6ca7ba4a960d06739f53aabeec43d87b2dff0ff9f66a2643529ce/skll-3.2.0.tar.gz",
    "platform": null,
    "description": "SciKit-Learn Laboratory\n-----------------------\n\n.. image:: https://gitlab.com/EducationalTestingService/skll/badges/main/pipeline.svg\n   :target: https://gitlab.com/EducationalTestingService/skll/-/pipelines\n   :alt: Gitlab CI status\n\n.. image:: https://dev.azure.com/EducationalTestingService/SKLL/_apis/build/status/EducationalTestingService.skll\n   :target: https://dev.azure.com/EducationalTestingService/SKLL/_build?view=runs\n   :alt: Azure Pipelines status\n\n.. image:: https://codecov.io/gh/EducationalTestingService/skll/branch/main/graph/badge.svg\n  :target: https://codecov.io/gh/EducationalTestingService/skll\n\n.. image:: https://img.shields.io/pypi/v/skll.svg\n   :target: https://pypi.org/project/skll/\n   :alt: Latest version on PyPI\n\n.. image:: https://img.shields.io/pypi/l/skll.svg\n   :alt: License\n\n.. image:: https://img.shields.io/conda/v/ets/skll.svg\n   :target: https://anaconda.org/ets/skll\n   :alt: Conda package for SKLL\n\n.. image:: https://img.shields.io/pypi/pyversions/skll.svg\n   :target: https://pypi.org/project/skll/\n   :alt: Supported python versions for SKLL\n\n.. image:: https://img.shields.io/badge/DOI-10.5281%2Fzenodo.12825-blue.svg\n   :target: http://dx.doi.org/10.5281/zenodo.12825\n   :alt: DOI for citing SKLL 1.0.0\n\n.. image:: https://mybinder.org/badge_logo.svg\n :target: https://mybinder.org/v2/gh/EducationalTestingService/skll/main?filepath=examples%2FTutorial.ipynb\n\n\nThis Python package provides command-line utilities to make it easier to run\nmachine learning experiments with scikit-learn.  One of the primary goals of\nour project is to make it so that you can run scikit-learn experiments without\nactually needing to write any code other than what you used to generate/extract\nthe features.\n\nInstallation\n~~~~~~~~~~~~\n\nYou can install using either ``pip`` or ``conda``. See details `here <https://skll.readthedocs.io/en/latest/getting_started.html>`__.\n\nRequirements\n~~~~~~~~~~~~\n\n-  Python 3.8, 3.9, or 3.10\n-  `beautifulsoup4 <http://www.crummy.com/software/BeautifulSoup/>`__\n-  `gridmap <https://pypi.org/project/gridmap/>`__ (only required if you plan\n   to run things in parallel on a DRMAA-compatible cluster)\n-  `joblib <https://pypi.org/project/joblib/>`__\n-  `pandas <http://pandas.pydata.org>`__\n-  `ruamel.yaml <http://yaml.readthedocs.io/en/latest/overview.html>`__\n-  `scikit-learn <http://scikit-learn.org/stable/>`__\n-  `seaborn <http://seaborn.pydata.org>`__\n-  `tabulate <https://pypi.org/project/tabulate/>`__\n\nCommand-line Interface\n~~~~~~~~~~~~~~~~~~~~~~\n\nThe main utility we provide is called ``run_experiment`` and it can be used to\neasily run a series of learners on datasets specified in a configuration file\nlike:\n\n.. code:: ini\n\n  [General]\n  experiment_name = Titanic_Evaluate_Tuned\n  # valid tasks: cross_validate, evaluate, predict, train\n  task = evaluate\n\n  [Input]\n  # these directories could also be absolute paths\n  # (and must be if you're not running things in local mode)\n  train_directory = train\n  test_directory = dev\n  # Can specify multiple sets of feature files that are merged together automatically\n  featuresets = [[\"family.csv\", \"misc.csv\", \"socioeconomic.csv\", \"vitals.csv\"]]\n  # List of scikit-learn learners to use\n  learners = [\"RandomForestClassifier\", \"DecisionTreeClassifier\", \"SVC\", \"MultinomialNB\"]\n  # Column in CSV containing labels to predict\n  label_col = Survived\n  # Column in CSV containing instance IDs (if any)\n  id_col = PassengerId\n\n  [Tuning]\n  # Should we tune parameters of all learners by searching provided parameter grids?\n  grid_search = true\n  # Function to maximize when performing grid search\n  objectives = ['accuracy']\n\n  [Output]\n  # Also compute the area under the ROC curve as an additional metric\n  metrics = ['roc_auc']\n  # The following can also be absolute paths\n  logs = output\n  results = output\n  predictions = output\n  probability = true\n  models = output\n\nFor more information about getting started with ``run_experiment``, please check\nout `our tutorial <https://skll.readthedocs.org/en/latest/tutorial.html>`__, or\n`our config file specs <https://skll.readthedocs.org/en/latest/run_experiment.html>`__.\n\nYou can also follow this `interactive Jupyter tutorial <https://mybinder.org/v2/gh/AVajpayeeJr/skll/feature/448-interactive-binder?filepath=examples>`__.\n\nWe also provide utilities for:\n\n-  `converting between machine learning toolkit formats <https://skll.readthedocs.org/en/latest/utilities.html#skll-convert>`__\n   (e.g., ARFF, CSV)\n-  `filtering feature files <https://skll.readthedocs.org/en/latest/utilities.html#filter-features>`__\n-  `joining feature files <https://skll.readthedocs.org/en/latest/utilities.html#join-features>`__\n-  `other common tasks <https://skll.readthedocs.org/en/latest/utilities.html>`__\n\n\nPython API\n~~~~~~~~~~\n\nIf you just want to avoid writing a lot of boilerplate learning code, you can\nalso use our simple Python API which also supports pandas DataFrames.\nThe main way you'll want to use the API is through\nthe ``Learner`` and ``Reader`` classes. For more details on our API, see\n`the documentation <https://skll.readthedocs.org/en/latest/api.html>`__.\n\nWhile our API can be broadly useful, it should be noted that the command-line\nutilities are intended as the primary way of using SKLL.  The API is just a nice\nside-effect of our developing the utilities.\n\n\nA Note on Pronunciation\n~~~~~~~~~~~~~~~~~~~~~~~\n\n.. image:: doc/skll.png\n   :alt: SKLL logo\n   :align: right\n\n.. container:: clear\n\n  .. image:: doc/spacer.png\n\nSciKit-Learn Laboratory (SKLL) is pronounced \"skull\": that's where the learning\nhappens.\n\nTalks\n~~~~~\n\n-  *Simpler Machine Learning with SKLL 1.0*, Dan Blanchard, PyData NYC 2014 (`video <https://www.youtube.com/watch?v=VEo2shBuOrc&feature=youtu.be&t=1s>`__ | `slides <http://www.slideshare.net/DanielBlanchard2/py-data-nyc-2014>`__)\n-  *Simpler Machine Learning with SKLL*, Dan Blanchard, PyData NYC 2013 (`video <http://vimeo.com/79511496>`__ | `slides <http://www.slideshare.net/DanielBlanchard2/simple-machine-learning-with-skll>`__)\n\nCiting\n~~~~~~\nIf you are using SKLL in your work, you can cite it as follows: \"We used scikit-learn (Pedragosa et al, 2011) via the SKLL toolkit (https://github.com/EducationalTestingService/skll).\"\n\nBooks\n~~~~~\n\nSKLL is featured in `Data Science at the Command Line <http://datascienceatthecommandline.com>`__\nby `Jeroen Janssens <http://jeroenjanssens.com>`__.\n\nChangelog\n~~~~~~~~~\n\nSee `GitHub releases <https://github.com/EducationalTestingService/skll/releases>`__.\n\nContribute\n~~~~~~~~~~\n\nThank you for your interest in contributing to SKLL! See `CONTRIBUTING.md <https://github.com/EducationalTestingService/skll/blob/main/CONTRIBUTING.md>`__ for instructions on how to get started.\n",
    "bugtrack_url": null,
    "license": "BSD 3 clause",
    "summary": "SciKit-Learn Laboratory makes it easier to run machine learning experiments with scikit-learn.",
    "version": "3.2.0",
    "split_keywords": [
        "learning",
        "scikit-learn"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2f40970365e6ca7ba4a960d06739f53aabeec43d87b2dff0ff9f66a2643529ce",
                "md5": "9efb94e347af5ee702211a459f5dccce",
                "sha256": "e18e45493360b3d1efef1e9d208c5e4fda4dae7f7b9ef3442b738bd5126555db"
            },
            "downloads": -1,
            "filename": "skll-3.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9efb94e347af5ee702211a459f5dccce",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 110286,
            "upload_time": "2023-01-19T17:26:51",
            "upload_time_iso_8601": "2023-01-19T17:26:51.141251Z",
            "url": "https://files.pythonhosted.org/packages/2f/40/970365e6ca7ba4a960d06739f53aabeec43d87b2dff0ff9f66a2643529ce/skll-3.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-19 17:26:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "EducationalTestingService",
    "github_project": "skll",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": false,
    "requirements": [],
    "lcname": "skll"
}
        
Elapsed time: 0.03145s