skrub


Nameskrub JSON
Version 0.4.1 PyPI version JSON
download
home_pageNone
SummaryPrepping tables for machine learning
upload_time2024-12-11 19:28:08
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            skrub
=====

.. image:: https://skrub-data.github.io/stable/_static/skrub.svg
   :align: center
   :width: 50 %
   :alt: skrub logo


|py_ver| |pypi_var| |pypi_dl| |codecov| |circleci| |black|

.. |py_ver| image:: https://img.shields.io/pypi/pyversions/skrub
.. |pypi_var| image:: https://img.shields.io/pypi/v/skrub?color=informational
.. |pypi_dl| image:: https://img.shields.io/pypi/dm/skrub
.. |codecov| image:: https://img.shields.io/codecov/c/github/skrub-data/skrub/main
.. |circleci| image:: https://img.shields.io/circleci/build/github/skrub-data/skrub/main?label=CircleCI
.. |black| image:: https://img.shields.io/badge/code%20style-black-000000.svg


**skrub** (formerly *dirty_cat*) is a Python
library that facilitates prepping your tables for machine learning.

If you like the package, spread the word and ⭐ this repository!
You can also join the `discord server <https://discord.gg/ABaPnm7fDC>`_.

Website: https://skrub-data.org/

What can skrub do?
------------------

The goal of skrub is to bridge the gap between tabular data sources and machine-learning models.

skrub provides high-level tools for joining dataframes (``Joiner``, ``AggJoiner``, ...),
encoding columns (``MinHashEncoder``, ``ToCategorical``, ...), building a pipeline
(``TableVectorizer``, ``tabular_learner``, ...), and explore interactively your data (``TableReport``).

.. figure::
   https://github.com/rcap107/skrub-datasets/blob/master/data/output.gif?raw=true
   :alt: An animation showing how TableReport works

   An animation showing how TableReport works


>>> from skrub.datasets import fetch_employee_salaries
>>> dataset = fetch_employee_salaries()
>>> df = dataset.X
>>> y = dataset.y
>>> df.iloc[0]
gender                                                                     F
department                                                               POL
department_name                                         Department of Police
division                   MSB Information Mgmt and Tech Division Records...
assignment_category                                         Fulltime-Regular
employee_position_title                          Office Services Coordinator
date_first_hired                                                  09/22/1986
year_first_hired                                                        1986

>>> from sklearn.model_selection import cross_val_score
>>> from skrub import tabular_learner
>>> cross_val_score(tabular_learner('regressor'), df, y)
array([0.89370447, 0.89279068, 0.92282557, 0.92319094, 0.92162666])

See our `examples <https://skrub-data.org/stable/auto_examples>`_.

Installation
------------

skrub can easily be installed via ``pip`` or ``conda``. For more installation information, see
the `installation instructions <https://skrub-data.org/stable/install.html>`_.

Contributing
------------

The best way to support the development of skrub is to spread the word!

Also, if you already are a skrub user, we would love to hear about your use cases and challenges in the `Discussions <https://github.com/skrub-data/skrub/discussions>`_ section.

To report a bug or suggest enhancements, please
`open an issue <https://docs.github.com/en/issues/tracking-your-work-with-issues/creating-an-issue>`_.

If you want to contribute directly to the library, then check the
`how to contribute <https://skrub-data.org/stable/CONTRIBUTING.html>`_ page on
the website for more information.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "skrub",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Patricio Cerda <patricio.cerda@inria.fr>",
    "download_url": "https://files.pythonhosted.org/packages/95/b4/947b51a9b47fb5301ac14a6759f4d4fc2baa09e0059167de482a5779b822/skrub-0.4.1.tar.gz",
    "platform": null,
    "description": "skrub\n=====\n\n.. image:: https://skrub-data.github.io/stable/_static/skrub.svg\n   :align: center\n   :width: 50 %\n   :alt: skrub logo\n\n\n|py_ver| |pypi_var| |pypi_dl| |codecov| |circleci| |black|\n\n.. |py_ver| image:: https://img.shields.io/pypi/pyversions/skrub\n.. |pypi_var| image:: https://img.shields.io/pypi/v/skrub?color=informational\n.. |pypi_dl| image:: https://img.shields.io/pypi/dm/skrub\n.. |codecov| image:: https://img.shields.io/codecov/c/github/skrub-data/skrub/main\n.. |circleci| image:: https://img.shields.io/circleci/build/github/skrub-data/skrub/main?label=CircleCI\n.. |black| image:: https://img.shields.io/badge/code%20style-black-000000.svg\n\n\n**skrub** (formerly *dirty_cat*) is a Python\nlibrary that facilitates prepping your tables for machine learning.\n\nIf you like the package, spread the word and \u2b50 this repository!\nYou can also join the `discord server <https://discord.gg/ABaPnm7fDC>`_.\n\nWebsite: https://skrub-data.org/\n\nWhat can skrub do?\n------------------\n\nThe goal of skrub is to bridge the gap between tabular data sources and machine-learning models.\n\nskrub provides high-level tools for joining dataframes (``Joiner``, ``AggJoiner``, ...),\nencoding columns (``MinHashEncoder``, ``ToCategorical``, ...), building a pipeline\n(``TableVectorizer``, ``tabular_learner``, ...), and explore interactively your data (``TableReport``).\n\n.. figure::\n   https://github.com/rcap107/skrub-datasets/blob/master/data/output.gif?raw=true\n   :alt: An animation showing how TableReport works\n\n   An animation showing how TableReport works\n\n\n>>> from skrub.datasets import fetch_employee_salaries\n>>> dataset = fetch_employee_salaries()\n>>> df = dataset.X\n>>> y = dataset.y\n>>> df.iloc[0]\ngender                                                                     F\ndepartment                                                               POL\ndepartment_name                                         Department of Police\ndivision                   MSB Information Mgmt and Tech Division Records...\nassignment_category                                         Fulltime-Regular\nemployee_position_title                          Office Services Coordinator\ndate_first_hired                                                  09/22/1986\nyear_first_hired                                                        1986\n\n>>> from sklearn.model_selection import cross_val_score\n>>> from skrub import tabular_learner\n>>> cross_val_score(tabular_learner('regressor'), df, y)\narray([0.89370447, 0.89279068, 0.92282557, 0.92319094, 0.92162666])\n\nSee our `examples <https://skrub-data.org/stable/auto_examples>`_.\n\nInstallation\n------------\n\nskrub can easily be installed via ``pip`` or ``conda``. For more installation information, see\nthe `installation instructions <https://skrub-data.org/stable/install.html>`_.\n\nContributing\n------------\n\nThe best way to support the development of skrub is to spread the word!\n\nAlso, if you already are a skrub user, we would love to hear about your use cases and challenges in the `Discussions <https://github.com/skrub-data/skrub/discussions>`_ section.\n\nTo report a bug or suggest enhancements, please\n`open an issue <https://docs.github.com/en/issues/tracking-your-work-with-issues/creating-an-issue>`_.\n\nIf you want to contribute directly to the library, then check the\n`how to contribute <https://skrub-data.org/stable/CONTRIBUTING.html>`_ page on\nthe website for more information.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Prepping tables for machine learning",
    "version": "0.4.1",
    "project_urls": {
        "Homepage": "https://skrub-data.org/",
        "Issues": "https://github.com/skrub-data/skrub/issues",
        "Source": "https://github.com/skrub-data/skrub"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e69ab77226bf12a8690a5d8fa7f1198bc4fdd967dc0138f14549d687ea94daea",
                "md5": "e1b49e823425590c8d0ba8833337d71d",
                "sha256": "011940ec1a0c79cbaaf0cd18e83aad09f7071011b8e3e2cebe658c8bfa969d64"
            },
            "downloads": -1,
            "filename": "skrub-0.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e1b49e823425590c8d0ba8833337d71d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 327645,
            "upload_time": "2024-12-11T19:28:02",
            "upload_time_iso_8601": "2024-12-11T19:28:02.364073Z",
            "url": "https://files.pythonhosted.org/packages/e6/9a/b77226bf12a8690a5d8fa7f1198bc4fdd967dc0138f14549d687ea94daea/skrub-0.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "95b4947b51a9b47fb5301ac14a6759f4d4fc2baa09e0059167de482a5779b822",
                "md5": "1d492f8569b1a80c9299331e57fe8184",
                "sha256": "2d32267fcae3aec0af187f209039d78b283fe37ddbee112862b7cefc51f0c2d4"
            },
            "downloads": -1,
            "filename": "skrub-0.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "1d492f8569b1a80c9299331e57fe8184",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 6510113,
            "upload_time": "2024-12-11T19:28:08",
            "upload_time_iso_8601": "2024-12-11T19:28:08.007064Z",
            "url": "https://files.pythonhosted.org/packages/95/b4/947b51a9b47fb5301ac14a6759f4d4fc2baa09e0059167de482a5779b822/skrub-0.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-11 19:28:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "skrub-data",
    "github_project": "skrub",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "circle": true,
    "lcname": "skrub"
}
        
Elapsed time: 1.42770s