formulaic


Nameformulaic JSON
Version 1.1.1 PyPI version JSON
download
home_pageNone
SummaryAn implementation of Wilkinson formulas.
upload_time2024-12-20 19:37:04
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7.2
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # <img src="https://raw.githubusercontent.com/matthewwardrop/formulaic/main/docsite/docs/assets/images/logo_with_text.png" alt="Formulaic" height=100/>

[![PyPI - Version](https://img.shields.io/pypi/v/formulaic.svg)](https://pypi.org/project/formulaic/)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/formulaic.svg)
![PyPI - Status](https://img.shields.io/pypi/status/formulaic.svg)
[![build](https://img.shields.io/github/actions/workflow/status/matthewwardrop/formulaic/tests.yml?branch=main)](https://github.com/matthewwardrop/formulaic/actions?query=workflow%3A%22Run+Tox+Tests%22)
[![docs](https://img.shields.io/github/actions/workflow/status/matthewwardrop/formulaic/publish_docs.yml?label=docs)](https://matthewwardrop.github.io/formulaic/)
[![codecov](https://codecov.io/gh/matthewwardrop/formulaic/branch/main/graph/badge.svg)](https://codecov.io/gh/matthewwardrop/formulaic)
[![Code Style](https://img.shields.io/badge/code%20style-black-black)](https://github.com/psf/black)

Formulaic is a high-performance implementation of Wilkinson formulas for Python.

- **Documentation**: https://matthewwardrop.github.io/formulaic
- **Source Code**: https://github.com/matthewwardrop/formulaic
- **Issue tracker**: https://github.com/matthewwardrop/formulaic/issues


It provides:

- high-performance dataframe to model-matrix conversions.
- support for reusing the encoding choices made during conversion of one data-set on other datasets.
- extensible formula parsing.
- extensible data input/output plugins, with implementations for:
  - input:
    - `pandas.DataFrame`
    - `pyarrow.Table`
  - output:
    - `pandas.DataFrame`
    - `numpy.ndarray`
    - `scipy.sparse.CSCMatrix`
- support for symbolic differentiation of formulas (and hence model matrices).
- and much more.

## Example code

```
import pandas
from formulaic import Formula

df = pandas.DataFrame({
    'y': [0, 1, 2],
    'x': ['A', 'B', 'C'],
    'z': [0.3, 0.1, 0.2],
})

y, X = Formula('y ~ x + z').get_model_matrix(df)
```

`y = `
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>y</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1</td>
    </tr>
    <tr>
      <th>2</th>
      <td>2</td>
    </tr>
  </tbody>
</table>

`X = `
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Intercept</th>
      <th>x[T.B]</th>
      <th>x[T.C]</th>
      <th>z</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1.0</td>
      <td>0</td>
      <td>0</td>
      <td>0.3</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1.0</td>
      <td>1</td>
      <td>0</td>
      <td>0.1</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1.0</td>
      <td>0</td>
      <td>1</td>
      <td>0.2</td>
    </tr>
  </tbody>
</table>

Note that the above can be short-handed to:

```
from formulaic import model_matrix
model_matrix('y ~ x + z', df)
```

## Benchmarks

Formulaic typically outperforms R for both dense and sparse model matrices, and vastly outperforms `patsy` (the existing implementation for Python) for dense matrices (`patsy` does not support sparse model matrix output).

![Benchmarks](https://github.com/matthewwardrop/formulaic/raw/main/benchmarks/benchmarks.png)

For more details, see [here](benchmarks/README.md).

## Related projects and prior art

- [Patsy](https://github.com/pydata/patsy): a prior implementation of Wilkinson formulas for Python, which is widely used (e.g. in statsmodels). It has fantastic documentation (which helped bootstrap this project), and a rich array of features.
- [StatsModels.jl `@formula`](https://juliastats.org/StatsModels.jl/stable/formula/): The implementation of Wilkinson formulas for Julia.
- [R Formulas](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/formula): The implementation of Wilkinson formulas for R, which is thoroughly introduced [here](https://cran.r-project.org/web/packages/Formula/vignettes/Formula.pdf). [R itself is an implementation of [S](https://en.wikipedia.org/wiki/S_%28programming_language%29), in which formulas were first made popular].
- The work that started it all: Wilkinson, G. N., and C. E. Rogers. Symbolic description of factorial models for analysis of variance. J. Royal Statistics Society 22, pp. 392–399, 1973.

## Used by

Below are some of the projects that use Formulaic:

- [Glum](https://github.com/Quantco/glum): High performance Python GLM's with all the features.
- [Lifelines](https://github.com/camDavidsonPilon/lifelines): Survival analysis in Python.
- [Linearmodels](https://github.com/bashtage/linearmodels): Additional linear models including instrumental variable and panel data models that are missing from statsmodels.
- [Pyfixest](https://github.com/s3alfisc/pyfixest): Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax.
- [Tabmat](https://github.com/Quantco/tabmat): Efficient matrix representations for working with tabular data.
- Add your project here!

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "formulaic",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7.2",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Matthew Wardrop <mpwardrop@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/5c/30/03b5e3bb62374db3f665ca3020fdfc4304e98ceeaaa9dcd7a47a6b574ebf/formulaic-1.1.1.tar.gz",
    "platform": null,
    "description": "# <img src=\"https://raw.githubusercontent.com/matthewwardrop/formulaic/main/docsite/docs/assets/images/logo_with_text.png\" alt=\"Formulaic\" height=100/>\n\n[![PyPI - Version](https://img.shields.io/pypi/v/formulaic.svg)](https://pypi.org/project/formulaic/)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/formulaic.svg)\n![PyPI - Status](https://img.shields.io/pypi/status/formulaic.svg)\n[![build](https://img.shields.io/github/actions/workflow/status/matthewwardrop/formulaic/tests.yml?branch=main)](https://github.com/matthewwardrop/formulaic/actions?query=workflow%3A%22Run+Tox+Tests%22)\n[![docs](https://img.shields.io/github/actions/workflow/status/matthewwardrop/formulaic/publish_docs.yml?label=docs)](https://matthewwardrop.github.io/formulaic/)\n[![codecov](https://codecov.io/gh/matthewwardrop/formulaic/branch/main/graph/badge.svg)](https://codecov.io/gh/matthewwardrop/formulaic)\n[![Code Style](https://img.shields.io/badge/code%20style-black-black)](https://github.com/psf/black)\n\nFormulaic is a high-performance implementation of Wilkinson formulas for Python.\n\n- **Documentation**: https://matthewwardrop.github.io/formulaic\n- **Source Code**: https://github.com/matthewwardrop/formulaic\n- **Issue tracker**: https://github.com/matthewwardrop/formulaic/issues\n\n\nIt provides:\n\n- high-performance dataframe to model-matrix conversions.\n- support for reusing the encoding choices made during conversion of one data-set on other datasets.\n- extensible formula parsing.\n- extensible data input/output plugins, with implementations for:\n  - input:\n    - `pandas.DataFrame`\n    - `pyarrow.Table`\n  - output:\n    - `pandas.DataFrame`\n    - `numpy.ndarray`\n    - `scipy.sparse.CSCMatrix`\n- support for symbolic differentiation of formulas (and hence model matrices).\n- and much more.\n\n## Example code\n\n```\nimport pandas\nfrom formulaic import Formula\n\ndf = pandas.DataFrame({\n    'y': [0, 1, 2],\n    'x': ['A', 'B', 'C'],\n    'z': [0.3, 0.1, 0.2],\n})\n\ny, X = Formula('y ~ x + z').get_model_matrix(df)\n```\n\n`y = `\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>y</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2</td>\n    </tr>\n  </tbody>\n</table>\n\n`X = `\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Intercept</th>\n      <th>x[T.B]</th>\n      <th>x[T.C]</th>\n      <th>z</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1.0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0.3</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1.0</td>\n      <td>1</td>\n      <td>0</td>\n      <td>0.1</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>1.0</td>\n      <td>0</td>\n      <td>1</td>\n      <td>0.2</td>\n    </tr>\n  </tbody>\n</table>\n\nNote that the above can be short-handed to:\n\n```\nfrom formulaic import model_matrix\nmodel_matrix('y ~ x + z', df)\n```\n\n## Benchmarks\n\nFormulaic typically outperforms R for both dense and sparse model matrices, and vastly outperforms `patsy` (the existing implementation for Python) for dense matrices (`patsy` does not support sparse model matrix output).\n\n![Benchmarks](https://github.com/matthewwardrop/formulaic/raw/main/benchmarks/benchmarks.png)\n\nFor more details, see [here](benchmarks/README.md).\n\n## Related projects and prior art\n\n- [Patsy](https://github.com/pydata/patsy): a prior implementation of Wilkinson formulas for Python, which is widely used (e.g. in statsmodels). It has fantastic documentation (which helped bootstrap this project), and a rich array of features.\n- [StatsModels.jl `@formula`](https://juliastats.org/StatsModels.jl/stable/formula/): The implementation of Wilkinson formulas for Julia.\n- [R Formulas](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/formula): The implementation of Wilkinson formulas for R, which is thoroughly introduced [here](https://cran.r-project.org/web/packages/Formula/vignettes/Formula.pdf). [R itself is an implementation of [S](https://en.wikipedia.org/wiki/S_%28programming_language%29), in which formulas were first made popular].\n- The work that started it all: Wilkinson, G. N., and C. E. Rogers. Symbolic description of factorial models for analysis of variance. J. Royal Statistics Society 22, pp. 392\u2013399, 1973.\n\n## Used by\n\nBelow are some of the projects that use Formulaic:\n\n- [Glum](https://github.com/Quantco/glum): High performance Python GLM's with all the features.\n- [Lifelines](https://github.com/camDavidsonPilon/lifelines): Survival analysis in Python.\n- [Linearmodels](https://github.com/bashtage/linearmodels): Additional linear models including instrumental variable and panel data models that are missing from statsmodels.\n- [Pyfixest](https://github.com/s3alfisc/pyfixest): Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax.\n- [Tabmat](https://github.com/Quantco/tabmat): Efficient matrix representations for working with tabular data.\n- Add your project here!\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "An implementation of Wilkinson formulas.",
    "version": "1.1.1",
    "project_urls": {
        "documentation": "https://matthewwardrop.github.io/formulaic",
        "repository": "https://github.com/matthewwardrop/formulaic"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d2c2a34097e53efe70a538ae97574ff9e9866e60fc1c792c19da5fd6b56ce7b5",
                "md5": "ed2f10cf8c26dc8be0e0253f8ed71972",
                "sha256": "bbb7e38f99e4bcdc62cb0a6a818ad33b370b4e98e9e4f0b276561448482c8268"
            },
            "downloads": -1,
            "filename": "formulaic-1.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ed2f10cf8c26dc8be0e0253f8ed71972",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7.2",
            "size": 115718,
            "upload_time": "2024-12-20T19:37:07",
            "upload_time_iso_8601": "2024-12-20T19:37:07.055056Z",
            "url": "https://files.pythonhosted.org/packages/d2/c2/a34097e53efe70a538ae97574ff9e9866e60fc1c792c19da5fd6b56ce7b5/formulaic-1.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5c3003b5e3bb62374db3f665ca3020fdfc4304e98ceeaaa9dcd7a47a6b574ebf",
                "md5": "0c72819f100bd5d6fbe4ae1be894534e",
                "sha256": "ddf80e4bef976dd99698aa27512015276c7b86c314b601ae6fd360c7741b7231"
            },
            "downloads": -1,
            "filename": "formulaic-1.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "0c72819f100bd5d6fbe4ae1be894534e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7.2",
            "size": 652602,
            "upload_time": "2024-12-20T19:37:04",
            "upload_time_iso_8601": "2024-12-20T19:37:04.660107Z",
            "url": "https://files.pythonhosted.org/packages/5c/30/03b5e3bb62374db3f665ca3020fdfc4304e98ceeaaa9dcd7a47a6b574ebf/formulaic-1.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-20 19:37:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "matthewwardrop",
    "github_project": "formulaic",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "formulaic"
}
        
Elapsed time: 0.43349s