formulaic


Nameformulaic JSON
Version 1.0.1 PyPI version JSON
download
home_pageNone
SummaryAn implementation of Wilkinson formulas.
upload_time2023-12-25 05:43:04
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7.2
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # <img src="https://raw.githubusercontent.com/matthewwardrop/formulaic/main/docsite/docs/assets/images/logo_with_text.png" alt="Formulaic" height=100/>

[![PyPI - Version](https://img.shields.io/pypi/v/formulaic.svg)](https://pypi.org/project/formulaic/)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/formulaic.svg)
![PyPI - Status](https://img.shields.io/pypi/status/formulaic.svg)
[![build](https://img.shields.io/github/actions/workflow/status/matthewwardrop/formulaic/tests.yml?branch=main)](https://github.com/matthewwardrop/formulaic/actions?query=workflow%3A%22Run+Tox+Tests%22)
[![docs](https://img.shields.io/github/actions/workflow/status/matthewwardrop/formulaic/publish_docs.yml?label=docs)](https://matthewwardrop.github.io/formulaic/)
[![codecov](https://codecov.io/gh/matthewwardrop/formulaic/branch/main/graph/badge.svg)](https://codecov.io/gh/matthewwardrop/formulaic)
[![Code Style](https://img.shields.io/badge/code%20style-black-black)](https://github.com/psf/black)

Formulaic is a high-performance implementation of Wilkinson formulas for Python.

- **Documentation**: https://matthewwardrop.github.io/formulaic
- **Source Code**: https://github.com/matthewwardrop/formulaic
- **Issue tracker**: https://github.com/matthewwardrop/formulaic/issues


It provides:

- high-performance dataframe to model-matrix conversions.
- support for reusing the encoding choices made during conversion of one data-set on other datasets.
- extensible formula parsing.
- extensible data input/output plugins, with implementations for:
  - input:
    - `pandas.DataFrame`
    - `pyarrow.Table`
  - output:
    - `pandas.DataFrame`
    - `numpy.ndarray`
    - `scipy.sparse.CSCMatrix`
- support for symbolic differentiation of formulas (and hence model matrices).
- and much more.

## Example code

```
import pandas
from formulaic import Formula

df = pandas.DataFrame({
    'y': [0,1,2],
    'x': ['A', 'B', 'C'],
    'z': [0.3, 0.1, 0.2],
})

y, X = Formula('y ~ x + z').get_model_matrix(df)
```

`y = `
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>y</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1</td>
    </tr>
    <tr>
      <th>2</th>
      <td>2</td>
    </tr>
  </tbody>
</table>

`X = `
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Intercept</th>
      <th>x[T.B]</th>
      <th>x[T.C]</th>
      <th>z</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1.0</td>
      <td>0</td>
      <td>0</td>
      <td>0.3</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1.0</td>
      <td>1</td>
      <td>0</td>
      <td>0.1</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1.0</td>
      <td>0</td>
      <td>1</td>
      <td>0.2</td>
    </tr>
  </tbody>
</table>

Note that the above can be short-handed to:

```
from formulaic import model_matrix
model_matrix('y ~ x + z', df)
```

## Benchmarks

Formulaic typically outperforms R for both dense and sparse model matrices, and vastly outperforms `patsy` (the existing implementation for Python) for dense matrices (`patsy` does not support sparse model matrix output).

![Benchmarks](https://github.com/matthewwardrop/formulaic/raw/main/benchmarks/benchmarks.png)

For more details, see [here](benchmarks/README.md).

## Related projects and prior art

- [Patsy](https://github.com/pydata/patsy): a prior implementation of Wilkinson formulas for Python, which is widely used (e.g. in statsmodels). It has fantastic documentation (which helped bootstrap this project), and a rich array of features.
- [StatsModels.jl `@formula`](https://juliastats.org/StatsModels.jl/stable/formula/): The implementation of Wilkinson formulas for Julia.
- [R Formulas](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/formula): The implementation of Wilkinson formulas for R, which is thoroughly introduced [here](https://cran.r-project.org/web/packages/Formula/vignettes/Formula.pdf). [R itself is an implementation of [S](https://en.wikipedia.org/wiki/S_%28programming_language%29), in which formulas were first made popular].
- The work that started it all: Wilkinson, G. N., and C. E. Rogers. Symbolic description of factorial models for analysis of variance. J. Royal Statistics Society 22, pp. 392–399, 1973.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "formulaic",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7.2",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Matthew Wardrop <mpwardrop@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/a8/31/b260a714a78f83da15b3410904a6e12a8ad25a3286b8819e083ff167173b/formulaic-1.0.1.tar.gz",
    "platform": null,
    "description": "# <img src=\"https://raw.githubusercontent.com/matthewwardrop/formulaic/main/docsite/docs/assets/images/logo_with_text.png\" alt=\"Formulaic\" height=100/>\n\n[![PyPI - Version](https://img.shields.io/pypi/v/formulaic.svg)](https://pypi.org/project/formulaic/)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/formulaic.svg)\n![PyPI - Status](https://img.shields.io/pypi/status/formulaic.svg)\n[![build](https://img.shields.io/github/actions/workflow/status/matthewwardrop/formulaic/tests.yml?branch=main)](https://github.com/matthewwardrop/formulaic/actions?query=workflow%3A%22Run+Tox+Tests%22)\n[![docs](https://img.shields.io/github/actions/workflow/status/matthewwardrop/formulaic/publish_docs.yml?label=docs)](https://matthewwardrop.github.io/formulaic/)\n[![codecov](https://codecov.io/gh/matthewwardrop/formulaic/branch/main/graph/badge.svg)](https://codecov.io/gh/matthewwardrop/formulaic)\n[![Code Style](https://img.shields.io/badge/code%20style-black-black)](https://github.com/psf/black)\n\nFormulaic is a high-performance implementation of Wilkinson formulas for Python.\n\n- **Documentation**: https://matthewwardrop.github.io/formulaic\n- **Source Code**: https://github.com/matthewwardrop/formulaic\n- **Issue tracker**: https://github.com/matthewwardrop/formulaic/issues\n\n\nIt provides:\n\n- high-performance dataframe to model-matrix conversions.\n- support for reusing the encoding choices made during conversion of one data-set on other datasets.\n- extensible formula parsing.\n- extensible data input/output plugins, with implementations for:\n  - input:\n    - `pandas.DataFrame`\n    - `pyarrow.Table`\n  - output:\n    - `pandas.DataFrame`\n    - `numpy.ndarray`\n    - `scipy.sparse.CSCMatrix`\n- support for symbolic differentiation of formulas (and hence model matrices).\n- and much more.\n\n## Example code\n\n```\nimport pandas\nfrom formulaic import Formula\n\ndf = pandas.DataFrame({\n    'y': [0,1,2],\n    'x': ['A', 'B', 'C'],\n    'z': [0.3, 0.1, 0.2],\n})\n\ny, X = Formula('y ~ x + z').get_model_matrix(df)\n```\n\n`y = `\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>y</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2</td>\n    </tr>\n  </tbody>\n</table>\n\n`X = `\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Intercept</th>\n      <th>x[T.B]</th>\n      <th>x[T.C]</th>\n      <th>z</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1.0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0.3</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1.0</td>\n      <td>1</td>\n      <td>0</td>\n      <td>0.1</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>1.0</td>\n      <td>0</td>\n      <td>1</td>\n      <td>0.2</td>\n    </tr>\n  </tbody>\n</table>\n\nNote that the above can be short-handed to:\n\n```\nfrom formulaic import model_matrix\nmodel_matrix('y ~ x + z', df)\n```\n\n## Benchmarks\n\nFormulaic typically outperforms R for both dense and sparse model matrices, and vastly outperforms `patsy` (the existing implementation for Python) for dense matrices (`patsy` does not support sparse model matrix output).\n\n![Benchmarks](https://github.com/matthewwardrop/formulaic/raw/main/benchmarks/benchmarks.png)\n\nFor more details, see [here](benchmarks/README.md).\n\n## Related projects and prior art\n\n- [Patsy](https://github.com/pydata/patsy): a prior implementation of Wilkinson formulas for Python, which is widely used (e.g. in statsmodels). It has fantastic documentation (which helped bootstrap this project), and a rich array of features.\n- [StatsModels.jl `@formula`](https://juliastats.org/StatsModels.jl/stable/formula/): The implementation of Wilkinson formulas for Julia.\n- [R Formulas](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/formula): The implementation of Wilkinson formulas for R, which is thoroughly introduced [here](https://cran.r-project.org/web/packages/Formula/vignettes/Formula.pdf). [R itself is an implementation of [S](https://en.wikipedia.org/wiki/S_%28programming_language%29), in which formulas were first made popular].\n- The work that started it all: Wilkinson, G. N., and C. E. Rogers. Symbolic description of factorial models for analysis of variance. J. Royal Statistics Society 22, pp. 392\u2013399, 1973.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "An implementation of Wilkinson formulas.",
    "version": "1.0.1",
    "project_urls": {
        "documentation": "https://matthewwardrop.github.io/formulaic",
        "repository": "https://github.com/matthewwardrop/formulaic"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2c097a9f95d35106d882f79ddabc2d33d8f2a262863f1f5d6fd00f46c5fc90aa",
                "md5": "b3499bb51f096d3209067ba21a052568",
                "sha256": "be9e6127b98ba0293654be528fd070ede264f64d0247477c4d5af799b0b12cf2"
            },
            "downloads": -1,
            "filename": "formulaic-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b3499bb51f096d3209067ba21a052568",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7.2",
            "size": 94197,
            "upload_time": "2023-12-25T05:43:06",
            "upload_time_iso_8601": "2023-12-25T05:43:06.504867Z",
            "url": "https://files.pythonhosted.org/packages/2c/09/7a9f95d35106d882f79ddabc2d33d8f2a262863f1f5d6fd00f46c5fc90aa/formulaic-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a831b260a714a78f83da15b3410904a6e12a8ad25a3286b8819e083ff167173b",
                "md5": "61f81f6b93ec712fa551f7e5b1415633",
                "sha256": "64dd7992a7aa5bbceb1e40679d0f01fc6f0ba12b7d23d78094a88c2edc68fba1"
            },
            "downloads": -1,
            "filename": "formulaic-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "61f81f6b93ec712fa551f7e5b1415633",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7.2",
            "size": 430137,
            "upload_time": "2023-12-25T05:43:04",
            "upload_time_iso_8601": "2023-12-25T05:43:04.661666Z",
            "url": "https://files.pythonhosted.org/packages/a8/31/b260a714a78f83da15b3410904a6e12a8ad25a3286b8819e083ff167173b/formulaic-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-25 05:43:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "matthewwardrop",
    "github_project": "formulaic",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "formulaic"
}
        
Elapsed time: 0.16165s