Name | formulaic JSON |
Version |
1.1.1
JSON |
| download |
home_page | None |
Summary | An implementation of Wilkinson formulas. |
upload_time | 2024-12-20 19:37:04 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.7.2 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# <img src="https://raw.githubusercontent.com/matthewwardrop/formulaic/main/docsite/docs/assets/images/logo_with_text.png" alt="Formulaic" height=100/>
[![PyPI - Version](https://img.shields.io/pypi/v/formulaic.svg)](https://pypi.org/project/formulaic/)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/formulaic.svg)
![PyPI - Status](https://img.shields.io/pypi/status/formulaic.svg)
[![build](https://img.shields.io/github/actions/workflow/status/matthewwardrop/formulaic/tests.yml?branch=main)](https://github.com/matthewwardrop/formulaic/actions?query=workflow%3A%22Run+Tox+Tests%22)
[![docs](https://img.shields.io/github/actions/workflow/status/matthewwardrop/formulaic/publish_docs.yml?label=docs)](https://matthewwardrop.github.io/formulaic/)
[![codecov](https://codecov.io/gh/matthewwardrop/formulaic/branch/main/graph/badge.svg)](https://codecov.io/gh/matthewwardrop/formulaic)
[![Code Style](https://img.shields.io/badge/code%20style-black-black)](https://github.com/psf/black)
Formulaic is a high-performance implementation of Wilkinson formulas for Python.
- **Documentation**: https://matthewwardrop.github.io/formulaic
- **Source Code**: https://github.com/matthewwardrop/formulaic
- **Issue tracker**: https://github.com/matthewwardrop/formulaic/issues
It provides:
- high-performance dataframe to model-matrix conversions.
- support for reusing the encoding choices made during conversion of one data-set on other datasets.
- extensible formula parsing.
- extensible data input/output plugins, with implementations for:
- input:
- `pandas.DataFrame`
- `pyarrow.Table`
- output:
- `pandas.DataFrame`
- `numpy.ndarray`
- `scipy.sparse.CSCMatrix`
- support for symbolic differentiation of formulas (and hence model matrices).
- and much more.
## Example code
```
import pandas
from formulaic import Formula
df = pandas.DataFrame({
'y': [0, 1, 2],
'x': ['A', 'B', 'C'],
'z': [0.3, 0.1, 0.2],
})
y, X = Formula('y ~ x + z').get_model_matrix(df)
```
`y = `
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>y</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>0</td>
</tr>
<tr>
<th>1</th>
<td>1</td>
</tr>
<tr>
<th>2</th>
<td>2</td>
</tr>
</tbody>
</table>
`X = `
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Intercept</th>
<th>x[T.B]</th>
<th>x[T.C]</th>
<th>z</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1.0</td>
<td>0</td>
<td>0</td>
<td>0.3</td>
</tr>
<tr>
<th>1</th>
<td>1.0</td>
<td>1</td>
<td>0</td>
<td>0.1</td>
</tr>
<tr>
<th>2</th>
<td>1.0</td>
<td>0</td>
<td>1</td>
<td>0.2</td>
</tr>
</tbody>
</table>
Note that the above can be short-handed to:
```
from formulaic import model_matrix
model_matrix('y ~ x + z', df)
```
## Benchmarks
Formulaic typically outperforms R for both dense and sparse model matrices, and vastly outperforms `patsy` (the existing implementation for Python) for dense matrices (`patsy` does not support sparse model matrix output).
![Benchmarks](https://github.com/matthewwardrop/formulaic/raw/main/benchmarks/benchmarks.png)
For more details, see [here](benchmarks/README.md).
## Related projects and prior art
- [Patsy](https://github.com/pydata/patsy): a prior implementation of Wilkinson formulas for Python, which is widely used (e.g. in statsmodels). It has fantastic documentation (which helped bootstrap this project), and a rich array of features.
- [StatsModels.jl `@formula`](https://juliastats.org/StatsModels.jl/stable/formula/): The implementation of Wilkinson formulas for Julia.
- [R Formulas](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/formula): The implementation of Wilkinson formulas for R, which is thoroughly introduced [here](https://cran.r-project.org/web/packages/Formula/vignettes/Formula.pdf). [R itself is an implementation of [S](https://en.wikipedia.org/wiki/S_%28programming_language%29), in which formulas were first made popular].
- The work that started it all: Wilkinson, G. N., and C. E. Rogers. Symbolic description of factorial models for analysis of variance. J. Royal Statistics Society 22, pp. 392–399, 1973.
## Used by
Below are some of the projects that use Formulaic:
- [Glum](https://github.com/Quantco/glum): High performance Python GLM's with all the features.
- [Lifelines](https://github.com/camDavidsonPilon/lifelines): Survival analysis in Python.
- [Linearmodels](https://github.com/bashtage/linearmodels): Additional linear models including instrumental variable and panel data models that are missing from statsmodels.
- [Pyfixest](https://github.com/s3alfisc/pyfixest): Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax.
- [Tabmat](https://github.com/Quantco/tabmat): Efficient matrix representations for working with tabular data.
- Add your project here!
Raw data
{
"_id": null,
"home_page": null,
"name": "formulaic",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7.2",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Matthew Wardrop <mpwardrop@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/5c/30/03b5e3bb62374db3f665ca3020fdfc4304e98ceeaaa9dcd7a47a6b574ebf/formulaic-1.1.1.tar.gz",
"platform": null,
"description": "# <img src=\"https://raw.githubusercontent.com/matthewwardrop/formulaic/main/docsite/docs/assets/images/logo_with_text.png\" alt=\"Formulaic\" height=100/>\n\n[![PyPI - Version](https://img.shields.io/pypi/v/formulaic.svg)](https://pypi.org/project/formulaic/)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/formulaic.svg)\n![PyPI - Status](https://img.shields.io/pypi/status/formulaic.svg)\n[![build](https://img.shields.io/github/actions/workflow/status/matthewwardrop/formulaic/tests.yml?branch=main)](https://github.com/matthewwardrop/formulaic/actions?query=workflow%3A%22Run+Tox+Tests%22)\n[![docs](https://img.shields.io/github/actions/workflow/status/matthewwardrop/formulaic/publish_docs.yml?label=docs)](https://matthewwardrop.github.io/formulaic/)\n[![codecov](https://codecov.io/gh/matthewwardrop/formulaic/branch/main/graph/badge.svg)](https://codecov.io/gh/matthewwardrop/formulaic)\n[![Code Style](https://img.shields.io/badge/code%20style-black-black)](https://github.com/psf/black)\n\nFormulaic is a high-performance implementation of Wilkinson formulas for Python.\n\n- **Documentation**: https://matthewwardrop.github.io/formulaic\n- **Source Code**: https://github.com/matthewwardrop/formulaic\n- **Issue tracker**: https://github.com/matthewwardrop/formulaic/issues\n\n\nIt provides:\n\n- high-performance dataframe to model-matrix conversions.\n- support for reusing the encoding choices made during conversion of one data-set on other datasets.\n- extensible formula parsing.\n- extensible data input/output plugins, with implementations for:\n - input:\n - `pandas.DataFrame`\n - `pyarrow.Table`\n - output:\n - `pandas.DataFrame`\n - `numpy.ndarray`\n - `scipy.sparse.CSCMatrix`\n- support for symbolic differentiation of formulas (and hence model matrices).\n- and much more.\n\n## Example code\n\n```\nimport pandas\nfrom formulaic import Formula\n\ndf = pandas.DataFrame({\n 'y': [0, 1, 2],\n 'x': ['A', 'B', 'C'],\n 'z': [0.3, 0.1, 0.2],\n})\n\ny, X = Formula('y ~ x + z').get_model_matrix(df)\n```\n\n`y = `\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>y</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2</td>\n </tr>\n </tbody>\n</table>\n\n`X = `\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Intercept</th>\n <th>x[T.B]</th>\n <th>x[T.C]</th>\n <th>z</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1.0</td>\n <td>0</td>\n <td>0</td>\n <td>0.3</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1.0</td>\n <td>1</td>\n <td>0</td>\n <td>0.1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1.0</td>\n <td>0</td>\n <td>1</td>\n <td>0.2</td>\n </tr>\n </tbody>\n</table>\n\nNote that the above can be short-handed to:\n\n```\nfrom formulaic import model_matrix\nmodel_matrix('y ~ x + z', df)\n```\n\n## Benchmarks\n\nFormulaic typically outperforms R for both dense and sparse model matrices, and vastly outperforms `patsy` (the existing implementation for Python) for dense matrices (`patsy` does not support sparse model matrix output).\n\n![Benchmarks](https://github.com/matthewwardrop/formulaic/raw/main/benchmarks/benchmarks.png)\n\nFor more details, see [here](benchmarks/README.md).\n\n## Related projects and prior art\n\n- [Patsy](https://github.com/pydata/patsy): a prior implementation of Wilkinson formulas for Python, which is widely used (e.g. in statsmodels). It has fantastic documentation (which helped bootstrap this project), and a rich array of features.\n- [StatsModels.jl `@formula`](https://juliastats.org/StatsModels.jl/stable/formula/): The implementation of Wilkinson formulas for Julia.\n- [R Formulas](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/formula): The implementation of Wilkinson formulas for R, which is thoroughly introduced [here](https://cran.r-project.org/web/packages/Formula/vignettes/Formula.pdf). [R itself is an implementation of [S](https://en.wikipedia.org/wiki/S_%28programming_language%29), in which formulas were first made popular].\n- The work that started it all: Wilkinson, G. N., and C. E. Rogers. Symbolic description of factorial models for analysis of variance. J. Royal Statistics Society 22, pp. 392\u2013399, 1973.\n\n## Used by\n\nBelow are some of the projects that use Formulaic:\n\n- [Glum](https://github.com/Quantco/glum): High performance Python GLM's with all the features.\n- [Lifelines](https://github.com/camDavidsonPilon/lifelines): Survival analysis in Python.\n- [Linearmodels](https://github.com/bashtage/linearmodels): Additional linear models including instrumental variable and panel data models that are missing from statsmodels.\n- [Pyfixest](https://github.com/s3alfisc/pyfixest): Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax.\n- [Tabmat](https://github.com/Quantco/tabmat): Efficient matrix representations for working with tabular data.\n- Add your project here!\n",
"bugtrack_url": null,
"license": null,
"summary": "An implementation of Wilkinson formulas.",
"version": "1.1.1",
"project_urls": {
"documentation": "https://matthewwardrop.github.io/formulaic",
"repository": "https://github.com/matthewwardrop/formulaic"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "d2c2a34097e53efe70a538ae97574ff9e9866e60fc1c792c19da5fd6b56ce7b5",
"md5": "ed2f10cf8c26dc8be0e0253f8ed71972",
"sha256": "bbb7e38f99e4bcdc62cb0a6a818ad33b370b4e98e9e4f0b276561448482c8268"
},
"downloads": -1,
"filename": "formulaic-1.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ed2f10cf8c26dc8be0e0253f8ed71972",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7.2",
"size": 115718,
"upload_time": "2024-12-20T19:37:07",
"upload_time_iso_8601": "2024-12-20T19:37:07.055056Z",
"url": "https://files.pythonhosted.org/packages/d2/c2/a34097e53efe70a538ae97574ff9e9866e60fc1c792c19da5fd6b56ce7b5/formulaic-1.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5c3003b5e3bb62374db3f665ca3020fdfc4304e98ceeaaa9dcd7a47a6b574ebf",
"md5": "0c72819f100bd5d6fbe4ae1be894534e",
"sha256": "ddf80e4bef976dd99698aa27512015276c7b86c314b601ae6fd360c7741b7231"
},
"downloads": -1,
"filename": "formulaic-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "0c72819f100bd5d6fbe4ae1be894534e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7.2",
"size": 652602,
"upload_time": "2024-12-20T19:37:04",
"upload_time_iso_8601": "2024-12-20T19:37:04.660107Z",
"url": "https://files.pythonhosted.org/packages/5c/30/03b5e3bb62374db3f665ca3020fdfc4304e98ceeaaa9dcd7a47a6b574ebf/formulaic-1.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-20 19:37:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "matthewwardrop",
"github_project": "formulaic",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "formulaic"
}