Name | formulaic JSON |
Version |
1.2.0
JSON |
| download |
home_page | None |
Summary | An implementation of Wilkinson formulas. |
upload_time | 2025-07-14 06:40:34 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | None |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# <img src="https://raw.githubusercontent.com/matthewwardrop/formulaic/main/docsite/docs/assets/images/logo_with_text.png" alt="Formulaic" height=100/>
[](https://pypi.org/project/formulaic/)


[](https://github.com/matthewwardrop/formulaic/actions?query=workflow%3A%22Run+Tox+Tests%22)
[](https://matthewwardrop.github.io/formulaic/)
[](https://codecov.io/gh/matthewwardrop/formulaic)
[](https://github.com/psf/black)
Formulaic is a high-performance implementation of Wilkinson formulas for Python.
- **Documentation**: https://matthewwardrop.github.io/formulaic
- **Source Code**: https://github.com/matthewwardrop/formulaic
- **Issue tracker**: https://github.com/matthewwardrop/formulaic/issues
It provides:
- high-performance dataframe to model-matrix conversions.
- support for reusing the encoding choices made during conversion of one data-set on other datasets.
- extensible formula parsing.
- extensible data input/output plugins, with implementations for:
- input:
- `pandas.DataFrame`
- Any dataframe representation supported by [`narwhals`](https://narwhals-dev.github.io/narwhals/) including
- `pyarrow.Table`
- `polars.DataFrame`
- ...
- output:
- `pandas.DataFrame`
- `numpy.ndarray`
- `scipy.sparse.CSCMatrix`
- `narwhals` dataframe passthrough when using narwhals dataframes.
- support for symbolic differentiation of formulas (and hence model matrices).
- and much more.
## Example code
```
import pandas
from formulaic import Formula
df = pandas.DataFrame({
'y': [0, 1, 2],
'x': ['A', 'B', 'C'],
'z': [0.3, 0.1, 0.2],
})
y, X = Formula('y ~ x + z').get_model_matrix(df)
```
`y = `
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>y</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>0</td>
</tr>
<tr>
<th>1</th>
<td>1</td>
</tr>
<tr>
<th>2</th>
<td>2</td>
</tr>
</tbody>
</table>
`X = `
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Intercept</th>
<th>x[T.B]</th>
<th>x[T.C]</th>
<th>z</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1.0</td>
<td>0</td>
<td>0</td>
<td>0.3</td>
</tr>
<tr>
<th>1</th>
<td>1.0</td>
<td>1</td>
<td>0</td>
<td>0.1</td>
</tr>
<tr>
<th>2</th>
<td>1.0</td>
<td>0</td>
<td>1</td>
<td>0.2</td>
</tr>
</tbody>
</table>
Note that the above can be short-handed to:
```
from formulaic import model_matrix
model_matrix('y ~ x + z', df)
```
## Benchmarks
Formulaic typically outperforms R for both dense and sparse model matrices, and vastly outperforms `patsy` (the existing implementation for Python) for dense matrices (`patsy` does not support sparse model matrix output).

For more details, see [here](benchmarks/README.md).
## Related projects and prior art
- [Patsy](https://github.com/pydata/patsy): a prior implementation of Wilkinson formulas for Python, which is widely used (e.g. in statsmodels). It has fantastic documentation (which helped bootstrap this project), and a rich array of features.
- [StatsModels.jl `@formula`](https://juliastats.org/StatsModels.jl/stable/formula/): The implementation of Wilkinson formulas for Julia.
- [R Formulas](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/formula): The implementation of Wilkinson formulas for R, which is thoroughly introduced [here](https://cran.r-project.org/web/packages/Formula/vignettes/Formula.pdf). [R itself is an implementation of [S](https://en.wikipedia.org/wiki/S_%28programming_language%29), in which formulas were first made popular].
- The work that started it all: Wilkinson, G. N., and C. E. Rogers. Symbolic description of factorial models for analysis of variance. J. Royal Statistics Society 22, pp. 392–399, 1973.
## Used by
Below are some of the projects that use Formulaic:
- [Glum](https://github.com/Quantco/glum): High performance Python GLM's with all the features.
- [Lifelines](https://github.com/camDavidsonPilon/lifelines): Survival analysis in Python.
- [Linearmodels](https://github.com/bashtage/linearmodels): Additional linear models including instrumental variable and panel data models that are missing from statsmodels.
- [Pyfixest](https://github.com/s3alfisc/pyfixest): Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax.
- [Tabmat](https://github.com/Quantco/tabmat): Efficient matrix representations for working with tabular data.
- Add your project here!
Raw data
{
"_id": null,
"home_page": null,
"name": "formulaic",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Matthew Wardrop <mpwardrop@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/b0/9d/e9c835c98cac77d76087ae88f1ced12c9a093ca5b7c5f07e2dae2876b7b6/formulaic-1.2.0.tar.gz",
"platform": null,
"description": "# <img src=\"https://raw.githubusercontent.com/matthewwardrop/formulaic/main/docsite/docs/assets/images/logo_with_text.png\" alt=\"Formulaic\" height=100/>\n\n[](https://pypi.org/project/formulaic/)\n\n\n[](https://github.com/matthewwardrop/formulaic/actions?query=workflow%3A%22Run+Tox+Tests%22)\n[](https://matthewwardrop.github.io/formulaic/)\n[](https://codecov.io/gh/matthewwardrop/formulaic)\n[](https://github.com/psf/black)\n\nFormulaic is a high-performance implementation of Wilkinson formulas for Python.\n\n- **Documentation**: https://matthewwardrop.github.io/formulaic\n- **Source Code**: https://github.com/matthewwardrop/formulaic\n- **Issue tracker**: https://github.com/matthewwardrop/formulaic/issues\n\n\nIt provides:\n\n- high-performance dataframe to model-matrix conversions.\n- support for reusing the encoding choices made during conversion of one data-set on other datasets.\n- extensible formula parsing.\n- extensible data input/output plugins, with implementations for:\n - input:\n - `pandas.DataFrame`\n - Any dataframe representation supported by [`narwhals`](https://narwhals-dev.github.io/narwhals/) including\n - `pyarrow.Table`\n - `polars.DataFrame`\n - ...\n - output:\n - `pandas.DataFrame`\n - `numpy.ndarray`\n - `scipy.sparse.CSCMatrix`\n - `narwhals` dataframe passthrough when using narwhals dataframes.\n- support for symbolic differentiation of formulas (and hence model matrices).\n- and much more.\n\n## Example code\n\n```\nimport pandas\nfrom formulaic import Formula\n\ndf = pandas.DataFrame({\n 'y': [0, 1, 2],\n 'x': ['A', 'B', 'C'],\n 'z': [0.3, 0.1, 0.2],\n})\n\ny, X = Formula('y ~ x + z').get_model_matrix(df)\n```\n\n`y = `\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>y</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2</td>\n </tr>\n </tbody>\n</table>\n\n`X = `\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Intercept</th>\n <th>x[T.B]</th>\n <th>x[T.C]</th>\n <th>z</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1.0</td>\n <td>0</td>\n <td>0</td>\n <td>0.3</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1.0</td>\n <td>1</td>\n <td>0</td>\n <td>0.1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1.0</td>\n <td>0</td>\n <td>1</td>\n <td>0.2</td>\n </tr>\n </tbody>\n</table>\n\nNote that the above can be short-handed to:\n\n```\nfrom formulaic import model_matrix\nmodel_matrix('y ~ x + z', df)\n```\n\n## Benchmarks\n\nFormulaic typically outperforms R for both dense and sparse model matrices, and vastly outperforms `patsy` (the existing implementation for Python) for dense matrices (`patsy` does not support sparse model matrix output).\n\n\n\nFor more details, see [here](benchmarks/README.md).\n\n## Related projects and prior art\n\n- [Patsy](https://github.com/pydata/patsy): a prior implementation of Wilkinson formulas for Python, which is widely used (e.g. in statsmodels). It has fantastic documentation (which helped bootstrap this project), and a rich array of features.\n- [StatsModels.jl `@formula`](https://juliastats.org/StatsModels.jl/stable/formula/): The implementation of Wilkinson formulas for Julia.\n- [R Formulas](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/formula): The implementation of Wilkinson formulas for R, which is thoroughly introduced [here](https://cran.r-project.org/web/packages/Formula/vignettes/Formula.pdf). [R itself is an implementation of [S](https://en.wikipedia.org/wiki/S_%28programming_language%29), in which formulas were first made popular].\n- The work that started it all: Wilkinson, G. N., and C. E. Rogers. Symbolic description of factorial models for analysis of variance. J. Royal Statistics Society 22, pp. 392\u2013399, 1973.\n\n## Used by\n\nBelow are some of the projects that use Formulaic:\n\n- [Glum](https://github.com/Quantco/glum): High performance Python GLM's with all the features.\n- [Lifelines](https://github.com/camDavidsonPilon/lifelines): Survival analysis in Python.\n- [Linearmodels](https://github.com/bashtage/linearmodels): Additional linear models including instrumental variable and panel data models that are missing from statsmodels.\n- [Pyfixest](https://github.com/s3alfisc/pyfixest): Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax.\n- [Tabmat](https://github.com/Quantco/tabmat): Efficient matrix representations for working with tabular data.\n- Add your project here!\n",
"bugtrack_url": null,
"license": null,
"summary": "An implementation of Wilkinson formulas.",
"version": "1.2.0",
"project_urls": {
"documentation": "https://matthewwardrop.github.io/formulaic",
"repository": "https://github.com/matthewwardrop/formulaic"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "ddc47e791ff347c3f8eb394596d7a6e6e0a48ef12bbbc3fcc99240e36a6ddd01",
"md5": "dc3638e5aa8f23cc615a33e75523eff7",
"sha256": "2c5f48886a979f034e9fce67b8a36223aa22020256a291abceecd6763291193b"
},
"downloads": -1,
"filename": "formulaic-1.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dc3638e5aa8f23cc615a33e75523eff7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 117219,
"upload_time": "2025-07-14T06:40:32",
"upload_time_iso_8601": "2025-07-14T06:40:32.596232Z",
"url": "https://files.pythonhosted.org/packages/dd/c4/7e791ff347c3f8eb394596d7a6e6e0a48ef12bbbc3fcc99240e36a6ddd01/formulaic-1.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b09de9c835c98cac77d76087ae88f1ced12c9a093ca5b7c5f07e2dae2876b7b6",
"md5": "d0cc610a39ba87cf99d4938eed7d160e",
"sha256": "ab5c43a5b107d1b1e87f55cf0a01245531cce793d3dcab433dae12053c3cb2d6"
},
"downloads": -1,
"filename": "formulaic-1.2.0.tar.gz",
"has_sig": false,
"md5_digest": "d0cc610a39ba87cf99d4938eed7d160e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 654869,
"upload_time": "2025-07-14T06:40:34",
"upload_time_iso_8601": "2025-07-14T06:40:34.229453Z",
"url": "https://files.pythonhosted.org/packages/b0/9d/e9c835c98cac77d76087ae88f1ced12c9a093ca5b7c5f07e2dae2876b7b6/formulaic-1.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-14 06:40:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "matthewwardrop",
"github_project": "formulaic",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "formulaic"
}