# Probabilistic Targeted Factor Analysis (PTFA)
`ptfa` provides an implementation of Probabilistic Targeted Factor Analysis, a probabilistic extension of Partial Least Squares (PLS) designed to extract latent factors from features $(X)$ to optimally predict a set of pre-specified target variables $(Y)$. It leverages an Expectation-Maximization (EM) algorithm for robust parameter estimation, accommodating challenges such as missing data, stochastic volatility, and dynamic factors.
The framework balances flexibility and efficiency, providing an alternative to traditional methods like Principal Component Analysis (PCA) and standard PLS by incorporating probabilistic foundations.
## Features
- Joint estimation of latent factors and parameters.
- Robust against noise, missing data, and model uncertainty.
- Extensible to stochastic volatility, mixed-frequency data and dynamic factor models.
- Competitive performance in high-dimensional forecasting tasks.
## Installation
You can install `ptfa` from PyPI:
```bash
pip install ptfa
```
## Routines
The `ptfa` module includes several classes aimed at implementing PTFA in a variety of real-world data settings:
- `ProbabilisticTFA`: main workhorse class providing factor extraction from features `X` to predict targets `Y` by extracting `n_components` number of common latent factors.
- `ProbabilisticTFA_MixedFrequency`: adapts to situations where natural measurement frequency of `X` is larger than `Y` (e.g., using monthly information to predict quarterly variables).
- `ProbabilisticTFA_StochasticVolatility`: adapts main class to deal with stochastic volatility (variance changing with time) in features and targets.
- `ProbabilisticTFA_DynamicFactors`: when factors can exhibit time-series persistence, we fit a vector autoregressive of order 1 (VAR-1) process on the latent factors.
All classes have the following methods in common:
- `__init__(self, n_components)`: creates the class instance with specified number of latent components.
- `fit(self, X, Y, ...)`: fits the PTFA model to the given data using a tailored EM algorithm for each class and extracts latent factors.
- `fitted(self, ...)`: computes the in-sample fitted values for the targets.
- `predict(self, X)`: out-of-sample predicted values of targets using new features `X`.
In addition, each class comes equipped with specific functions to handle the respective data-generating processes. More details on the routines and the additional arguments `...` each command can take can be found in the documentation for each class in the [GitHub repository](https://github.com/smonto2/PTFA/tree/main/src/ptfa/)).
Finally, all classes can handle missing-at-random data in the form of [`numpy.nan` entries](https://numpy.org/doc/stable/reference/constants.html#numpy.nan) in the data arrays `X` and `Y`. Alternatively, these arrays can be directly passed as [`numpy.MaskedArray` objects](https://numpy.org/doc/stable/reference/maskedarray.html#masked-arrays).
## Usage
A large example showcasing the capabilities of PTFA is provided in the package repository: [Example Notebook](https://github.com/smonto2/PTFA/blob/main/example.ipynb).
Here is a quick example of how to use the main class for factor extraction and forecasting, called `ProbabilisticTFA`:
```python
import numpy as np
from ptfa import ProbabilisticTFA
# Example data: predictors (X) and targets (Y)
X = np.random.rand(100, 10) # 100 observations, 10 predictors
Y = np.random.rand(100, 2) # 100 observations, 2 targets
# Initialize PTFA model with desired number of components
model = ProbabilisticTFA(n_components=3)
# Fit the model to data X and Y using EM algorithm
model.fit(X, Y)
# Calculate in-sample fitted values
Y_fitted = model.fitted()
# Calculate out-of-sample forecasts
X = np.random.rand(100, 10)
Y_predicted = model.predict(X)
print("Fitted targets:")
print(Y_fitted)
print("Predicted targets:")
print(Y_predicted)
# Running .fit() method saves to model object the
# extracted common factors from features and targets
print("Recovered factors:")
print(model.factors)
```
## Contributing
Feel free to open issues or contribute to the repository through pull requests. We welcome suggestions and improvements to the package!
## BibTeX Citation
If you use `ptfa` we would appreciate if you cite our work as:
```bibtex
@misc{herculano_2024_probabilistic,
title = {Probabilistic Targeted Factor Analysis},
author = {Herculano, Miguel C. and Montoya-Blandón, Santiago},
year = {2024},
eprint = {2412.06688},
archivePrefix = {arXiv},
primaryClass = {econ.EM},
url = {https://arxiv.org/abs/2412.06688},
}
```
## Licence
This project is licensed under the MIT License.
Raw data
{
"_id": null,
"home_page": null,
"name": "ptfa",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Santiago Montoya-Bland\u00f3n <Santiago.Montoya-Blandon@glasgow.ac.uk>, \"Miguel C. Herculano\" <miguel.herculano@glasgow.ac.uk>",
"keywords": "Partial Least Squares, high-dimensional data, Expectation-Maximization algorithm, missing data, time-series",
"author": null,
"author_email": "\"Miguel C. Herculano\" <miguel.herculano@glasgow.ac.uk>, Santiago Montoya-Bland\u00f3n <Santiago.Montoya-Blandon@glasgow.ac.uk>",
"download_url": "https://files.pythonhosted.org/packages/ae/cd/0834bf424e7fe33c6be7cf9177b3e1b68e1aec74c93188fe392f3d42164f/ptfa-0.2.11.tar.gz",
"platform": null,
"description": "# Probabilistic Targeted Factor Analysis (PTFA)\r\n\r\n`ptfa` provides an implementation of Probabilistic Targeted Factor Analysis, a probabilistic extension of Partial Least Squares (PLS) designed to extract latent factors from features $(X)$ to optimally predict a set of pre-specified target variables $(Y)$. It leverages an Expectation-Maximization (EM) algorithm for robust parameter estimation, accommodating challenges such as missing data, stochastic volatility, and dynamic factors.\r\n\r\nThe framework balances flexibility and efficiency, providing an alternative to traditional methods like Principal Component Analysis (PCA) and standard PLS by incorporating probabilistic foundations.\r\n\r\n## Features\r\n\r\n- Joint estimation of latent factors and parameters.\r\n- Robust against noise, missing data, and model uncertainty.\r\n- Extensible to stochastic volatility, mixed-frequency data and dynamic factor models.\r\n- Competitive performance in high-dimensional forecasting tasks.\r\n\r\n## Installation\r\n\r\nYou can install `ptfa` from PyPI:\r\n\r\n```bash\r\npip install ptfa\r\n```\r\n\r\n## Routines\r\n\r\nThe `ptfa` module includes several classes aimed at implementing PTFA in a variety of real-world data settings:\r\n- `ProbabilisticTFA`: main workhorse class providing factor extraction from features `X` to predict targets `Y` by extracting `n_components` number of common latent factors.\r\n- `ProbabilisticTFA_MixedFrequency`: adapts to situations where natural measurement frequency of `X` is larger than `Y` (e.g., using monthly information to predict quarterly variables).\r\n- `ProbabilisticTFA_StochasticVolatility`: adapts main class to deal with stochastic volatility (variance changing with time) in features and targets.\r\n- `ProbabilisticTFA_DynamicFactors`: when factors can exhibit time-series persistence, we fit a vector autoregressive of order 1 (VAR-1) process on the latent factors.\r\n\r\nAll classes have the following methods in common:\r\n- `__init__(self, n_components)`: creates the class instance with specified number of latent components.\r\n- `fit(self, X, Y, ...)`: fits the PTFA model to the given data using a tailored EM algorithm for each class and extracts latent factors.\r\n- `fitted(self, ...)`: computes the in-sample fitted values for the targets.\r\n- `predict(self, X)`: out-of-sample predicted values of targets using new features `X`.\r\n\r\nIn addition, each class comes equipped with specific functions to handle the respective data-generating processes. More details on the routines and the additional arguments `...` each command can take can be found in the documentation for each class in the [GitHub repository](https://github.com/smonto2/PTFA/tree/main/src/ptfa/)).\r\n\r\nFinally, all classes can handle missing-at-random data in the form of [`numpy.nan` entries](https://numpy.org/doc/stable/reference/constants.html#numpy.nan) in the data arrays `X` and `Y`. Alternatively, these arrays can be directly passed as [`numpy.MaskedArray` objects](https://numpy.org/doc/stable/reference/maskedarray.html#masked-arrays).\r\n\r\n## Usage\r\n\r\nA large example showcasing the capabilities of PTFA is provided in the package repository: [Example Notebook](https://github.com/smonto2/PTFA/blob/main/example.ipynb).\r\n\r\nHere is a quick example of how to use the main class for factor extraction and forecasting, called `ProbabilisticTFA`:\r\n\r\n```python\r\nimport numpy as np\r\nfrom ptfa import ProbabilisticTFA\r\n\r\n# Example data: predictors (X) and targets (Y)\r\nX = np.random.rand(100, 10) # 100 observations, 10 predictors\r\nY = np.random.rand(100, 2) # 100 observations, 2 targets\r\n\r\n# Initialize PTFA model with desired number of components\r\nmodel = ProbabilisticTFA(n_components=3)\r\n\r\n# Fit the model to data X and Y using EM algorithm\r\nmodel.fit(X, Y)\r\n\r\n# Calculate in-sample fitted values\r\nY_fitted = model.fitted()\r\n\r\n# Calculate out-of-sample forecasts\r\nX = np.random.rand(100, 10)\r\nY_predicted = model.predict(X)\r\n\r\nprint(\"Fitted targets:\")\r\nprint(Y_fitted)\r\n\r\nprint(\"Predicted targets:\")\r\nprint(Y_predicted)\r\n\r\n# Running .fit() method saves to model object the\r\n# extracted common factors from features and targets\r\nprint(\"Recovered factors:\")\r\nprint(model.factors)\r\n```\r\n\r\n## Contributing\r\n\r\nFeel free to open issues or contribute to the repository through pull requests. We welcome suggestions and improvements to the package!\r\n\r\n## BibTeX Citation\r\nIf you use `ptfa` we would appreciate if you cite our work as: \r\n```bibtex\r\n@misc{herculano_2024_probabilistic,\r\n title = {Probabilistic Targeted Factor Analysis}, \r\n author = {Herculano, Miguel C. and Montoya-Bland\u00f3n, Santiago},\r\n year = {2024},\r\n eprint = {2412.06688},\r\n archivePrefix = {arXiv},\r\n primaryClass = {econ.EM},\r\n url = {https://arxiv.org/abs/2412.06688}, \r\n}\r\n```\r\n\r\n## Licence \r\n\r\nThis project is licensed under the MIT License.\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Probabilistic Targeted Factor Analysis",
"version": "0.2.11",
"project_urls": {
"Bug tracking": "https://github.com/smonto2/PTFA/issues",
"Homepage": "https://github.com/smonto2/PTFA"
},
"split_keywords": [
"partial least squares",
" high-dimensional data",
" expectation-maximization algorithm",
" missing data",
" time-series"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9a17af7da25ab010cff0e7a86c27d2a5dee7755384d168fe5c7083bcd27f6d1e",
"md5": "67a416b6cacc6b6ef0a93cc66f76a1b2",
"sha256": "66dfb1a7c80b2459163c64c240aadba284f9883ae1ece1ab6ef4576e2b32e29a"
},
"downloads": -1,
"filename": "ptfa-0.2.11-py3-none-any.whl",
"has_sig": false,
"md5_digest": "67a416b6cacc6b6ef0a93cc66f76a1b2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 14034,
"upload_time": "2024-12-13T18:15:57",
"upload_time_iso_8601": "2024-12-13T18:15:57.204641Z",
"url": "https://files.pythonhosted.org/packages/9a/17/af7da25ab010cff0e7a86c27d2a5dee7755384d168fe5c7083bcd27f6d1e/ptfa-0.2.11-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "aecd0834bf424e7fe33c6be7cf9177b3e1b68e1aec74c93188fe392f3d42164f",
"md5": "b9f24014e500c6bf8de5b5c42508b0b8",
"sha256": "df564bd224c27ece7ebf34c5ae9920cca7d0c05b5b124b34add693393f6f5ef8"
},
"downloads": -1,
"filename": "ptfa-0.2.11.tar.gz",
"has_sig": false,
"md5_digest": "b9f24014e500c6bf8de5b5c42508b0b8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 15661,
"upload_time": "2024-12-13T18:16:00",
"upload_time_iso_8601": "2024-12-13T18:16:00.340123Z",
"url": "https://files.pythonhosted.org/packages/ae/cd/0834bf424e7fe33c6be7cf9177b3e1b68e1aec74c93188fe392f3d42164f/ptfa-0.2.11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-13 18:16:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "smonto2",
"github_project": "PTFA",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "ptfa"
}