# xRFM - Recursive Feature Machines optimized for tabular data
**xRFM** is a scalable implementation of Recursive Feature Machines (RFMs) optimized for tabular data. This library provides both the core RFM algorithm and a tree-based extension (xRFM) that enables efficient processing of large datasets through recursive data splitting.
## Core Components
```
xRFM/
├── xrfm/
│ ├── xrfm.py # Main xRFM class (tree-based)
│ ├── tree_utils.py # Tree manipulation utilities
│ └── rfm_src/
│ ├── recursive_feature_machine.py # Base RFM class
│ ├── kernels.py # Kernel implementations
│ ├── eigenpro.py # EigenPro optimization
│ ├── utils.py # Utility functions
│ ├── svd.py # SVD operations
│ └── gpu_utils.py # GPU memory management
├── examples/ # Usage examples
└── setup.py # Package configuration
```
## Installation
```bash
pip install xrfm
```
Or to use the KermacProductLaplaceKernel, with CUDA-11 or CUDA-12:
```bash
pip install xrfm[cu11]
```
or
```bash
pip install xrfm[cu12]
```
### Development Installation
```bash
git clone https://github.com/dmbeaglehole/xRFM.git
cd xRFM
pip install -e .
```
## Quick Start
### Basic Usage
```python
import torch
from xrfm import xRFM
from sklearn.model_selection import train_test_split
# Create synthetic data
def target_function(X):
return torch.cat([
(X[:, 0] > 0)[:, None],
(X[:, 1] < 0.5)[:, None]
], dim=1).float()
# Setup device and model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = xRFM(device=device, tuning_metric='mse')
# Generate data
n_samples = 2000
n_features = 100
X = torch.randn(n_samples, n_features, device=device)
y = target_function(X)
X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.2, random_state=0)
model.fit(X_train, y_train, X_val, y_val)
y_pred_test = model.predict(X_test)
```
### Custom Configuration
```python
# Custom RFM parameters
rfm_params = {
'model': {
'kernel': 'l2', # Kernel type
'bandwidth': 5.0, # Kernel bandwidth
'exponent': 1.0, # Kernel exponent
'diag': False, # Diagonal Mahalanobis matrix
'bandwidth_mode': 'constant'
},
'fit': {
'reg': 1e-3, # Regularization parameter
'iters': 5, # Number of iterations
'M_batch_size': 1000, # Batch size for AGOP
'verbose': True, # Verbose output
'early_stop_rfm': True # Early stopping
}
}
# Initialize model with custom parameters
model = xRFM(
rfm_params=rfm_params,
device=device,
min_subset_size=10000, # Minimum subset size for splitting
tuning_metric='accuracy', # Tuning metric
split_method='top_vector_agop_on_subset' # Splitting strategy
)
```
## File Structure
### Core Files
| File | Description |
|------|-------------|
| `xrfm/xrfm.py` | Main xRFM class implementing tree-based recursive splitting |
| `xrfm/rfm_src/recursive_feature_machine.py` | Base RFM class with core algorithm |
| `xrfm/rfm_src/kernels.py` | Kernel implementations (Laplace, Product Laplace, etc.) |
| `xrfm/rfm_src/eigenpro.py` | EigenPro optimization for large-scale training |
| `xrfm/rfm_src/utils.py` | Utility functions for matrix operations and metrics |
| `xrfm/rfm_src/svd.py` | SVD utilities for kernel computations |
| `xrfm/rfm_src/gpu_utils.py` | GPU memory management utilities |
| `xrfm/tree_utils.py` | Tree manipulation and parameter extraction utilities |
### Example Files
| File | Description |
|------|-------------|
| `examples/test.py` | Simple regression example with synthetic data |
| `examples/covertype.py` | Forest cover type classification example |
## API Reference
### Main Classes
#### `xRFM`
Tree-based Recursive Feature Machine for scalable learning.
**Constructor Parameters:**
- `rfm_params` (dict): Parameters for base RFM models
- `min_subset_size` (int, default=60000): Minimum subset size for splitting
- `max_depth` (int, default=None): Maximum tree depth
- `device` (str, default=None): Computing device ('cpu' or 'cuda')
- `tuning_metric` (str, default='mse'): Metric for model tuning
- `split_method` (str): Data splitting strategy
**Key Methods:**
- `fit(X, y, X_val, y_val)`: Train the model
- `predict(X)`: Make predictions
- `predict_proba(X)`: Predict class probabilities
- `score(X, y)`: Evaluate model performance
#### `RFM`
Base Recursive Feature Machine implementation.
**Constructor Parameters:**
- `kernel` (str or Kernel): Kernel type or kernel object
- `iters` (int, default=5): Number of training iterations
- `bandwidth` (float, default=10.0): Kernel bandwidth
- `device` (str, default=None): Computing device
- `tuning_metric` (str, default='mse'): Evaluation metric
### Available Kernels
| Kernel | String ID | Description |
|--------|-----------|-------------|
| `LaplaceKernel` | `'laplace'`, `'l2'` | Standard Laplace kernel |
| `LightLaplaceKernel` | `'l2_high_dim'`, `'l2_light'` | Memory-efficient Laplace kernel |
| `ProductLaplaceKernel` | `'product_laplace'`, `'l1'` | Product of Laplace kernels |
| `SumPowerLaplaceKernel` | `'sum_power_laplace'`, `'l1_power'` | Sum of powered Laplace kernels |
| `KermacProductLaplaceKernel` | `'l1_kermac'` | High-performance Product of Laplace kernels (requires install with `[cu11]` or `[cu12]`) |
### Splitting Methods
| Method | Description |
|--------|-------------|
| `'top_vector_agop_on_subset'` | Use top eigenvector of AGOP matrix |
| `'random_agop_on_subset'` | Use random eigenvector of AGOP matrix |
| `'top_pc_agop_on_subset'` | Use top principal component of AGOP |
| `'random_pca'` | Use vector sampled from Gaussian distribution with covariance $X^\top X$|
| `'linear'` | Use linear regression coefficients |
| `'fixed_vector'` | Use fixed projection vector |
### Tuning Metrics
| Metric | Description | Task Type |
|--------|-------------|-----------|
| `'mse'` | Mean Squared Error | Regression |
| `'accuracy'` | Classification Accuracy | Classification |
| `'auc'` | Area Under ROC Curve | Classification |
| `'f1'` | F1 Score | Classification |
Raw data
{
"_id": null,
"home_page": null,
"name": "xrfm",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "feature machine, machine learning, recursive, scikit-learn, tabular data, tree-based",
"author": "David Holzm\u00fcller",
"author_email": "Daniel Beaglehole <dbeaglehole@ucsd.edu>",
"download_url": "https://files.pythonhosted.org/packages/1e/c8/289b22719d4ba62fea3c150c171242fefaed43daf264fc5d93f4e9b243de/xrfm-0.3.1.tar.gz",
"platform": null,
"description": "# xRFM - Recursive Feature Machines optimized for tabular data\n\n\n**xRFM** is a scalable implementation of Recursive Feature Machines (RFMs) optimized for tabular data. This library provides both the core RFM algorithm and a tree-based extension (xRFM) that enables efficient processing of large datasets through recursive data splitting.\n\n## Core Components\n\n```\nxRFM/\n\u251c\u2500\u2500 xrfm/\n\u2502 \u251c\u2500\u2500 xrfm.py # Main xRFM class (tree-based)\n\u2502 \u251c\u2500\u2500 tree_utils.py # Tree manipulation utilities\n\u2502 \u2514\u2500\u2500 rfm_src/\n\u2502 \u251c\u2500\u2500 recursive_feature_machine.py # Base RFM class\n\u2502 \u251c\u2500\u2500 kernels.py # Kernel implementations\n\u2502 \u251c\u2500\u2500 eigenpro.py # EigenPro optimization\n\u2502 \u251c\u2500\u2500 utils.py # Utility functions\n\u2502 \u251c\u2500\u2500 svd.py # SVD operations\n\u2502 \u2514\u2500\u2500 gpu_utils.py # GPU memory management\n\u251c\u2500\u2500 examples/ # Usage examples\n\u2514\u2500\u2500 setup.py # Package configuration\n```\n\n## Installation\n\n```bash\npip install xrfm\n```\n\nOr to use the KermacProductLaplaceKernel, with CUDA-11 or CUDA-12:\n\n```bash\npip install xrfm[cu11]\n```\n\nor \n\n```bash\npip install xrfm[cu12]\n```\n\n### Development Installation\n\n```bash\ngit clone https://github.com/dmbeaglehole/xRFM.git\ncd xRFM\npip install -e .\n```\n\n## Quick Start\n\n### Basic Usage\n\n```python\nimport torch\nfrom xrfm import xRFM\nfrom sklearn.model_selection import train_test_split\n\n# Create synthetic data\ndef target_function(X):\n return torch.cat([\n (X[:, 0] > 0)[:, None], \n (X[:, 1] < 0.5)[:, None]\n ], dim=1).float()\n\n# Setup device and model\ndevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\nmodel = xRFM(device=device, tuning_metric='mse')\n\n# Generate data\nn_samples = 2000\nn_features = 100\nX = torch.randn(n_samples, n_features, device=device)\ny = target_function(X)\nX_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\nX_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.2, random_state=0)\n\nmodel.fit(X_train, y_train, X_val, y_val)\ny_pred_test = model.predict(X_test)\n```\n\n### Custom Configuration\n\n```python\n# Custom RFM parameters\nrfm_params = {\n 'model': {\n 'kernel': 'l2', # Kernel type\n 'bandwidth': 5.0, # Kernel bandwidth\n 'exponent': 1.0, # Kernel exponent\n 'diag': False, # Diagonal Mahalanobis matrix\n 'bandwidth_mode': 'constant'\n },\n 'fit': {\n 'reg': 1e-3, # Regularization parameter\n 'iters': 5, # Number of iterations\n 'M_batch_size': 1000, # Batch size for AGOP\n 'verbose': True, # Verbose output\n 'early_stop_rfm': True # Early stopping\n }\n}\n\n# Initialize model with custom parameters\nmodel = xRFM(\n rfm_params=rfm_params,\n device=device,\n min_subset_size=10000, # Minimum subset size for splitting\n tuning_metric='accuracy', # Tuning metric\n split_method='top_vector_agop_on_subset' # Splitting strategy\n)\n```\n\n## File Structure\n\n### Core Files\n\n| File | Description |\n|------|-------------|\n| `xrfm/xrfm.py` | Main xRFM class implementing tree-based recursive splitting |\n| `xrfm/rfm_src/recursive_feature_machine.py` | Base RFM class with core algorithm |\n| `xrfm/rfm_src/kernels.py` | Kernel implementations (Laplace, Product Laplace, etc.) |\n| `xrfm/rfm_src/eigenpro.py` | EigenPro optimization for large-scale training |\n| `xrfm/rfm_src/utils.py` | Utility functions for matrix operations and metrics |\n| `xrfm/rfm_src/svd.py` | SVD utilities for kernel computations |\n| `xrfm/rfm_src/gpu_utils.py` | GPU memory management utilities |\n| `xrfm/tree_utils.py` | Tree manipulation and parameter extraction utilities |\n\n### Example Files\n\n| File | Description |\n|------|-------------|\n| `examples/test.py` | Simple regression example with synthetic data |\n| `examples/covertype.py` | Forest cover type classification example |\n\n## API Reference\n\n### Main Classes\n\n#### `xRFM`\nTree-based Recursive Feature Machine for scalable learning.\n\n**Constructor Parameters:**\n- `rfm_params` (dict): Parameters for base RFM models\n- `min_subset_size` (int, default=60000): Minimum subset size for splitting\n- `max_depth` (int, default=None): Maximum tree depth\n- `device` (str, default=None): Computing device ('cpu' or 'cuda')\n- `tuning_metric` (str, default='mse'): Metric for model tuning\n- `split_method` (str): Data splitting strategy\n\n**Key Methods:**\n- `fit(X, y, X_val, y_val)`: Train the model\n- `predict(X)`: Make predictions\n- `predict_proba(X)`: Predict class probabilities\n- `score(X, y)`: Evaluate model performance\n\n#### `RFM`\nBase Recursive Feature Machine implementation.\n\n**Constructor Parameters:**\n- `kernel` (str or Kernel): Kernel type or kernel object\n- `iters` (int, default=5): Number of training iterations\n- `bandwidth` (float, default=10.0): Kernel bandwidth\n- `device` (str, default=None): Computing device\n- `tuning_metric` (str, default='mse'): Evaluation metric\n\n### Available Kernels\n\n| Kernel | String ID | Description |\n|--------|-----------|-------------|\n| `LaplaceKernel` | `'laplace'`, `'l2'` | Standard Laplace kernel |\n| `LightLaplaceKernel` | `'l2_high_dim'`, `'l2_light'` | Memory-efficient Laplace kernel |\n| `ProductLaplaceKernel` | `'product_laplace'`, `'l1'` | Product of Laplace kernels |\n| `SumPowerLaplaceKernel` | `'sum_power_laplace'`, `'l1_power'` | Sum of powered Laplace kernels |\n| `KermacProductLaplaceKernel` | `'l1_kermac'` | High-performance Product of Laplace kernels (requires install with `[cu11]` or `[cu12]`) |\n\n### Splitting Methods\n\n| Method | Description |\n|--------|-------------|\n| `'top_vector_agop_on_subset'` | Use top eigenvector of AGOP matrix |\n| `'random_agop_on_subset'` | Use random eigenvector of AGOP matrix |\n| `'top_pc_agop_on_subset'` | Use top principal component of AGOP |\n| `'random_pca'` | Use vector sampled from Gaussian distribution with covariance $X^\\top X$|\n| `'linear'` | Use linear regression coefficients |\n| `'fixed_vector'` | Use fixed projection vector |\n\n### Tuning Metrics\n\n| Metric | Description | Task Type |\n|--------|-------------|-----------|\n| `'mse'` | Mean Squared Error | Regression |\n| `'accuracy'` | Classification Accuracy | Classification |\n| `'auc'` | Area Under ROC Curve | Classification |\n| `'f1'` | F1 Score | Classification |\n",
"bugtrack_url": null,
"license": null,
"summary": "xRFM: Scalable and interpretable kernel methods for tabular data",
"version": "0.3.1",
"project_urls": {
"Documentation": "https://github.com/dmbeaglehole/xrfm#readme",
"Issues": "https://github.com/dmbeaglehole/xrfm/issues",
"Source": "https://github.com/dmbeaglehole/xrfm"
},
"split_keywords": [
"feature machine",
" machine learning",
" recursive",
" scikit-learn",
" tabular data",
" tree-based"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "fdeed351d2aa92b7f042a04ee67b951d8c8a1310fd4497244ae148b570966e75",
"md5": "2d49f7b62b45b73fb169e34ab576451d",
"sha256": "7cf64f43ffcf1203df605b5322a10939242de2c04529ea2d5482c84499762a0b"
},
"downloads": -1,
"filename": "xrfm-0.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2d49f7b62b45b73fb169e34ab576451d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 44392,
"upload_time": "2025-08-22T16:18:55",
"upload_time_iso_8601": "2025-08-22T16:18:55.174849Z",
"url": "https://files.pythonhosted.org/packages/fd/ee/d351d2aa92b7f042a04ee67b951d8c8a1310fd4497244ae148b570966e75/xrfm-0.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1ec8289b22719d4ba62fea3c150c171242fefaed43daf264fc5d93f4e9b243de",
"md5": "691788ff241b6eacfe25506343bbf8a7",
"sha256": "97164efa304fd666475438d6336dc32a579a0251dfe4a820579a57e2ac19dfdb"
},
"downloads": -1,
"filename": "xrfm-0.3.1.tar.gz",
"has_sig": false,
"md5_digest": "691788ff241b6eacfe25506343bbf8a7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 40918,
"upload_time": "2025-08-22T16:18:56",
"upload_time_iso_8601": "2025-08-22T16:18:56.075055Z",
"url": "https://files.pythonhosted.org/packages/1e/c8/289b22719d4ba62fea3c150c171242fefaed43daf264fc5d93f4e9b243de/xrfm-0.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-22 16:18:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dmbeaglehole",
"github_project": "xrfm#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "xrfm"
}