xrfm

Name	xrfm JSON
Version	0.3.1 JSON
	download
home_page	None
Summary	xRFM: Scalable and interpretable kernel methods for tabular data
upload_time	2025-08-22 16:18:56
maintainer	None
docs_url	None
author	David Holzmüller
requires_python	>=3.9
license	None
keywords	feature machine machine learning recursive scikit-learn tabular data tree-based
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # xRFM - Recursive Feature Machines optimized for tabular data


**xRFM** is a scalable implementation of Recursive Feature Machines (RFMs) optimized for tabular data. This library provides both the core RFM algorithm and a tree-based extension (xRFM) that enables efficient processing of large datasets through recursive data splitting.

## Core Components

```
xRFM/
├── xrfm/
│   ├── xrfm.py              # Main xRFM class (tree-based)
│   ├── tree_utils.py        # Tree manipulation utilities
│   └── rfm_src/
│       ├── recursive_feature_machine.py  # Base RFM class
│       ├── kernels.py       # Kernel implementations
│       ├── eigenpro.py      # EigenPro optimization
│       ├── utils.py         # Utility functions
│       ├── svd.py           # SVD operations
│       └── gpu_utils.py     # GPU memory management
├── examples/                # Usage examples
└── setup.py                # Package configuration
```

## Installation

```bash
pip install xrfm
```

Or to use the KermacProductLaplaceKernel, with CUDA-11 or CUDA-12:

```bash
pip install xrfm[cu11]
```

or 

```bash
pip install xrfm[cu12]
```

### Development Installation

```bash
git clone https://github.com/dmbeaglehole/xRFM.git
cd xRFM
pip install -e .
```

## Quick Start

### Basic Usage

```python
import torch
from xrfm import xRFM
from sklearn.model_selection import train_test_split

# Create synthetic data
def target_function(X):
    return torch.cat([
        (X[:, 0] > 0)[:, None], 
        (X[:, 1] < 0.5)[:, None]
    ], dim=1).float()

# Setup device and model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = xRFM(device=device, tuning_metric='mse')

# Generate data
n_samples = 2000
n_features = 100
X = torch.randn(n_samples, n_features, device=device)
y = target_function(X)
X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.2, random_state=0)

model.fit(X_train, y_train, X_val, y_val)
y_pred_test = model.predict(X_test)
```

### Custom Configuration

```python
# Custom RFM parameters
rfm_params = {
    'model': {
        'kernel': 'l2',           # Kernel type
        'bandwidth': 5.0,         # Kernel bandwidth
        'exponent': 1.0,          # Kernel exponent
        'diag': False,            # Diagonal Mahalanobis matrix
        'bandwidth_mode': 'constant'
    },
    'fit': {
        'reg': 1e-3,              # Regularization parameter
        'iters': 5,               # Number of iterations
        'M_batch_size': 1000,     # Batch size for AGOP
        'verbose': True,          # Verbose output
        'early_stop_rfm': True    # Early stopping
    }
}

# Initialize model with custom parameters
model = xRFM(
    rfm_params=rfm_params,
    device=device,
    min_subset_size=10000,        # Minimum subset size for splitting
    tuning_metric='accuracy',     # Tuning metric
    split_method='top_vector_agop_on_subset'  # Splitting strategy
)
```

## File Structure

### Core Files

| File | Description |
|------|-------------|
| `xrfm/xrfm.py` | Main xRFM class implementing tree-based recursive splitting |
| `xrfm/rfm_src/recursive_feature_machine.py` | Base RFM class with core algorithm |
| `xrfm/rfm_src/kernels.py` | Kernel implementations (Laplace, Product Laplace, etc.) |
| `xrfm/rfm_src/eigenpro.py` | EigenPro optimization for large-scale training |
| `xrfm/rfm_src/utils.py` | Utility functions for matrix operations and metrics |
| `xrfm/rfm_src/svd.py` | SVD utilities for kernel computations |
| `xrfm/rfm_src/gpu_utils.py` | GPU memory management utilities |
| `xrfm/tree_utils.py` | Tree manipulation and parameter extraction utilities |

### Example Files

| File | Description |
|------|-------------|
| `examples/test.py` | Simple regression example with synthetic data |
| `examples/covertype.py` | Forest cover type classification example |

## API Reference

### Main Classes

#### `xRFM`
Tree-based Recursive Feature Machine for scalable learning.

**Constructor Parameters:**
- `rfm_params` (dict): Parameters for base RFM models
- `min_subset_size` (int, default=60000): Minimum subset size for splitting
- `max_depth` (int, default=None): Maximum tree depth
- `device` (str, default=None): Computing device ('cpu' or 'cuda')
- `tuning_metric` (str, default='mse'): Metric for model tuning
- `split_method` (str): Data splitting strategy

**Key Methods:**
- `fit(X, y, X_val, y_val)`: Train the model
- `predict(X)`: Make predictions
- `predict_proba(X)`: Predict class probabilities
- `score(X, y)`: Evaluate model performance

#### `RFM`
Base Recursive Feature Machine implementation.

**Constructor Parameters:**
- `kernel` (str or Kernel): Kernel type or kernel object
- `iters` (int, default=5): Number of training iterations
- `bandwidth` (float, default=10.0): Kernel bandwidth
- `device` (str, default=None): Computing device
- `tuning_metric` (str, default='mse'): Evaluation metric

### Available Kernels

| Kernel | String ID | Description |
|--------|-----------|-------------|
| `LaplaceKernel` | `'laplace'`, `'l2'` | Standard Laplace kernel |
| `LightLaplaceKernel` | `'l2_high_dim'`, `'l2_light'` | Memory-efficient Laplace kernel |
| `ProductLaplaceKernel` | `'product_laplace'`, `'l1'` | Product of Laplace kernels |
| `SumPowerLaplaceKernel` | `'sum_power_laplace'`, `'l1_power'` | Sum of powered Laplace kernels |
| `KermacProductLaplaceKernel` | `'l1_kermac'` | High-performance Product of Laplace kernels (requires install with `[cu11]` or `[cu12]`) |

### Splitting Methods

| Method | Description |
|--------|-------------|
| `'top_vector_agop_on_subset'` | Use top eigenvector of AGOP matrix |
| `'random_agop_on_subset'` | Use random eigenvector of AGOP matrix |
| `'top_pc_agop_on_subset'` | Use top principal component of AGOP |
| `'random_pca'` | Use vector sampled from Gaussian distribution with covariance $X^\top X$|
| `'linear'` | Use linear regression coefficients |
| `'fixed_vector'` | Use fixed projection vector |

### Tuning Metrics

| Metric | Description | Task Type |
|--------|-------------|-----------|
| `'mse'` | Mean Squared Error | Regression |
| `'accuracy'` | Classification Accuracy | Classification |
| `'auc'` | Area Under ROC Curve | Classification |
| `'f1'` | F1 Score | Classification |

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "xrfm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "feature machine, machine learning, recursive, scikit-learn, tabular data, tree-based",
    "author": "David Holzm\u00fcller",
    "author_email": "Daniel Beaglehole <dbeaglehole@ucsd.edu>",
    "download_url": "https://files.pythonhosted.org/packages/1e/c8/289b22719d4ba62fea3c150c171242fefaed43daf264fc5d93f4e9b243de/xrfm-0.3.1.tar.gz",
    "platform": null,
    "description": "# xRFM - Recursive Feature Machines optimized for tabular data\n\n\n**xRFM** is a scalable implementation of Recursive Feature Machines (RFMs) optimized for tabular data. This library provides both the core RFM algorithm and a tree-based extension (xRFM) that enables efficient processing of large datasets through recursive data splitting.\n\n## Core Components\n\n```\nxRFM/\n\u251c\u2500\u2500 xrfm/\n\u2502   \u251c\u2500\u2500 xrfm.py              # Main xRFM class (tree-based)\n\u2502   \u251c\u2500\u2500 tree_utils.py        # Tree manipulation utilities\n\u2502   \u2514\u2500\u2500 rfm_src/\n\u2502       \u251c\u2500\u2500 recursive_feature_machine.py  # Base RFM class\n\u2502       \u251c\u2500\u2500 kernels.py       # Kernel implementations\n\u2502       \u251c\u2500\u2500 eigenpro.py      # EigenPro optimization\n\u2502       \u251c\u2500\u2500 utils.py         # Utility functions\n\u2502       \u251c\u2500\u2500 svd.py           # SVD operations\n\u2502       \u2514\u2500\u2500 gpu_utils.py     # GPU memory management\n\u251c\u2500\u2500 examples/                # Usage examples\n\u2514\u2500\u2500 setup.py                # Package configuration\n```\n\n## Installation\n\n```bash\npip install xrfm\n```\n\nOr to use the KermacProductLaplaceKernel, with CUDA-11 or CUDA-12:\n\n```bash\npip install xrfm[cu11]\n```\n\nor \n\n```bash\npip install xrfm[cu12]\n```\n\n### Development Installation\n\n```bash\ngit clone https://github.com/dmbeaglehole/xRFM.git\ncd xRFM\npip install -e .\n```\n\n## Quick Start\n\n### Basic Usage\n\n```python\nimport torch\nfrom xrfm import xRFM\nfrom sklearn.model_selection import train_test_split\n\n# Create synthetic data\ndef target_function(X):\n    return torch.cat([\n        (X[:, 0] > 0)[:, None], \n        (X[:, 1] < 0.5)[:, None]\n    ], dim=1).float()\n\n# Setup device and model\ndevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\nmodel = xRFM(device=device, tuning_metric='mse')\n\n# Generate data\nn_samples = 2000\nn_features = 100\nX = torch.randn(n_samples, n_features, device=device)\ny = target_function(X)\nX_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\nX_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.2, random_state=0)\n\nmodel.fit(X_train, y_train, X_val, y_val)\ny_pred_test = model.predict(X_test)\n```\n\n### Custom Configuration\n\n```python\n# Custom RFM parameters\nrfm_params = {\n    'model': {\n        'kernel': 'l2',           # Kernel type\n        'bandwidth': 5.0,         # Kernel bandwidth\n        'exponent': 1.0,          # Kernel exponent\n        'diag': False,            # Diagonal Mahalanobis matrix\n        'bandwidth_mode': 'constant'\n    },\n    'fit': {\n        'reg': 1e-3,              # Regularization parameter\n        'iters': 5,               # Number of iterations\n        'M_batch_size': 1000,     # Batch size for AGOP\n        'verbose': True,          # Verbose output\n        'early_stop_rfm': True    # Early stopping\n    }\n}\n\n# Initialize model with custom parameters\nmodel = xRFM(\n    rfm_params=rfm_params,\n    device=device,\n    min_subset_size=10000,        # Minimum subset size for splitting\n    tuning_metric='accuracy',     # Tuning metric\n    split_method='top_vector_agop_on_subset'  # Splitting strategy\n)\n```\n\n## File Structure\n\n### Core Files\n\n| File | Description |\n|------|-------------|\n| `xrfm/xrfm.py` | Main xRFM class implementing tree-based recursive splitting |\n| `xrfm/rfm_src/recursive_feature_machine.py` | Base RFM class with core algorithm |\n| `xrfm/rfm_src/kernels.py` | Kernel implementations (Laplace, Product Laplace, etc.) |\n| `xrfm/rfm_src/eigenpro.py` | EigenPro optimization for large-scale training |\n| `xrfm/rfm_src/utils.py` | Utility functions for matrix operations and metrics |\n| `xrfm/rfm_src/svd.py` | SVD utilities for kernel computations |\n| `xrfm/rfm_src/gpu_utils.py` | GPU memory management utilities |\n| `xrfm/tree_utils.py` | Tree manipulation and parameter extraction utilities |\n\n### Example Files\n\n| File | Description |\n|------|-------------|\n| `examples/test.py` | Simple regression example with synthetic data |\n| `examples/covertype.py` | Forest cover type classification example |\n\n## API Reference\n\n### Main Classes\n\n#### `xRFM`\nTree-based Recursive Feature Machine for scalable learning.\n\n**Constructor Parameters:**\n- `rfm_params` (dict): Parameters for base RFM models\n- `min_subset_size` (int, default=60000): Minimum subset size for splitting\n- `max_depth` (int, default=None): Maximum tree depth\n- `device` (str, default=None): Computing device ('cpu' or 'cuda')\n- `tuning_metric` (str, default='mse'): Metric for model tuning\n- `split_method` (str): Data splitting strategy\n\n**Key Methods:**\n- `fit(X, y, X_val, y_val)`: Train the model\n- `predict(X)`: Make predictions\n- `predict_proba(X)`: Predict class probabilities\n- `score(X, y)`: Evaluate model performance\n\n#### `RFM`\nBase Recursive Feature Machine implementation.\n\n**Constructor Parameters:**\n- `kernel` (str or Kernel): Kernel type or kernel object\n- `iters` (int, default=5): Number of training iterations\n- `bandwidth` (float, default=10.0): Kernel bandwidth\n- `device` (str, default=None): Computing device\n- `tuning_metric` (str, default='mse'): Evaluation metric\n\n### Available Kernels\n\n| Kernel | String ID | Description |\n|--------|-----------|-------------|\n| `LaplaceKernel` | `'laplace'`, `'l2'` | Standard Laplace kernel |\n| `LightLaplaceKernel` | `'l2_high_dim'`, `'l2_light'` | Memory-efficient Laplace kernel |\n| `ProductLaplaceKernel` | `'product_laplace'`, `'l1'` | Product of Laplace kernels |\n| `SumPowerLaplaceKernel` | `'sum_power_laplace'`, `'l1_power'` | Sum of powered Laplace kernels |\n| `KermacProductLaplaceKernel` | `'l1_kermac'` | High-performance Product of Laplace kernels (requires install with `[cu11]` or `[cu12]`) |\n\n### Splitting Methods\n\n| Method | Description |\n|--------|-------------|\n| `'top_vector_agop_on_subset'` | Use top eigenvector of AGOP matrix |\n| `'random_agop_on_subset'` | Use random eigenvector of AGOP matrix |\n| `'top_pc_agop_on_subset'` | Use top principal component of AGOP |\n| `'random_pca'` | Use vector sampled from Gaussian distribution with covariance $X^\\top X$|\n| `'linear'` | Use linear regression coefficients |\n| `'fixed_vector'` | Use fixed projection vector |\n\n### Tuning Metrics\n\n| Metric | Description | Task Type |\n|--------|-------------|-----------|\n| `'mse'` | Mean Squared Error | Regression |\n| `'accuracy'` | Classification Accuracy | Classification |\n| `'auc'` | Area Under ROC Curve | Classification |\n| `'f1'` | F1 Score | Classification |\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "xRFM: Scalable and interpretable kernel methods for tabular data",
    "version": "0.3.1",
    "project_urls": {
        "Documentation": "https://github.com/dmbeaglehole/xrfm#readme",
        "Issues": "https://github.com/dmbeaglehole/xrfm/issues",
        "Source": "https://github.com/dmbeaglehole/xrfm"
    },
    "split_keywords": [
        "feature machine",
        " machine learning",
        " recursive",
        " scikit-learn",
        " tabular data",
        " tree-based"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fdeed351d2aa92b7f042a04ee67b951d8c8a1310fd4497244ae148b570966e75",
                "md5": "2d49f7b62b45b73fb169e34ab576451d",
                "sha256": "7cf64f43ffcf1203df605b5322a10939242de2c04529ea2d5482c84499762a0b"
            },
            "downloads": -1,
            "filename": "xrfm-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2d49f7b62b45b73fb169e34ab576451d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 44392,
            "upload_time": "2025-08-22T16:18:55",
            "upload_time_iso_8601": "2025-08-22T16:18:55.174849Z",
            "url": "https://files.pythonhosted.org/packages/fd/ee/d351d2aa92b7f042a04ee67b951d8c8a1310fd4497244ae148b570966e75/xrfm-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1ec8289b22719d4ba62fea3c150c171242fefaed43daf264fc5d93f4e9b243de",
                "md5": "691788ff241b6eacfe25506343bbf8a7",
                "sha256": "97164efa304fd666475438d6336dc32a579a0251dfe4a820579a57e2ac19dfdb"
            },
            "downloads": -1,
            "filename": "xrfm-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "691788ff241b6eacfe25506343bbf8a7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 40918,
            "upload_time": "2025-08-22T16:18:56",
            "upload_time_iso_8601": "2025-08-22T16:18:56.075055Z",
            "url": "https://files.pythonhosted.org/packages/1e/c8/289b22719d4ba62fea3c150c171242fefaed43daf264fc5d93f4e9b243de/xrfm-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-22 16:18:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dmbeaglehole",
    "github_project": "xrfm#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "xrfm"
}

David Holzmüller