arfs_gen


Namearfs_gen JSON
Version 1.2.1 PyPI version JSON
download
home_pagehttps://github.com/lpfann/arfs_gen
SummaryLibrary to generate toy data for machine learning experiments in the context of all-relevant feature selection.
upload_time2025-08-02 10:02:04
maintainerNone
docs_urlNone
authorLukas Pfannschmidt
requires_python<4.0,>=3.9
licenseMIT
keywords data generation toy data all-relevant feature selection machine learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # All Relevant Feature Selection Generator Library (ARFS-Gen)
![PyPI](https://img.shields.io/pypi/v/arfs_gen)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/arfs_gen)
![PyPI - License](https://img.shields.io/pypi/l/arfs_gen)

This repository contains a python library to generate synthetic (toy) data for use in research papers when evaluating all relevant feature selection e.g.  used in [fri](https://github.com/lpfann/fri).

It allows creating datasets with a specified number of strongly and weakly relevant features as well as random noise features.

In the newest revision it also includes methods which generate data with privileged information.

It works by utilizing existing methods from `numpy` and `scikit-learn`.
# Install
The library is available on [PyPi](https://pypi.org/project/arfs-gen/).
Install via `pip`:
```shell
pip install arfs_gen
```
or clone this repository and use:
```shell
pip install .
```

# Usage
In the following we generate a simple regression data set with a mix of strongly and weakly relevant features:

```python
    # Import relevant method
    from arfs_gen import genRegressionData
    # Import model
    from sklearn.svm import LinearSVR

    # Specify parameters
    n = 100
    # Features
    strRel = 2
    strWeak = 2
    # Overall number of features (Rest will be filled by random features)
    d = 10

    # Generate the data
    X, y = genRegressionData(
        n_samples=n,
        n_features=d,
        n_redundant=strWeak,
        n_strel=strRel,
        n_repeated=0,
        noise=0,
    )
    # Fit a model
    linsvr = LinearSVR()
    linsvr.fit(X, y)
```

# Development
For dependency management we use the newly released [poetry](https://python-poetry.org/) tool.

If you have `poetry` installed, use
```shell
$ poetry install
```  
inside the project folder to create a new `venv` and to install all dependencies.
To enter the newly created `venv` use 
```shell 
$ poetry env
```
to open a new shell inside.
Or alternatively run commands inside the `venv` with `poetry run ...`.

## Test
Test it by running `poetry run pytest`.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lpfann/arfs_gen",
    "name": "arfs_gen",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "data generation, toy data, all-relevant feature selection, machine learning",
    "author": "Lukas Pfannschmidt",
    "author_email": "lukas@lpfann.me",
    "download_url": "https://files.pythonhosted.org/packages/9c/99/6f7373bd3736a9920a07f30d44f54eae5142d6d56aa27fd77fe826a71dbb/arfs_gen-1.2.1.tar.gz",
    "platform": null,
    "description": "# All Relevant Feature Selection Generator Library (ARFS-Gen)\n![PyPI](https://img.shields.io/pypi/v/arfs_gen)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/arfs_gen)\n![PyPI - License](https://img.shields.io/pypi/l/arfs_gen)\n\nThis repository contains a python library to generate synthetic (toy) data for use in research papers when evaluating all relevant feature selection e.g.  used in [fri](https://github.com/lpfann/fri).\n\nIt allows creating datasets with a specified number of strongly and weakly relevant features as well as random noise features.\n\nIn the newest revision it also includes methods which generate data with privileged information.\n\nIt works by utilizing existing methods from `numpy` and `scikit-learn`.\n# Install\nThe library is available on [PyPi](https://pypi.org/project/arfs-gen/).\nInstall via `pip`:\n```shell\npip install arfs_gen\n```\nor clone this repository and use:\n```shell\npip install .\n```\n\n# Usage\nIn the following we generate a simple regression data set with a mix of strongly and weakly relevant features:\n\n```python\n    # Import relevant method\n    from arfs_gen import genRegressionData\n    # Import model\n    from sklearn.svm import LinearSVR\n\n    # Specify parameters\n    n = 100\n    # Features\n    strRel = 2\n    strWeak = 2\n    # Overall number of features (Rest will be filled by random features)\n    d = 10\n\n    # Generate the data\n    X, y = genRegressionData(\n        n_samples=n,\n        n_features=d,\n        n_redundant=strWeak,\n        n_strel=strRel,\n        n_repeated=0,\n        noise=0,\n    )\n    # Fit a model\n    linsvr = LinearSVR()\n    linsvr.fit(X, y)\n```\n\n# Development\nFor dependency management we use the newly released [poetry](https://python-poetry.org/) tool.\n\nIf you have `poetry` installed, use\n```shell\n$ poetry install\n```  \ninside the project folder to create a new `venv` and to install all dependencies.\nTo enter the newly created `venv` use \n```shell \n$ poetry env\n```\nto open a new shell inside.\nOr alternatively run commands inside the `venv` with `poetry run ...`.\n\n## Test\nTest it by running `poetry run pytest`.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Library to generate toy data for machine learning experiments in the context of all-relevant feature selection.",
    "version": "1.2.1",
    "project_urls": {
        "Homepage": "https://github.com/lpfann/arfs_gen",
        "Repository": "https://github.com/lpfann/arfs_gen"
    },
    "split_keywords": [
        "data generation",
        " toy data",
        " all-relevant feature selection",
        " machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "96ccd54171261c17d25afea328a2eaf9d1fbf7819c9b3cf8f666ba4bacdebdb5",
                "md5": "0cf042e15f0f82c05e9bb4ef1ddd9e42",
                "sha256": "e7cf151e970407e795ee6502b0dc97a7b31fbfd026bb77693dd8345fbbb84f54"
            },
            "downloads": -1,
            "filename": "arfs_gen-1.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0cf042e15f0f82c05e9bb4ef1ddd9e42",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 11276,
            "upload_time": "2025-08-02T10:02:03",
            "upload_time_iso_8601": "2025-08-02T10:02:03.632813Z",
            "url": "https://files.pythonhosted.org/packages/96/cc/d54171261c17d25afea328a2eaf9d1fbf7819c9b3cf8f666ba4bacdebdb5/arfs_gen-1.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9c996f7373bd3736a9920a07f30d44f54eae5142d6d56aa27fd77fe826a71dbb",
                "md5": "53aab4c78eacecf095543b5371b11264",
                "sha256": "f9904cd933f829b9b929f160919cec1410ca5c31b2c234a93c210ed6958b663a"
            },
            "downloads": -1,
            "filename": "arfs_gen-1.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "53aab4c78eacecf095543b5371b11264",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 9850,
            "upload_time": "2025-08-02T10:02:04",
            "upload_time_iso_8601": "2025-08-02T10:02:04.829664Z",
            "url": "https://files.pythonhosted.org/packages/9c/99/6f7373bd3736a9920a07f30d44f54eae5142d6d56aa27fd77fe826a71dbb/arfs_gen-1.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-02 10:02:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lpfann",
    "github_project": "arfs_gen",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "arfs_gen"
}
        
Elapsed time: 0.83865s