brif


Namebrif JSON
Version 1.4.5 PyPI version JSON
download
home_pagehttps://pypi.org/project/brif/
SummaryBuild decision trees and random forests for classification and regression.
upload_time2024-10-15 19:26:10
maintainerNone
docs_urlNone
authorYanchao Liu
requires_python>=3.5
licenseGPL3
keywords random forest classification regression prediction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Description

Build random forests for classification and regression problems. 
The same program is available on [CRAN](URL 'https://cran.r-project.org/web/packages/brif/index.html') for R users. 

# Installation

For Python:
```bash
pip install brif
```

For R:
```R
install.packages('brif')
```

To use on Google Colab:
```python
!pip install brif
from brif import brif
```

# Examples

```python
from brif import brif
import pandas as pd

# Create a brif object with default parameters.
bf = brif.brif()  

# Display the current parameter values. 
bf.get_param()  

# To change certain parameter values, e.g.:
bf.set_param({'ntrees':100, 'nthreads':2})

# Or simply:
bf.ntrees = 200

# Load input data frame. Data must be a pandas data frame with appropriate headers.
df = pd.read_csv("auto.csv")

# Train the model
bf.fit(df, 'origin')  # specify the target column name

# Or equivalently
bf.fit(df, 7)  # specify the target column index

# Make predictions 
# The target variable column must be excluded, and all other columns should appear in the same order as in training
# Here, predict the first 10 rows of df
pred_labels = bf.predict(df.iloc[0:10, 0:7], type='class')  # return a list containing the predicted class labels
pred_scores = bf.predict(df.iloc[0:10, 0:7], type='score')  # return a data frame containing predicted probabilities by class

# Note: for a regression problem (i.e., when the response variable is numeric type), the predict function will always return a list containing the predicted values

```

# Parameters
**tmp_preddata**
a character string specifying a filename to save the temporary scoring data. Default is "tmp_brif_preddata.txt".

**n_numeric_cuts**	
an integer value indicating the maximum number of split points to generate for each numeric variable.

**n_integer_cuts**	
an integer value indicating the maximum number of split points to generate for each integer variable.

**max_integer_classes**
an integer value. If the target variable is integer and has more than max_integer_classes unique values in the training data, then the target variable will be grouped into max_integer_classes bins. If the target variable is numeric, then the smaller of max_integer_classes and the number of unique values number of bins will be created on the target variables and the regression problem will be solved as a classification problem.

**max_depth**
an integer specifying the maximum depth of each tree. Maximum is 40.

**min_node_size**	
an integer specifying the minimum number of training cases a leaf node must contain.

**ntrees**
an integer specifying the number of trees in the forest.

**ps**
an integer indicating the number of predictors to sample at each node split. Default is 0, meaning to use sqrt(p), where p is the number of predictors in the input.

**max_factor_levels**
an integer. If any factor variables has more than max_factor_levels, the program stops and prompts the user to increase the value of this parameter if the too-many-level factor is indeed intended.

**bagging_method**
an integer indicating the bagging sampling method: 0 for sampling without replacement; 1 for sampling with replacement (bootstrapping).

**bagging_proportion**	
a numeric scalar between 0 and 1, indicating the proportion of training observations to be used in each tree.

**split_search**
an integer indicating the choice of the split search method. 0: randomly pick a split point; 1: do a local search; 2: random pick subject to regulation; 3: local search subject to regulation; 4 or above: a mix of options 0 to 3.

**search_radius**
a positive integer indicating the split point search radius. This parameter takes effect only in the self-regulating local search (split_search = 2 or above).

**seed**
a positive integer, random number generator seed.

**nthreads**
an integer specifying the number of threads used by the program. This parameter takes effect only on systems supporting OpenMP.

**vote_method**
an integer (0 or 1) specifying the voting method in prediction. 0: each leaf contributes the raw count and an average is taken on the sum over all leaves; 1: each leaf contributes an intra-node fraction which is then averaged over all leaves with equal weight.

**na_numeric**
a numeric value, substitute for 'nan' in numeric variables.

**na_integer**
an integer value, substitute for 'nan' in integer variables.

**na_factor**
a character string, substitute for missing values in factor variables. 

**type**
a character string indicating the return content of the predict function. For a classification problem, "score" means the by-class probabilities and "class" means the class labels (i.e., the target variable levels). For regression, the predicted values are returned. This is a parameter for the predict function, not an attribute of the brif object. 


            

Raw data

            {
    "_id": null,
    "home_page": "https://pypi.org/project/brif/",
    "name": "brif",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": null,
    "keywords": "random forest, classification, regression, prediction",
    "author": "Yanchao Liu",
    "author_email": "yanchaoliu@wayne.edu",
    "download_url": "https://files.pythonhosted.org/packages/a6/b8/62b9bab1f7100a6664b9a957b2c82eb09efbc2c3419a530af46ff36d2a8a/brif-1.4.5.tar.gz",
    "platform": null,
    "description": "# Description\n\nBuild random forests for classification and regression problems. \nThe same program is available on [CRAN](URL 'https://cran.r-project.org/web/packages/brif/index.html') for R users. \n\n# Installation\n\nFor Python:\n```bash\npip install brif\n```\n\nFor R:\n```R\ninstall.packages('brif')\n```\n\nTo use on Google Colab:\n```python\n!pip install brif\nfrom brif import brif\n```\n\n# Examples\n\n```python\nfrom brif import brif\nimport pandas as pd\n\n# Create a brif object with default parameters.\nbf = brif.brif()  \n\n# Display the current parameter values. \nbf.get_param()  \n\n# To change certain parameter values, e.g.:\nbf.set_param({'ntrees':100, 'nthreads':2})\n\n# Or simply:\nbf.ntrees = 200\n\n# Load input data frame. Data must be a pandas data frame with appropriate headers.\ndf = pd.read_csv(\"auto.csv\")\n\n# Train the model\nbf.fit(df, 'origin')  # specify the target column name\n\n# Or equivalently\nbf.fit(df, 7)  # specify the target column index\n\n# Make predictions \n# The target variable column must be excluded, and all other columns should appear in the same order as in training\n# Here, predict the first 10 rows of df\npred_labels = bf.predict(df.iloc[0:10, 0:7], type='class')  # return a list containing the predicted class labels\npred_scores = bf.predict(df.iloc[0:10, 0:7], type='score')  # return a data frame containing predicted probabilities by class\n\n# Note: for a regression problem (i.e., when the response variable is numeric type), the predict function will always return a list containing the predicted values\n\n```\n\n# Parameters\n**tmp_preddata**\na character string specifying a filename to save the temporary scoring data. Default is \"tmp_brif_preddata.txt\".\n\n**n_numeric_cuts**\t\nan integer value indicating the maximum number of split points to generate for each numeric variable.\n\n**n_integer_cuts**\t\nan integer value indicating the maximum number of split points to generate for each integer variable.\n\n**max_integer_classes**\nan integer value. If the target variable is integer and has more than max_integer_classes unique values in the training data, then the target variable will be grouped into max_integer_classes bins. If the target variable is numeric, then the smaller of max_integer_classes and the number of unique values number of bins will be created on the target variables and the regression problem will be solved as a classification problem.\n\n**max_depth**\nan integer specifying the maximum depth of each tree. Maximum is 40.\n\n**min_node_size**\t\nan integer specifying the minimum number of training cases a leaf node must contain.\n\n**ntrees**\nan integer specifying the number of trees in the forest.\n\n**ps**\nan integer indicating the number of predictors to sample at each node split. Default is 0, meaning to use sqrt(p), where p is the number of predictors in the input.\n\n**max_factor_levels**\nan integer. If any factor variables has more than max_factor_levels, the program stops and prompts the user to increase the value of this parameter if the too-many-level factor is indeed intended.\n\n**bagging_method**\nan integer indicating the bagging sampling method: 0 for sampling without replacement; 1 for sampling with replacement (bootstrapping).\n\n**bagging_proportion**\t\na numeric scalar between 0 and 1, indicating the proportion of training observations to be used in each tree.\n\n**split_search**\nan integer indicating the choice of the split search method. 0: randomly pick a split point; 1: do a local search; 2: random pick subject to regulation; 3: local search subject to regulation; 4 or above: a mix of options 0 to 3.\n\n**search_radius**\na positive integer indicating the split point search radius. This parameter takes effect only in the self-regulating local search (split_search = 2 or above).\n\n**seed**\na positive integer, random number generator seed.\n\n**nthreads**\nan integer specifying the number of threads used by the program. This parameter takes effect only on systems supporting OpenMP.\n\n**vote_method**\nan integer (0 or 1) specifying the voting method in prediction. 0: each leaf contributes the raw count and an average is taken on the sum over all leaves; 1: each leaf contributes an intra-node fraction which is then averaged over all leaves with equal weight.\n\n**na_numeric**\na numeric value, substitute for 'nan' in numeric variables.\n\n**na_integer**\nan integer value, substitute for 'nan' in integer variables.\n\n**na_factor**\na character string, substitute for missing values in factor variables. \n\n**type**\na character string indicating the return content of the predict function. For a classification problem, \"score\" means the by-class probabilities and \"class\" means the class labels (i.e., the target variable levels). For regression, the predicted values are returned. This is a parameter for the predict function, not an attribute of the brif object. \n\n",
    "bugtrack_url": null,
    "license": "GPL3",
    "summary": "Build decision trees and random forests for classification and regression.",
    "version": "1.4.5",
    "project_urls": {
        "Homepage": "https://pypi.org/project/brif/"
    },
    "split_keywords": [
        "random forest",
        " classification",
        " regression",
        " prediction"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "169ec9912dfa28b63e7a2cf6a96be1cf6301ccaede00270de0d697426f954f83",
                "md5": "9be1af88a50e8cc7ccc798594912125d",
                "sha256": "ee26a18650e4e68fbde130f7da519ab5a4e67b43bb66d2aef05ccd0dab5a716d"
            },
            "downloads": -1,
            "filename": "brif-1.4.5-cp311-cp311-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "9be1af88a50e8cc7ccc798594912125d",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.5",
            "size": 31031,
            "upload_time": "2024-10-15T19:30:55",
            "upload_time_iso_8601": "2024-10-15T19:30:55.324507Z",
            "url": "https://files.pythonhosted.org/packages/16/9e/c9912dfa28b63e7a2cf6a96be1cf6301ccaede00270de0d697426f954f83/brif-1.4.5-cp311-cp311-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "71b7ee005512c96e72e901cb98ff206d7e0fd530cf1e5552fe8e1c0c438f9495",
                "md5": "94cde9389bf3f739fd02a2630b56d5bb",
                "sha256": "e1b9b684f10d5a28b7f5bfd2ff1265c490241edbdaafa015412067bd46309528"
            },
            "downloads": -1,
            "filename": "brif-1.4.5-cp38-cp38-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "94cde9389bf3f739fd02a2630b56d5bb",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.5",
            "size": 33377,
            "upload_time": "2024-10-15T19:26:09",
            "upload_time_iso_8601": "2024-10-15T19:26:09.791227Z",
            "url": "https://files.pythonhosted.org/packages/71/b7/ee005512c96e72e901cb98ff206d7e0fd530cf1e5552fe8e1c0c438f9495/brif-1.4.5-cp38-cp38-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a6b862b9bab1f7100a6664b9a957b2c82eb09efbc2c3419a530af46ff36d2a8a",
                "md5": "3361e910671ebe64c23e50db6c0ab743",
                "sha256": "4dfd6f46b7758303096b8c7107a0ee370817e477d13a811785deef2cdce43b40"
            },
            "downloads": -1,
            "filename": "brif-1.4.5.tar.gz",
            "has_sig": false,
            "md5_digest": "3361e910671ebe64c23e50db6c0ab743",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5",
            "size": 26725,
            "upload_time": "2024-10-15T19:26:10",
            "upload_time_iso_8601": "2024-10-15T19:26:10.846176Z",
            "url": "https://files.pythonhosted.org/packages/a6/b8/62b9bab1f7100a6664b9a957b2c82eb09efbc2c3419a530af46ff36d2a8a/brif-1.4.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-15 19:26:10",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "brif"
}
        
Elapsed time: 1.82844s