# Description
Build random forests for classification and regression problems.
The same program is available on [CRAN](URL 'https://cran.r-project.org/web/packages/brif/index.html') for R users.
# Installation
For Python:
```bash
pip install brif
```
For R:
```R
install.packages('brif')
```
To use on Google Colab:
```python
!pip install brif
from brif import brif
```
# Examples
```python
from brif import brif
import pandas as pd
# Create a brif object with default parameters.
bf = brif.brif()
# Display the current parameter values.
bf.get_param()
# To change certain parameter values, e.g.:
bf.set_param({'ntrees':100, 'nthreads':2})
# Or simply:
bf.ntrees = 200
# Load input data frame. Data must be a pandas data frame with appropriate headers.
df = pd.read_csv("auto.csv")
# Train the model
bf.fit(df, 'origin') # specify the target column name
# Or equivalently
bf.fit(df, 7) # specify the target column index
# Make predictions
# The target variable column must be excluded, and all other columns should appear in the same order as in training
# Here, predict the first 10 rows of df
pred_labels = bf.predict(df.iloc[0:10, 0:7], type='class') # return a list containing the predicted class labels
pred_scores = bf.predict(df.iloc[0:10, 0:7], type='score') # return a data frame containing predicted probabilities by class
# Note: for a regression problem (i.e., when the response variable is numeric type), the predict function will always return a list containing the predicted values
```
# Parameters
**tmp_preddata**
a character string specifying a filename to save the temporary scoring data. Default is "tmp_brif_preddata.txt".
**n_numeric_cuts**
an integer value indicating the maximum number of split points to generate for each numeric variable.
**n_integer_cuts**
an integer value indicating the maximum number of split points to generate for each integer variable.
**max_integer_classes**
an integer value. If the target variable is integer and has more than max_integer_classes unique values in the training data, then the target variable will be grouped into max_integer_classes bins. If the target variable is numeric, then the smaller of max_integer_classes and the number of unique values number of bins will be created on the target variables and the regression problem will be solved as a classification problem.
**max_depth**
an integer specifying the maximum depth of each tree. Maximum is 40.
**min_node_size**
an integer specifying the minimum number of training cases a leaf node must contain.
**ntrees**
an integer specifying the number of trees in the forest.
**ps**
an integer indicating the number of predictors to sample at each node split. Default is 0, meaning to use sqrt(p), where p is the number of predictors in the input.
**max_factor_levels**
an integer. If any factor variables has more than max_factor_levels, the program stops and prompts the user to increase the value of this parameter if the too-many-level factor is indeed intended.
**bagging_method**
an integer indicating the bagging sampling method: 0 for sampling without replacement; 1 for sampling with replacement (bootstrapping).
**bagging_proportion**
a numeric scalar between 0 and 1, indicating the proportion of training observations to be used in each tree.
**split_search**
an integer indicating the choice of the split search method. 0: randomly pick a split point; 1: do a local search; 2: random pick subject to regulation; 3: local search subject to regulation; 4 or above: a mix of options 0 to 3.
**search_radius**
a positive integer indicating the split point search radius. This parameter takes effect only in the self-regulating local search (split_search = 2 or above).
**seed**
a positive integer, random number generator seed.
**nthreads**
an integer specifying the number of threads used by the program. This parameter takes effect only on systems supporting OpenMP.
**vote_method**
an integer (0 or 1) specifying the voting method in prediction. 0: each leaf contributes the raw count and an average is taken on the sum over all leaves; 1: each leaf contributes an intra-node fraction which is then averaged over all leaves with equal weight.
**na_numeric**
a numeric value, substitute for 'nan' in numeric variables.
**na_integer**
an integer value, substitute for 'nan' in integer variables.
**na_factor**
a character string, substitute for missing values in factor variables.
**type**
a character string indicating the return content of the predict function. For a classification problem, "score" means the by-class probabilities and "class" means the class labels (i.e., the target variable levels). For regression, the predicted values are returned. This is a parameter for the predict function, not an attribute of the brif object.
Raw data
{
"_id": null,
"home_page": "https://pypi.org/project/brif/",
"name": "brif",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.5",
"maintainer_email": null,
"keywords": "random forest, classification, regression, prediction",
"author": "Yanchao Liu",
"author_email": "yanchaoliu@wayne.edu",
"download_url": "https://files.pythonhosted.org/packages/a6/b8/62b9bab1f7100a6664b9a957b2c82eb09efbc2c3419a530af46ff36d2a8a/brif-1.4.5.tar.gz",
"platform": null,
"description": "# Description\n\nBuild random forests for classification and regression problems. \nThe same program is available on [CRAN](URL 'https://cran.r-project.org/web/packages/brif/index.html') for R users. \n\n# Installation\n\nFor Python:\n```bash\npip install brif\n```\n\nFor R:\n```R\ninstall.packages('brif')\n```\n\nTo use on Google Colab:\n```python\n!pip install brif\nfrom brif import brif\n```\n\n# Examples\n\n```python\nfrom brif import brif\nimport pandas as pd\n\n# Create a brif object with default parameters.\nbf = brif.brif() \n\n# Display the current parameter values. \nbf.get_param() \n\n# To change certain parameter values, e.g.:\nbf.set_param({'ntrees':100, 'nthreads':2})\n\n# Or simply:\nbf.ntrees = 200\n\n# Load input data frame. Data must be a pandas data frame with appropriate headers.\ndf = pd.read_csv(\"auto.csv\")\n\n# Train the model\nbf.fit(df, 'origin') # specify the target column name\n\n# Or equivalently\nbf.fit(df, 7) # specify the target column index\n\n# Make predictions \n# The target variable column must be excluded, and all other columns should appear in the same order as in training\n# Here, predict the first 10 rows of df\npred_labels = bf.predict(df.iloc[0:10, 0:7], type='class') # return a list containing the predicted class labels\npred_scores = bf.predict(df.iloc[0:10, 0:7], type='score') # return a data frame containing predicted probabilities by class\n\n# Note: for a regression problem (i.e., when the response variable is numeric type), the predict function will always return a list containing the predicted values\n\n```\n\n# Parameters\n**tmp_preddata**\na character string specifying a filename to save the temporary scoring data. Default is \"tmp_brif_preddata.txt\".\n\n**n_numeric_cuts**\t\nan integer value indicating the maximum number of split points to generate for each numeric variable.\n\n**n_integer_cuts**\t\nan integer value indicating the maximum number of split points to generate for each integer variable.\n\n**max_integer_classes**\nan integer value. If the target variable is integer and has more than max_integer_classes unique values in the training data, then the target variable will be grouped into max_integer_classes bins. If the target variable is numeric, then the smaller of max_integer_classes and the number of unique values number of bins will be created on the target variables and the regression problem will be solved as a classification problem.\n\n**max_depth**\nan integer specifying the maximum depth of each tree. Maximum is 40.\n\n**min_node_size**\t\nan integer specifying the minimum number of training cases a leaf node must contain.\n\n**ntrees**\nan integer specifying the number of trees in the forest.\n\n**ps**\nan integer indicating the number of predictors to sample at each node split. Default is 0, meaning to use sqrt(p), where p is the number of predictors in the input.\n\n**max_factor_levels**\nan integer. If any factor variables has more than max_factor_levels, the program stops and prompts the user to increase the value of this parameter if the too-many-level factor is indeed intended.\n\n**bagging_method**\nan integer indicating the bagging sampling method: 0 for sampling without replacement; 1 for sampling with replacement (bootstrapping).\n\n**bagging_proportion**\t\na numeric scalar between 0 and 1, indicating the proportion of training observations to be used in each tree.\n\n**split_search**\nan integer indicating the choice of the split search method. 0: randomly pick a split point; 1: do a local search; 2: random pick subject to regulation; 3: local search subject to regulation; 4 or above: a mix of options 0 to 3.\n\n**search_radius**\na positive integer indicating the split point search radius. This parameter takes effect only in the self-regulating local search (split_search = 2 or above).\n\n**seed**\na positive integer, random number generator seed.\n\n**nthreads**\nan integer specifying the number of threads used by the program. This parameter takes effect only on systems supporting OpenMP.\n\n**vote_method**\nan integer (0 or 1) specifying the voting method in prediction. 0: each leaf contributes the raw count and an average is taken on the sum over all leaves; 1: each leaf contributes an intra-node fraction which is then averaged over all leaves with equal weight.\n\n**na_numeric**\na numeric value, substitute for 'nan' in numeric variables.\n\n**na_integer**\nan integer value, substitute for 'nan' in integer variables.\n\n**na_factor**\na character string, substitute for missing values in factor variables. \n\n**type**\na character string indicating the return content of the predict function. For a classification problem, \"score\" means the by-class probabilities and \"class\" means the class labels (i.e., the target variable levels). For regression, the predicted values are returned. This is a parameter for the predict function, not an attribute of the brif object. \n\n",
"bugtrack_url": null,
"license": "GPL3",
"summary": "Build decision trees and random forests for classification and regression.",
"version": "1.4.5",
"project_urls": {
"Homepage": "https://pypi.org/project/brif/"
},
"split_keywords": [
"random forest",
" classification",
" regression",
" prediction"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "169ec9912dfa28b63e7a2cf6a96be1cf6301ccaede00270de0d697426f954f83",
"md5": "9be1af88a50e8cc7ccc798594912125d",
"sha256": "ee26a18650e4e68fbde130f7da519ab5a4e67b43bb66d2aef05ccd0dab5a716d"
},
"downloads": -1,
"filename": "brif-1.4.5-cp311-cp311-win_amd64.whl",
"has_sig": false,
"md5_digest": "9be1af88a50e8cc7ccc798594912125d",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.5",
"size": 31031,
"upload_time": "2024-10-15T19:30:55",
"upload_time_iso_8601": "2024-10-15T19:30:55.324507Z",
"url": "https://files.pythonhosted.org/packages/16/9e/c9912dfa28b63e7a2cf6a96be1cf6301ccaede00270de0d697426f954f83/brif-1.4.5-cp311-cp311-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "71b7ee005512c96e72e901cb98ff206d7e0fd530cf1e5552fe8e1c0c438f9495",
"md5": "94cde9389bf3f739fd02a2630b56d5bb",
"sha256": "e1b9b684f10d5a28b7f5bfd2ff1265c490241edbdaafa015412067bd46309528"
},
"downloads": -1,
"filename": "brif-1.4.5-cp38-cp38-macosx_10_9_x86_64.whl",
"has_sig": false,
"md5_digest": "94cde9389bf3f739fd02a2630b56d5bb",
"packagetype": "bdist_wheel",
"python_version": "cp38",
"requires_python": ">=3.5",
"size": 33377,
"upload_time": "2024-10-15T19:26:09",
"upload_time_iso_8601": "2024-10-15T19:26:09.791227Z",
"url": "https://files.pythonhosted.org/packages/71/b7/ee005512c96e72e901cb98ff206d7e0fd530cf1e5552fe8e1c0c438f9495/brif-1.4.5-cp38-cp38-macosx_10_9_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a6b862b9bab1f7100a6664b9a957b2c82eb09efbc2c3419a530af46ff36d2a8a",
"md5": "3361e910671ebe64c23e50db6c0ab743",
"sha256": "4dfd6f46b7758303096b8c7107a0ee370817e477d13a811785deef2cdce43b40"
},
"downloads": -1,
"filename": "brif-1.4.5.tar.gz",
"has_sig": false,
"md5_digest": "3361e910671ebe64c23e50db6c0ab743",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5",
"size": 26725,
"upload_time": "2024-10-15T19:26:10",
"upload_time_iso_8601": "2024-10-15T19:26:10.846176Z",
"url": "https://files.pythonhosted.org/packages/a6/b8/62b9bab1f7100a6664b9a957b2c82eb09efbc2c3419a530af46ff36d2a8a/brif-1.4.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-15 19:26:10",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "brif"
}