Skip to content

Commit

Permalink
Merge branch 'master' into split_stats
Browse files Browse the repository at this point in the history
  • Loading branch information
mnwright committed Nov 8, 2023
2 parents 6305b67 + 5550eaf commit 3cc3b53
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 9 deletions.
25 changes: 21 additions & 4 deletions R/ranger.R
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,22 @@
##' This importance measure can be combined with the methods to estimate p-values in \code{\link{importance_pvalues}}.
##' We recommend not to use the 'impurity_corrected' importance when making predictions since the feature permutation step might reduce predictive performance (a warning is raised when predicting on new data).
##'
##' Note that ranger has different default values than other packages.
##' For example, our default for \code{mtry} is the square root of the number of variables for all tree types, whereas other packages use different values for regression.
##' Also, changing one hyperparameter does not change other hyperparameters (where possible).
##' For example, \code{splitrule="extratrees"} uses randomized splitting but does not disable bagging as in Geurts et al. (2006).
##' To disable bagging, use \code{replace = FALSE, sample.fraction = 1}.
##' This can also be used to grow a single decision tree without bagging and feature subsetting: \code{ranger(..., num.trees = 1, mtry = p, replace = FALSE, sample.fraction = 1)}, where p is the number of independent variables.
##'
##' While random forests are known for their robustness, default hyperparameters not always work well.
##' For example, for high dimensional data, increasing the \code{mtry} value and the number of trees \code{num.trees} is recommended.
##' For more details and recommendations, see Probst et al. (2019).
##' To find the best hyperparameters, consider hyperparameter tuning with the \code{tuneRanger} or \code{mlr3} packages.
##'
##' Out-of-bag prediction error is calculated as accuracy (proportion of misclassified observations) for classification, as Brier score for probability estimation, as mean squared error (MSE) for regression and as one minus Harrell's C-index for survival.
##' Harrell's C-index is calculated based on the sum of the cumulative hazard function (CHF) over all timepoints, i.e., \code{rowSums(chf)}, where \code{chf} is the the out-of-bag CHF; for details, see Ishwaran et al. (2008).
##' Calculation of the out-of-bag prediction error can be turned off with \code{oob.error = FALSE}.
##'
##' Regularization works by penalizing new variables by multiplying the splitting criterion by a factor, see Deng & Runger (2012) for details.
##' If \code{regularization.usedepth=TRUE}, \eqn{f^d} is used, where \emph{f} is the regularization factor and \emph{d} the depth of the node.
##' If regularization is used, multithreading is deactivated because all trees need access to the list of variables that are already included in the model.
Expand Down Expand Up @@ -130,12 +146,12 @@
##' @param ... Further arguments passed to or from other methods (currently ignored).
##' @return Object of class \code{ranger} with elements
##' \item{\code{forest}}{Saved forest (If write.forest set to TRUE). Note that the variable IDs in the \code{split.varIDs} object do not necessarily represent the column number in R.}
##' \item{\code{predictions}}{Predicted classes/values, based on out of bag samples (classification and regression only).}
##' \item{\code{predictions}}{Predicted classes/values, based on out-of-bag samples (classification and regression only).}
##' \item{\code{variable.importance}}{Variable importance for each independent variable.}
##' \item{\code{variable.importance.local}}{Variable importance for each independent variable and each sample, if \code{local.importance} is set to TRUE and \code{importance} is set to 'permutation'.}
##' \item{\code{prediction.error}}{Overall out of bag prediction error. For classification this is the fraction of missclassified samples, for probability estimation the Brier score, for regression the mean squared error and for survival one minus Harrell's C-index.}
##' \item{\code{r.squared}}{R squared. Also called explained variance or coefficient of determination (regression only). Computed on out of bag data.}
##' \item{\code{confusion.matrix}}{Contingency table for classes and predictions based on out of bag samples (classification only).}
##' \item{\code{prediction.error}}{Overall out-of-bag prediction error. For classification this is accuracy (proportion of misclassified observations), for probability estimation the Brier score, for regression the mean squared error and for survival one minus Harrell's C-index.}
##' \item{\code{r.squared}}{R squared. Also called explained variance or coefficient of determination (regression only). Computed on out-of-bag data.}
##' \item{\code{confusion.matrix}}{Contingency table for classes and predictions based on out-of-bag samples (classification only).}
##' \item{\code{unique.death.times}}{Unique death times (survival only).}
##' \item{\code{chf}}{Estimated cumulative hazard function for each sample (survival only).}
##' \item{\code{survival}}{Estimated survival function for each sample (survival only).}
Expand Down Expand Up @@ -206,6 +222,7 @@
##' \item Sandri, M. & Zuccolotto, P. (2008). A bias correction algorithm for the Gini variable importance measure in classification trees. J Comput Graph Stat, 17:611-628. \doi{10.1198/106186008X344522}.
##' \item Coppersmith D., Hong S. J., Hosking J. R. (1999). Partitioning nominal attributes in decision trees. Data Min Knowl Discov 3:197-217. \doi{10.1023/A:1009869804967}.
##' \item Deng & Runger (2012). Feature selection via regularized trees. The 2012 International Joint Conference on Neural Networks (IJCNN), Brisbane, Australia. \doi{10.1109/IJCNN.2012.6252640}.
##' \item Probst, P., Wright, M. N. & Boulesteix, A-L. (2019). Hyperparameters and tuning strategies for random forest. WIREs Data Mining Knowl Discov 9:e1301.\doi{10.1002/widm.1301}.
##' }
##' @seealso \code{\link{predict.ranger}}
##' @useDynLib ranger, .registration = TRUE
Expand Down
25 changes: 21 additions & 4 deletions man/ranger.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion src/AAA_check_cpp11.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#ifndef WIN_R_BUILD
#if __cplusplus < 201402L
#error Error: ranger requires a C++14 compiler, e.g., gcc >= 5 or Clang >= 3.4. You probably have to update your C++ compiler.
#error Error: ranger requires C++14. Possible fixes: 1) Update R, 2) Set "CXX = g++ -std=gnu++11" or similar in local Makevars, 3) update C++ compiler. See https://github.com/imbs-hl/ranger/wiki/FAQ.
#endif
#endif

0 comments on commit 3cc3b53

Please sign in to comment.