Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: inclusion of the trivial random forest model #711

Open
DoktorPi opened this issue Nov 26, 2023 · 2 comments
Open

Feature Request: inclusion of the trivial random forest model #711

DoktorPi opened this issue Nov 26, 2023 · 2 comments

Comments

@DoktorPi
Copy link

Currently, the `ranger' package does not support fitting a model with no covariates (i.e., the trivial model Y~ 1). In the context of random forest, this would effectively amount to simply bootstrapping the mean of Y along with the respective OOB estimates for each prediction and the prediction error (variance of Y).

The potential benefits of implementing the case of zero covariates could be the following:

  1. The trivial model without covariates often serves as a reference or literal null model.
  2. It fits neatly into the class of random forests as a trivial but essential subclass, i.e. bagging the mean.
  3. Many scenarios involve automated screening through various sets of predictors, which may or may not include the trivial model, such that no additional code must be specified for handling exceptional cases.

As a side note: Setting mtry = 0 does not force a fit of the trivial model, if handing over a formula with at least one covariate, but seems to force a fit with mtry = 1.

@mnwright
Copy link
Member

Thanks, that's a good idea. Could the interface look like this:

  • Formula interface: y ~ 1
  • dependent.variable.name interface: Supply data with just the target column?
  • x/y interface: x = NULL

Or any other ideas?

@DoktorPi
Copy link
Author

DoktorPi commented Dec 4, 2023

I fully agree with your scheme:

  1. Formula interface: y~1 is the standard way to specify the trivial model in R´s formula syntax.
  2. Dependent Variable interface: Triggering the trivial model by providing only one data column aligns very well with automated data processing pipeline.
  3. x/y interface: providing y without x as a method to trigger the trivial model is convenient and consistent.

So, this seems to be the most consistent interface scheme.

Btw, I also discovered a workaround to emulate the trivial model in ranger by just specifying a non-trivial formula (at least one predictor) but setting min.node.size equal to the sample size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants