Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prediction interval for class probabilities #583

Open
LouisAsh opened this issue Aug 5, 2021 · 6 comments
Open

Prediction interval for class probabilities #583

LouisAsh opened this issue Aug 5, 2021 · 6 comments

Comments

@LouisAsh
Copy link

LouisAsh commented Aug 5, 2021

Firstly, thanks for the great, smooth and stable package that ranger proved to be.

My question/request is, in a sense, an extension to #136: currently, is there a way to have ranger generate prediction intervals for the class probabilities (in case of binary and multiclass classification with parameter probability=TRUE)?

I know that for regression RFs, standard errors can be estimated with predict.ranger(se.method=), so I wonder if there is an analogous way, in classification tasks, to estimate some sense of the uncertainty about the class probabilities themselves (be it a standard error or an interval).

Searching online, I don't find much about this particular topic, even if, in my mind, at least a Bootstrapping seems to be reasonable in this case. Anyhow, if this is not currently possible in ranger, it would be great to see it eventually available.

@calebasaraba
Copy link

I second this comment -- I have also been looking for some way to do this, but have not been able to find an implementation in R or Python. There are quite a few methods out there for prediction intervals around regression predictions, but I haven't seen any ways to do this for predicted class probabilities. It seems like it would be a common use case.

Love the ranger package and would love to see some way of doing this integrated into the package as well.

@bgreenwell
Copy link
Contributor

@LouisAsh (and @calebasaraba) Can you not already compute standard errors for predicted class probabilities in ranger? Example below using the well-known email spam classification data:

library(ranger)

# Read in email spam data and split into train/test sets using a 70/30 split
data(spam, package = "kernlab")
set.seed(1258)  # for reproducibility
trn.id <- sample(nrow(spam), size = 0.7 * nrow(spam), replace = FALSE)
spam.trn <- spam[trn.id, ]
spam.tst <- spam[-trn.id, ]

set.seed(1108)  # for reproducibility
rfo <- ranger(type ~ ., data = spam.trn, probability = TRUE, keep.inbag = TRUE)

# Plot standard errors
pal <- palette.colors(2, palette = "Okabe-Ito", alpha = 0.5)
se <- predict(rfo, data = spam.trn, type = "se", se.method = "infjack")
classes <- ifelse(se$predictions[, "spam"] > 0.5, "spam", "nonspam")
id <- (classes == spam.trn$type) + 1
plot(se$predictions[, "spam"], se$se[, "spam"], col = pal[id], 
     pch = c(19, 1)[id], xlab = "Predicted probablitiy",
     ylab = "Standard error")

image

@allison-patterson
Copy link

I see this is an old thread but it is still open, so I hope this is a good place for my question. I'm struggling to understand how to interpret the standard errors for a probability model. I saw in the Wager et al 2014 paper (footnote on p 1626) and comments on issue #136 that the regression standard errors can be converted to Gaussian confidence intervals. Since the response values from the probability random forest are probabilities, it is not clear to me how to interpret the standard errors in this case in any absolute way. Is it possible to get a confidence interval for the probability estimates from the standard errors? If this were a traditional generalized linear model, I would convert the probabilities to a logit scale and calculate the CI that way, but I don't know if that is a legitimate way to do this for a RF. Thank you.

@brandongreenwell-8451
Copy link

brandongreenwell-8451 commented May 31, 2024

Hi @allison-patterson, I see your point and here are my two cents.

  • I think the Wager paper still reports the standard errors for probabilities as p +/- SE, which is a bit akward and I'm not certain how useful that would be as a confidence or a prediction interval (nor do they show any results on coverage probability of such intervals).
  • I do see some logic to your thought about doing this on the logit scale then converting back, which would at least ensure the intervals are between 0 and 1. But without some experimentation, I'm not sure how well this would hold up.
  • Conformal inference has shown some promise, but not sure how easy this is to do in R. I know there's a scikit-learn compatible port of ranger that could work with some of the conformal prediction libraries, like MAPIE.
  • You could use a quantile regression forest, which is implemented in ranger to achieve something akin to a confidence interval by using relevant quantiles (e.g., 0.025 and 0.975).
  • I honestly like the idea of natural gradient boosting (see NGBoost) which will estimate the entire distribution of each predicted value and can provide all sorts of inference on your predictions.

@mnwright and others may hopefully have some additional insights or alternative suggestions!

@allison-patterson
Copy link

Hi @brandongreenwell-8451, thank you for considering my problem.

I looked into using quantile regression, but I don't think it is possible. I had to convert my classes to 0/1 to run the quantile regression option (which is probably problematic to start with). The predictions were largely consistent with the probability RF, but the predicted standard errors showed no correlation with the standard errors from the probability RF. When I tried to predict to quantiles the results are either 0 or 1, with any individual observation making the switch from 0 to 1 at quantile that corresponds to the predicted quantile.

I looked into natural gradient boosting, essentially following the classification tutorial at: https://github.com/acca3003/ngboostR. It didn't seem necessary in the classification case. The predictions for a classification model are for a binomial distribution which only has one parameter, so the predicted probability is the same as the predicted distribution. This left me scratching my head again about whether a SE even makes sense for these classification models.

I found some options for conformal inference in R within the 'probably' package, but none of these were compatible with classification models.

@brandongreenwell-8451
Copy link

brandongreenwell-8451 commented Jun 4, 2024

Ahh, yes, I seemingly responded while forgetting you were dealing with a binary outcome! I think NGBoost could still work, if the above standard errors are akward to deal with on a probability scale. Since you're estimating the single parameter of a Bernoulli distribution (i.e., p) then you also know the standard deviation sqrt(p * (1-p)). I don't see why you could not use this with any of the formulas for a confidence interval for a Bernoulli probability!

For instance, as described in the "Exact binomial confidence intervals" in this SE post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants