You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using ranger to predict threshold exceedance probabilities using QRF. I am puzzled by the behaviour with small numbers of trees. With for example 100 trees, I get nice continuous probabilities. But for 5 trees, it predicts either 0, 0.2, 0.4, 0.6, 0.8 or 1.0, and for 3 trees 0, 0.5 or 1. It looks like the individual trees predict deterministically, and the probabilities are derived by combining these deterministic trees. But for QRF as I understand, the probability distribution should come from the ecdf of all data points in the terminal nodes, so even for 1 tree, many different probabilites should be possible. Any ideas on this?
Code that shows this behaviour:
`library(ranger)
#load data set
data(mtcars)
#extract 1 column as test forecast row
xtest=mtcars[3,]
mtcars=mtcars[-c(3)]
#fit model using 50 trees
model <- ranger(data=mtcars,quantreg=TRUE,dependent.variable.name='mpg',importance='permutation',classification=FALSE,num.trees=50,min.node.size=2)
outcomes <- predict(model,xtest,what=function(x) ecdf(x)(23),type='quantiles')
print(outcomes$predictions)
#fit model using 3 trees
model <- ranger(data=mtcars,quantreg=TRUE,dependent.variable.name='mpg',importance='permutation',classification=FALSE,num.trees=3,min.node.size=2)
outcomes <- predict(model,xtest,what=function(x) ecdf(x)(23),type='quantiles')
print(outcomes$predictions)
#fit model using 1 tree
model <- ranger(data=mtcars,quantreg=TRUE,dependent.variable.name='mpg',importance='permutation',classification=FALSE,num.trees=1,min.node.size=2)
outcomes <- predict(model,xtest,what=function(x) ecdf(x)(23),type='quantiles')
print(outcomes$predictions)`
The text was updated successfully, but these errors were encountered:
I am using ranger to predict threshold exceedance probabilities using QRF. I am puzzled by the behaviour with small numbers of trees. With for example 100 trees, I get nice continuous probabilities. But for 5 trees, it predicts either 0, 0.2, 0.4, 0.6, 0.8 or 1.0, and for 3 trees 0, 0.5 or 1. It looks like the individual trees predict deterministically, and the probabilities are derived by combining these deterministic trees. But for QRF as I understand, the probability distribution should come from the ecdf of all data points in the terminal nodes, so even for 1 tree, many different probabilites should be possible. Any ideas on this?
Code that shows this behaviour:
`library(ranger)
#load data set
data(mtcars)
#extract 1 column as test forecast row
xtest=mtcars[3,]
mtcars=mtcars[-c(3)]
#fit model using 50 trees
model <- ranger(data=mtcars,quantreg=TRUE,dependent.variable.name='mpg',importance='permutation',classification=FALSE,num.trees=50,min.node.size=2)
outcomes <- predict(model,xtest,what=function(x) ecdf(x)(23),type='quantiles')
print(outcomes$predictions)
#fit model using 3 trees
model <- ranger(data=mtcars,quantreg=TRUE,dependent.variable.name='mpg',importance='permutation',classification=FALSE,num.trees=3,min.node.size=2)
outcomes <- predict(model,xtest,what=function(x) ecdf(x)(23),type='quantiles')
print(outcomes$predictions)
#fit model using 1 tree
model <- ranger(data=mtcars,quantreg=TRUE,dependent.variable.name='mpg',importance='permutation',classification=FALSE,num.trees=1,min.node.size=2)
outcomes <- predict(model,xtest,what=function(x) ecdf(x)(23),type='quantiles')
print(outcomes$predictions)`
The text was updated successfully, but these errors were encountered: