Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a sample feature for ICE/c-ICE/d-ICE curves #74

Open
bgreenwell opened this issue May 30, 2018 · 4 comments
Open

Add a sample feature for ICE/c-ICE/d-ICE curves #74

bgreenwell opened this issue May 30, 2018 · 4 comments

Comments

@bgreenwell
Copy link
Owner

bgreenwell commented May 30, 2018

For example, to plot a random (sub)sample of curves

partial(fit, pred.var = "x3", ice = TRUE, frac = 0.5, plot = TRUE)

This would be easiest to accomplish before converting to long format; for example

if (frac < 1) {
  pd.df <- pd.df[sample(nrow(pd.df), size = floor(frac*nrow(pd.df)), replace = FALSE), ]
}
@DeFilippis
Copy link

DeFilippis commented Jul 3, 2019

This is exactly what I came here to ask about! I assume this feature isn't yet implemented? Until the feature is implemented, what is the "right" way to go about hacking this together?

I don't want to just restrict the sample of curves plotted. Is there a way to restrict the sample of curves computed (in addition to plotted), so as to reduce computation time. My dataset has 2.5 million observations, so even with parallel = TRUE, it's taking hours to compute and plot a single feature.

I was thinking of just feeding my random forest model a random subset of the data, and inputting that into the partial command, but I'm worried this is not correct.

@bgreenwell
Copy link
Owner Author

Hey @DeFilippis. the easiest way to accomplish this right now is to provide a sampled version of the original training data via the ‘train’ argument in partial. Fit your model on the full training set though! I can provide a simple example later on if you need!

Now that vip has been updated on cran, I’ve started to work on pdp so hopefully these features will be available in the next release!

@DeFilippis
Copy link

Perfect -- that's really easy. I'm using this in case it helps anybody:

 partial(model, pred.var = "predictor", ice = TRUE, center = TRUE, plot = TRUE, plot.engine =
"ggplot2", parallel = TRUE,  paropts = list(.packages = "ranger"), train = sample_frac(data, .5)))

sample_frac from tidyverse

@bgreenwell
Copy link
Owner Author

That should do it! I’ll be sure to include this feature in the next release, so hopefully soon! Same with the squash function as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants