Skip to content

imbs-hl/Pomona

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Travis Build Status AppVeyor Build Status Coverage Status

Pomona

Silke Szymczak and Cesaire J.K. Fouodo

Introduction

This package provides different methods for identifying relevant variables in omics data sets using Random Forests. It implements the following approaches: empirical and parametric permutation (Altmann), Boruta, Vita, r2VIM (recurrent relative variable importance), RFE (recursive feature elimination) and Hybrid, combining Vita and Boruta. All approaches use unscaled permutation variable importance and the R package ranger to generate the forests. The package also includes a function to simulate correlated gene expression data.

Installation

Installation from Github:

devtools::install_github("imbs-hl/Pomona")

CRAN release coming soon.

Usage

For usage in R, see ?Pomona in R. Most importantly, see the Examples section. As a first example you could try

data <- simulation.data.cor(no.samples = 100, group.size = rep(10, 6), no.var.total = 200)
res <- var.sel.hybrid(x = data[, -1], y = data[, 1])

References

  • Nembrini, S., Koenig, I. R. & Wright, M. N. (2018). The revival of the Gini Importance? Bioinformatics. https://doi.org/10.1093/bioinformatics/bty373.
  • Janitza, S, Celik, E, Boulesteix, AL. (2018). A computationally fast variable importance test for random forests for high-dimensional data. Adv Data Anal Classif.; doi.org: 10.1007/s11634-016-0276-4
  • Kursa, M. B. and Rudnicki, W. R. (2010). Feature Selection with the Boruta Package. Journal of Statistical Software. \emph{Journal of Statistical Software, 36(11)}, p. 1-13. URL: \url{http://www.jstatsoft.org/v36/i11/}.
  • Wright, M. N. and Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software, 77(1), 1–17.
  • Szymczak, S., Holzinger, E., Dasgupta, A., Malley, J. D., Molloy, A. M., Mills, J. L., Brody, L. C., Stambolian, D., and Bailey-Wilson, J. E. (2016). r2VIM: A new variable selection method for random forests in genome-wide association studies. BioData Mining, 9(1), 7.

About

Random forest variable importance

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages