Run mikropml with snakemake

Snakemake is a workflow manager that enables massively parallel and reproducible analyses. Snakemake is a suitable tool to use when you can break a workflow down into discrete steps, with each step having input and output files.

mikropml is an R package for supervised machine learning pipelines. We provide this example workflow as a template to get started running mikropml with snakemake. We hope you then customize the code to meet the needs of your particular ML task.

For more details on these tools, see the Snakemake tutorial and read the mikropml docs.

The Workflow

The Snakefile contains rules which define the output files we want and how to make them. Snakemake automatically builds a directed acyclic graph (DAG) of jobs to figure out the dependencies of each of the rules and what order to run them in. This workflow preprocesses the example dataset, calls mikropml::run_ml() for each seed and ML method set in the config file, combines the results files, plots performance results (cross-validation and test AUROCs, hyperparameter AUROCs from cross-validation, and benchmark performance), and renders a simple R Markdown report as a GitHub-flavored markdown file (see example here).

The DAG shows how calls to run_ml can run in parallel if snakemake is allowed to run more than one job at a time. If we use 100 seeds and 4 ML methods, snakemake would call run_ml 400 times. Here's a small example DAG if we were to use only 2 seeds and 1 ML method:

Usage

Full usage instructions recommended by snakemake are available in the snakemake workflow catalog. Snakemake recommends using snakedeploy to use this workflow as a module in your own project.

Alternatively, you can download this repo and modify the code directly to suit your needs. See instructions here.

Help & Contributing

If you come across a bug, open an issue and include a minimal reproducible example.

If you have questions, create a new post in Discussions.

If you’d like to contribute, see our guidelines here.

Code of Conduct

Please note that the mikropml-snakemake-workflow is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Name		Name	Last commit message	Last commit date
Latest commit History 579 Commits
.github		.github
.tests/unit		.tests/unit
config		config
data/processed		data/processed
figures		figures
log/hpc/jobs		log/hpc/jobs
workflow		workflow
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yml		.pre-commit-config.yml
.snakemake-workflow-catalog.yml		.snakemake-workflow-catalog.yml
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
mikRopML-snakemake-workflow.Rproj		mikRopML-snakemake-workflow.Rproj
quick-start.md		quick-start.md
report-example.md		report-example.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Run mikropml with snakemake

The Workflow

Usage

Help & Contributing

Code of Conduct

More resources

About

Releases 4

Contributors 8

Languages

License

SchlossLab/mikropml-snakemake-workflow

Folders and files

Latest commit

History

Repository files navigation

Run mikropml with snakemake

The Workflow

Usage

Help & Contributing

Code of Conduct

More resources

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 4

Contributors 8

Languages