Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature snakemake utilities #49

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions README_snakemake-utils.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Snakemake Utilities / CUBI

This repository contains a script to generate Snakemake profiles from a generic config and resource presets. Currently, the data in this repository support generating Snakemake profiles for the HHU cluster "HILBERT", and for local execution such as on a laptop.

## Usage

If necessary, create the Conda environment specified in `envs/smk_profile.yaml` to have the pyYAML package available.

Run the script `set_profile.py --help` to display the command line help.

Briefly, the mode of operation is as follows:

1. specify the infrastructrue (`-i`) you are targeting: `local` or `hilbert`
2. if you plan on using Snakemake version 8.x you have to enter `-smk8` because starting with Snakemake 8.0 some options/commands have been deprecated or renamed
3. for cluster execution, select a resource preset YAML file (`-r`) located in `profiles/<CLUSTER>/resource_presets/<PRESET>.yaml`
- the preset equivalent to the Snakemake profile up to release/tag v1.0.0 of this repository is `mem-mb_walltime_wo-bonus.yaml`
- if you activated `-smk8` make sure that you select a Snakemake 8.x adjusted YAML file located in `profiles/<CLUSTER>/resource_presets/<PRESET>_smk8.yaml`
4. specify the values to replace the placeholders as an ordered list (`-p`). The current set of recognized placeholders are - in that order - the "project" name and the "anchor" name (context: bonus/priority points).
5. specify the Snakemake working directory via `-w`, the profile will be copied to this folder. This file copying is done because Snakemake does not reasonably resolve paths to the files mentioned in the profile.
- if you generate several profiles, e.g., one with and one with using bonus points, you can also specify a suffix via `-s` that will be appended to the profile folder name.

Having generated your execution profile, you can run `snakemake` as follows:

```bash
$ snakemake -d SNAKEMAKE-WORK-DIR/ --profile SNAKEMAKE-WORK-DIR/prf_<PROJECT>_<SUFFIX> [...]
```

As explained above, the `SUFFIX` part is optional.

If you execute your workflow on an HPC cluster, the created profile folder includes a special config file `env.yaml`
that contains information on available CPU cores and common (and maximal) memory configurations of the
cluster compute nodes (= the job execution servers). Using that information requires loading this configuration
file via the `--configfiles` parameters:

```bash
$ snakemake -d SNAKEMAKE-WORK-DIR/ \
--profile SNAKEMAKE-WORK-DIR/prf_<PROJECT>_<SUFFIX> \
--configfiles SNAKEMAKE-WORK-DIR/prf_<PROJECT>_<SUFFIX>/env.yaml \
[...]
```

Note that the CUBI Snakemake workflow template sets (low) default values for the available CPU cores, so it is
strongly recommended to make use of the `env.yaml` configuration file.

### Cluster logs

Note that the `pbs-submit.py` script includes the option to create the required directories that are the destinations for `stdout` and `stderr` of the cluster jobs:

```
pbs-submit.py ++mkdirs clusterLogs/err,clusterLogs/out [...]
```

These directory names match what is then specified further down in the profile:

```
-e clusterLogs/err/{rule}.{jobid}.stderr
-o clusterLogs/out/{rule}.{jobid}.stdout
```

If these folders do not exist at runtime, you'll receive PBS error notifications via e-mail.

## Contributors

- HHU/CUBI source
- Developer: Peter Ebert
- HHU source
- Developer: Lukas Rose
- Original source
- Copyright: 2017 Snakemake-Profiles
- License: MIT
- Developer: gh#neilav
- URL: https://github.com/Snakemake-Profiles/pbs-torque
10 changes: 10 additions & 0 deletions cubi-tools/prototypes/envs/smk_profile.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
name: smk_profile
dependencies:
- Python=3.9.*
- pip
- mamba=0.25.0
- pyyaml=6.0
- semver=2.13.0
- pylint=2.14.5
- isort=5.10.1
- black=22.6.0
30 changes: 30 additions & 0 deletions cubi-tools/prototypes/profiles/hilbert/base.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@

# Use custom submit script that, by default,
# adds the Singularity envmodule to the jobscript
# to be loaded before the Snakemake job execution.
# To deactivate that behavior, add the parameter
# ++no-singularity-module
cluster: >-
pbs-submit.py ++mkdirs log/cluster_jobs/err,log/cluster_jobs/out
-e log/cluster_jobs/err/{rule}.{jobid}.stderr
-o log/cluster_jobs/out/{rule}.{jobid}.stdout
-N {jobid}_{rule}

cluster-status: pbs-status.py
cluster-cancel: qdel
jobscript: pbs-jobscript.sh
jobs: 100
local-cores: 2
max-jobs-per-second: 5
immediate-submit: false
max-status-checks-per-second: 10
scheduler: ilp
verbose: false
reason: false
latency-wait: 60
keep-going: true
keep-incomplete: false
restart-times: 1
rerun-incomplete: true
nolock: true
conda-frontend: mamba
29 changes: 29 additions & 0 deletions cubi-tools/prototypes/profiles/hilbert/base_smk8.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@

# Use custom submit script that, by default,
# adds the Singularity envmodule to the jobscript
# to be loaded before the Snakemake job execution.
# To deactivate that behavior, add the parameter
# ++no-singularity-module
cluster-generic-submit-cmd: >-
pbs-submit.py ++mkdirs log/cluster_jobs/err,log/cluster_jobs/out
-e log/cluster_jobs/err/{rule}.{jobid}.stderr
-o log/cluster_jobs/out/{rule}.{jobid}.stdout
-N {jobid}_{rule}

cluster-generic-status-cmd: pbs-status.py
cluster-generic-cancel-cmd: qdel
jobscript: pbs-jobscript.sh
jobs: 100
local-cores: 2
max-jobs-per-second: 5
immediate-submit: false
max-status-checks-per-second: 10
scheduler: ilp
verbose: false
latency-wait: 60
keep-going: true
keep-incomplete: false
restart-times: 1
rerun-incomplete: true
nolock: true
conda-frontend: mamba
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
#!/bin/sh
# properties = {properties}

# If really needed for debugging, uncomment the following two lines:
#echo "Will execute the following jobscript: "
#cat $0

# Will be inserted by pbs-submit.py
# <modules>

# 2022-03-31
# Properly set TMPDIR and change the default location
# of SINGULARITY_CACHEDIR to the (node-local) temp storage.
# At the time writing, this deals with certain Singularity
# problems when too many container run in parallel and dump
# their rootfs all to the same location on the /gpfs
# (default: /gpfs/scratch/$USER/.singularity)
# CAVEAT: the node-local temp storage is not monitored and
# cannot be requested as a job resources, which increases
# the risk of job failures because the node is running out
# of temp storage.

# As long as the node-local temp storage is not monitored
# by PBS, track the info in the job output logs for
# potential debugging purposes.

echo "Execution host:"
echo `uname -a`
echo "Size of /tmp:"
echo `df -h /tmp`

# Unlikely: if a jobscript is not executed via
# the cluster scheduler (PBS), it will nevertheless
# create a temp directory, which needs to be
# cleaned up after the job (no matter the job's exit status)
TMPCLEANUP="MANUAL"

if [[ -d $TMPDIR ]];
then
echo "TMPDIR is set to: $TMPDIR"
TMPCLEANUP="AUTO"
else
echo "No TMPDIR set"
TMPDIR=$(mktemp -d -t $USER-task-XXXXXXXX)
echo "TMPDIR set to: $TMPDIR"
fi;

# set all of these in case some tool dev doesn't know
# how to properly request a temp file...
TEMP=$TMPDIR
TEMPDIR=$TMPDIR
TMP=$TMPDIR
echo "Set env vars TEMP / TEMPDIR / TMP to $TMPDIR"
SINGULARITY_CACHEDIR=$TMPDIR/.singularity/cache
SINGULARITY_TMPDIR=$TMPDIR/.singularity/tmpdir
echo "SINGULARITY_CACHEDIR set to $SINGULARITY_CACHEDIR"
echo "SINGULARITY_TMPDIR set to $SINGULARITY_TMPDIR"

{exec_job}

# 2022-04-07 note: for Snakemake cluster jobs, this last
# part of the jobscript is not triggered if a cluster
# status command script is configured at the Snakemake
# command line (or profile). If so, the Snakemake
# command is extended with " && exit 0 || exit 1"
# (see "executors.py"), presumably to ensure always
# returning 0 or 1. In a cluster run, this seems
# acceptable since the scheduler will take care of
# cleaning up $TMPDIR.

# Capture job's exit status before triggering
# potential cleanup operations

JOBEXIT=$?

if [[ "$TMPCLEANUP" = "MANUAL" ]];
then
echo "Deleting TMPDIR: $TMPDIR"
rm -rfd $TMPDIR
fi;

echo "Done - job exit status: $JOBEXIT"
exit $JOBEXIT
Loading