-
Notifications
You must be signed in to change notification settings - Fork 22
Home
Note: these pipelines were made for the CRyPTIC project, but in principle can be used on any bacteria.
- This page - an overview of Clockwork and installation instructions.
- Two walkthroughs, depending on how you decide to run clockwork (see below for the options): clockwork scripts only, or using nextflow and a database
- How to use your own custom remove contamination data specific to your species.
- Pipeline output files
- A description of the spreadsheet format for importing data
- Spreadsheet validation - instructions for data submitters who want to validate an import spreadsheet.
- Information for developers - how to develop the code and run the tests.
The pipelines are designed for use with paired reads, and assume a pair of FASTQ files for each sequencing run. They are:
- Import (only applicable if tracking using a database as described below)
- Remove contamination - decontaminates reads. This can be customised, but scripts are available specifically for decontaminating M. tuberculosis reads.
- QC - gathers various QC stats from mapping and SAMtools and FASTQC.
- Variant call - the main Clockwork pipeline. Trims reads (Trimmomatic), calls variants (minimap2/SAMtools and Cortex), merges variant calls (Minos) to make final call set.
- Mykrobe - runs mykrobe predict
Installation instructions are at the end of this page. They depend on how you are running the pipelines, so please read the next section before installing anything!
There are two different ways you can run the pipelines:
-
Run each clockwork script manually on each sample. This is appropriate if you want control over all your jobs and/or have a small dataset. You can run clockwork scripts such as
clockwork variant_call
on one sample to make variant calls for that sample. Some pipelines will require running more than one clockwork script (for example to remove contamination, a script to map the reads, then a second script to make decontaminated FASTQ files). Unlike options 2, this does not require nextflow and/or MySQL. But it puts the responsibility on you to track all your samples and orchestrate running jobs. -
The "full experience": clockwork can track all your samples using a MySQL database. This is applicable if you have a large number of samples. The clockwork scripts handle all interactions with the database, and pipelines are run using nextflow. Roughly, the process is to import your data (using the "import" Clockwork pipeline), and then when running subsequent pipelines Clockwork will find new data and only run on those. Clockwork will take care of putting all files inside its own directory structure, and to get eg variant calls (a VCF for each sample), there is a script that outputs a TSV file with file paths. Bear in mind that troubleshooting may be difficult and could involve running some manual SQL commands to tidy things up if things go wrong.
There is a walkthrough for each of the options:
We recommend that you use either Singularity or Docker (otherwise, have fun trying to install all the depenencies minimap2, samtools, bcftools, gramtools, ...!).
Singularity containers are available for each
release from version
v0.11.0 onwards. For example, the file for v0.11.0 is called
clockwork_v0.11.0.img
and can be downloaded from the
v0.11.0 release.
The latest Docker image can be obtained with:
docker pull ghcr.io/iqbal-lab-org/clockwork:latest
All Docker images are listed on the clockwork packages github page.
Alternatively, you can build a container by cloning the repository and running:
singularity build clockwork.img Singularity.def
or
docker build .
from the root of the repository.
If the build fails because of mysql errors: check that your host does not currently have mysql running. The errors will look something like this:
Errors were encountered while processing:
mysql-server-8.0
mysql-server
If mysql is running, then it breaks the build (port conflicts). Look for mysql running like this (on Ubuntu):
$ sudo netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:33060 0.0.0.0:* LISTEN 738/mysqld
tcp 0 0 127.0.0.1:3306 0.0.0.0:* LISTEN 738/mysqld
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 604/systemd-resolve
... etc
If you see mysqld
in there, then stop it running.
On ubuntu, you can run
sudo service mysql stop
Other distros may vary.
If you are running nextflow pipelines - ie options 2 - then you will need nextflow installed. The overall method is to get the clockwork nextflow scripts (eg by cloning the clockwork repository), then run a nextflow script but pointing it at the singularity or docker container for it to run each process. This means that nextflow itself and the clockwork nextflow scripts are not inside the clockwork container. If this is not clear, then please see the walkthrough for examples.
This is only needed if you are running clockwork pipelines using option 2 above. Just like nextflow, MySQL should be installed on your host machine - it is not inside the clockwork container.