This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor and histone ChIP-seq pipeline specifications (by Anshul Kundaje) in this google doc.
- Flexibility: Support for
docker
,singularity
andConda
. - Portability: Support for many cloud platforms (Google/DNAnexus) and cluster engines (SLURM/SGE/PBS).
- Resumability: Resume a failed workflow from where it left off.
- User-friendly HTML report: tabulated quality metrics including alignment/peak statistics and FRiP along with many useful plots (IDR/cross-correlation measures).
- Genomes: Pre-built database for GRCh38, hg19, mm10, mm9 and additional support for custom genomes.
This pipeline supports many cloud platforms and cluster engines. It also supports docker
, singularity
and Conda
to resolve complicated software dependencies for the pipeline. A tutorial-based instruction for each platform will be helpful to understand how to run pipelines. There are special instructions for two major Stanford HPC servers (SCG4 and Sherlock).
- Cloud platforms
- Web interface
- CLI (command line interface)
- Stanford HPC servers (CLI)
- Cluster engines (CLI)
- Local computers (CLI)
Output directory specification
There are some useful tools to post-process outputs of the pipeline.
This tool recursively finds and parses all qc.json
(pipeline's final output) found from a specified root directory. It generates a TSV file that has all quality metrics tabulated in rows for each experiment and replicate. This tool also estimates overall quality of a sample by a criteria definition JSON file which can be a good guideline for QC'ing experiments.
This tool parses a metadata JSON file from a previous failed workflow and generates a new input JSON file to start a pipeline from where it left off.
This tool downloads any type (FASTQ, BAM, PEAK, ...) of data from the ENCODE portal. It also generates a metadata JSON file per experiment which will be very useful to make an input JSON file for the pipeline.