diff --git a/CHANGELOG.md b/CHANGELOG.md index 39e77a767..25d35d657 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,30 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## dev + +### Credits + +Special thanks to the following for their contributions to the release: + +- [Siddhartha Bagaria](https://github.com/siddharthab) + +### Enhancements & fixes + +- [PR #1369](https://github.com/nf-core/rnaseq/pull/1369) - Add umicollapse as an alternative to umi-tools + +### Software dependencies + +| Dependency | Old version | New version | +| -------------- | ----------- | ----------- | +| `UMICollapse` | | 1.1.0 | + +> **NB:** Dependency has been **updated** if both old and new version information is present. +> +> **NB:** Dependency has been **added** if just the new version information is present. +> +> **NB:** Dependency has been **removed** if new version information isn't present. + ## [[3.17.0](https://github.com/nf-core/rnaseq/releases/tag/3.17.0)] - 2024-10-23 ### Credits @@ -981,429 +1005,4 @@ Note, since the pipeline is now using Nextflow DSL2, each process will be run wi ### :warning: Major enhancements - You will need to install Nextflow `>=20.11.0-edge` to run the pipeline. If you are using Singularity, then features introduced in that release now enable the pipeline to directly download Singularity images hosted by Biocontainers as opposed to performing a conversion from Docker images (see [#496](https://github.com/nf-core/rnaseq/issues/496)). -- The previous default of aligning BAM files using STAR and quantifying using featureCounts (`--aligner star`) has been removed. The new default is to align with STAR and quantify using Salmon (`--aligner star_salmon`). - - This decision was made primarily because of the limitations of featureCounts to appropriately quantify gene expression data. Please see [Zhao et al., 2015](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0141910#pone-0141910-t001) and [Soneson et al., 2015](https://f1000research.com/articles/4-1521/v1)). -- For similar reasons, **quantification will not be performed** if using `--aligner hisat2` due to the lack of an appropriate option to calculate accurate expression estimates from HISAT2 derived genomic alignments. - - This pipeline option is still available for those who have a preference for the alignment, QC and other types of downstream analysis compatible with the output of HISAT2. No gene-level quantification results will be generated. - - In a future release we hope to add back quantitation for HISAT2 using different tools. - -### Enhancements & fixes - -- Updated pipeline template to nf-core/tools `1.12.1` -- Bumped Nextflow version `20.07.1` -> `20.11.0-edge` -- Added UCSC `bedClip` module to restrict bedGraph file coordinates to chromosome boundaries -- Check if Bioconda and conda-forge channels are set-up correctly when running with `-profile conda` -- Use `rsem-prepare-reference` and not `gffread` to create transcriptome fasta file -- [[#494](https://github.com/nf-core/rnaseq/issues/494)] - Issue running rnaseq v2.0 (DSL2) with test profile -- [[#496](https://github.com/nf-core/rnaseq/issues/496)] - Direct download of Singularity images via HTTPS -- [[#498](https://github.com/nf-core/rnaseq/issues/498)] - Significantly different versions of STAR in star_rsem (2.7.6a) and star (2.6.1d) -- [[#499](https://github.com/nf-core/rnaseq/issues/499)] - Use of salmon counts for DESeq2 -- [[#500](https://github.com/nf-core/rnaseq/issues/500), [#509](https://github.com/nf-core/rnaseq/issues/509)] - Error with AWS batch params -- [[#511](https://github.com/nf-core/rnaseq/issues/511)] - rsem/star index fails with large genome -- [[#515](https://github.com/nf-core/rnaseq/issues/515)] - Add decoy-aware indexing for salmon -- [[#516](https://github.com/nf-core/rnaseq/issues/516)] - Unexpected error [InvocationTargetException] -- [[#525](https://github.com/nf-core/rnaseq/issues/525)] - sra_ids_to_runinfo.py UnicodeEncodeError -- [[#550](https://github.com/nf-core/rnaseq/issues/525)] - handle samplesheets with replicate=0 - -### Parameters - -| Old parameter | New parameter | -| --------------------------- | -------------------------------------- | -| `--fc_extra_attributes` | `--gtf_extra_attributes` | -|  `--fc_group_features` |  `--gtf_group_features` | -|  `--fc_count_type` |  `--gtf_count_type` | -|  `--fc_group_features_type` |  `--gtf_group_features_type` | -|   |  `--singularity_pull_docker_container` | -|  `--skip_featurecounts` |   | - -> **NB:** Parameter has been **updated** if both old and new parameter information is present. -> **NB:** Parameter has been **added** if just the new parameter information is present. -> **NB:** Parameter has been **removed** if parameter information isn't present. - -### Software dependencies - -Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference. - -| Dependency | Old version | New version | -| ----------------------------------- | ----------- | ----------- | -| `bioconductor-summarizedexperiment` | 1.18.1 | 1.20.0 | -| `bioconductor-tximeta` | 1.6.3 | 1.8.0 | -| `picard` | 2.23.8 | 2.23.9 | -| `requests` | | 2.24.0 | -| `salmon` | 1.3.0 | 1.4.0 | -| `ucsc-bedclip` | | 377 | -| `umi_tools` | 1.0.1 | 1.1.1 | - -> **NB:** Dependency has been **updated** if both old and new version information is present. -> **NB:** Dependency has been **added** if just the new version information is present. -> **NB:** Dependency has been **removed** if version information isn't present. - -## [[2.0](https://github.com/nf-core/rnaseq/releases/tag/2.0)] - 2020-11-12 - -### Major enhancements - -- Pipeline has been re-implemented in [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) -- All software containers are now exclusively obtained from [Biocontainers](https://biocontainers.pro/#/registry) -- Added a separate workflow to download FastQ files via SRA, ENA or GEO ids and to auto-create the input samplesheet ([`ENA FTP`](https://ena-docs.readthedocs.io/en/latest/retrieval/file-download.html); see [`--public_data_ids`](https://nf-co.re/rnaseq/parameters#public_data_ids) parameter) -- Added and refined a Groovy `lib/` of functions that include the automatic rendering of parameters defined in the JSON schema for the help and summary log information -- Replace [edgeR](https://bioconductor.org/packages/release/bioc/html/edgeR.html) with [DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) for the generation of PCA and heatmaps (also included in the MultiQC report) -- Creation of bigWig coverage files using [BEDTools](https://github.com/arq5x/bedtools2/) and [bedGraphToBigWig](http://hgdownload.soe.ucsc.edu/admin/exe/) -- [[#70](https://github.com/nf-core/rnaseq/issues/70)] - Added new genome mapping and quantification route with [RSEM](https://github.com/deweylab/RSEM) via the `--aligner star_rsem` parameter -- [[#72](https://github.com/nf-core/rnaseq/issues/72)] - Samples skipped due to low alignment reported in the MultiQC report -- [[#73](https://github.com/nf-core/rnaseq/issues/73), [#435](https://github.com/nf-core/rnaseq/pull/435)] - UMI barcode support -- [[#91](https://github.com/nf-core/rnaseq/issues/91)] - Ability to concatenate multiple runs of the same samples via the input samplesheet -- [[#123](https://github.com/nf-core/rnaseq/issues/123)] - The primary input for the pipeline has changed from `--reads` glob to samplesheet `--input`. See [usage docs](https://nf-co.re/rnaseq/docs/usage#introduction). -- [[#197](https://github.com/nf-core/rnaseq/issues/197)] - Samples failing strand-specificity checks reported in the MultiQC report -- [[#227](https://github.com/nf-core/rnaseq/issues/227)] - Removal of ribosomal RNA via [SortMeRNA](https://github.com/biocore/sortmerna) -- [[#419](https://github.com/nf-core/rnaseq/pull/419)] - Add `--additional_fasta` parameter to provide ERCC spike-ins, transgenes such as GFP or CAR-T as additional sequences to align to - -### Other enhancements & fixes - -- Updated pipeline template to nf-core/tools `1.11` -- Optimise MultiQC configuration for faster run-time on huge sample numbers -- Add information about SILVA licensing when removing rRNA to `usage.md` -- Fixed ansi colours for pipeline summary, added summary logs of alignment results -- [[#281](https://github.com/nf-core/rnaseq/issues/281)] - Add nag to cite the pipeline in summary -- [[#302](https://github.com/nf-core/rnaseq/issues/302)] - Fixed MDS plot axis labels -- [[#338](https://github.com/nf-core/rnaseq/issues/338)] - Add option for turning on/off STAR command line option (--sjdbGTFfile) -- [[#344](https://github.com/nf-core/rnaseq/issues/344)] - Added multi-core TrimGalore support -- [[#351](https://github.com/nf-core/rnaseq/issues/351)] - Fixes missing Qualimap parameter `-p` -- [[#353](https://github.com/nf-core/rnaseq/issues/353)] - Fixes an issue where MultiQC fails to run with `--skip_biotype_qc` option -- [[#357](https://github.com/nf-core/rnaseq/issues/357)] - Fixes broken links -- [[#362](https://github.com/nf-core/rnaseq/issues/362)] - Fix error with gzipped annotation file -- [[#384](https://github.com/nf-core/rnaseq/issues/384)] - Changed SortMeRNA reference dbs path to use stable URLs (v4.2.0) -- [[#396](https://github.com/nf-core/rnaseq/issues/396)] - Deterministic mapping for STAR aligner -- [[#412](https://github.com/nf-core/rnaseq/issues/412)] - Fix Qualimap not being passed on correct strand-specificity parameter -- [[#413](https://github.com/nf-core/rnaseq/issues/413)] - Fix STAR unmapped reads not output -- [[#434](https://github.com/nf-core/rnaseq/issues/434)] - Fix typo reported for work-dir -- [[#437](https://github.com/nf-core/rnaseq/issues/434)] - FastQC uses correct number of threads now -- [[#440](https://github.com/nf-core/rnaseq/issues/440)] - Fixed issue where featureCounts process fails when setting `--fc_count_type` to gene -- [[#452](https://github.com/nf-core/rnaseq/issues/452)] - Fix `--gff` input bug -- [[#345](https://github.com/nf-core/rnaseq/pull/345)] - Fixes label name in FastQC process -- [[#391](https://github.com/nf-core/rnaseq/pull/391)] - Make publishDir mode configurable -- [[#431](https://github.com/nf-core/rnaseq/pull/431)] - Update AWS GitHub actions workflow with organization level secrets -- [[#435](https://github.com/nf-core/rnaseq/pull/435)] - Fix a bug where gzipped references were not extracted when `--additional_fasta` was not specified -- [[#435](https://github.com/nf-core/rnaseq/pull/435)] - Fix a bug where merging of RSEM output would fail if only one fastq provided as input -- [[#435](https://github.com/nf-core/rnaseq/pull/435)] - Correct RSEM output name (was saving counts but calling them TPMs; now saving both properly labelled) -- [[#436](https://github.com/nf-core/rnaseq/pull/436)] - Fix a bug where the RSEM reference could not be built -- [[#458](https://github.com/nf-core/rnaseq/pull/458)] - Fix `TMP_DIR` for process MarkDuplicates and Qualimap - -### Parameters - -#### Updated - -| Old parameter | New parameter | -| ----------------------------- | --------------------------- | -| `--reads` | `--input` | -|  `--igenomesIgnore` |  `--igenomes_ignore` | -|  `--removeRiboRNA` |  `--remove_ribo_rna` | -|  `--rRNA_database_manifest` |  `--ribo_database_manifest` | -|  `--save_nonrRNA_reads` |  `--save_non_ribo_reads` | -|  `--saveAlignedIntermediates` |  `--save_align_intermeds` | -|  `--saveReference` |  `--save_reference` | -|  `--saveTrimmed` |  `--save_trimmed` | -|  `--saveUnaligned` |  `--save_unaligned` | -|  `--skipAlignment` |  `--skip_alignment` | -|  `--skipBiotypeQC` |  `--skip_biotype_qc` | -|  `--skipDupRadar` |  `--skip_dupradar` | -|  `--skipFastQC` |  `--skip_fastqc` | -|  `--skipMultiQC` |  `--skip_multiqc` | -|  `--skipPreseq` |  `--skip_preseq` | -|  `--skipQC` |  `--skip_qc` | -|  `--skipQualimap` |  `--skip_qualimap` | -|  `--skipRseQC` |  `--skip_rseqc` | -|  `--skipTrimming` |  `--skip_trimming` | -|  `--stringTieIgnoreGTF` |  `--stringtie_ignore_gtf` | - -#### Added - -- `--additional_fasta` - FASTA file to concatenate to genome FASTA file e.g. containing spike-in sequences -- `--deseq2_vst` - Use vst transformation instead of rlog with DESeq2 -- `--enable_conda` - Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter -- `--min_mapped_reads` - Minimum percentage of uniquely mapped reads below which samples are removed from further processing -- `--multiqc_title` - MultiQC report title. Printed as page header, used for filename if not otherwise specified -- `--public_data_ids` - File containing SRA/ENA/GEO identifiers one per line in order to download their associated FastQ files -- `--publish_dir_mode` - Method used to save pipeline results to output directory -- `--rsem_index` - Path to directory or tar.gz archive for pre-built RSEM index -- `--rseqc_modules` - Specify the RSeQC modules to run -- `--save_merged_fastq` - Save FastQ files after merging re-sequenced libraries in the results directory -- `--save_umi_intermeds` - If this option is specified, intermediate FastQ and BAM files produced by UMI-tools are also saved in the results directory -- `--skip_bigwig` - Skip bigWig file creation -- `--skip_deseq2_qc` - Skip DESeq2 PCA and heatmap plotting -- `--skip_featurecounts` - Skip featureCounts -- `--skip_markduplicates` - Skip picard MarkDuplicates step -- `--skip_sra_fastq_download` - Only download metadata for public data database ids and don't download the FastQ files -- `--skip_stringtie` - Skip StringTie -- `--star_ignore_sjdbgtf` - See [#338](https://github.com/nf-core/rnaseq/issues/338) -- `--umitools_bc_pattern` - The UMI barcode pattern to use e.g. 'NNNNNN' indicates that the first 6 nucleotides of the read are from the UMI -- `--umitools_extract_method` - UMI pattern to use. Can be either 'string' (default) or 'regex' -- `--with_umi` - Enable UMI-based read deduplication - -#### Removed - -- `--awsqueue` can now be provided via nf-core/configs if using AWS -- `--awsregion` can now be provided via nf-core/configs if using AWS -- `--compressedReference` now auto-detected -- `--markdup_java_options` in favour of updating centrally on nf-core/modules -- `--project` parameter from old NGI template -- `--readPaths` is not required since these are provided from the input samplesheet -- `--sampleLevel` not required -- `--singleEnd` is now auto-detected from the input samplesheet -- `--skipEdgeR` qc not performed by DESeq2 instead -- `--star_memory` in favour of updating centrally on nf-core/modules if required -- Strandedness is now specified at the sample-level via the input samplesheet - - `--forwardStranded` - - `--reverseStranded` - - `--unStranded` - - `--pico` - -### Software dependencies - -Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference. - -| Dependency | Old version | New version | -| ----------------------------------- | ----------- | ----------- | -| `bioconductor-dupradar` | 1.14.0 | 1.18.0 | -| `bioconductor-summarizedexperiment` | 1.14.0 | 1.18.1 | -| `bioconductor-tximeta` | 1.2.2 | 1.6.3 | -| `fastqc` | 0.11.8 | 0.11.9 | -| `gffread` | 0.11.4 | 0.12.1 | -| `hisat2` | 2.1.0 | 2.2.0 | -| `multiqc` | 1.7 | 1.9 | -| `picard` | 2.21.1 | 2.23.8 | -| `qualimap` | 2.2.2c | 2.2.2d | -| `r-base` | 3.6.1 | 4.0.3 | -| `salmon` | 0.14.2 | 1.3.0 | -| `samtools` | 1.9 | 1.10 | -| `sortmerna` | 2.1b | 4.2.0 | -| `stringtie` | 2.0 | 2.1.4 | -| `subread` | 1.6.4 | 2.0.1 | -| `trim-galore` | 0.6.4 | 0.6.6 | -| `bedtools` | - | 2.29.2 | -| `bioconductor-biocparallel` | - | 1.22.0 | -| `bioconductor-complexheatmap` | - | 2.4.2 | -| `bioconductor-deseq2` | - | 1.28.0 | -| `bioconductor-tximport` | - | 1.16.0 | -| `perl` | - | 5.26.2 | -| `python` | - | 3.8.3 | -| `r-ggplot2` | - | 3.3.2 | -| `r-optparse` | - | 1.6.6 | -| `r-pheatmap` | - | 1.0.12 | -| `r-rcolorbrewer` | - | 1.1_2 | -| `rsem` | - | 1.3.3 | -| `ucsc-bedgraphtobigwig` | - | 377 | -| `umi_tools` | - | 1.0.1 | -| `bioconductor-edger` | - | - | -| `deeptools` | - | - | -| `matplotlib` | - | - | -| `r-data.table` | - | - | -| `r-gplots` | - | - | -| `r-markdown` | - | - | - -> **NB:** Dependency has been **updated** if both old and new version information is present. -> **NB:** Dependency has been **added** if just the new version information is present. -> **NB:** Dependency has been **removed** if version information isn't present. - -## [[1.4.2](https://github.com/nf-core/rnaseq/releases/tag/1.4.2)] - 2019-10-18 - -- Minor version release for keeping Git History in sync -- No changes with respect to 1.4.1 on pipeline level - -## [[1.4.1](https://github.com/nf-core/rnaseq/releases/tag/1.4.1)] - 2019-10-17 - -Major novel changes include: - -- Update `igenomes.config` with NCBI `GRCh38` and most recent UCSC genomes -- Set `autoMounts = true` by default for `singularity` profile - -### Pipeline enhancements & fixes - -- Fixed parameter warnings [#316](https://github.com/nf-core/rnaseq/issues/316) and [318](https://github.com/nf-core/rnaseq/issues/318) -- Fixed [#307](https://github.com/nf-core/rnaseq/issues/307) - Confusing Info Printout about GFF and GTF - -## [[1.4](https://github.com/nf-core/rnaseq/releases/tag/1.4)] - 2019-10-15 - -Major novel changes include: - -- Support for Salmon as an alternative method to STAR and HISAT2 -- Several improvements in `featureCounts` handling of types other than `exon`. It is possible now to handle nuclearRNAseq data. Nuclear RNA has un-spliced RNA, and the whole transcript, including the introns, needs to be counted, e.g. by specifying `--fc_count_type transcript`. -- Support for [outputting unaligned data](https://github.com/nf-core/rnaseq/issues/277) to results folders. -- Added options to skip several steps - - Skip trimming using `--skipTrimming` - - Skip BiotypeQC using `--skipBiotypeQC` - - Skip Alignment using `--skipAlignment` to only use pseudoalignment using Salmon - -### Documentation updates - -- Adjust wording of skipped samples [in pipeline output](https://github.com/nf-core/rnaseq/issues/290) -- Fixed link to guidelines [#203](https://github.com/nf-core/rnaseq/issues/203) -- Add `Citation` and `Quick Start` section to `README.md` -- Add in documentation of the `--gff` parameter - -### Reporting Updates - -- Generate MultiQC plots in the results directory [#200](https://github.com/nf-core/rnaseq/issues/200) -- Get MultiQC to save plots as [standalone files](https://github.com/nf-core/rnaseq/issues/183) -- Get MultiQC to write out the software versions in a `.csv` file [#185](https://github.com/nf-core/rnaseq/issues/185) -- Use `file` instead of `new File` to create `pipeline_report.{html,txt}` files, and properly create subfolders - -### Pipeline enhancements & fixes - -- Restore `SummarizedExperimment` object creation in the salmon_merge process avoiding increasing memory with sample size. -- Fix sample names in feature counts and dupRadar to remove suffixes added in other processes -- Removed `genebody_coverage` process [#195](https://github.com/nf-core/rnaseq/issues/195) -- Implemented Pearsons correlation instead of Euclidean distance [#146](https://github.com/nf-core/rnaseq/issues/146) -- Add `--stringTieIgnoreGTF` parameter [#206](https://github.com/nf-core/rnaseq/issues/206) -- Removed unused `stringtie` channels for `MultiQC` -- Integrate changes in `nf-core/tools v1.6` template which resolved [#90](https://github.com/nf-core/rnaseq/issues/90) -- Moved process `convertGFFtoGTF` before `makeSTARindex` [#215](https://github.com/nf-core/rnaseq/issues/215) -- Change all boolean parameters from `snake_case` to `camelCase` and vice versa for value parameters -- Add SM ReadGroup info for QualiMap compatibility[#238](https://github.com/nf-core/rnaseq/issues/238) -- Obtain edgeR + dupRadar version information [#198](https://github.com/nf-core/rnaseq/issues/198) and [#112](https://github.com/nf-core/rnaseq/issues/112) -- Add `--gencode` option for compatibility of Salmon and featureCounts biotypes with GENCODE gene annotations -- Added functionality to accept compressed reference data in the pipeline -- Check that gtf features are on chromosomes that exist in the genome fasta file [#274](https://github.com/nf-core/rnaseq/pull/274) -- Maintain all gff features upon gtf conversion (keeps `gene_biotype` or `gene_type` to make `featureCounts` happy) -- Add SortMeRNA as an optional step to allow rRNA removal [#280](https://github.com/nf-core/rnaseq/issues/280) -- Minimal adjustment of memory and CPU constraints for clusters with locked memory / CPU relation -- Cleaned up usage, `parameters.settings.json` and the `nextflow.config` - -### Dependency Updates - -- Dependency list is now sorted appropriately -- Force matplotlib=3.0.3 - -#### Updated Packages - -- Picard 2.20.0 -> 2.21.1 -- bioconductor-dupradar 1.12.1 -> 1.14.0 -- bioconductor-edger 3.24.3 -> 3.26.5 -- gffread 0.9.12 -> 0.11.4 -- trim-galore 0.6.1 -> 0.6.4 -- gffread 0.9.12 -> 0.11.4 -- rseqc 3.0.0 -> 3.0.1 -- R-Base 3.5 -> 3.6.1 - -#### Added / Removed Packages - -- Dropped CSVtk in favor of Unix's simple `cut` and `paste` utilities -- Added Salmon 0.14.2 -- Added TXIMeta 1.2.2 -- Added SummarizedExperiment 1.14.0 -- Added SortMeRNA 2.1b -- Add tximport and summarizedexperiment dependency [#171](https://github.com/nf-core/rnaseq/issues/171) -- Add Qualimap dependency [#202](https://github.com/nf-core/rnaseq/issues/202) - -## [[1.3](https://github.com/nf-core/rnaseq/releases/tag/1.3)] - 2019-03-26 - -### Pipeline Updates - -- Added configurable options to specify group attributes for featureCounts [#144](https://github.com/nf-core/rnaseq/issues/144) -- Added support for RSeqC 3.0 [#148](https://github.com/nf-core/rnaseq/issues/148) -- Added a `parameters.settings.json` file for use with the new `nf-core launch` helper tool. -- Centralized all configuration profiles using [nf-core/configs](https://github.com/nf-core/configs) -- Fixed all centralized configs [for offline usage](https://github.com/nf-core/rnaseq/issues/163) -- Hide %dup in [multiqc report](https://github.com/nf-core/rnaseq/issues/150) -- Add option for Trimming NextSeq data properly ([@jburos work](https://github.com/jburos)) - -### Bug fixes - -- Fixing HISAT2 Index Building for large reference genomes [#153](https://github.com/nf-core/rnaseq/issues/153) -- Fixing HISAT2 BAM sorting using more memory than available on the system -- Fixing MarkDuplicates memory consumption issues following [#179](https://github.com/nf-core/rnaseq/pull/179) -- Use `file` instead of `new File` to create the `pipeline_report.{html,txt}` files to avoid creating local directories when outputting to AWS S3 folders -- Fix SortMeRNA default rRNA db paths specified in assets/rrna-db-defaults.txt - -### Dependency Updates - -- RSeQC 2.6.4 -> 3.0.0 -- Picard 2.18.15 -> 2.20.0 -- r-data.table 1.11.4 -> 1.12.2 -- bioconductor-edger 3.24.1 -> 3.24.3 -- r-markdown 0.8 -> 0.9 -- csvtk 0.15.0 -> 0.17.0 -- stringtie 1.3.4 -> 1.3.6 -- subread 1.6.2 -> 1.6.4 -- gffread 0.9.9 -> 0.9.12 -- multiqc 1.6 -> 1.7 -- deeptools 3.2.0 -> 3.2.1 -- trim-galore 0.5.0 -> 0.6.1 -- qualimap 2.2.2b -- matplotlib 3.0.3 -- r-base 3.5.1 - -## [[1.2](https://github.com/nf-core/rnaseq/releases/tag/1.2)] - 2018-12-12 - -### Pipeline updates - -- Removed some outdated documentation about non-existent features -- Config refactoring and code cleaning -- Added a `--fcExtraAttributes` option to specify more than ENSEMBL gene names in `featureCounts` -- Remove legacy rseqc `strandRule` config code. [#119](https://github.com/nf-core/rnaseq/issues/119) -- Added STRINGTIE ballgown output to results folder [#125](https://github.com/nf-core/rnaseq/issues/125) -- HiSAT index build now requests `200GB` memory, enough to use the exons / splice junction option for building. - - Added documentation about the `--hisatBuildMemory` option. -- BAM indices are stored and re-used between processes [#71](https://github.com/nf-core/rnaseq/issues/71) - -### Bug Fixes - -- Fixed conda bug which caused problems with environment resolution due to changes in bioconda [#113](https://github.com/nf-core/rnaseq/issues/113) -- Fixed wrong gffread command line [#117](https://github.com/nf-core/rnaseq/issues/117) -- Added `cpus = 1` to `workflow summary process` [#130](https://github.com/nf-core/rnaseq/issues/130) - -## [[1.1](https://github.com/nf-core/rnaseq/releases/tag/1.1)] - 2018-10-05 - -### Pipeline updates - -- Wrote docs and made minor tweaks to the `--skip_qc` and associated options -- Removed the depreciated `uppmax-modules` config profile -- Updated the `hebbe` config profile to use the new `withName` syntax too -- Use new `workflow.manifest` variables in the pipeline script -- Updated minimum nextflow version to `0.32.0` - -### Bug Fixes - -- [#77](https://github.com/nf-core/rnaseq/issues/77): Added back `executor = 'local'` for the `workflow_summary_mqc` -- [#95](https://github.com/nf-core/rnaseq/issues/95): Check if task.memory is false instead of null -- [#97](https://github.com/nf-core/rnaseq/issues/97): Resolved edge-case where numeric sample IDs are parsed as numbers causing some samples to be incorrectly overwritten. - -## [[1.0](https://github.com/nf-core/rnaseq/releases/tag/1.0)] - 2018-08-20 - -This release marks the point where the pipeline was moved from [SciLifeLab/NGI-RNAseq](https://github.com/SciLifeLab/NGI-RNAseq) -over to the new [nf-core](http://nf-co.re/) community, at [nf-core/rnaseq](https://github.com/nf-core/rnaseq). - -View the previous changelog at [SciLifeLab/NGI-RNAseq/CHANGELOG.md](https://github.com/SciLifeLab/NGI-RNAseq/blob/master/CHANGELOG.md) - -In addition to porting to the new nf-core community, the pipeline has had a number of major changes in this version. -There have been 157 commits by 16 different contributors covering 70 different files in the pipeline: 7,357 additions and 8,236 deletions! - -In summary, the main changes are: - -- Rebranding and renaming throughout the pipeline to nf-core -- Updating many parts of the pipeline config and style to meet nf-core standards -- Support for GFF files in addition to GTF files - - Just use `--gff` instead of `--gtf` when specifying a file path -- New command line options to skip various quality control steps -- More safety checks when launching a pipeline - - Several new sanity checks - for example, that the specified reference genome exists -- Improved performance with memory usage (especially STAR and Picard) -- New BigWig file outputs for plotting coverage across the genome -- Refactored gene body coverage calculation, now much faster and using much less memory -- Bugfixes in the MultiQC process to avoid edge cases where it wouldn't run -- MultiQC report now automatically attached to the email sent when the pipeline completes -- New testing method, with data on GitHub - - Now run pipeline with `-profile test` instead of using bash scripts -- Rewritten continuous integration tests with Travis CI -- New explicit support for Singularity containers -- Improved MultiQC support for DupRadar and featureCounts - - Now works for all users instead of just NGI Stockholm -- New configuration for use on AWS batch -- Updated config syntax to support latest versions of Nextflow -- Built-in support for a number of new local HPC systems - - CCGA, GIS, UCT HEX, updates to UPPMAX, CFC, BINAC, Hebbe, c3se -- Slightly improved documentation (more updates to come) -- Updated software packages - -...and many more minor tweaks. - -Thanks to everyone who has worked on this release! +- The previous default of aligning BAM files using STAR and quantifying using fe diff --git a/modules.json b/modules.json index 9667d1b6b..f51a6431a 100644 --- a/modules.json +++ b/modules.json @@ -198,6 +198,7 @@ "branch": "master", "git_sha": "b13f07be4c508d6ff6312d354d09f2493243e208", "installed_by": [ + "bam_dedup_stats_samtools_umicollapse", "bam_dedup_stats_samtools_umitools", "bam_markduplicates_picard", "bam_sort_stats_samtools" @@ -266,6 +267,11 @@ "git_sha": "49f4e50534fe4b64101e62ea41d5dc43b1324358", "installed_by": ["bedgraph_bedclip_bedgraphtobigwig"] }, + "umicollapse": { + "branch": "master", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", + "installed_by": ["bam_dedup_stats_samtools_umicollapse"] + }, "umitools/dedup": { "branch": "master", "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", @@ -290,6 +296,11 @@ }, "subworkflows": { "nf-core": { + "bam_dedup_stats_samtools_umicollapse": { + "branch": "master", + "git_sha": "763d4b5c05ffda3ac1ac969dc67f7458cfb2eb1d", + "installed_by": ["subworkflows"] + }, "bam_dedup_stats_samtools_umitools": { "branch": "master", "git_sha": "763d4b5c05ffda3ac1ac969dc67f7458cfb2eb1d", @@ -314,6 +325,7 @@ "branch": "master", "git_sha": "763d4b5c05ffda3ac1ac969dc67f7458cfb2eb1d", "installed_by": [ + "bam_dedup_stats_samtools_umicollapse", "bam_dedup_stats_samtools_umitools", "bam_markduplicates_picard", "bam_sort_stats_samtools" diff --git a/modules/nf-core/umicollapse/environment.yml b/modules/nf-core/umicollapse/environment.yml new file mode 100644 index 000000000..3847980dd --- /dev/null +++ b/modules/nf-core/umicollapse/environment.yml @@ -0,0 +1,5 @@ +channels: + - conda-forge + - bioconda +dependencies: + - bioconda::umicollapse=1.0.0 diff --git a/modules/nf-core/umicollapse/main.nf b/modules/nf-core/umicollapse/main.nf new file mode 100644 index 000000000..dae290e6e --- /dev/null +++ b/modules/nf-core/umicollapse/main.nf @@ -0,0 +1,73 @@ +process UMICOLLAPSE { + tag "$meta.id" + label "process_high" + label "process_high_memory" + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/umicollapse:1.0.0--hdfd78af_1' : + 'biocontainers/umicollapse:1.0.0--hdfd78af_1' }" + + input: + tuple val(meta), path(input), path(bai) + val(mode) + + output: + tuple val(meta), path("*.bam"), emit: bam, optional: true + tuple val(meta), path("*dedup*fastq.gz"), emit: fastq, optional: true + tuple val(meta), path("*_UMICollapse.log"), emit: log + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def VERSION = '1.0.0-1' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. + // Memory allocation: We need to make sure that both heap and stack size is sufficiently large for + // umicollapse. We set the stack size to 5% of the available memory, the heap size to 90% + // which leaves 5% for stuff happening outside of java without the scheduler killing the process. + def max_heap_size_mega = (task.memory.toMega() * 0.9).intValue() + def max_stack_size_mega = 999 //most java jdks will not allow Xss > 1GB, so fixing this to the allowed max + if ( mode !in [ 'fastq', 'bam' ] ) { + error "Mode must be one of 'fastq' or 'bam'." + } + extension = mode.contains("fastq") ? "fastq.gz" : "bam" + """ + # Getting the umicollapse jar file like this because `umicollapse` is a Python wrapper script generated + # by conda that allows to set the heap size (Xmx), but not the stack size (Xss). + # `which` allows us to get the directory that contains `umicollapse`, independent of whether we + # are in a container or conda environment. + UMICOLLAPSE_JAR=\$(dirname \$(which umicollapse))/../share/umicollapse-${VERSION}/umicollapse.jar + java \\ + -Xmx${max_heap_size_mega}M \\ + -Xss${max_stack_size_mega}M \\ + -jar \$UMICOLLAPSE_JAR \\ + $mode \\ + -i ${input} \\ + -o ${prefix}.${extension} \\ + $args | tee ${prefix}_UMICollapse.log + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + umicollapse: $VERSION + END_VERSIONS + """ + + stub: + def prefix = task.ext.prefix ?: "${meta.id}" + def VERSION = '1.0.0-1' + if ( mode !in [ 'fastq', 'bam' ] ) { + error "Mode must be one of 'fastq' or 'bam'." + } + extension = mode.contains("fastq") ? "fastq.gz" : "bam" + """ + touch ${prefix}.dedup.${extension} + touch ${prefix}_UMICollapse.log + cat <<-END_VERSIONS > versions.yml + "${task.process}": + umicollapse: $VERSION + END_VERSIONS + """ +} diff --git a/modules/nf-core/umicollapse/meta.yml b/modules/nf-core/umicollapse/meta.yml new file mode 100644 index 000000000..8b366c244 --- /dev/null +++ b/modules/nf-core/umicollapse/meta.yml @@ -0,0 +1,81 @@ +name: "umicollapse" +description: Deduplicate reads based on the mapping co-ordinate and the UMI attached + to the read. +keywords: + - umicollapse + - deduplication + - genomics +tools: + - "umicollapse": + description: "UMICollapse contains tools for dealing with Unique Molecular Identifiers + (UMIs)/Random Molecular Tags (RMTs)." + homepage: "https://github.com/Daniel-Liu-c0deb0t/UMICollapse" + documentation: "https://github.com/Daniel-Liu-c0deb0t/UMICollapse" + tool_dev_url: "https://github.com/Daniel-Liu-c0deb0t/UMICollapse" + doi: "10.7717/peerj.8275" + licence: ["MIT"] + identifier: "" +input: + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input: + type: file + description: Input bam file + pattern: "*.bam" + - bai: + type: file + description: | + BAM index files corresponding to the input BAM file. Optionally can be skipped using [] when using FastQ input. + pattern: "*.{bai}" + - - mode: + type: string + description: | + Selects the mode of Umicollapse - either fastq or bam need to be provided. + pattern: "{fastq,bam}" +output: + - bam: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.bam": + type: file + description: BAM file with deduplicated UMIs. + pattern: "*.{bam}" + - fastq: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*dedup*fastq.gz": + type: file + description: FASTQ file with deduplicated UMIs. + pattern: "*dedup*fastq.gz" + - log: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*_UMICollapse.log": + type: file + description: A log file with the deduplication statistics. + pattern: "*_{UMICollapse.log}" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@CharlotteAnne" + - "@chris-cheshire" +maintainers: + - "@CharlotteAnne" + - "@chris-cheshire" + - "@apeltzer" + - "@MatthiasZepper" diff --git a/modules/nf-core/umicollapse/tests/main.nf.test b/modules/nf-core/umicollapse/tests/main.nf.test new file mode 100644 index 000000000..cc28359a6 --- /dev/null +++ b/modules/nf-core/umicollapse/tests/main.nf.test @@ -0,0 +1,249 @@ +nextflow_process { + + name "Test Process UMICOLLAPSE" + script "../main.nf" + process "UMICOLLAPSE" + + tag "modules" + tag "modules_nfcore" + tag "umicollapse" + tag "umitools/extract" + tag "samtools/index" + tag "bwa/index" + tag "bwa/mem" + + test("umicollapse single end test") { + setup{ + run("UMITOOLS_EXTRACT"){ + script "../../umitools/extract/main.nf" + config "./nextflow_SE.config" + process{ + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + [ + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) + ] + ] + """ + } + } + + run("BWA_INDEX"){ + script "../../bwa/index/main.nf" + process{ + """ + input[0] = [[ id:'sarscov2'],file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)] + """ + } + } + run("BWA_MEM"){ + script "../../bwa/mem/main.nf" + process{ + """ + input[0] = UMITOOLS_EXTRACT.out.reads + input[1] = BWA_INDEX.out.index + input[2] = [[ id:'sarscov2'],file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)] + input[3] = true + """ + } + } + run("SAMTOOLS_INDEX"){ + script "../../samtools/index/main.nf" + process{ + """ + input[0] = BWA_MEM.out.bam + """ + } + } + } + + when { + config "./nextflow_SE.config" + process { + """ + input[0] = BWA_MEM.out.bam.join(SAMTOOLS_INDEX.out.bai, by: [0]) + input[1] = 'bam' + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + process.out.bam, + process.out.versions).match() } + ) + } + + } + + test("umicollapse paired tests") { + setup{ + run("UMITOOLS_EXTRACT"){ + script "../../umitools/extract/main.nf" + config "./nextflow_PE.config" + process{ + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + [ + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) + ] + ] + """ + } + } + + run("BWA_INDEX"){ + script "../../bwa/index/main.nf" + process{ + """ + input[0] = [ + [ id:'sarscov2'], + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + """ + } + } + run("BWA_MEM"){ + script "../../bwa/mem/main.nf" + process{ + """ + input[0] = UMITOOLS_EXTRACT.out.reads + input[1] = BWA_INDEX.out.index + input[2] = [[ id:'sarscov2'],file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)] + input[3] = true + """ + } + } + run("SAMTOOLS_INDEX"){ + script "../../samtools/index/main.nf" + process{ + """ + input[0] = BWA_MEM.out.bam + """ + } + } + } + + when { + config "./nextflow_PE.config" + process { + """ + input[0] = BWA_MEM.out.bam.join(SAMTOOLS_INDEX.out.bai, by: [0]) + input[1] = 'bam' + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + process.out.bam, + process.out.versions).match() } + ) + } + + } + + test("umicollapse fastq tests") { + + when { + config "./nextflow_SE.config" + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + [] + ] + input[1] = 'fastq' + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + process.out.fastq, + process.out.versions).match() } + ) + } + } + + test("umicollapse stub tests") { + options "-stub-run" + setup{ + run("UMITOOLS_EXTRACT"){ + script "../../umitools/extract/main.nf" + config "./nextflow_PE.config" + process{ + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + [ + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) + ] + ] + """ + } + } + + run("BWA_INDEX"){ + script "../../bwa/index/main.nf" + process{ + """ + input[0] = [ + [ id:'sarscov2'], + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + """ + } + } + run("BWA_MEM"){ + script "../../bwa/mem/main.nf" + process{ + """ + input[0] = UMITOOLS_EXTRACT.out.reads + input[1] = BWA_INDEX.out.index + input[2] = [[ id:'sarscov2'],file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)] + input[3] = true + """ + } + } + run("SAMTOOLS_INDEX"){ + script "../../samtools/index/main.nf" + process{ + """ + input[0] = BWA_MEM.out.bam + """ + } + } + } + when { + config "./nextflow_PE.config" + process { + """ + input[0] = BWA_MEM.out.bam.join(SAMTOOLS_INDEX.out.bai, by: [0]) + input[1] = 'bam' + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + +} \ No newline at end of file diff --git a/modules/nf-core/umicollapse/tests/main.nf.test.snap b/modules/nf-core/umicollapse/tests/main.nf.test.snap new file mode 100644 index 000000000..3f393eac1 --- /dev/null +++ b/modules/nf-core/umicollapse/tests/main.nf.test.snap @@ -0,0 +1,124 @@ +{ + "umicollapse single end test": { + "content": [ + [ + [ + { + "id": "test", + "single_end": true + }, + "test.dedup.bam:md5,89e844724f73fae9e7100506d0be5775" + ] + ], + [ + "versions.yml:md5,c1e0275d81b1c97a9344d216f9154996" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-05-20T08:47:11.402203361" + }, + "umicollapse fastq tests": { + "content": [ + [ + [ + { + "id": "test", + "single_end": true + }, + "test.dedup.fastq.gz:md5,c9bac08c7fd8df3e0203e3eeafc73155" + ] + ], + [ + "versions.yml:md5,c1e0275d81b1c97a9344d216f9154996" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-01-30T10:45:56.053352008" + }, + "umicollapse stub tests": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.dedup.dedup.bam:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + + ], + "2": [ + [ + { + "id": "test", + "single_end": false + }, + "test.dedup_UMICollapse.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "3": [ + "versions.yml:md5,c1e0275d81b1c97a9344d216f9154996" + ], + "bam": [ + [ + { + "id": "test", + "single_end": false + }, + "test.dedup.dedup.bam:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "fastq": [ + + ], + "log": [ + [ + { + "id": "test", + "single_end": false + }, + "test.dedup_UMICollapse.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,c1e0275d81b1c97a9344d216f9154996" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-01-30T10:46:12.482697713" + }, + "umicollapse paired tests": { + "content": [ + [ + [ + { + "id": "test", + "single_end": false + }, + "test.dedup.bam:md5,3e2ae4701e3d2ca074ea878a314a3e4f" + ] + ], + [ + "versions.yml:md5,c1e0275d81b1c97a9344d216f9154996" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-05-20T08:47:30.028323337" + } +} \ No newline at end of file diff --git a/modules/nf-core/umicollapse/tests/nextflow.config b/modules/nf-core/umicollapse/tests/nextflow.config new file mode 100644 index 000000000..844edbdc6 --- /dev/null +++ b/modules/nf-core/umicollapse/tests/nextflow.config @@ -0,0 +1,8 @@ +process { + withName: UMITOOLS_EXTRACT { + ext.args = '--bc-pattern="NNNN"' + } + withName: UMICOLLAPSE { + ext.prefix = { "${meta.id}.dedup" } + } +} \ No newline at end of file diff --git a/modules/nf-core/umicollapse/tests/nextflow_PE.config b/modules/nf-core/umicollapse/tests/nextflow_PE.config new file mode 100644 index 000000000..ae4c96320 --- /dev/null +++ b/modules/nf-core/umicollapse/tests/nextflow_PE.config @@ -0,0 +1,10 @@ +process { + + withName: UMITOOLS_EXTRACT { + ext.args = '--bc-pattern="NNNN" --bc-pattern2="NNNN"' + } + + withName: UMICOLLAPSE { + ext.prefix = { "${meta.id}.dedup" } + } +} diff --git a/modules/nf-core/umicollapse/tests/nextflow_SE.config b/modules/nf-core/umicollapse/tests/nextflow_SE.config new file mode 100644 index 000000000..d4b944365 --- /dev/null +++ b/modules/nf-core/umicollapse/tests/nextflow_SE.config @@ -0,0 +1,10 @@ +process { + + withName: UMITOOLS_EXTRACT { + ext.args = '--bc-pattern="NNNN"' + } + + withName: UMICOLLAPSE { + ext.prefix = { "${meta.id}.dedup" } + } +} diff --git a/modules/nf-core/umicollapse/tests/tags.yml b/modules/nf-core/umicollapse/tests/tags.yml new file mode 100644 index 000000000..912879c4d --- /dev/null +++ b/modules/nf-core/umicollapse/tests/tags.yml @@ -0,0 +1,2 @@ +umicollapse: + - "modules/nf-core/umicollapse/**" diff --git a/nextflow.config b/nextflow.config index 468792d0f..8e010752f 100644 --- a/nextflow.config +++ b/nextflow.config @@ -30,6 +30,7 @@ params { with_umi = false skip_umi_extract = false umitools_extract_method = 'string' + umi_dedup_tool = 'umitools' umitools_grouping_method = 'directional' umitools_dedup_stats = false umitools_bc_pattern = null diff --git a/nextflow_schema.json b/nextflow_schema.json index 802209b42..d95a0934b 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -305,6 +305,13 @@ "fa_icon": "fas fa-barcode", "description": "Enable UMI-based read deduplication." }, + "umi_dedup_tool": { + "type": "string", + "default": "umitools", + "description": "Specifies the tool to use for UMI deduplication - available options are 'umitools' and 'umicollapse'.", + "fa_icon": "fas fa-barcode", + "enum": ["umitools", "umicollapse"] + }, "umitools_extract_method": { "type": "string", "default": "string", diff --git a/subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf b/subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf index aa0dd4ed7..1745f8be6 100644 --- a/subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf +++ b/subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf @@ -216,6 +216,10 @@ def validateInputParameters() { } } + if (params.with_umi && params.umi_dedup_tool == "umicollapse" && params.umitools_grouping_method !in ['directional', 'adjacency', 'cluster']) { + error("UMI grouping method '${params.umitools_grouping_method}' unsupported for umicollapse, supported methods are 'cluster', 'adjacency' and 'directional'") + } + if (params.skip_alignment) { skipAlignmentWarn() } diff --git a/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/main.nf b/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/main.nf new file mode 100644 index 000000000..54c42b986 --- /dev/null +++ b/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/main.nf @@ -0,0 +1,55 @@ +// +// umicollapse, index BAM file and run samtools stats, flagstat and idxstats +// + +include { UMICOLLAPSE } from '../../../modules/nf-core/umicollapse/main' +include { SAMTOOLS_INDEX } from '../../../modules/nf-core/samtools/index/main' +include { BAM_STATS_SAMTOOLS } from '../bam_stats_samtools/main' + +workflow BAM_DEDUP_STATS_SAMTOOLS_UMICOLLAPSE { + take: + ch_bam_bai // channel: [ val(meta), path(bam), path(bai/csi) ] + + main: + + ch_versions = Channel.empty() + + // + // umicollapse in bam mode (thus hardcode mode input channel to 'bam') + // + UMICOLLAPSE ( ch_bam_bai, channel.value( 'bam' )) + ch_versions = ch_versions.mix(UMICOLLAPSE.out.versions.first()) + + // + // Index BAM file and run samtools stats, flagstat and idxstats + // + SAMTOOLS_INDEX ( UMICOLLAPSE.out.bam ) + ch_versions = ch_versions.mix(SAMTOOLS_INDEX.out.versions.first()) + + ch_bam_bai_dedup = UMICOLLAPSE.out.bam + .join(SAMTOOLS_INDEX.out.bai, by: [0], remainder: true) + .join(SAMTOOLS_INDEX.out.csi, by: [0], remainder: true) + .map { + meta, bam, bai, csi -> + if (bai) { + [ meta, bam, bai ] + } else { + [ meta, bam, csi ] + } + } + + BAM_STATS_SAMTOOLS ( ch_bam_bai_dedup, [ [:], [] ] ) + ch_versions = ch_versions.mix(BAM_STATS_SAMTOOLS.out.versions) + + emit: + bam = UMICOLLAPSE.out.bam // channel: [ val(meta), path(bam) ] + + bai = SAMTOOLS_INDEX.out.bai // channel: [ val(meta), path(bai) ] + csi = SAMTOOLS_INDEX.out.csi // channel: [ val(meta), path(csi) ] + dedup_stats = UMICOLLAPSE.out.log // channel: [ val(meta), path(stats) ] + stats = BAM_STATS_SAMTOOLS.out.stats // channel: [ val(meta), path(stats) ] + flagstat = BAM_STATS_SAMTOOLS.out.flagstat // channel: [ val(meta), path(flagstat) ] + idxstats = BAM_STATS_SAMTOOLS.out.idxstats // channel: [ val(meta), path(idxstats) ] + + versions = ch_versions // channel: [ path(versions.yml) ] +} diff --git a/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/meta.yml b/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/meta.yml new file mode 100644 index 000000000..a24e0448d --- /dev/null +++ b/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/meta.yml @@ -0,0 +1,59 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json +name: "bam_dedup_stats_samtools_umicollapse" +description: umicollapse, index BAM file and run samtools stats, flagstat and idxstats +keywords: + - umi + - dedup + - index + - bam + - sam + - cram +components: + - umicollapse + - samtools/index + - samtools/stats + - samtools/idxstats + - samtools/flagstat + - bam_stats_samtools +input: + - ch_bam_bai: + description: | + input BAM file + Structure: [ val(meta), path(bam), path(bai) ] +output: + - bam: + description: | + Umi deduplicated BAM/SAM file + Structure: [ val(meta), path(bam) ] + - bai: + description: | + Umi deduplicated BAM/SAM samtools index + Structure: [ val(meta), path(bai) ] + - csi: + description: | + CSI samtools index + Structure: [ val(meta), path(csi) ] + - dedupstats: + description: | + File containing umicollapse deduplication stats + Structure: [ val(meta), path(stats) ] + - stats: + description: | + File containing samtools stats output + Structure: [ val(meta), path(stats) ] + - flagstat: + description: | + File containing samtools flagstat output + Structure: [ val(meta), path(flagstat) ] + - idxstats: + description: | + File containing samtools idxstats output + Structure: [ val(meta), path(idxstats) ] + - versions: + description: | + Files containing software versions + Structure: [ path(versions.yml) ] +authors: + - "@MatthiasZepper" +maintainers: + - "@MatthiasZepper" diff --git a/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/tests/main.nf.test b/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/tests/main.nf.test new file mode 100644 index 000000000..dd7f23718 --- /dev/null +++ b/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/tests/main.nf.test @@ -0,0 +1,103 @@ +// nf-core subworkflows test bam_dedup_stats_samtools_umicollapse +nextflow_workflow { + + name "Test Subworkflow BAM_DEDUP_STATS_SAMTOOLS_UMICOLLAPSE" + script "../main.nf" + workflow "BAM_DEDUP_STATS_SAMTOOLS_UMICOLLAPSE" + + tag "subworkflows" + tag "subworkflows_nfcore" + tag "subworkflows/bam_dedup_stats_samtools_umicollapse" + tag "subworkflows/bam_stats_samtools" + tag "bam_stats_samtools" + tag "bwa/index" + tag "bwa/mem" + tag "samtools" + tag "samtools/index" + tag "samtools/stats" + tag "samtools/idxstats" + tag "samtools/flagstat" + tag "umicollapse" + tag "umitools/extract" + + test("sarscov2_bam_bai") { + + setup{ + run("UMITOOLS_EXTRACT"){ + script "../../../../modules/nf-core/umitools/extract/main.nf" + config "./paired-end-umis.config" + process{ + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + [ + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) + ] + ] + """ + } + } + + run("BWA_INDEX"){ + script "../../../../modules/nf-core/bwa/index/main.nf" + process{ + """ + input[0] = [ + [ id:'sarscov2'], + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + """ + } + } + run("BWA_MEM"){ + script "../../../../modules/nf-core/bwa/mem/main.nf" + process{ + """ + input[0] = UMITOOLS_EXTRACT.out.reads + input[1] = BWA_INDEX.out.index + input[2] = [[ id:'sarscov2'],file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)] + input[3] = true + """ + } + } + run("SAMTOOLS_INDEX"){ + script "../../../../modules/nf-core/samtools/index/main.nf" + process{ + """ + input[0] = BWA_MEM.out.bam + """ + } + } + } + + when { + config "./paired-end-umis.config" + params { + outdir = "$outputDir" + } + workflow { + """ + + input[0] = BWA_MEM.out.bam.join(SAMTOOLS_INDEX.out.bai, by: [0]) + + """ + } + } + + then { + assertAll( + { assert workflow.success}, + { assert snapshot(workflow.out.bam, workflow.out.versions).match() }, + { assert workflow.out.bam.get(0).get(1) ==~ ".*.bam"}, + { assert workflow.out.bai.get(0).get(1) ==~ ".*.bai"}, + { assert workflow.out.dedup_stats.get(0).get(1) ==~ ".*_UMICollapse.log"}, + { assert snapshot(workflow.out.stats).match("test_bam_dedup_stats_samtools_umicollapse_stats") }, + { assert snapshot(workflow.out.flagstat).match("test_bam_dedup_stats_samtools_umicollapse_flagstats") }, + { assert snapshot(workflow.out.idxstats).match("test_bam_dedup_stats_samtools_umicollapse_idxstats") } + ) + } + +} +} + diff --git a/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/tests/main.nf.test.snap b/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/tests/main.nf.test.snap new file mode 100644 index 000000000..ccb965763 --- /dev/null +++ b/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/tests/main.nf.test.snap @@ -0,0 +1,81 @@ +{ + "test_bam_dedup_stats_samtools_umicollapse_stats": { + "content": [ + [ + [ + { + "id": "test", + "single_end": false + }, + "test.stats:md5,498621f92e86d55e4f7ae93170e6e733" + ] + ] + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-09-16T08:04:02.179870196" + }, + "test_bam_dedup_stats_samtools_umicollapse_flagstats": { + "content": [ + [ + [ + { + "id": "test", + "single_end": false + }, + "test.flagstat:md5,18d602435a02a4d721b78d1812622159" + ] + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-09T17:05:48.69612524" + }, + "sarscov2_bam_bai": { + "content": [ + [ + [ + { + "id": "test", + "single_end": false + }, + "test.dedup.bam:md5,3e2ae4701e3d2ca074ea878a314a3e4f" + ] + ], + [ + "versions.yml:md5,20605eb79c410c0ed179ba660d82f75b", + "versions.yml:md5,23617661d2c899996bee2b05db027e25", + "versions.yml:md5,268e43f34038d4c6146ed050630f95b4", + "versions.yml:md5,e02a62a393a833778e16542eeed0d148", + "versions.yml:md5,ef00762e264b99ac45713dc0dedf4060" + ] + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-09-16T08:04:02.126366857" + }, + "test_bam_dedup_stats_samtools_umicollapse_idxstats": { + "content": [ + [ + [ + { + "id": "test", + "single_end": false + }, + "test.idxstats:md5,85d20a901eef23ca50c323638a2eb602" + ] + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-09T17:05:48.740441747" + } +} \ No newline at end of file diff --git a/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/tests/paired-end-umis.config b/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/tests/paired-end-umis.config new file mode 100644 index 000000000..602c026f0 --- /dev/null +++ b/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/tests/paired-end-umis.config @@ -0,0 +1,10 @@ +process { + + withName: UMITOOLS_EXTRACT { + ext.args = '--bc-pattern="NNNN" --bc-pattern2="NNNN"' + } + + withName: UMICOLLAPSE { + ext.prefix = { "${meta.id}.dedup" } + } +} \ No newline at end of file diff --git a/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/tests/tags.yml b/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/tests/tags.yml new file mode 100644 index 000000000..a3ba5b726 --- /dev/null +++ b/subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/tests/tags.yml @@ -0,0 +1,2 @@ +subworkflows/bam_dedup_stats_samtools_umicollapse: + - subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse/** diff --git a/workflows/rnaseq/main.nf b/workflows/rnaseq/main.nf index 84bedaeb6..8c3b99ba3 100755 --- a/workflows/rnaseq/main.nf +++ b/workflows/rnaseq/main.nf @@ -57,6 +57,8 @@ include { FASTQ_ALIGN_HISAT2 } from '../../subworkflows/nf-core/fa include { BAM_SORT_STATS_SAMTOOLS } from '../../subworkflows/nf-core/bam_sort_stats_samtools' include { BAM_MARKDUPLICATES_PICARD } from '../../subworkflows/nf-core/bam_markduplicates_picard' include { BAM_RSEQC } from '../../subworkflows/nf-core/bam_rseqc' +include { BAM_DEDUP_STATS_SAMTOOLS_UMICOLLAPSE as BAM_DEDUP_STATS_SAMTOOLS_UMICOLLAPSE_GENOME } from '../../subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse' +include { BAM_DEDUP_STATS_SAMTOOLS_UMICOLLAPSE as BAM_DEDUP_STATS_SAMTOOLS_UMICOLLAPSE_TRANSCRIPTOME } from '../../subworkflows/nf-core/bam_dedup_stats_samtools_umicollapse' include { BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS as BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME } from '../../subworkflows/nf-core/bam_dedup_stats_samtools_umitools' include { BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS as BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_TRANSCRIPTOME } from '../../subworkflows/nf-core/bam_dedup_stats_samtools_umitools' include { BEDGRAPH_BEDCLIP_BEDGRAPHTOBIGWIG as BEDGRAPH_BEDCLIP_BEDGRAPHTOBIGWIG_FORWARD } from '../../subworkflows/nf-core/bedgraph_bedclip_bedgraphtobigwig' @@ -216,21 +218,32 @@ workflow RNASEQ { // if (params.with_umi) { // Deduplicate genome BAM file before downstream analysis - BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME ( - ch_genome_bam.join(ch_genome_bam_index, by: [0]), - params.umitools_dedup_stats - ) - ch_genome_bam = BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.bam - ch_genome_bam_index = BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.bai - ch_multiqc_files = ch_multiqc_files.mix(BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.deduplog.collect{it[1]}) - ch_multiqc_files = ch_multiqc_files.mix(BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.stats.collect{it[1]}) - ch_multiqc_files = ch_multiqc_files.mix(BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.flagstat.collect{it[1]}) - ch_multiqc_files = ch_multiqc_files.mix(BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.idxstats.collect{it[1]}) + if (params.umi_dedup_tool == "umicollapse") { + BAM_DEDUP_STATS_SAMTOOLS_UMICOLLAPSE_GENOME ( + ch_genome_bam.join(ch_genome_bam_index, by: [0]) + ) + umi_dedup_genome = BAM_DEDUP_STATS_SAMTOOLS_UMICOLLAPSE_GENOME + ch_multiqc_files = ch_multiqc_files.mix(umi_dedup_genome.out.dedup_stats.collect{it[1]}.ifEmpty([])) + } else if (params.umi_dedup_tool == "umitools") { + BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME ( + ch_genome_bam.join(ch_genome_bam_index, by: [0]), + params.umitools_dedup_stats + ) + umi_dedup_genome = BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME + ch_multiqc_files = ch_multiqc_files.mix(umi_dedup_genome.out.deduplog.collect{it[1]}) + } else { + error("Unknown umi_dedup_tool '${params.umi_dedup_tool}'") + } + ch_genome_bam = umi_dedup_genome.out.bam + ch_genome_bam_index = umi_dedup_genome.out.bai + ch_multiqc_files = ch_multiqc_files.mix(umi_dedup_genome.out.stats.collect{it[1]}) + ch_multiqc_files = ch_multiqc_files.mix(umi_dedup_genome.out.flagstat.collect{it[1]}) + ch_multiqc_files = ch_multiqc_files.mix(umi_dedup_genome.out.idxstats.collect{it[1]}) if (params.bam_csi_index) { - ch_genome_bam_index = BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.csi + ch_genome_bam_index = umi_dedup_genome.out.csi } - ch_versions = ch_versions.mix(BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.versions) + ch_versions = ch_versions.mix(umi_dedup_genome.out.versions) // Co-ordinate sort, index and run stats on transcriptome BAM BAM_SORT_STATS_SAMTOOLS ( @@ -241,14 +254,24 @@ workflow RNASEQ { ch_transcriptome_sorted_bai = BAM_SORT_STATS_SAMTOOLS.out.bai // Deduplicate transcriptome BAM file before read counting with Salmon - BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_TRANSCRIPTOME ( - ch_transcriptome_sorted_bam.join(ch_transcriptome_sorted_bai, by: [0]), - params.umitools_dedup_stats - ) + if (params.umi_dedup_tool == "umicollapse") { + BAM_DEDUP_STATS_SAMTOOLS_UMICOLLAPSE_TRANSCRIPTOME ( + ch_transcriptome_sorted_bam.join(ch_transcriptome_sorted_bai, by: [0]) + ) + umi_dedup_transcriptome = BAM_DEDUP_STATS_SAMTOOLS_UMICOLLAPSE_TRANSCRIPTOME + } else if (params.umi_dedup_tool == "umitools") { + BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_TRANSCRIPTOME ( + ch_transcriptome_sorted_bam.join(ch_transcriptome_sorted_bai, by: [0]), + params.umitools_dedup_stats + ) + umi_dedup_transcriptome = BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_TRANSCRIPTOME + } else { + error("Unknown umi_dedup_tool '${params.umi_dedup_tool}'") + } // Name sort BAM before passing to Salmon SAMTOOLS_SORT ( - BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_TRANSCRIPTOME.out.bam, + umi_dedup_transcriptome.out.bam, ch_fasta.map { [ [:], it ] } ) @@ -263,16 +286,16 @@ workflow RNASEQ { paired_end: !meta.single_end return [ meta, bam ] } - .set { ch_umitools_dedup_bam } + .set { ch_dedup_bam } // Fix paired-end reads in name sorted BAM file // See: https://github.com/nf-core/rnaseq/issues/828 UMITOOLS_PREPAREFORSALMON ( - ch_umitools_dedup_bam.paired_end.map { meta, bam -> [ meta, bam, [] ] } + ch_dedup_bam.paired_end.map { meta, bam -> [ meta, bam, [] ] } ) ch_versions = ch_versions.mix(UMITOOLS_PREPAREFORSALMON.out.versions.first()) - ch_umitools_dedup_bam + ch_dedup_bam .single_end .mix(UMITOOLS_PREPAREFORSALMON.out.bam) .set { ch_transcriptome_bam } @@ -371,20 +394,31 @@ workflow RNASEQ { // SUBWORKFLOW: Remove duplicate reads from BAM file based on UMIs // if (params.with_umi) { - BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME ( - ch_genome_bam.join(ch_genome_bam_index, by: [0]), - params.umitools_dedup_stats - ) - ch_genome_bam = BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.bam - ch_genome_bam_index = BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.bai - ch_multiqc_files = ch_multiqc_files.mix(BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.deduplog.collect{it[1]}) - ch_multiqc_files = ch_multiqc_files.mix(BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.stats.collect{it[1]}) - ch_multiqc_files = ch_multiqc_files.mix(BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.flagstat.collect{it[1]}) - ch_multiqc_files = ch_multiqc_files.mix(BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.idxstats.collect{it[1]}) + if (params.umi_dedup_tool == "umicollapse") { + BAM_DEDUP_STATS_SAMTOOLS_UMICOLLAPSE_GENOME ( + ch_genome_bam.join(ch_genome_bam_index, by: [0]), + ) + umi_dedup_genome = BAM_DEDUP_STATS_SAMTOOLS_UMICOLLAPSE_GENOME + ch_multiqc_files = ch_multiqc_files.mix(umi_dedup_genome.out.dedup_stats.collect{it[1]}.ifEmpty([])) + } else if (params.umi_dedup_tool == "umitools") { + BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME ( + ch_genome_bam.join(ch_genome_bam_index, by: [0]), + params.umitools_dedup_stats + ) + umi_dedup_genome = BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME + ch_multiqc_files = ch_multiqc_files.mix(umi_dedup_genome.out.deduplog.collect{it[1]}) + } else { + error("Unknown umi_dedup_tool '${params.umi_dedup_tool}'") + } + ch_genome_bam = umi_dedup_genome.out.bam + ch_genome_bam_index = umi_dedup_genome.out.bai + ch_multiqc_files = ch_multiqc_files.mix(umi_dedup_genome.out.stats.collect{it[1]}) + ch_multiqc_files = ch_multiqc_files.mix(umi_dedup_genome.out.flagstat.collect{it[1]}) + ch_multiqc_files = ch_multiqc_files.mix(umi_dedup_genome.out.idxstats.collect{it[1]}) if (params.bam_csi_index) { - ch_genome_bam_index = BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.csi + ch_genome_bam_index = umi_dedup_genome.out.csi } - ch_versions = ch_versions.mix(BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME.out.versions) + ch_versions = ch_versions.mix(umi_dedup_genome.out.versions) } } diff --git a/workflows/rnaseq/nextflow.config b/workflows/rnaseq/nextflow.config index 9cbf0cd30..48245515b 100644 --- a/workflows/rnaseq/nextflow.config +++ b/workflows/rnaseq/nextflow.config @@ -20,6 +20,31 @@ includeConfig "../../subworkflows/nf-core/fastq_fastqc_umitools_fastp/nextflow.c includeConfig "../../subworkflows/nf-core/fastq_fastqc_umitools_trimgalore/nextflow.config" includeConfig "../../subworkflows/nf-core/fastq_subsample_fq_salmon/nextflow.config" +def umi_dedup_args() { + if (params.umi_dedup_tool == "umicollapse") { + def algo = params.umitools_grouping_method + if (params.umitools_grouping_method == 'directional') { + algo = 'dir' + } else if (params.umitools_grouping_method == 'adjacency') { + algo = 'adj' + } else if (params.umitools_grouping_method == 'cluster') { + algo = 'cc' + } + return { [ + '--two-pass', + meta.single_end ? '' : '--paired --remove-unpaired --remove-chimeric', + params.umitools_grouping_method ? "--algo '${algo}'" : '', + params.umitools_umi_separator ? "--umi-sep '${params.umitools_umi_separator}'" : '' + ].join(' ').trim() } + } else { + return { [ + meta.single_end ? '' : '--unpaired-reads=discard --chimeric-pairs=discard', + params.umitools_grouping_method ? "--method='${params.umitools_grouping_method}'" : '', + params.umitools_umi_separator ? "--umi-separator='${params.umitools_umi_separator}'" : '' + ].join(' ').trim() } + } +} + // // STAR Salmon alignment options // @@ -133,12 +158,8 @@ if (!params.skip_alignment && params.aligner == 'star_salmon') { ] } - withName: '.*:BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_TRANSCRIPTOME:UMITOOLS_DEDUP' { - ext.args = { [ - meta.single_end ? '' : '--unpaired-reads=discard --chimeric-pairs=discard', - params.umitools_grouping_method ? "--method='${params.umitools_grouping_method}'" : '', - params.umitools_umi_separator ? "--umi-separator='${params.umitools_umi_separator}'" : '' - ].join(' ').trim() } + withName: '.*:BAM_DEDUP_STATS_SAMTOOLS_UMI(COLLAPSE|TOOLS)_TRANSCRIPTOME:UMI(COLLAPSE|TOOLS_DEDUP)' { + ext.args = umi_dedup_args() ext.prefix = { "${meta.id}.umi_dedup.transcriptome.sorted" } publishDir = [ [ @@ -160,7 +181,7 @@ if (!params.skip_alignment && params.aligner == 'star_salmon') { ] } - withName: '.*:BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_TRANSCRIPTOME:SAMTOOLS_INDEX' { + withName: '.*:BAM_DEDUP_STATS_SAMTOOLS_UMI(COLLAPSE|TOOLS)_TRANSCRIPTOME:SAMTOOLS_INDEX' { publishDir = [ path: { params.save_align_intermeds || params.save_umi_intermeds ? "${params.outdir}/${params.aligner}" : params.outdir }, mode: params.publish_dir_mode, @@ -169,7 +190,7 @@ if (!params.skip_alignment && params.aligner == 'star_salmon') { ] } - withName: '.*:BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_TRANSCRIPTOME:BAM_STATS_SAMTOOLS:.*' { + withName: '.*:BAM_DEDUP_STATS_SAMTOOLS_UMI(COLLAPSE|TOOLS)_TRANSCRIPTOME:BAM_STATS_SAMTOOLS:.*' { ext.prefix = { "${meta.id}.umi_dedup.transcriptome.sorted.bam" } publishDir = [ path: { "${params.outdir}/${params.aligner}/samtools_stats" }, @@ -227,12 +248,8 @@ if (!params.skip_alignment) { if (params.with_umi && ['star_salmon','hisat2'].contains(params.aligner)) { process { - withName: '.*:BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME:UMITOOLS_DEDUP' { - ext.args = { [ - meta.single_end ? '' : '--unpaired-reads=discard --chimeric-pairs=discard', - params.umitools_grouping_method ? "--method='${params.umitools_grouping_method}'" : '', - params.umitools_umi_separator ? "--umi-separator='${params.umitools_umi_separator}'" : '' - ].join(' ').trim() } + withName: '.*:BAM_DEDUP_STATS_SAMTOOLS_UMI(COLLAPSE|TOOLS)_GENOME:UMI(COLLAPSE|TOOLS_DEDUP)' { + ext.args = umi_dedup_args() ext.prefix = { "${meta.id}.umi_dedup.sorted" } publishDir = [ [ @@ -254,7 +271,7 @@ if (!params.skip_alignment) { ] } - withName: '.*:BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME:SAMTOOLS_INDEX' { + withName: '.*:BAM_DEDUP_STATS_SAMTOOLS_UMI(COLLAPSE|TOOLS)_GENOME:SAMTOOLS_INDEX' { ext.args = { params.bam_csi_index ? '-c' : '' } ext.prefix = { "${meta.id}.umi_dedup.sorted" } publishDir = [ @@ -265,7 +282,7 @@ if (!params.skip_alignment) { ] } - withName: '.*:BAM_DEDUP_STATS_SAMTOOLS_UMITOOLS_GENOME:BAM_STATS_SAMTOOLS:.*' { + withName: '.*:BAM_DEDUP_STATS_SAMTOOLS_UMI(COLLAPSE|TOOLS)_GENOME:BAM_STATS_SAMTOOLS:.*' { ext.prefix = { "${meta.id}.umi_dedup.sorted.bam" } publishDir = [ path: { "${params.outdir}/${params.aligner}/samtools_stats" },