nf-core/eager: Changelog

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

[2.4.0] - 20201-09-14

`Added`

#317 Added bcftools stats for general genotyping statistics of VCF files
#651 - Adds removal of adapters specified in an AdapterRemoval adapter list file
#642 and #431 adds post-adapter removal barcode/fastq trimming
#769 - Adds lc_extrap mode to preseq (suggested by @roberta-davidson)

`Fixed`

Fixed some missing or incorrectly reported software versions
#771 Remove legacy code
Improved output documentation for MultiQC general stats table (thanks to @KathrinNaegele and @esalmela)
Improved output documentation for BowTie2 (thanks to @isinaltinkaya)
#612 Updated BAM trimming defaults to 0 to ensure no unwanted trimming when mixing half-UDG with no-UDG (thanks to @scarlhoff)
#722 Updated BWA mapping mapping parameters to latest recommendations - primarily alnn back to 0.01 and alno to 2 as per Oliva et al. 2021 (10.1093/bib/bbab076)
Updated workflow diagrams to reflect latest functionality
#787 Adds memory specification flags for the GATK UnifiedGenotyper and HaplotyperCaller steps (thanks to @nylander)
Fixed issue where MultiVCFAnalyzer would not pick up newly generated VCF files, when specifying additional VCF files.
#790 Fixed kraken2 report file-name collision when sample names have . in them
#792 Fixed java error messages for AdapterRemovalFixPrefix being hidden in output
#794 Aligned default test profile with nf-core standards (test_tsv is now test)

`Dependencies`

Bumped python: 3.7.3 -> 3.9.4
Bumped markdown: 3.2.2 -> 3.3.4
Bumped pymdown-extensions: 7.1 -> 8.2
Bumped pyments: 2.6.1 -> 2.9.0
Bumped adapterremoval: 2.3.1 -> 2.3.2
Bumped picard: 2.22.9 -> 2.26.0
Bumped samtools 1.9 -> 1.12
Bumped angsd: 0.933 -> 0.935
Bumped gatk4: 4.1.7.0 -> 4.2.0.0
Bumped multiqc: 1.10.1 -> 1.11
Bumped bedtools 2.29.2 -> 2.30.0
Bumped libiconv: 1.15 -> 1.16
Bumped preseq: 2.0.3 -> 3.1.2
Bumped bamutil: 1.0.14 -> 1.0.15
Bumped pysam: 0.15.4 -> 0.16.0
Bumped kraken2: 2.1.1 -> 2.1.2
Bumped pandas: 1.0.4 -> 1.2.4
Bumped freebayes: 1.3.2 -> 1.3.5
Bumped biopython: 1.76 -> 1.79
Bumped xopen: 0.9.0 -> 1.1.0
Bumped bowtie2: 2.4.2 -> 2.4.4
Bumped mapdamage2: 2.2.0 -> 2.2.1
Bumped bbmap: 38.87 -> 38.92
Added bcftools: 1.12

`Deprecated`

[2.3.5] - 2021-06-03

`Added`

#722 - Adds bwa -o flag for more flexibility in bwa parameters
#736 - Add printing of multiqc run report location on successful completion
New logo that is more visible when a user is using darkmode on GitHub or nf-core website!

`Fixed`

#723 - Fixes empty fields in TSV resulting in uninformative error
Updated template to nf-core/tools 1.14
#688 - Clarified the pipeline is not just for humans and microbes, but also plants and animals, and also for modern DNA
#751 - Added missing label to mtnucratio
General code cleanup and standardisation of parameters with no default setting
#750 - Fixed piped commands requesting the same number of CPUs at each command step
#757 - Removed confusing 'Data Type' variable from MultiQC workflow summary (not consistent with TSV input)
#759 - Fixed malformed software scraping regex that resulted in N/A in MultiQC report
#761 - Fixed issues related to instability of samtools filtering related CI tests

`Dependencies`

`Deprecated`

[2.3.4] - 2021-05-05

`Added`

#729 - Added Bowtie2 flag --maxins for PE mapping modern DNA mapping contexts

`Fixed`

Corrected explanation of the "--min_adap_overlap" parameter for AdapterRemoval in the docs
#725 - bwa_index doc update
Re-adds gzip piping to AdapterRemovalFixPrefix to speed up process after reports of being very slow
Updated DamageProfiler citation from bioRxiv to publication

`Dependencies`

Removed pinning of tbb (upstream bug in bioconda fixed)
Bumped pigz to 2.6 to fix rare stall bug when compressing data after AdapterRemoval
Bumped Bowtie2 to 2.4.2 to fix issues with tbb version

`Deprecated`

[2.3.3] - 2021-04-08

`Added`

#349 - Added option enabling platypus formatted output of pmdtools misincorporation frequencies.

`Fixed`

#719 - Fix filename for bam output of mapdamage_rescaling
#707 - Fix typo in UnifiedGenotyper IndelRealigner command
Fixed some Java tools not following process memory specifications
Updated template to nf-core/tools 1.13.2
#711 - Fix conditional execution preventing multivcfanalyze to run
#714 - Fixes bug in nuc contamination by upgrading to latest MultiQC v1.10.1 bugfix release

`Dependencies`

`Deprecated`

[2.3.2] - 2021-03-16

`Added`

#687 - Adds Kraken2 unique kmer counting report
#676 - Refactor help message / summary message formatting to automatic versions using nf-core library
#682 - Add AdapterRemoval --qualitymax flag to allow FASTQ Phred score range max more than 41

`Fixed`

#666 - Fixed input file staging for print_nuclear_contamination
#631 - Update minimum Nextflow version to 20.07.1, due to unfortunate bug in Nextflow 20.04.1 causing eager to crash if patch pulled
Made MultiQC crash behaviour stricter when dealing with large datasets, as reported by @ashildv
#652 - Added note to documentation that when using --skip_collapse this will use paired-end alignment mode with mappers when using PE data
#626 - Add additional checks to ensure pipeline will give useful error if cells of a TSV column are empty
Added note to documentation that when using --skip_collapse this will use paired-end alignment mode with mappers when using PE data
#673 - Fix Kraken database loading when loading from directory instead of compressed file
#688 - Allow pipeline to complete, even if Qualimap crashes due to an empty or corrupt BAM file for one sample/library
#683 - Sets --igenomes_ignore to true by default, as rarely used by users currently and makes resolving configs less complex
Added exit code 140 to re-tryable exit code list to account for certain scheduler wall-time limit fails
#672 - Removed java parameter from picard tools which could cause memory issues
#679 - Refactor within-process bash conditions to groovy/nextflow, due to incompatibility with some servers environments
#690 - Fixed ANGSD output mode for beagle by setting -doMajorMinor 1 as default in that case
#693 - Fixed broken TSV input validation for the Colour Chemistry column
#695 - Fixed incorrect -profile order in tutorials (originally written reversed due to nextflow bug)
#653 - Fixed file collision errors with sexdeterrmine for two same-named libraries with different strandedness

`Dependencies`

Bumped MultiQC to 1.10 for improved functionality
Bumped HOPS to 0.35 for MultiQC 1.10 compatibility

`Deprecated`

[2.3.1] - 2021-01-14

`Added`

`Fixed`

#654 - Fixed some values in JSON schema (used in launch GUI) not passing validation checks during run
#655 - Updated read groups for all mappers to allow proper GATK validation
Fixed issue with Docker container not being pullable by Nextflow due to version-number inconsistencies

`Dependencies`

`Deprecated`

[2.3.0] - 2021-01-11 - "Aalen"

`Added`

#640 - Added a pre-metagenomic screening filtering of low-sequence complexity reads with bbduk
#583 - Added mapDamage2 rescaling of BAM files to remove damage
Updated usage (merging files) and workflow images reflecting new functionality.

`Fixed`

Removed leftover old DockerHub push CI commands.
#627 - Added de Barros Damgaard citation to README
#630 - Better handling of Qualimap memory requirements and error strategy.
Fixed some incomplete schema options to ensure users supply valid input values
#638 Fixed inverted circularfilter filtering (previously filtering would happen by default, not when requested by user as originally recorded in documentation)
DeDup: Fixed Null Pointer Bug in DeDup by updating to 0.12.8 version
#650 - Increased memory given to FastQC for larger files by making it multithreaded

`Dependencies`

Update: DeDup v0.12.7 to v0.12.8

`Deprecated`

[2.2.2] - 2020-12-09

`Added`

Added large scale 'stress-test' profile for AWS (using de Barros Damgaard et al. 2018's 137 ancient human genomes).
- This will now be run automatically for every release. All processed data will be available on the nf-core website: https://nf-co.re/eager/results
  - You can run this yourself using -profile test_full

`Fixed`

Fixed AWS full test profile.
#587 - Re-implemented AdapterRemovalFixPrefix for DeDup compatibility of including singletons
#602 - Added the newly available GATK 3.5 conda package.
#610 - Create bwa_index channel when specifying circularmapper as mapper
Updated template to nf-core/tools 1.12.1
General documentation improvements

`Deprecated`

Flag --gatk_ug_jar has now been removed as GATK 3.5 is now avaliable within the nf-core/eager software environment.

[2.2.1] - 2020-10-20

`Fixed`

#591 - Fixed offset underlines in lane merging diagram in docs
#592 - Fixed issue where supplying Bowtie2 index reported missing bwamem_index error
#590 - Removed redundant dockstore.yml from root
#596 - Add workaround for issue regarding gzipped FASTAs and pre-built indices
#589 - Updated template to nf-core/tools 1.11
#582 - Clarify memory limit issue on FAQ

[2.2.0] - Ulm - 2020-10-20

`Added`

Major Automated cloud tests with large-scale data on AWS
Major Re-wrote input logic to accept a TSV 'map' file in addition to direct paths to FASTQ files
Major Added JSON Schema, enabling web GUI for configuration of pipeline available here
Major Lane and library merging implemented
- When using TSV input, one library with the multiple lanes will be merged together, before mapping
- Strip FASTQ will also produce a lane merged 'raw' but 'stripped' FASTQ file
- When using TSV input, one sample with multiple (same treatment) libraries will be merged together
- Important: direct FASTQ paths will not have this functionality. TSV is required.
#40 - Added the pileupCaller genotyper from sequenceTools
Added validation check and clearer error message when --fasta_index is provided and filepath does not end in .fai.
Improved error messages
Added ability for automated emails using mailutils to also send MultiQC reports
General documentation additions, cleaning, and updated figures with CC-BY license
Added large 'full size' dataset test-profiles for ancient fish and human contexts human
#257 - Added the bowtie2 aligner as option for mapping, following Poullet and Orlando 2020 doi: 10.3389/fevo.2020.00105
#451 - Adds ANGSD genotype likelihood calculations as an alternative to typical 'genotypers'
#566 - Add tutorials on how to set up nf-core/eager for different contexts
Nuclear contamination results are now shown in the MultiQC report
Tutorial on how to use profiles for reproducible science (i.e. parameter sharing between different groups)
#522 - Added post-mapping length filter to assist in more realistic endogenous DNA calculations
#512 - Added flexible trimming of BAMs by library type. 'half' and 'none' UDG libraries can now be trimmed differentially within a single eager run.
Added a .dockstore.yml config file for automatic workflow registration with dockstore.org
Updated template to nf-core/tools 1.10.2
#544 - Add script to perform bam filtering on fragment length
#456 - Bumps the base (default) runtime of all processes to 4 hours, and set shorter time limits for test profiles (1 hour)
#552 - Adds optional creation of MALT SAM files alongside RMA6 files
Added eigenstrat snp coverage statistics to MultiQC report. Process results are published in genotyping/*_eigenstrat_coverage.txt.

`Fixed`

#368 - Fixed the profile test to contain a parameter for --paired_end
Mini bugfix for typo in line 1260+1261
#374 - Fixed output documentation rendering not containing images
#379 - Fixed insufficient memory requirements for FASTQC edge case
#390 - Renamed clipped/merged output directory to be more descriptive
#398 - Stopped incompatible FASTA indexes being accepted
#400 - Set correct recommended bwa mapping parameters from Schubert et al. 2012
#410 - Fixed nf-core/configs not being loaded properly
#473 - Fixed bug in sexdet_process on AWS
#444 - Provide option for preserving realigned bam + index
Fixed deduplication output logic. Will now pass along only the post-rmdup bams if duplicate removal is not skipped, instead of both the post-rmdup and pre-rmdup bams
#497 - Simplifies number of parameters required to run bam filtering
#501 - Adds additional validation checks for MALT/MaltExtract database input files
#508 - Made Markduplicates default dedupper due to narrower context specificity of dedup
#516 - Made bedtools not report out of memory exit code when warning of inconsistent FASTA/Bed entry names
#504 - Removed uninformative sexdeterrmine-snps plot from MultiQC report.
Nuclear contamination is now reported with the correct library names.
#531 - Renamed 'FASTQ stripping' to 'host removal'
Merged all tutorials and FAQs into usage.md for display on nf-co.re
Corrected header of nuclear contamination table (nuclear_contamination.txt).
Fixed a bug with nSNPs definition in print_x_contamination.py. Number of SNPs now correctly reported
print_x_contamination.py now correctly converts all NA values to "N/A"
Increased amount of memory MultiQC by default uses, to account for very large nf-core/eager runs (e.g. >1000 samples)

`Dependencies`

Added sequenceTools (1.4.0.6) that adds the ability to do genotyping with the 'pileupCaller'
Latest version of DeDup (0.12.6) which now reports mapped reads after deduplication
#560 Latest version of Dedup (0.12.7), which now correctly reports deduplication statistics based on calculations of mapped reads only (prior denominator was total reads of BAM file)
Latest version of ANGSD (0.933) which doesn't seg fault when running contamination on BAMs with insufficient reads
Latest version of MultiQC (1.9) with support for lots of extra tools in the pipeline (MALT, SexDetERRmine, DamageProfiler, MultiVCFAnalyzer)
Latest versions of Pygments (7.1), Pymdown-Extensions (2.6.1) and Markdown (3.2.2) for documentation output
Latest version of Picard (2.22.9)
Latest version of GATK4 (4.1.7.0)
Latest version of sequenceTools (1.4.0.6)
Latest version of fastP (0.20.1)
Latest version of Kraken2 (2.0.9beta)
Latest version of FreeBayes (1.3.2)
Latest version of xopen (0.9.0)
Added Bowtie 2 (2.4.1)
Latest version of Sex.DetERRmine (1.1.2)
Latest version of endorS.py (0.4)

[2.1.0] - 2020-03-05 - "Ravensburg"

`Added`

Added Support for automated tests using GitHub Actions, replacing travis
#40, #231 - Added genotyping capability through GATK UnifiedGenotyper (v3.5), GATK HaplotypeCaller (v4.1) and FreeBayes
Added MultiVCFAnalyzer module
#240 - Added human sex determination module
#226 - Added --preserve5p function for AdapterRemoval
#212 - Added ability to use only merged reads downstream from AdapterRemoval
#265 - Adjusted full markdown linting in Travis CI
#247 - Added nuclear contamination with angsd
#258 - Added ability to report bedtools stats to features (e.g. depth/breadth of annotated genes)
#249 - Added metagenomic classification of unmapped reads with MALT and aDNA authentication with MaltExtract
#302 - Added mitochondrial to nuclear ratio calculation
#302 - Added VCF2Genome for consensus sequence generation
Fancy new logo from ZandraFagernas
#286 - Adds pipeline-specific profiles (loaded from nf-core configs)
#310 - Generalises base.config
#326 - Add Biopython and xopen dependencies
#336 - Change default Y-axis maximum value of DamageProfiler to 30% to match popular (but slower) mapDamage, and allow user to set their own value.
#352 - Add social preview image
#355 - Add Kraken2 metagenomics classifier
#90 - Added endogenous DNA calculator (original repository: https://github.com/aidaanva/endorS.py/)

`Fixed`

#227 - Large re-write of input/output process logic to allow maximum flexibility. Originally to address #227, but further expanded
Fixed Travis-Ci.org to Travis-Ci.com migration issues
#266 - Added sanity checks for input filetypes (i.e. only BAM files can be supplied if --bam)
#237 - Fixed and Updated script scrape_software_versions
#322 - Move extract map reads fastq compression to pigz
#327 - Speed up strip_input_fastq process and make it more robust
#342 - Updated to match nf-core tools 1.8 linting guidelines
#339 - Converted unnecessary zcat + gzip to just cat for a performance boost
#344 - Fixed pipeline still trying to run when using old nextflow version

`Dependencies`

adapterremoval=2.2.2 upgraded to 2.3.1
adapterremovalfixprefix=0.0.4 upgraded to 0.0.5
damageprofiler=0.4.3 upgraded to 0.4.9
angsd=0.923 upgraded to 0.931
gatk4=4.1.2.0 upgraded to 4.1.4.1
mtnucratio=0.5 upgraded to 0.6
conda-forge::markdown=3.1.1 upgraded to 3.2.1
bioconda::fastqc=0.11.8 upgraded to 0.11.9
bioconda::picard=2.21.4 upgraded to 2.22.0
bioconda::bedtools=2.29.0 upgraded to 2.29.2
pysam=0.15.3 upgraded to 0.15.4
conda-forge::pandas=1.0.0 upgraded to 1.0.1
bioconda::freebayes=1.3.1 upgraded to 1.3.2
conda-forge::biopython=1.75 upgraded to 1.76

[2.0.7] - 2019-06-10

`Added`

#189 - Outputting unmapped reads in a fastq files with the --strip_input_fastq flag
#186 - Make FastQC skipping possible
Merged in nf-core/tools release V1.6 template changes
A lot more automated tests using Travis CI
Don't ignore DamageProfiler errors any more
#220 - Added post-mapping filtering statistics module and corresponding MultiQC statistics #217

`Fixed`

#152 - DamageProfiler errors won't crash entire pipeline any more
#176 - Increase runtime for DamageProfiler on large reference genomes
#172 - DamageProfiler errors won't crash entire pipeline any more
#174 - Publish DeDup files properly
#196 - Fix reference issues
#196 - Fix issues with PE data being mapped incompletely
#200 - Fix minor issue with some typos
#210 - Fix PMDTools encoding issue from samtools calmd generated files by running through sa]mtools view first
#221 - Fix BWA Index not being reused by multiple samples

`Dependencies`

Added DeDup v0.12.5 (json support)
Added mtnucratio v0.5 (json support)
Updated Picard 2.18.27 -> 2.20.2
Updated GATK 4.1.0.0 -> 4.1.2.0
Updated damageprofiler 0.4.4 -> 0.4.5
Updated r-rmarkdown 1.11 -> 1.12
Updated fastp 0.19.7 -> 0.20.0
Updated qualimap 2.2.2b -> 2.2.2c

[2.0.6] - 2019-03-05

`Added`

#152 - Clarified --complexity_filter flag to be specifically for poly G trimming.
#155 - Added Dedup log to output folders
#159 - Added Possibility to skip AdapterRemoval, skip merging, skip trimming fixing #64,#137 - thanks to @maxibor, @jfy133

`Fixed`

#151 - Fixed post-deduplication step errors
#147 - Fix Samtools Index for large references
#145 - Added Picard Memory Handling fix

`Dependencies`

Picard Tools 2.18.23 -> 2.18.27
GATK 4.0.12.0 -> 4.1.0.0
FastP 0.19.6 -> 0.19.7

[2.0.5] - 2019-01-28

`Added`

#127 - Added a second test case for testing the pipeline properly
#129 - Support BAM files as input format
#131 - Support different reference genome file extensions

`Fixed`

#128 - Fixed reference genome handling errors

`Dependencies`

Picard Tools 2.18.21 -> 2.18.23
R-Markdown 1.10 -> 1.11
FastP 0.19.5 -> 0.19.6

[2.0.4] - 2019-01-09

`Added`

#111 - Allow Zipped FastA reference input
#113 - All files are now staged via channels, which is considered best practice by Nextflow
#114 - Add proper runtime defaults for multiple processes
#118 - Add centralized configs handling
#115 - Add DamageProfiler MultiQC support
#122 - Add pulling from Dockerhub again

`Fixed`

#110 - Fix for MultiQC Missing Second FastQC report
#112 - Remove redundant UDG options

[2.0.3] - 2018-12-12

`Added`

#80 - BWA Index file handling
#77 - Lots of documentation updates by @jfy133
#81 - Renaming of certain BAM options
#92 - Complete restructure of BAM options

`Fixed`

#84 - Fix for Samtools index issues
#96 - Fix for MarkDuplicates issues found by @nilesh-tawari

Other

Added Slack button to repository readme

[2.0.2] - 2018-11-03

`Changed`

#70 - Uninitialized readPaths warning removed

`Added`

#73 - Travis CI Testing of Conda Environment added

`Fixed`

#72 - iconv Issue with R in conda environment

[2.0.1] - 2018-11-02

`Fixed`

#69 - FastQC issues with conda environments

[2.0.0] "Kaufbeuren" - 2018-10-17

Initial release of nf-core/eager:

`Added`

FastQC read quality control
(Optional) Read complexity filtering with FastP
Read merging and clipping using AdapterRemoval v2
Mapping using BWA / BWA Mem or CircularMapper
Library Complexity Estimation with Preseq
Conversion and Filtering of BAM files using Samtools
Damage assessment via DamageProfiler, additional filtering using PMDTools
Duplication removal via DeDup
BAM Clipping with BamUtil for UDGhalf protocols
QualiMap BAM quality control analysis

Furthermore, this already creates an interactive report using MultiQC, which will be upgraded in V2.1 "Ulm" to contain more aDNA specific metrics.

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

nf-core/eager: Changelog

[2.4.0] - 20201-09-14

Added

Fixed

Dependencies

Deprecated

[2.3.5] - 2021-06-03

Added

Fixed

Dependencies

Deprecated

[2.3.4] - 2021-05-05

Added

Fixed

Dependencies

Deprecated

[2.3.3] - 2021-04-08

Added

Fixed

Dependencies

Deprecated

[2.3.2] - 2021-03-16

Added

Fixed

Dependencies

Deprecated

[2.3.1] - 2021-01-14

Added

Fixed

Dependencies

Deprecated

[2.3.0] - 2021-01-11 - "Aalen"

Added

Fixed

Dependencies

Deprecated

[2.2.2] - 2020-12-09

Added

Fixed

Deprecated

[2.2.1] - 2020-10-20

Fixed

[2.2.0] - Ulm - 2020-10-20

Added

Fixed

Dependencies

[2.1.0] - 2020-03-05 - "Ravensburg"

Added

Fixed

Dependencies

[2.0.7] - 2019-06-10

Added

Fixed

Dependencies

[2.0.6] - 2019-03-05

Added

Fixed

Dependencies

[2.0.5] - 2019-01-28

Added

Fixed

Dependencies

[2.0.4] - 2019-01-09

Added

Fixed

[2.0.3] - 2018-12-12

Added

Fixed

Other

[2.0.2] - 2018-11-03

Changed

Added

Fixed

[2.0.1] - 2018-11-02

`Added`

`Fixed`

`Dependencies`

`Deprecated`

`Added`

`Fixed`

`Dependencies`

`Deprecated`

`Added`

`Fixed`

`Dependencies`

`Deprecated`

`Added`

`Fixed`

`Dependencies`

`Deprecated`

`Added`

`Fixed`

`Dependencies`

`Deprecated`

`Added`

`Fixed`

`Dependencies`

`Deprecated`

`Added`

`Fixed`

`Dependencies`

`Deprecated`

`Added`

`Fixed`

`Deprecated`

`Fixed`

`Added`

`Fixed`

`Dependencies`

`Added`

`Fixed`

`Dependencies`

`Added`

`Fixed`

`Dependencies`

`Added`

`Fixed`

`Dependencies`

`Added`

`Fixed`

`Dependencies`

`Added`

`Fixed`

`Added`

`Fixed`

`Changed`

`Added`

`Fixed`

`Fixed`

`Added`