All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Parsing of an insert that is split between the start and end of the assembly and is a reverse complement.
- Swap read count plot axis so Sample aliases are readable.
- Incorrectly running Insert QC and outputting Insert statistics when an insert was not present in the assembly.
- Updated Medaka to v2.0.0
- The default maximum allowed mismatches in the insert primers has changed from 3 to 2.
--override_basecaller_cfg
parameter for cases where automatic basecall model detection fails or users wish to override the automatic choice.--medaka_model_path
parameter to provide a custom medaka model. This is intended for users testing experimental Medaka models and will not be needed for general use.
- The now redundant
--basecaller_cfg
parameter as its value is now automatically detected from the input data on a per-sample basis.
- Min and max read length determined per sample based on
approx_size
- Emit assembly quality stats as part of final workflow outputs
- Trim length parameter can be set to 0.
- Parameter to control number of mismatches allowed in cutsite analysis
- Regression causing incorrect raw read counts shown in "Read stats" section of the report.
- INDELS now called from assembly to reference alignment (
-m1
added to bcftools mpileup)
- Updated Medaka to v1.12.0.
- Updated EZCharts to v0.10.0.
--full_reference
parameter to accept a reference of the full construct. If provided, an additional construct QC section will be output in the report which will include reference coverage and percentage identity per sample.- Additional per-sample output files if
--full_reference
is provided:- BAM: Reference aligned with the assembly in an indexed BAM.
- Variant stats: BCF stats report with any variants found between reference and assembly.
- Variant BCF: BCF file with all variants found between reference and assembly.
- BAM stats: Stats report from alignment of provided reference with assembly.
- Optional linearisation efficiency section in the report, added when the
cut_site
column is supplied in the sample sheet. - Support for
full_reference
andinsert_reference
columns in the sample sheet (to provide different references for individual samples). The MSA for insert analysis will group samples based oninsert_reference
. - Workflow now additionally accept BAM as input by using the
--bam
parameter.
- Update plannotate version to v1.2.2 which fixes error that occurs when a feature contains a float.
- The
--medaka_model
parameter (since the appropriate Medaka model is now automatically determined from the input data). If the input data are lacking information on the model that was used to basecall them, the basecall model must be provided with--basecaller_cfg
. Otherwise the workflow will fail.
- Reinstate Canu assembler alongside Flye.
--assembly_tool
parameter with optionscanu
andflye
(default: flye).--client_fields
parameter to add extra info to the report
- Ensure repetitive regions are marked on the dot plot by reducing the threshold for suppressing repeats inside exact matches.
--large_construct
parameter for assembly of larger constructs including Bacterial Artificial Constructs(50,000-300,000bps).
- Parameters
--min_barcode
and--max_barcode
- Default local executor CPU and RAM limits.
- Parameterised Flye meta option (
--non_uniform_coverage
) for non-uniform data and defaulted to false.
- Now handles sample aliases consisting only of numbers.
- Squashed assembly stats section downsampled plots.
- Log a warning when sample sheet approx size column is being used instead of approx size parameter.
- Deconcatenate only if the assembly is not of the approx. expected size.
.gbk
output file has the actual sequence for the origin
- Dotplot allowing to visualize the repetitive regions in assemblies.
- If approximate size is <=3000 set Flye min overlap to 1000.
- Additionally output plannotate annotations as a GenBank (
.gbk
) file.
- Remove plasmid length column from plannotate feature table.
- Strand column of plannotate feature to use
+
and-
notation. - Default
--primers
parameter is now set to null.
- If
--host_filter
provided the fastcat stats will be included in the report.
- The report has been updated and re-ordered to improve usability.
- Default basecaller cfg is now
dna_r10.4.1_e8.2_400bps_sup@v4.2.0
. - Docker will use an ARM platform image on appropriate devices.
- Updated basecaller cfg model options.
- Workflow will still output report if there are no assemblies.
- Documentation updated to include workflow steps.
- Full plasmid assembly mean quality table in report.
- Output a fastq of the final assembly.
- Insert reference, if provided, will now be used to variant call insert consensus with bcftools.
- Unused packages from the container.
- Enum choices are enumerated in the
--help
output - Enum choices are enumerated as part of the error message when a user has selected an invalid choice
- Bumped minimum required Nextflow version to 22.10.8
- Updated GitHub issue templates to force capture of more information.
- Reference parameter changed to
--insert_reference
. - Updated example command displayed when running
--help
- Parameter
--approx_size_sheet
no longer accepted, instead use sample sheet with optional additional columnapprox_size
. - Any sample aliases that contain spaces will be replaced with underscores.
- Replaced
--threads
option in fastqingress with hardcoded values to remove warning about undefinedparam.threads
- Annotation output bed file has correct notation for strand.
- Configuration for running demo data in AWS
- Flye replaces canu as the assembler tool.
- Updated to Oxford Nanopore Technologies PLC. Public License.
- Amended raw QC stats to show data before filtering by assembly_size parameter.
- Bug where the workflow wouldn't run properly when
--approx_size_sheet
was used.
- Now uses new
fastq_ingress
implementation.
- Provide medaka model for each assembly to fix bug.
- Replace spaces with tabs in medaka model TSV to fix bug.
- Medaka models added to container
--basecall_cfg
is now used to determine suitable Medaka model, alternatively provide the name of a model with--medaka_model
to override automatic selection.
- Updated description in manifest
-profile conda
is no longer supported, users should use-profile standard
(Docker) or-profile singularity
instead
nextflow run epi2me-labs/wf-clone-validation --version
will now print the workflow version number and exit
- Filter host step not outputting approx_size.
- Use groovy script to ping after workflow has run.
- Error handling for no annotations found for an assembly.
- Windows parameter so Canu can run on windows
- Plannotate dictionary keys can contain any characters.
- Sanitize fastq intermittent null object error.
- Change params.threads to task.cpus
- Fastqingress metadata map.
- Sample status now collected from tuples.
- Plannotate read/write database requirement fix
- approx_size_sheet param instead of sample_sheet
- Set out_dir option type to ensure output is written to correct directory on Windows
- Better help text on CLI.
- Fix issue with S3 file inputs.
- Plannotate to version v1.2.0
- Param for fast option in Canu assembly
- New docs format
- Sample sheet encoding
- Min max barcodes integer types
- Moved bioinformatics from report to seperate processes
- Ability to define approx_size of sequence per sample in sample_sheet
- Insert length to table
- Output annotation bed files per sample
- Update schema for epi2melabs compatibility
- Make use of the canu_useGrid parameter
- Singularity profile to config.
- Ping telemetry file.
- Handle more fastq input directory structures.
- db_directory description and explained in README
- db_directory param updated to match s3 folder name
- Use downsampled samples for polish assembly step.
- Option to add suffix to HTML report name.
- Error message if fastq input file evaluates to null.
- Default Primers parameter txt to tsv.
- Fastcat stats plots in tabs for pass and failed samples.
- Version and parameter tables.
- Per barcode number of reads.
- Insert sequences output.
- MSA of inserted sequences.
- Order samples lexicographically.
- Use Canu for assembly instead of Flye.
- Trim input sequences.
- Corrected number of input channels for host_reference process.
- Remove duplicate output files.
- Help message parameters reflect config.
- Plannotate for plasmid annotation and visualization.
- Per sample pass or fail error message in CSV.
- Plasmid annotation feature table output CSV.
- Updated project to use latest practices from wf-template.
- Incorrect specification of conda environment file in Nextflow config.
- Fix report naming to be consistent with other projects
- Optional --prefix flag for naming outputs
- --no-reconcile flag for a simpler and quicker overall pipeline
- simplified assembly outputs, to only emit the final polished assembly
- First release