Skip to content

Commit

Permalink
Merge pull request #154 from genomic-medicine-sweden/dev
Browse files Browse the repository at this point in the history
Release 2.2.0
  • Loading branch information
Lucpen authored Aug 27, 2024
2 parents d34aa8f + 896292d commit 0e7c4c9
Show file tree
Hide file tree
Showing 73 changed files with 6,095 additions and 1,790 deletions.
39 changes: 39 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,45 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 2.2.0 - TioDeNadal [2024-08-27]

### `Added`

- Fasta, gtf, vep cache and plugins can now be downloaded automatically by the pipeline if they are not provided by the user [#149](https://github.com/genomic-medicine-sweden/tomte/pull/149)
- Added `--gencode_annotation_version`, the version of the gencode reference version to download if fasta or gtf is not provided [#149](https://github.com/genomic-medicine-sweden/tomte/pull/149)
- Added the possibility to provide `--vep_refs_download`, a comma separated csv determining the vep references that should be downloaded (excluding gnomad ones) alongside with a switch `--skip_download_vep` for the vep reference download in general and `--skip_download_gnomad` for gnomad in particular [#149](https://github.com/genomic-medicine-sweden/tomte/pull/149)

### `Fixed`

- Input to BootstrapAnn is now supplied in a single channel. Previously they were supplied in separate channels, which could cause mix-ups if more than one sample was supplied [#151](https://github.com/genomic-medicine-sweden/tomte/pull/151)

### `Parameters`

| Old parameter | New parameter |
| ------------- | ------------------------------ |
| | `--gencode_annotation_version` |
| | `--vep_refs_download` |
| | `--skip_download_vep` |
| | `--skip_download_gnomad` |

> [!NOTE]
> Parameter has been updated if both old and new parameter information is present.
> Parameter has been added if just thenew parameter information is present.
> Parameter has been removed if new parameter information isn't present.
### `Changed`

- Updated modules bcftools/annotate, bcftools/mpileup, bcftools/view, cat/fastq, ensemblvep/filtervep, fastp, fastqc, gatk4/haplotypecaller, gatk4/splitncigarreads, gunzip, multiqc, picard/collectrnaseqmetrics, samtools/index, star/align, star/genomegenerate, stringtie/stringtie, tabix/bgziptabix, tabix/tabix and untar [#153](https://github.com/genomic-medicine-sweden/tomte/pull/153)

| Tool | Old version | New version |
| ------------------------------- | ----------- | ----------- |
| gunzip | 20.04 | 22.04 |
| multiqc | 1.22.3 | 1.24.1 |
| picard/collectinsertsizemetrics | 3.1.1 | 3.2.0 |
| tabix/bgziptabix | 1.19.1 | 1.20 |
| tabix/tabix | 1.19.1 | 1.20 |
| untar | 20.04 | 22.04 |

## 2.1.0 - Elf [2024-06-26]

### `Added`
Expand Down
2 changes: 1 addition & 1 deletion assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
report_comment: >
This report has been generated by the <a href="https://github.com/genomic-medicine-sweden/tomte/releases/tag/2.1.0" target="_blank">genomic-medicine-sweden/tomte</a>
This report has been generated by the <a href="https://github.com/genomic-medicine-sweden/tomte/releases/tag/2.2.0" target="_blank">genomic-medicine-sweden/tomte</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://github.com/genomic-medicine-sweden/tomte/blob/master/docs/output.md" target="_blank">documentation</a>.
report_section_order:
Expand Down
3 changes: 3 additions & 0 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ process {
withLabel:process_high_memory {
memory = { check_max( 200.GB * task.attempt, 'memory' ) }
}
withLabel:process_very_long {
time = { check_max( 100.h * task.attempt, 'time' ) }
}
withLabel:error_ignore {
errorStrategy = 'ignore'
}
Expand Down
63 changes: 63 additions & 0 deletions conf/modules/download_references.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available keys to override module options:
ext.args = Additional arguments appended to command in module.
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
ext.prefix = File name prefix for output files.
----------------------------------------------------------------------------------------
*/

//
// DOWNLOAD REFERENCES
//

process {

withName: '.*DOWNLOAD_REFERENCES:FASTA_DOWNLOAD' {
ext.when = { !params.fasta }
publishDir = [
path: { "${params.outdir}/references" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
enabled: params.save_reference
]
}

withName: '.*DOWNLOAD_REFERENCES:GTF_DOWNLOAD' {
ext.when = { !params.gtf }
publishDir = [
path: { "${params.outdir}/references" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
enabled: params.save_reference
]
}

withName: '.*DOWNLOAD_REFERENCES:WGET_DOWNLOAD' {
ext.when = { (!params.skip_download_vep && !params.vep_cache) }
publishDir = [
enabled: false
]
}

withName: '.*DOWNLOAD_REFERENCES:VEP_GNOMAD_DOWNLOAD' {
ext.when = { !params.skip_download_gnomad }
publishDir = [
enabled: false
]
}

withName: '.*DOWNLOAD_REFERENCES:BUILD_VEP_CACHE' {
ext.when = { (!params.skip_download_vep && !params.vep_cache) }
publishDir = [
path: { "${params.outdir}/references" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
enabled: params.save_reference
]
}

}
6 changes: 3 additions & 3 deletions conf/modules/prepare_references.config
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
process {

withName: '.*PREPARE_REFERENCES:GUNZIP_FASTA' {
ext.when = {params.fasta.endsWith(".gz")}
ext.when = { params.fasta && params.fasta.endsWith(".gz") }
publishDir = [
path: { "${params.outdir}/references" },
mode: params.publish_dir_mode,
Expand Down Expand Up @@ -51,7 +51,7 @@ process {
}

withName: '.*PREPARE_REFERENCES:GUNZIP_GTF' {
ext.when = { params.gtf.endsWith(".gz") }
ext.when = { params.gtf && params.gtf.endsWith(".gz") }
publishDir = [
path: { "${params.outdir}/references" },
mode: params.publish_dir_mode,
Expand Down Expand Up @@ -134,7 +134,7 @@ process {
}

withName: '.*PREPARE_REFERENCES:UNTAR_VEP_CACHE' {
ext.when = { (params.vep_cache.endsWith("tar.gz")) }
ext.when = { (params.vep_cache && params.vep_cache.endsWith(".gz")) }
publishDir = [
enabled: false
]
Expand Down
60 changes: 35 additions & 25 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ If you would like to see more examples of what a typical samplesheet looks like

In genomic-medicine-sweden/tomte, references can be supplied using parameters. We have also introduced the possiblility of using the `--igenomes_base` parameter to point to a path where genome specific reference files are placed (fasta, fai, gtf, star_index, salmon_index, subsample_bed). To make sure that the names of the reference files match those in your directory, check [igenomes.config](https://github.com/genomic-medicine-sweden/tomte/blob/master/conf/igenomes.config).

If no references are provided by the user the pipeline will automatically download a fasta and a gtf file. The user can select the desired genome and gencode version using `--genome` and `--genome_annotation_version`. If the user also wants to download vep cache and vep plugins references they will have to set `--skip_download_vep false`. The user will have to provide a comma separated file containing the plugins they want to download `--vep_refs_download`, this file should NOT contain the path to gnomad database. If the user also wants to download the gnomad database they will have to set `--skip_download_gnomad false`, bare in mind that about ~900GB of data will be downloaded, so storage space and time are needed. The data will then be processed and its size significantly reduced to under 40GB.

Note that the pipeline is modular in architecture. It offers you the flexibility to choose between different tools. For example, you can call SNVs either with BCFtools or with GATK. You also have the option to turn off sections of the pipeline if you do not want to run them. For example, drop aberrant expression module can be turned off by setting `--skip_drop_ae true`. This flexibility means that in any given analysis run, a combination of tools included in the pipeline will not be executed. So the pipeline is written in a way that can account for these differences while working with reference parameters. If a tool is not going to be executed during the course of a run, parameters used only by that tool need not be provided. For example, if you are not running DROP aberrant splicing, you do not need to provide `--reference_drop_splice_folder`.

genomic-medicine-sweden/tomte consists of several tools used for various purposes. For convenience, we have grouped those tools under the following categories:
Expand All @@ -145,21 +147,24 @@ The mandatory and optional parameters for each category are tabulated below.

| Mandatory | Optional |
| --------- | ------------------------------ |
| fasta | fasta_fai<sup>1</sup> |
| gtf | sequence_dict<sup>1</sup> |
| | salmon_index<sup>1</sup> |
| | star_index<sup>1</sup> |
| | transcript_fasta<sup>1</sup> |
| | genome<sup>2</sup> |
| | platform<sup>3</sup> |
| | min_trimmed_length<sup>4</sup> |
| | star_two_pass_mode<sup>4</sup> |

<sup>1</sup> If the parameter is not provided by the user, it will be generated from the fasta and gtf files.<br />
<sup>2</sup> If it is not provided by the user, the default value is GRCh38.<br />
<sup>3</sup> If it is not provided by the user, the default value is illumina.<br />
<sup>4</sup> If it is not provided by the user, the default value is 40.<br />
<sup>5</sup> If it is not provided by the user, the default value is Basic.
| | fasta<sup>1</sup> |
| | gtf<sup>1</sup> |
| | fasta_fai<sup>2</sup> |
| | sequence_dict<sup>2</sup> |
| | salmon_index<sup>2</sup> |
| | star_index<sup>2</sup> |
| | transcript_fasta<sup>2</sup> |
| | genome<sup>3</sup> |
| | platform<sup>4</sup> |
| | min_trimmed_length<sup>5</sup> |
| | star_two_pass_mode<sup>6</sup> |

<sup>1</sup> If the parameter is not provided by the user, it will be downloaded.<br />
<sup>2</sup> If the parameter is not provided by the user, it will be generated from the fasta and gtf files.<br />
<sup>3</sup> If it is not provided by the user, the default value is GRCh38.<br />
<sup>4</sup> If it is not provided by the user, the default value is illumina.<br />
<sup>5</sup> If it is not provided by the user, the default value is 40.<br />
<sup>6</sup> If it is not provided by the user, the default value is Basic.

##### 2. Junction track and bigwig

Expand Down Expand Up @@ -191,16 +196,21 @@ The mandatory and optional parameters for each category are tabulated below.

#### 5. SNV annotation (ensembl VEP)

| Mandatory | Optional |
| ---------------------------- | -------------------------- |
| vep_plugin_files<sup>1</sup> | skip_vep<sup>2</sup> |
| | vep_cache<sup>3</sup> |
| | vep_cache_version |
| | gene_panel_clinical_filter |

<sup>1</sup> VEP caches can be downloaded [here](https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#cache). VEP plugins may be installed in the cache directory, and the plugin pLI is mandatory to install. To supply files required by VEP plugins, use `vep_plugin_files` parameter. See example cache [here](https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vep_cache_and_plugins.tar.gz).<br />
<sup>2</sup> If it is not provided by the user, the default value is false<br />
<sup>3</sup> If it is not provided by the user, the default value is 110, supported values are 107 and 110 <br />
| Mandatory | Optional |
| --------- | -------------------------------- |
| | skip_vep<sup>1</sup> |
| | vep_plugin_files<sup>2</sup> |
| | vep_cache<sup>2</sup> |
| | vep_cache_version<sup>3</sup> |
| | skip_download_vep<sup>4</sup> |
| | skip_download_gnomad<sup>4</sup> |
| | vep_refs_download |
| | gene_panel_clinical_filter |

<sup>1</sup> If it is not provided by the user, the default value is false<br />
<sup>2</sup> VEP cache and plugins can be automatically downloaded by the pipeline by setting `--skip_download_vep false`, `--skip_download_gnomad false` and providing a lcsv with a list of files to download `--vep_refs_download` as done [here](https://github.com/genomic-medicine-sweden/tomte/blob/dev/test_data/vep_to_download.csv). VEP caches can also be downloaded [here](https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#cache). VEP plugins may also be installed in the cache directory, and the plugin pLI is mandatory to install. To supply files required by VEP plugins, use `vep_plugin_files` parameter. See example cache [here](https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vep_cache_and_plugins.tar.gz).<br />
<sup>3</sup> If it is not provided by the user, the default value is 110, supported values are 107 and 110
<sup>4</sup> If it is not provided by the user, the default value true

#### 6. Stringtie & gffcompare

Expand Down
Loading

0 comments on commit 0e7c4c9

Please sign in to comment.