Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2.2.0 #154

Merged
merged 36 commits into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
00e5eb3
Merge pull request #143 from genomic-medicine-sweden/master
Lucpen Jun 28, 2024
2d90721
feat added possibility to download refs
Aug 13, 2024
43ea83a
linitng
Aug 13, 2024
05ebdaf
fix linting
Aug 13, 2024
50c7b5d
fix linting
Aug 13, 2024
3e5e6d8
fix channels
Aug 14, 2024
0508398
Update CHANGELOG.md
Lucpen Aug 14, 2024
3271cd5
fix wget_download
Aug 14, 2024
946f311
fix bootstrapann
Lucpen Aug 14, 2024
72cfeab
fix bootstrapann
Lucpen Aug 14, 2024
9e4decc
Update CHANGELOG.md
Lucpen Aug 14, 2024
096719d
Update CHANGELOG.md
Lucpen Aug 14, 2024
3ec8f26
Update CHANGELOG.md
Lucpen Aug 14, 2024
a353c5c
Update CHANGELOG.md
Lucpen Aug 14, 2024
407e782
Merge pull request #151 from genomic-medicine-sweden/fix_bootstrapann
Lucpen Aug 15, 2024
5b48060
Merge branch 'dev' into download_references2
Lucpen Aug 15, 2024
bfd8376
Apply suggestions from code review
Lucpen Aug 15, 2024
3898fd5
Apply suggestions from code review
Lucpen Aug 15, 2024
cc675f4
Apply suggestions from code review
Lucpen Aug 15, 2024
24b3c5b
fix modules/local/wget_download.nf
Aug 19, 2024
e510080
fix output vep_to_download
Aug 20, 2024
dd83bdc
fix build_vep_cache
Aug 21, 2024
c49d4c3
avoid publishing of wget and gnomad_download modules
Aug 21, 2024
173cb2b
avoid publishing of wget and gnomad_download modules
Aug 21, 2024
f87da99
Merge pull request #149 from genomic-medicine-sweden/download_referen…
Lucpen Aug 23, 2024
c412eb4
update modules
Aug 23, 2024
d4ac03b
bump to version 2.20
Aug 23, 2024
376ce25
updated changelog
Aug 23, 2024
3742b1a
Update CHANGELOG.md
Lucpen Aug 23, 2024
b5ad8a1
Merge pull request #153 from genomic-medicine-sweden/prepare_release_…
Lucpen Aug 23, 2024
0a5055f
Update CHANGELOG.md
Lucpen Aug 26, 2024
93ffd4a
Apply suggestions from code review
Lucpen Aug 26, 2024
20580e5
Apply suggestions from code review
Lucpen Aug 26, 2024
20f9d4b
Apply suggestions from code review
Lucpen Aug 26, 2024
5816b1a
Update nextflow_schema.json
Lucpen Aug 26, 2024
896292d
Update CHANGELOG.md
Lucpen Aug 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,45 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 2.2.0 - TioDeNadal [2024-08-26]
Lucpen marked this conversation as resolved.
Show resolved Hide resolved

### `Added`

- Fasta, gtf, vep cache and plugins can now be downloaded automatically by the pipeline if they are not provided by the user [#149](https://github.com/genomic-medicine-sweden/tomte/pull/149)
- Added `--gencode_annotation_version`, the version of the gencode reference version to download if fasta or gtf is not provided [#149](https://github.com/genomic-medicine-sweden/tomte/pull/149)
- Added the possibility to provide `--vep_refs_download`, a comma separated csv determining the vep references that should be downloaded (excluding gnomad ones) alongside with a switch `--skip_download_vep` for the vep reference download in general and `--skip_download_gnomad` for gnomad in particular [#149](https://github.com/genomic-medicine-sweden/tomte/pull/149)

### `Fixed`

- Input to BootstrapAnn is now supplied in a single channel. Previously they were supplied in separate channels, which could cause mix-ups if more than one sample was supplied [#151](https://github.com/genomic-medicine-sweden/tomte/pull/151)

### `Parameters`

| Old parameter | New parameter |
| ------------- | ------------------------------ |
| | `--gencode_annotation_version` |
| | `--vep_refs_download` |
| | `--skip_download_vep` |
| | `--skip_download_gnomad` |

> [!NOTE]
> Parameter has been updated if both old and new parameter information is present.
> Parameter has been added if just thenew parameter information is present.
> Parameter has been removed if new parameter information isn't present.

### `Changed`

- Updated modules bcftools/annotate, bcftools/mpileup, bcftools/view, cat/fastq, ensemblvep/filtervep, fastp, fastqc, gatk4/haplotypecaller, gatk4/splitncigarreads, gunzip, multiqc, picard/collectrnaseqmetrics, samtools/index, star/align, star/genomegenerate, stringtie/stringtie, tabix/bgziptabix, tabix/tabix and untar [#153](https://github.com/genomic-medicine-sweden/tomte/pull/153)

| Tool | Old version | New version |
| ------------------------------- | ----------- | ----------- |
| gunzip | 20.04 | 22.04 |
| multiqc | 1.22.3 | 1.24.1 |
| picard/collectinsertsizemetrics | 3.1.1 | 3.2.0 |
| tabix/bgziptabix | 1.19.1 | 1.20 |
| tabix/tabix | 1.19.1 | 1.20 |
| untar | 20.04 | 22.04 |

## 2.1.0 - Elf [2024-06-26]

### `Added`
Expand Down
2 changes: 1 addition & 1 deletion assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
report_comment: >
This report has been generated by the <a href="https://github.com/genomic-medicine-sweden/tomte/releases/tag/2.1.0" target="_blank">genomic-medicine-sweden/tomte</a>
This report has been generated by the <a href="https://github.com/genomic-medicine-sweden/tomte/releases/tag/2.2.0" target="_blank">genomic-medicine-sweden/tomte</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://github.com/genomic-medicine-sweden/tomte/blob/master/docs/output.md" target="_blank">documentation</a>.
report_section_order:
Expand Down
3 changes: 3 additions & 0 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ process {
withLabel:process_high_memory {
memory = { check_max( 200.GB * task.attempt, 'memory' ) }
}
withLabel:process_very_long {
time = { check_max( 100.h * task.attempt, 'time' ) }
}
withLabel:error_ignore {
errorStrategy = 'ignore'
}
Expand Down
63 changes: 63 additions & 0 deletions conf/modules/download_references.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available keys to override module options:
ext.args = Additional arguments appended to command in module.
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
ext.prefix = File name prefix for output files.
----------------------------------------------------------------------------------------
*/

//
// DOWNLOAD REFERENCES
//

process {

withName: '.*DOWNLOAD_REFERENCES:FASTA_DOWNLOAD' {
ext.when = { !params.fasta }
publishDir = [
path: { "${params.outdir}/references" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
enabled: params.save_reference
]
}

withName: '.*DOWNLOAD_REFERENCES:GTF_DOWNLOAD' {
ext.when = { !params.gtf }
publishDir = [
path: { "${params.outdir}/references" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
enabled: params.save_reference
]
}

withName: '.*DOWNLOAD_REFERENCES:WGET_DOWNLOAD' {
ext.when = { (!params.skip_download_vep && !params.vep_cache) }
publishDir = [
enabled: false
]
}

withName: '.*DOWNLOAD_REFERENCES:VEP_GNOMAD_DOWNLOAD' {
ext.when = { !params.skip_download_gnomad }
publishDir = [
enabled: false
]
}

withName: '.*DOWNLOAD_REFERENCES:BUILD_VEP_CACHE' {
ext.when = { (!params.skip_download_vep && !params.vep_cache) }
publishDir = [
path: { "${params.outdir}/references" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
enabled: params.save_reference
]
}

}
6 changes: 3 additions & 3 deletions conf/modules/prepare_references.config
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
process {

withName: '.*PREPARE_REFERENCES:GUNZIP_FASTA' {
ext.when = {params.fasta.endsWith(".gz")}
ext.when = { params.fasta && params.fasta.endsWith(".gz") }
publishDir = [
path: { "${params.outdir}/references" },
mode: params.publish_dir_mode,
Expand Down Expand Up @@ -51,7 +51,7 @@ process {
}

withName: '.*PREPARE_REFERENCES:GUNZIP_GTF' {
ext.when = { params.gtf.endsWith(".gz") }
ext.when = { params.gtf && params.gtf.endsWith(".gz") }
publishDir = [
path: { "${params.outdir}/references" },
mode: params.publish_dir_mode,
Expand Down Expand Up @@ -134,7 +134,7 @@ process {
}

withName: '.*PREPARE_REFERENCES:UNTAR_VEP_CACHE' {
ext.when = { (params.vep_cache.endsWith("tar.gz")) }
ext.when = { (params.vep_cache && params.vep_cache.endsWith(".gz")) }
publishDir = [
enabled: false
]
Expand Down
56 changes: 31 additions & 25 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ If you would like to see more examples of what a typical samplesheet looks like

In genomic-medicine-sweden/tomte, references can be supplied using parameters. We have also introduced the possiblility of using the `--igenomes_base` parameter to point to a path where genome specific reference files are placed (fasta, fai, gtf, star_index, salmon_index, subsample_bed). To make sure that the names of the reference files match those in your directory, check [igenomes.config](https://github.com/genomic-medicine-sweden/tomte/blob/master/conf/igenomes.config).

If no references are provided by the user the pipeline will automatically download a fasta and a gtf file. The user can select the desired genome and gencode version using `--genome` and `--genome_annotation_version`. If the user also wants to download vep cache and vep plugins references they will have to set `--skip_download_vep false`. The user will have to provide a comma separated file containing the plugins they want to download `--vep_refs_download`, this file should NOT contain the path to gnomad database. If the user also wants to download the gnomad database they will have to set `--skip_download_gnomad false`, bare in mind that about ~900GB of data will be downloaded, so storage space and time are needed. The data will then be processed and its size significantly reduced to under 40GB.

Note that the pipeline is modular in architecture. It offers you the flexibility to choose between different tools. For example, you can call SNVs either with BCFtools or with GATK. You also have the option to turn off sections of the pipeline if you do not want to run them. For example, drop aberrant expression module can be turned off by setting `--skip_drop_ae true`. This flexibility means that in any given analysis run, a combination of tools included in the pipeline will not be executed. So the pipeline is written in a way that can account for these differences while working with reference parameters. If a tool is not going to be executed during the course of a run, parameters used only by that tool need not be provided. For example, if you are not running DROP aberrant splicing, you do not need to provide `--reference_drop_splice_folder`.

genomic-medicine-sweden/tomte consists of several tools used for various purposes. For convenience, we have grouped those tools under the following categories:
Expand All @@ -145,21 +147,24 @@ The mandatory and optional parameters for each category are tabulated below.

| Mandatory | Optional |
| --------- | ------------------------------ |
| fasta | fasta_fai<sup>1</sup> |
| gtf | sequence_dict<sup>1</sup> |
| | salmon_index<sup>1</sup> |
| | star_index<sup>1</sup> |
| | transcript_fasta<sup>1</sup> |
| | genome<sup>2</sup> |
| | platform<sup>3</sup> |
| | min_trimmed_length<sup>4</sup> |
| | star_two_pass_mode<sup>4</sup> |

<sup>1</sup> If the parameter is not provided by the user, it will be generated from the fasta and gtf files.<br />
<sup>2</sup> If it is not provided by the user, the default value is GRCh38.<br />
<sup>3</sup> If it is not provided by the user, the default value is illumina.<br />
<sup>4</sup> If it is not provided by the user, the default value is 40.<br />
<sup>5</sup> If it is not provided by the user, the default value is Basic.
| | fasta<sup>1</sup> |
| | gtf<sup>1</sup> |
| | fasta_fai<sup>2</sup> |
| | sequence_dict<sup>2</sup> |
| | salmon_index<sup>2</sup> |
| | star_index<sup>2</sup> |
| | transcript_fasta<sup>2</sup> |
| | genome<sup>3</sup> |
| | platform<sup>4</sup> |
| | min_trimmed_length<sup>5</sup> |
| | star_two_pass_mode<sup>6</sup> |

<sup>1</sup> If the parameter is not provided by the user, it will be downloaded.<br />
<sup>2</sup> If the parameter is not provided by the user, it will be generated from the fasta and gtf files.<br />
<sup>3</sup> If it is not provided by the user, the default value is GRCh38.<br />
<sup>4</sup> If it is not provided by the user, the default value is illumina.<br />
<sup>5</sup> If it is not provided by the user, the default value is 40.<br />
<sup>6</sup> If it is not provided by the user, the default value is Basic.

##### 2. Junction track and bigwig

Expand Down Expand Up @@ -191,16 +196,17 @@ The mandatory and optional parameters for each category are tabulated below.

#### 5. SNV annotation (ensembl VEP)

| Mandatory | Optional |
| ---------------------------- | -------------------------- |
| vep_plugin_files<sup>1</sup> | skip_vep<sup>2</sup> |
| | vep_cache<sup>3</sup> |
| | vep_cache_version |
| | gene_panel_clinical_filter |

<sup>1</sup> VEP caches can be downloaded [here](https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#cache). VEP plugins may be installed in the cache directory, and the plugin pLI is mandatory to install. To supply files required by VEP plugins, use `vep_plugin_files` parameter. See example cache [here](https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vep_cache_and_plugins.tar.gz).<br />
<sup>2</sup> If it is not provided by the user, the default value is false<br />
<sup>3</sup> If it is not provided by the user, the default value is 110, supported values are 107 and 110 <br />
| Mandatory | Optional |
| --------- | ----------------------------- |
| | skip_vep<sup>1</sup> |
| | vep_plugin_files<sup>2</sup> |
| | vep_cache<sup>2</sup> |
| | vep_cache_version<sup>3</sup> |
| | gene_panel_clinical_filter |

<sup>1</sup> If it is not provided by the user, the default value is false<br />
<sup>2</sup> VEP cache and plugins can be automatically downloaded by the pipeline by setting `--skip_download_vep false`, `--skip_download_gnomad false` and providing a lcsv with a list of files to download `--vep_refs_download` as done [here](https://github.com/genomic-medicine-sweden/tomte/blob/dev/test_data/vep_to_download.csv). VEP caches can also be downloaded [here](https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#cache). VEP plugins may also be installed in the cache directory, and the plugin pLI is mandatory to install. To supply files required by VEP plugins, use `vep_plugin_files` parameter. See example cache [here](https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/vep_cache_and_plugins.tar.gz).<br />
Lucpen marked this conversation as resolved.
Show resolved Hide resolved
<sup>3</sup> If it is not provided by the user, the default value is 110, supported values are 107 and 110
Lucpen marked this conversation as resolved.
Show resolved Hide resolved

#### 6. Stringtie & gffcompare

Expand Down
Loading
Loading