Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Back to dev #2

Merged
merged 12 commits into from
Feb 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) Kübra Narcı
Copyright (c) kuebra.narci@dkfz.de

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
24 changes: 16 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
<img alt="nf-core/variantbenchmarking" src="docs/images/nf-core-variantbenchmarking_logo_light.png">
</picture>
</h1>

[![GitHub Actions CI Status](https://github.com/nf-core/variantbenchmarking/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/variantbenchmarking/actions?query=workflow%3A%22nf-core+CI%22)
[![GitHub Actions Linting Status](https://github.com/nf-core/variantbenchmarking/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/variantbenchmarking/actions?query=workflow%3A%22nf-core+linting%22)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/variantbenchmarking/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)

Expand All @@ -13,7 +14,7 @@
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
[![Launch on Nextflow Tower](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Nextflow%20Tower-%234256e7)](https://tower.nf/launch?pipeline=https://github.com/nf-core/variantbenchmarking)

[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23variantbenchmarking-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/variantbenchmarking)[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)
[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23benchmark-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/variantbenchmarking)[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)

## Introduction

Expand All @@ -29,14 +30,22 @@
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->

1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
1. Standardization of SVs in test VCF files
2. Normalization of SVs in test VCF files
3. Normalization of SVs in truth VCF files
4. SV stats and histograms
5. Germline benchmarking of SVs
6. Somatic benchmarking of SVs
7. Final report and comparisons

## Usage

> [!NOTE]
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.

Supported SV callers: Manta, SVaba, Dragen, Delly, Lumpy ..
Available Truth samples: HG002, SEQC2

<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
Explain what rows and columns represent. For instance (please edit as appropriate):

Expand All @@ -45,12 +54,11 @@ First, prepare a samplesheet with your input data that looks as follows:
`samplesheet.csv`:

```csv
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
caller,test_vcf
caller1,test1.vcf.gz
caller2,test2.vcf
```

Each row represents a fastq file (single-end) or a pair of fastq files (paired end).

-->

Now, you can run the pipeline using:
Expand Down Expand Up @@ -78,7 +86,7 @@ For more details about the output files and reports, please refer to the

## Credits

nf-core/variantbenchmarking was originally written by Kübra Narcı.
nf-core/variantbenchmarking was originally written by kuebra.narci@dkfz.de.

We thank the following people for their extensive assistance in the development of this pipeline:

Expand Down
2 changes: 1 addition & 1 deletion assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
report_comment: >
This report has been generated by the <a href="https://github.com/nf-core/variantbenchmarking/tree/dev" target="_blank">nf-core/variantbenchmarking</a>
This report has been generated by the <a href="https://github.com/nf-core/variantbenchmarking/releases/tag/dev" target="_blank">nf-core/variantbenchmarking</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://nf-co.re/variantbenchmarking/dev/docs/output" target="_blank">documentation</a>.
report_section_order:
Expand Down
6 changes: 3 additions & 3 deletions assets/samplesheet.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
sample,fastq_1,fastq_2
SAMPLE_PAIRED_END,/path/to/fastq/files/AEG588A1_S1_L002_R1_001.fastq.gz,/path/to/fastq/files/AEG588A1_S1_L002_R2_001.fastq.gz
SAMPLE_SINGLE_END,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz,
sample,test_vcf,truth_vcf,caller,type
HG002,"/Users/w620-admin/Desktop/nf-core/dataset/hg37/Broad_svaba_05052017/HG002.svaba.germline.sv.convBNDtoDEL.vcf","/Users/w620-admin/Desktop/nf-core/dataset/hg37/GIAB_Assemblytics_structural_variants_only_160618/hg002.Assemblytics_structural_variants.sorted.vcf.gz",svaba,sv
HG003,"/Users/w620-admin/Desktop/nf-core/dataset/hg37/Broad_svaba_05052017/HG003.svaba.germline.sv.convBNDtoDEL.vcf","/Users/w620-admin/Desktop/nf-core/dataset/hg37/GIAB_Assemblytics_structural_variants_only_160618/hg003.Assemblytics_structural_variants.sorted.vcf.gz",svaba,sv
4 changes: 4 additions & 0 deletions assets/samplesheet_HG002.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
test_vcf,caller
"/Users/w620-admin/Desktop/nf-core/dataset/hg37/dragen_paper/HG002_delly_SV_hg19.vcf.gz",delly
"/Users/w620-admin/Desktop/nf-core/dataset/hg37/dragen_paper/HG002_lumpy_SV_hg19.vcf.gz",lumpy
"/Users/w620-admin/Desktop/nf-core/dataset/hg37/dragen_paper/HG002_manta_SV_hg19_genotype.vcf",manta
5 changes: 5 additions & 0 deletions assets/samplesheet_HG002_hg19.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
test_vcf,caller
"/Users/w620-admin/Desktop/nf-core/dataset/hg37/dragen_paper/HG002_DRAGEN_SV_hg19.vcf.gz",dragen
"/Users/w620-admin/Desktop/nf-core/dataset/hg37/dragen_paper/HG002_delly_SV_hg19.vcf.gz",delly
"/Users/w620-admin/Desktop/nf-core/dataset/hg37/dragen_paper/HG002_lumpy_SV_hg19.vcf.gz",lumpy
"/Users/w620-admin/Desktop/nf-core/dataset/hg37/dragen_paper/HG002_manta_SV_hg19_genotype.vcf",manta
5 changes: 5 additions & 0 deletions assets/samplesheet_HG002_hg38.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
test_vcf,caller
"/Users/w620-admin/Desktop/nf-core/dataset/hg38/GIAB_GRCh38_SVs_06252018/ajtrio.lumpy.svtyper.HG002.md.sorted.recal.vcf.gz",lumpy
"/Users/w620-admin/Desktop/nf-core/dataset/hg38/GIAB_GRCh38_SVs_06252018/manta.HG002.vcf.gz",manta
"/Users/w620-admin/Desktop/nf-core/dataset/hg37/Ashkenazim_unnanotated/Ashkenazim_HG002.filtered.sv.vcf.gz",merged

3 changes: 3 additions & 0 deletions assets/samplesheet_SEQC2.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
test_vcf,caller
"/Users/w620-admin/Desktop/nf-core/dataset/hg38/SEQC_somatic_mutation_truth/test/WGS.bwa.dedup-IL_T_1_vs_IL_N_1-Strelka.indel.vcf.gz",strelka
"/Users/w620-admin/Desktop/nf-core/dataset/hg38/SEQC_somatic_mutation_truth/test/WGS.bwa.dedup-IL_T_1_vs_IL_N_1-MuTect2.vcf.gz",mutect2
55 changes: 21 additions & 34 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
@@ -1,36 +1,23 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://raw.githubusercontent.com/nf-core/variantbenchmarking/master/assets/schema_input.json",
"title": "nf-core/variantbenchmarking pipeline - params.input schema",
"description": "Schema for the file provided with params.input",
"type": "array",
"items": {
"type": "object",
"properties": {
"sample": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Sample name must be provided and cannot contain spaces"
},
"fastq_1": {
"type": "string",
"pattern": "^\\S+\\.f(ast)?q\\.gz$",
"errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
},
"fastq_2": {
"errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
"anyOf": [
{
"type": "string",
"pattern": "^\\S+\\.f(ast)?q\\.gz$"
},
{
"type": "string",
"maxLength": 0
}
]
}
},
"required": ["sample", "fastq_1"]
}
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://raw.githubusercontent.com/nf-core/variantbenchmarking/master/assets/schema_input.json",
"title": "nf-core/variantbenchmarking pipeline - params.input schema",
"description": "Schema for the file provided with params.input",
"type": "array",
"items": {
"type": "object",
"properties": {
"test_vcf": {
"type": "string",
"pattern": "",
"errorMessage": "Test VCF must be provided, cannot contain spaces and must have extension '.vcf.gz'"
},
"caller": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Name of the variant caller used to generate test file"
}
},
"required": ["test_vcf","caller"]
}
}
Empty file added assets/svync/default.yaml
Empty file.
69 changes: 69 additions & 0 deletions assets/svync/delly.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
id: delly_$INFO/SVTYPE
alt:
BND: TRA
info:
CALLER:
value: delly
number: 1
type: string
description: The caller used to determine this variant
SVLEN:
value: ~sub:$INFO/END,$POS
number: 1
type: integer
description: The length of the structural variant
alts:
DEL: -~sub:$INFO/END,$POS
INS: $INFO/SVLEN
TRA: 1
CIEND:
value: $INFO/CIEND
number: 2
type: integer
description: PE confidence interval around END
CIPOS:
value: $INFO/CIPOS
number: 2
type: integer
description: PE confidence interval around POS
SVTYPE:
value: $INFO/SVTYPE
number: 1
type: string
description: Type of structural variant
CHR2:
value:
number: 1
type: string
description: Chromosome for second position
alts:
TRA: $INFO/CHR2
END:
value: $INFO/END
number: 1
type: integer
description: End position of the structural variant
alts:
TRA: $INFO/POS2
IMPRECISE:
value: $INFO/IMPRECISE
number: 0
type: flag
description: Imprecise structural variation
format:
GT:
value: $FORMAT/GT
number: 1
type: string
description: Genotype
PE:
value: $FORMAT/DR,$FORMAT/DV
number: 2
type: integer
description: Paired-read support for the ref and alt alleles in the order listed
SR:
value: $FORMAT/RR,$FORMAT/RV
number: 2
type: integer
description: Split-read support for the ref and alt alleles in the order listed

65 changes: 65 additions & 0 deletions assets/svync/gridss.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
id: gridss_$INFO/SVTYPE
info:
CALLER:
value: gridss
number: 1
type: string
description: The caller used to determine this variant
SVLEN:
value: ~sub:$INFO/END,$POS
number: 1
type: integer
description: The length of the structural variant
alts:
BND:
TRA: 0
DEL: -~sub:$INFO/END,$POS
CIEND:
value: $INFO/CIRPOS
number: 2
type: integer
description: PE confidence interval around END
CIPOS:
value: $INFO/CIPOS
number: 2
type: integer
description: PE confidence interval around POS
SVTYPE:
value: $INFO/SVTYPE
number: 1
type: string
description: Type of structural variant
CHR2:
value:
number: 1
type: string
description: Chromosome for second position
alts:
TRA: $INFO/CHR2
END:
value: $INFO/END
number: 1
type: integer
description: End position of the structural variant
IMPRECISE:
value: $INFO/IMPRECISE
number: 0
type: flag
description: Imprecise structural variation
format:
GT:
value: $FORMAT/GT
number: 1
type: string
description: Genotype
PE:
value: $FORMAT/REFPAIR,$FORMAT/RP
number: 2
type: integer
description: Paired-read support for the ref and alt alleles in the order listed
SR:
value: .,$FORMAT/SR
number: 2
type: integer
description: Split-read support for the ref and alt alleles in the order listed

66 changes: 66 additions & 0 deletions assets/svync/manta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
id: manta_$INFO/SVTYPE
info:
CALLER:
value: manta
number: 1
type: string
description: The caller used to determine this variant
SVLEN:
value: $INFO/SVLEN
number: 1
type: integer
description: The length of the structural variant
alts:
INS: ~sum:~len:LEFT_SVINSSEQ,~len:RIGHT_SVINSSEQ
TRA: 1
CIEND:
value: $INFO/CIEND
number: 2
type: integer
description: PE confidence interval around END
CIPOS:
value: $INFO/CIPOS
number: 2
type: integer
description: PE confidence interval around POS
SVTYPE:
value: $INFO/SVTYPE
number: 1
type: string
description: Type of structural variant
CHR2:
value:
number: 1
type: string
description: Chromosome for second position
alts:
TRA: $INFO/CHR2
END:
value: $INFO/END
number: 1
type: integer
description: End position of the structural variant
alts:
TRA: $INFO/POS2
IMPRECISE:
value: $INFO/IMPRECISE
number: 0
type: flag
description: Imprecise structural variation
format:
GT:
value: $FORMAT/GT
number: 1
type: string
description: Genotype
PE:
value: $FORMAT/PR
number: 2
type: integer
description: Paired-read support for the ref and alt alleles in the order listed
SR:
value: $FORMAT/SR
number: 2
type: integer
description: Split-read support for the ref and alt alleles in the order listed

Loading
Loading