Skip to content

Commit

Permalink
Merge pull request #4 from dbmi-bgm/v2
Browse files Browse the repository at this point in the history
v2
  • Loading branch information
phil-grayson authored Sep 23, 2021
2 parents b08bea3 + 132025b commit 58d260d
Show file tree
Hide file tree
Showing 55 changed files with 1,256 additions and 182 deletions.
1 change: 1 addition & 0 deletions PIPELINE
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
cnv
27 changes: 26 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@

* This repo contains CGAP SV Pipeline components
* CWL
* Docker sources - `cgap/cgap-manta:v1` for Manta and `cgap/cnv:v1` for annotation and filtering
* Public Docker sources - `cgap/cgap-manta:v2` for Manta and `cgap/cnv:v2` for annotation and filtering
* Private ECR sources created dynamically at deployment with `post_patch_to_portal.py`
* Example Tibanna input jsons for individual steps
* CGAP Portal Workflows and Metaworkflow

Expand All @@ -15,12 +16,36 @@ python post_patch_to_portal.py [--ff-env=<env_name>] [--del-prev-version]
[--skip-software]
[--skip-file-format] [--skip-file-reference]
[--skip-workflow] [--skip-metaworkflow]
[--skip-cwl] [--skip-ecr] [--cwl-bucket=<cwl_s3_bucket>]
[--account=<account_num>] [--region=<region>]
[--ugrp-unrelated] [--ignore-key-conflict]
# env_name : fourfront-cgapwolf (default), fourfront-cgap
# cwl_s3_bucket : '' (default); provide s3 cwl bucket name, required for cwl and workflow steps
# account_num : '' (default); provide aws account number, required for cwl, workflow, and ecr steps
# region : '' (default); provide aws account region, required for cwl, workflow, and ecr steps
```

### Version updates

#### v2

* The pipeline has been converted to work on private ECR images which are created from our public Docker images
* Various updates throughout the CGAP SV Pipeline. The current pipeline is outlined below and updates are indicated. A new version of ``granite`` (v0.1.13) is being used for steps 2-13 of the pipeline.
* Step 1. Manta-based calling of SVs (**Update**: Manta now uses the `callRegions` flag instead of `regions`. We no longer use get_contigs.py from Parliament2 and have removed the Parliament2 github repo from the `cgap-manta:v2` Dockerfile)
* Step 2. Granite SVqcVCF is used to count DEL and DUP variants and provide a total number of DEL and DUP variants in each sample (**New Step**)
* Step 3. VEP/sansa annotation (**Update**: VEP now includes the `canonical` flag to identify the canonical transcript for each gene)
* Step 4. Granite SVqcVCF is used to count DEL and DUP variants and provide a total number of DEL and DUP variants in each sample (**New Step**)
* Step 5. Annotation filtering and SV type selection
* Step 6. 20 Unrelated filtering (**Update**: New 20 unrelated reference file resulting from UGRP samples re-mapped with v24 of the CGAP Pipeline including alt index)
* Step 7. Granite SVqcVCF is used to count DEL and DUP variants and provide a total number of DEL and DUP variants in each sample (**New Step**)
* Step 8. Cytoband annotation step adds the cytoband for each breakpoint; Cyto1 and Cyto2 (**New Step**)
* Step 9. Granite SVqcVCF is used to count DEL and DUP variants and provide a total number of DEL and DUP variants in each sample (**New Step**)
* Step 10. Length filtering
* Step 11. Granite SVqcVCF is used to count DEL and DUP variants and provide a total number of DEL and DUP variants in each sample (**New Step**)
* Step 12. Annotation cleaning to produce a vcf file that loads quickly in the Higlass genome browser
* Step 13. Granite SVqcVCF is used to count DEL and DUP variants and provide a total number of DEL and DUP variants in each sample (**New Step**)

#### v1

* Created entirely new pipeline - CGAP Structural Variant (SV) Pipeline
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
v1
v2
2 changes: 1 addition & 1 deletion cwl/20_unrelated_SV_filter.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ requirements:

hints:
- class: DockerRequirement
dockerPull: cgap/cnv:v1
dockerPull: ACCOUNT/cnv:VERSION

baseCommand: [python3, /usr/local/bin/20_unrelated_SV_filter.py]

Expand Down
2 changes: 1 addition & 1 deletion cwl/SV_annotation_VCF_cleaner.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ requirements:

hints:
- class: DockerRequirement
dockerPull: cgap/cnv:v1
dockerPull: ACCOUNT/cnv:VERSION

baseCommand: [python3, /usr/local/bin/SV_annotation_VCF_cleaner.py]

Expand Down
45 changes: 45 additions & 0 deletions cwl/SV_cytoband.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/usr/bin/env cwl-runner

cwlVersion: v1.0

class: CommandLineTool

requirements:
- class: InlineJavascriptRequirement

hints:
- class: DockerRequirement
dockerPull: ACCOUNT/cnv:VERSION

baseCommand: [python3, /usr/local/bin/SV_cytoband.py]

inputs:
- id: input
type: File
inputBinding:
prefix: -i
doc: expect the path to the vcf file

- id: outputfile
type: string
default: "output.vcf"
inputBinding:
prefix: -o
doc: name of the output file

- id: cytoband
type: File
inputBinding:
prefix: -c
doc: expect the path to the cytoband reference file

outputs:
- id: output
type: File
outputBinding:
glob: $(inputs.outputfile + ".gz")
secondaryFiles:
- .tbi

doc: |
run SV_cytoband.py to add cytoband annotations for each SV breakpoint
2 changes: 1 addition & 1 deletion cwl/SV_length_filter.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ requirements:

hints:
- class: DockerRequirement
dockerPull: cgap/cnv:v1
dockerPull: ACCOUNT/cnv:VERSION

baseCommand: [python3, /usr/local/bin/SV_length_filter.py]

Expand Down
2 changes: 1 addition & 1 deletion cwl/SV_type_selector.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ requirements:

hints:
- class: DockerRequirement
dockerPull: cgap/cnv:v1
dockerPull: ACCOUNT/cnv:VERSION

baseCommand: [python3, /usr/local/bin/SV_type_selector.py]

Expand Down
2 changes: 1 addition & 1 deletion cwl/combine_sansa_vep.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ requirements:

hints:
- class: DockerRequirement
dockerPull: cgap/cnv:v1
dockerPull: ACCOUNT/cnv:VERSION

baseCommand: [combine_sansa_and_VEP_vcf.py]

Expand Down
43 changes: 43 additions & 0 deletions cwl/granite-SVqcVCF.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#!/usr/bin/env cwl-runner

cwlVersion: v1.0

class: CommandLineTool

requirements:
- class: InlineJavascriptRequirement

hints:
- class: DockerRequirement
dockerPull: ACCOUNT/cnv:VERSION

baseCommand: [granite, SVqcVCF]

inputs:
- id: input_vcf
type: File
inputBinding:
prefix: -i
doc: expect the path to the vcf gz file

- id: outputfile
type: string
default: "output.json"
inputBinding:
prefix: -o
doc: name of the output file

- id: samples
type: string[]
inputBinding:
prefix: --samples
doc: samples to collect metrics for

outputs:
- id: qc_json
type: File
outputBinding:
glob: $(inputs.outputfile)

doc: |
run granite SVqcVCF
2 changes: 1 addition & 1 deletion cwl/granite-blackList_SV.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ requirements:

hints:
- class: DockerRequirement
dockerPull: cgap/cnv:v1
dockerPull: ACCOUNT/cnv:VERSION

baseCommand: [granite, blackList]

Expand Down
2 changes: 1 addition & 1 deletion cwl/granite-geneList_SV.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ requirements:

hints:
- class: DockerRequirement
dockerPull: cgap/cnv:v1
dockerPull: ACCOUNT/cnv:VERSION

baseCommand: [granite, geneList]

Expand Down
2 changes: 1 addition & 1 deletion cwl/granite-whiteList_SV.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ requirements:

hints:
- class: DockerRequirement
dockerPull: cgap/cnv:v1
dockerPull: ACCOUNT/cnv:VERSION

baseCommand: [granite, whiteList]

Expand Down
12 changes: 10 additions & 2 deletions cwl/manta.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,14 @@ inputs:
position: 2
secondaryFiles:
- .fai
callRegions:
type: File
inputBinding:
prefix: -r
separate: false
position: 3
secondaryFiles:
- .tbi
outputs:
result:
type: File
Expand All @@ -33,8 +41,8 @@ outputs:
type: File
outputBinding:
glob: variants.vcf.gz

hints:
- dockerPull: cgap/cgap-manta:v1
- dockerPull: ACCOUNT/manta:VERSION
class: DockerRequirement
class: CommandLineTool
2 changes: 1 addition & 1 deletion cwl/sansa.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ requirements:

hints:
- class: DockerRequirement
dockerPull: cgap/cnv:v1
dockerPull: ACCOUNT/cnv:VERSION

baseCommand: [sansa.sh]

Expand Down
2 changes: 1 addition & 1 deletion cwl/vcf-integrity-check-manta.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ requirements:

hints:
- class: DockerRequirement
dockerPull: cgap/cgap-manta:v1
dockerPull: ACCOUNT/manta:VERSION

baseCommand: [vcf-integrity-check.sh]

Expand Down
2 changes: 1 addition & 1 deletion cwl/vcf-integrity-check.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ requirements:

hints:
- class: DockerRequirement
dockerPull: cgap/cnv:v1
dockerPull: ACCOUNT/cnv:VERSION

baseCommand: [vcf-integrity-check.sh]

Expand Down
2 changes: 1 addition & 1 deletion cwl/vep-annot_SV.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ requirements:

hints:
- class: DockerRequirement
dockerPull: cgap/cnv:v1
dockerPull: ACCOUNT/cnv:VERSION

baseCommand: [vep-annot_SV.sh]

Expand Down
52 changes: 52 additions & 0 deletions cwl/workflow_SV_cytoband_plus_vcf-integrity-check.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
cwlVersion: v1.0

class: Workflow

requirements:
MultipleInputFeatureRequirement: {}

inputs:
- id: input_vcf
type: File
doc: expect the path to the sample vcf gz file

- id: output_vcf
type: string
default: "output.vcf"
doc: base name of output vcf gz file

- id: cytoband
type: File
doc: expect the path to the cytoband reference file

outputs:
cytoband_SV_vcf:
type: File
outputSource: SV_cytoband/output

vcf-check:
type: File
outputSource: integrity-check/output

steps:
SV_cytoband:
run: SV_cytoband.cwl
in:
input:
source: input_vcf
outputfile:
source: output_vcf
cytoband:
source: cytoband
out: [output]

integrity-check:
run: vcf-integrity-check.cwl
in:
input:
source: SV_cytoband/output
out: [output]

doc: |
run SV_cytoband.py to add cytoband annotations for each SV breakpoint |
run an integrity check on the output vcf gz
34 changes: 34 additions & 0 deletions cwl/workflow_granite-SVqcVCF.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
cwlVersion: v1.0

class: Workflow

requirements:
MultipleInputFeatureRequirement: {}

inputs:
- id: input_vcf
type: File
doc: expect the path to the vcf gz file

- id: samples
type: string[]
doc: samples to collect metrics for

outputs:
qc_json:
type: File
outputSource: granite-SVqcVCF/qc_json

steps:
granite-SVqcVCF:
run: granite-SVqcVCF.cwl
in:
input_vcf:
source: input_vcf
samples:
source: samples
out: [qc_json]


doc: |
run granite SVqcVCF
8 changes: 8 additions & 0 deletions cwl/workflow_manta_integrity-check.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,12 @@ inputs:
- .fai
doc: expect the path to the reference fasta file

- id: callRegions
type: File
secondaryFiles:
- .tbi
doc: expect the path to the bed file for callRegions

outputs:
final_zip:
type: File
Expand All @@ -42,6 +48,8 @@ steps:
source: input_bams
ref_fasta:
source: ref_fasta
callRegions:
source: callRegions

out: [result, variants]

Expand Down
Loading

0 comments on commit 58d260d

Please sign in to comment.