1kg_sv_comparison

Scripts utilized to prep data for comparison of calls from the svtools pipeline to calls from the 1000 Genomes Project. This code is specific for the 1000 Genomes Phase 3 Integrated SV Map currently found here: ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/phase3/integrated_sv_map/ALL.wgs.integrated_sv_map_v2.20130502.svs.genotypes.vcf.gz

Dependencies

A Docker container containing all dependencies is available from Docker Hub as halllab/1kg_sv_comparison and a Dockerfile is contained within this repo.

Alternatively, the following are pre-requisites:

curl
Python (2.7)
PyVCF (0.6.8)
pysam (0.8.4)
bgzip (any version)
bcftools (>=1.2)
svtools (>=0.3.1)
perl (>=0.5.10)
make (if you use the Makefile)

1000 Genome Calls Preparation

A Makefile is provide to run these steps. If the dependencies are met then you can run by typing make all.

Download the calls

curl -O ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/phase3/integrated_sv_map/ALL.wgs.integrated_sv_map_v2.20130502.svs.genotypes.vcf.gz

Convert to svtools/lumpy-sv compatible SVTYPE fields.

python 1kg_phase3_to_svtools_compat_svtype.py \
   ALL.wgs.integrated_sv_map_v2.20130502.svs.genotypes.vcf.gz \
   | bgzip -c > ALL.wgs.integrated_sv_map_v2.20130502.svs.genotypes.svtools_types.vcf.gz

Remove MEI insertions (relative to the reference)

These aren't compatible with svtools as they lack an END INFO field and the pipeline doesn't detect these.

bcftools view -e 'SVTYPE=="ALU" || SVTYPE=="LINE1" || SVTYPE=="SVA"' \
   ALL.wgs.integrated_sv_map_v2.20130502.svs.genotypes.svtools_types.vcf.gz - \
   | bgzip -c > ALL.wgs.integrated_sv_map_v2.20130502.svs.genotypes.svtools_types.no_ins_of_mei.vcf.gz

Remove SV calls only detected by read depth based algorithms.

python remove_read_depth_only.py \
   ALL.wgs.integrated_sv_map_v2.20130502.svs.genotypes.svtools_types.no_ins_of_mei.vcf.gz \
   | bgzip -c \
   > ALL.wgs.integrated_sv_map_v2.20130502.svs.genotypes.svtools_types.non_rd_only.no_ins_of_mei.vcf.gz

Comparison of SV VCF to 1000 Genomes

A wrapper script is provided to aid in the comparison of multiple samples.

Inputs

The 1000 Genomes VCF prepared above.
An evaluation VCF containing samples overlapping with the 1000 Genomes cohort.
A 2-column, tab-separated file where the first column is the sample name in the evaluation VCF and the second column is the sample name in the 1000 Genomes VCF. For example:
```
H_IJ-NA19239-NA19239_B9	NA19239
```

Invocation

The script is called like:

bash 1kg_sens_multi.sh \
   ALL.wgs.integrated_sv_map_v2.20130502.svs.genotypes.svtools_types.non_rd_only.no_ins_of_mei.vcf.gz \
   cohort.sv.gt.cn.pruned.reclassed.filtered.nosec.vcf.gz \
   sample_name_in_1kg.map

Results

In the current working directory, a sub-directory for each 1000 Genomes sample is created. Inside each sub-directory will be three files.

A BEDPE of the 1000 Genomes call for the sample.
A BEDPE of the evaluation calls for the sample.

A file of calculated sensitivities which has structure as shown below:

Unfiltered Total Calls	9586
Unfiltered Overlapping 1KG	1641
Unfiltered Fraction Overlapping 1KG	0.171187
Unfiltered 1KG Sensitivity	0.888949
Filtered Total Calls	3916
Filtered Overlapping 1KG	1559
Filtered Fraction Overlapping 1KG	0.39811
Filtered 1KG Sensitivity	0.844529

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
1kg_phase3_to_svtools_compat_svtype.py		1kg_phase3_to_svtools_compat_svtype.py
1kg_sens.sh		1kg_sens.sh
1kg_sens_multi.sh		1kg_sens_multi.sh
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
filter_based_on_strand.py		filter_based_on_strand.py
remove_read_depth_only.py		remove_read_depth_only.py
test_filter_based_on_strand.py		test_filter_based_on_strand.py
test_remove_read_depth_only.py		test_remove_read_depth_only.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1kg_sv_comparison

Dependencies

1000 Genome Calls Preparation

Comparison of SV VCF to 1000 Genomes

Inputs

Invocation

Results

About

Releases

Packages

Languages

License

hall-lab/1kg_sv_comparison

Folders and files

Latest commit

History

Repository files navigation

1kg_sv_comparison

Dependencies

1000 Genome Calls Preparation

Comparison of SV VCF to 1000 Genomes

Inputs

Invocation

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages