Python based command execution of exome sequencing analysis on the stanford genomics cluster
python exome_file_command.py XXXXX_merged.bam
(this pipeline accepts only BWA aligned bam file)
- SORT the bwa aligned file - tool used is picard - function is SortSam
- Reorder the SORTED bam file using hg19 coordinates - picard function is ReorderSam
- Mark duplicates in the Reordered bam - picard function is MarkDuplicates
- Build the bam index of dedup bam - picard function is BuildBamIndex
- Base Recalibration of the dedup bam file - GATK function is BaseRecalibrator
- Output the calibrated reads - GATK function is PrintReads
- call the genotypes directly or to g.vcf file if many >30 samples - GATK function is HaplotypeCaller
- use scripts VQSR_S1 to VQSR_S4 for variant filtration using GATK bext practices
- Perform Variant evaluation - expected Ti/Tv ratio for whole exome > 2.5
Dependancies required picard-tools/2.14 gatk/3.7 hg19.fasta, Mills_and_1000G_gold_standard.indels.hg19.sites.vcf, db138 resource bundle from GATK best practices pipeline
ONLY TO BE USED ON A SUN GRID ENGINE JOB SUBMISSION CLUSTER with QSUB