You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Exception: Command line bcftools index -t /tmp/tmpIg13Dm.vcf.gz got return code 255.STDOUT: STDERR: [E::hts_idx_push] unsorted positions on sequence #5: 38598232 followed by 38598227index: failed to create index for "/tmp/tmpIg13Dm.vcf.gz"
Error running BCFTOOLS; please check your file for compatibility issues issues using vcfcheck
The pre-processing of the truth VCF (with tons of logging lines as noted in #129), but fails during the processing of the input VCF. The sites in question (chr5:38598232, chr5:38598227) are correctly ordered in the query VCF before processing, and are 2 deletions. It seems like at some point during the pre-processing the sites are becoming disordered, possibly as a result of the ref alleles overlapping. Without including any additional VCF content, the 2 lines as they appear in the sorted, indexed, input VCF:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
chr5 38598227 rs68118711 TTTTTTATTTATTTATTTATTTTTTTTTTTTTTC T
chr5 38598232 rs373607072;rs71963527 TATTTATTTA T
If this is a result of decomposition, could this be resolved by adding in a bcftools sort call prior to the index?
This issue appears to be the same as #109, but a fix was proposed for that issue on a branch that no longer exists, and the issue is closed.
The text was updated successfully, but these errors were encountered:
MattWellie
changed the title
ProProcessing creates un-ordered VCF
PreProcessing creates un-ordered VCF
Aug 15, 2022
I've solved this problem myself by updating the bcftools version within the image
The issue is that the BCFtools 1.4.1 bundled as an external tool here seems to have a dumb BED file filtering process, where each region in the BED file is processed in turn, and variants are appended to the output. Relevant regions from my BED file are:
chr5 38598232 38598244
chr5 38598259 38602497
In this example both variants overlap with the first interval, and the first (sorted by POS) ends in the second interval so it is being added twice: (example salvaged from the temp VCF generated by bcftools.py's preProcessVCF)
chr5 38598227 rs68118711 TTTTTTATTTATTTATTTATTTTTTTTTTTTTTC T ...
chr5 38598232 rs373607072;rs71963527 TATTTATTTA T ...
chr5 38598227 rs68118711 TTTTTTATTTATTTATTTATTTTTTTTTTTTTTC T ...
PLEASE PLEASE PLEASE update the bundled tool binaries in this repository, the SAMtools/BCFtools are 5 years old now
Error message:
This is caused by running the following command
The pre-processing of the truth VCF (with tons of logging lines as noted in #129), but fails during the processing of the input VCF. The sites in question (chr5:38598232, chr5:38598227) are correctly ordered in the query VCF before processing, and are 2 deletions. It seems like at some point during the pre-processing the sites are becoming disordered, possibly as a result of the ref alleles overlapping. Without including any additional VCF content, the 2 lines as they appear in the sorted, indexed, input VCF:
If this is a result of decomposition, could this be resolved by adding in a
bcftools sort
call prior to the index?This issue appears to be the same as #109, but a fix was proposed for that issue on a branch that no longer exists, and the issue is closed.
The text was updated successfully, but these errors were encountered: