You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ariba was running into weird issue while running on vf database:
[E::hts_idx_push] Unsorted positions on sequence # 1: 109 followed by 11
OSError: building of index for /scratch/shadow/tmpr7wt7j_c/ariba_virulencefinder/ariba_virulencefinder/read_store.gz failed
I figured that it was because read_store.gz is incorrectly sorted because one of the genes doesnt have cluster information. I changed read_store.py to sort correctly even with cluster information missing but then it failed in future step:
_init_and_run_clusters reference_names=self.cluster_ids[cluster_name],
KeyError: ''
Obviously, because cluster name was missing. :)
Then I started digging around and made this small test:
5558 all_file
5554 cluster_file //cluster file contains one empty line in the beginning
1d0 //this is the empty line
< //this is the empty line
718a718
> csnA_4_KJ922517
973a974
> eltIIAB_c8_1_AASRQF010000005
4943a4945
> stx2_122_CP022279_122
5082a5085
> stx2b_O128_24196_97_95_AJ567995_95
5157a5161
> stx2h_O102_STEC299_122_CP022279_122
So the issue is because one or more of those 5 genes (in my case stx2h_O102_STEC299_122_CP022279_122) can be found in my sequencing reads but they are not part of any cluster. Whenever read_store is made, they do not contain any cluster name which fails the script.
Hi,
ariba was running into weird issue while running on vf database:
[E::hts_idx_push] Unsorted positions on sequence # 1: 109 followed by 11
OSError: building of index for /scratch/shadow/tmpr7wt7j_c/ariba_virulencefinder/ariba_virulencefinder/read_store.gz failed
I figured that it was because read_store.gz is incorrectly sorted because one of the genes doesnt have cluster information. I changed read_store.py to sort correctly even with cluster information missing but then it failed in future step:
_init_and_run_clusters reference_names=self.cluster_ids[cluster_name],
KeyError: ''
Obviously, because cluster name was missing. :)
Then I started digging around and made this small test:
mkdir vftest
cd vftest
ariba getref virulencefinder out.virulencefinder
ariba prepareref -f out.virulencefinder.fa -m out.virulencefinder.tsv ./test
cd test
cat 02.cdhit.clusters.tsv | awk '{$1="";print}' | tr " " "\n" | sort | uniq > cluster_file
grep ">" 02.cdhit.all.fa | sed 's/>//g' | sort > all_file
wc -l all_file
wc -l cluster_file
diff cluster_file all_file
Output of the last three lines:
So the issue is because one or more of those 5 genes (in my case stx2h_O102_STEC299_122_CP022279_122) can be found in my sequencing reads but they are not part of any cluster. Whenever read_store is made, they do not contain any cluster name which fails the script.
The text was updated successfully, but these errors were encountered: