-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up WFA when two sequences differ greatly in length #13
Comments
Is this in the context of global or semi-global alignment (I suppose global given that you're aligning between anchors)? Could you explain the code you added a bit more? It looks quadratic to me but surely that's just my misunderstanding. Is the following correct: |
It is global.
Function
Let q=max(x,o1+e1,o2+e2). Then PS:
Almost correct, except that the "whenever" part – my implementation only calls the function when s%64==0 as most of time this function has no effect. |
probably this has no effect in practice. See also smarco/WFA2-lib#13
Sorry for the late reply. I think I understand the situation you are presenting. In the case of the WFA2-lib, it calls the function wavefront_compute_trim_ends() every time a new wavefront is computed. This should have a similar effect as your
But it clearly isn't. So I must have made an implementation mistake here.
Right. In some extreme corner cases, like this one, the WFA doesn't have the upper-hand.
Although the time measurement is too small to make an assessment, this case invites to make a fine profile of this case and see what is going on. I'm leaving this open until I can address this case. |
When patching gaps between two anchors, we sometimes need to align two sequences of vastly different lengths. Suppose we are aligning a 10bp sequence against a 100kb sequence. The active band [lo,hi] in the original WFA will grow from 1 to 100010 in size. The total number of iterations is about 1000102/2. Based on the running time of WFA2-lib, I guess WFA2-lib has a similar behavior. This is not necessary given that a wavefront can only consist of 10 cells in this example.
Generally, a key observation is that the active band is determined by the cells in the current stripe (the wfalm terminology). Sometimes we can decrease "hi" or increase "lo" when the wavefront hits the end of the query or the target. I added a hack to miniwfa to achieve that. For 10bp vs 100kb, WFA2-lib takes 44 sec while the modified miniwfa takes 0.2 sec. ksw2 only takes 0.01s mainly because WFA doesn't have an advantage on such examples and partly because my hack is inefficient. I think there should be a faster and cleaner solution but I haven't found that.
The text was updated successfully, but these errors were encountered: