Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up WFA when two sequences differ greatly in length #13

Open
lh3 opened this issue Apr 12, 2022 · 3 comments
Open

Speed up WFA when two sequences differ greatly in length #13

lh3 opened this issue Apr 12, 2022 · 3 comments
Labels
bug Something isn't working discussion Discussion feature Feature request

Comments

@lh3
Copy link

lh3 commented Apr 12, 2022

When patching gaps between two anchors, we sometimes need to align two sequences of vastly different lengths. Suppose we are aligning a 10bp sequence against a 100kb sequence. The active band [lo,hi] in the original WFA will grow from 1 to 100010 in size. The total number of iterations is about 1000102/2. Based on the running time of WFA2-lib, I guess WFA2-lib has a similar behavior. This is not necessary given that a wavefront can only consist of 10 cells in this example.

Generally, a key observation is that the active band is determined by the cells in the current stripe (the wfalm terminology). Sometimes we can decrease "hi" or increase "lo" when the wavefront hits the end of the query or the target. I added a hack to miniwfa to achieve that. For 10bp vs 100kb, WFA2-lib takes 44 sec while the modified miniwfa takes 0.2 sec. ksw2 only takes 0.01s mainly because WFA doesn't have an advantage on such examples and partly because my hack is inefficient. I think there should be a faster and cleaner solution but I haven't found that.

@RagnarGrootKoerkamp
Copy link
Contributor

RagnarGrootKoerkamp commented Apr 12, 2022

Is this in the context of global or semi-global alignment (I suppose global given that you're aligning between anchors)?

Could you explain the code you added a bit more? It looks quadratic to me but surely that's just my misunderstanding. Is the following correct:
WFA keeps diagonals that already reached the end active, and (tries to) update them on every increment of s. Your patch decreases the band whenever this happens, so that only diagonals that haven't reached the end yet are updated.

@lh3
Copy link
Author

lh3 commented Apr 12, 2022

Is this in the context of global or semi-global alignment (I suppose global given that you're aligning between anchors)?

It is global.

Could you explain the code you added a bit more?

Function wf_stripe_shrink() finds the smallest lower bound and the largest upper bound among cells in the current stripe. This function is called every 64 cycles.

It looks quadratic to me

Let q=max(x,o1+e1,o2+e2). Then wf->n = q+1. Perhaps it is safer to swap the order of for (j) and for (d). In that case, the time complexity of calling the function once is O(qw+q) where w is the change of the bandwidth. Most of time, w=0. When w>0, the following iterations will be faster. With the current loop, the time complexity can be O(qw+q2) in the worst case, I think.

PS:

WFA keeps diagonals that already reached the end active, and (tries to) update them on every increment of s. Your patch decreases the band whenever this happens, so that only diagonals that haven't reached the end yet are updated.

Almost correct, except that the "whenever" part – my implementation only calls the function when s%64==0 as most of time this function has no effect.

lh3 added a commit to lh3/miniwfa that referenced this issue Apr 12, 2022
probably this has no effect in practice. See also smarco/WFA2-lib#13
@smarco
Copy link
Owner

smarco commented Apr 17, 2022

Sorry for the late reply.

I think I understand the situation you are presenting. In the case of the WFA2-lib, it calls the function wavefront_compute_trim_ends() every time a new wavefront is computed.

This should have a similar effect as your wf_stripe_shrink() (?)

For 10bp vs 100kb, WFA2-lib takes 44 sec while the modified miniwfa takes 0.2 sec

But it clearly isn't. So I must have made an implementation mistake here.

mainly because WFA doesn't have an advantage on such examples

Right. In some extreme corner cases, like this one, the WFA doesn't have the upper-hand.

the modified miniwfa takes 0.2 sec. ksw2 only takes 0.01s

Although the time measurement is too small to make an assessment, this case invites to make a fine profile of this case and see what is going on.

I'm leaving this open until I can address this case.
Thanks for the feedback.

@smarco smarco added bug Something isn't working feature Feature request discussion Discussion labels Apr 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working discussion Discussion feature Feature request
Projects
None yet
Development

No branches or pull requests

3 participants