You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Motif runs in clipped regions of raw ONT reads may contain homopolymer errors where the number of sequential repeated bases is incorrect. Currently teloclip has an option --noPoly which compresses homopolymer runs in both the motif and the read befor looking for matches. However, this is likely to reduce specificity and increase false positive matches.
Proposal: Replace homopolymer compression with a regex based fuzzy search method.
Steps:
Convert input motif to regex where all runs of > 1 base allow for a +/- 1 range.
Search given sequence with pattern
Return count of non-overlapping matches
Add noisy reporting of motif counts per readname in L/R end of contigname
Add Warning for depreciated --noPoly to be removed in future major release.
Should default behaviour be exact match or fuzzy search?
The text was updated successfully, but these errors were encountered:
Motif runs in clipped regions of raw ONT reads may contain homopolymer errors where the number of sequential repeated bases is incorrect. Currently
teloclip
has an option--noPoly
which compresses homopolymer runs in both the motif and the read befor looking for matches. However, this is likely to reduce specificity and increase false positive matches.Proposal: Replace homopolymer compression with a regex based fuzzy search method.
Steps:
--noPoly
to be removed in future major release.Should default behaviour be exact match or fuzzy search?
The text was updated successfully, but these errors were encountered: