Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Align and Extend #16

Open
Adamtaranto opened this issue Nov 1, 2023 · 5 comments
Open

Feature: Align and Extend #16

Adamtaranto opened this issue Nov 1, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@Adamtaranto
Copy link
Owner

Existing modules:

  • Base Teloclip extracts any reads that are soft-clipped at contig ends (optionally checking for telomeric motifs)
  • Teloclip-extract: bins reads from teloclip into output files per contig end (left / right for each contig)

New module:

  • Teloclip-extend: Extend a contig with the soft-clipped overhang of a single aligned read or contig.

Tasks:

  • Write function that takes a single aligned read and extends a contig with the overhanging (soft-clipped) segment of the alignment.
  • Handle cases where alignment is clipped at both ends of contig
  • Update argparser to use submodule keywords [filter, extract, extend]
@Adamtaranto Adamtaranto added the enhancement New feature or request label Nov 1, 2023
@Adamtaranto Adamtaranto self-assigned this Nov 1, 2023
@Adamtaranto
Copy link
Owner Author

Proposed modules names:

teloclip filter
teloclip extract
teloclip extend

@Adamtaranto
Copy link
Owner Author

#10 requests automation of the contig extension process. I think it is generally unwise to blindly accept overhang alignments are "real" without inspecting them first.

Need to balance convenience vs enabling errors.

Could provide tutorial on manual curation: Select the best overhang-read (i.e. confident anchor region, unique alignment, many reads agree) and then extend contig with teloclip extend.

Alternatively, could provide an extend-now-ask-questions-later option whereby we extend contigs using the longest available overhang and then suggest validation checks i.e. align all overhang-reads back to the extended contig and look for agreement between reads.

@Adamtaranto
Copy link
Owner Author

Add output option for extract to yield MAF or MSA of overhang-reads that can be viewed in terminal or externally.

@Adamtaranto
Copy link
Owner Author

Note: Log total bases extended and bases per contig end. Useful for reporting results.

@Adamtaranto Adamtaranto added this to the v0.1.0 Release milestone Nov 12, 2023
@Adamtaranto
Copy link
Owner Author

Option: Output BED file coords for extended sequence segments for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant