Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out indels that are actually STRs, as well as duplicate STRs. #227

Merged
merged 12 commits into from
Jun 14, 2024

Conversation

hopedisastro
Copy link
Contributor

@hopedisastro hopedisastro commented Jun 10, 2024

This script filters raw associaTR outputs that contain both eSTR and eSNP results (i.e. we assume that dataframe_concatenator.py has been run).
Particularly, we remove indels that actually represent STRs, AND remove duplicate eSTRs (retain only one eSTR per duplicate set).
This is necessary to improve the accuracy of fine-mapping.
Indels are considered STRs (and subsequently removed) if:

  • they overlap with an STR region, specified by an eSTR in the associaTR output; and
  • the (reverse complement) sequence of the indel is a whole/partial copy of at least one cyclical representation of the STR motif.
    e.g. a 'TAT' insertion overlapping with a 'ATT' STR would be considered an STR as 'TAT' is a cyclical representation of 'ATT'.
    Similarly, a 'GC' insertion overlapping with a 'CAG' STR would be considered an STR, as 'GC' is a partial copy of 'GCA', a cyclical representation of 'CAG'.
    Note that impure indels are conservatively not considered STRs. For example, a 'GCCGCA' insertion overlapping a 'GCC' STR would not be considered an STR.

This script additionally removes duplicate eSTRs (defined by sharing the same coordinates and motif), retaining only one eSTR per duplicate set (chosen based on having the lowest p-value).

@hopedisastro hopedisastro requested a review from silkm June 12, 2024 22:51
Copy link

@silkm silkm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Hope, I've gone over this as best I can and all looks good.

@hopedisastro hopedisastro merged commit 724d283 into main Jun 14, 2024
3 checks passed
@hopedisastro hopedisastro deleted the remove-STR-indels branch June 14, 2024 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants