Filter out indels that are actually STRs, as well as duplicate STRs. #227

hopedisastro · 2024-06-10T02:57:44Z

This script filters raw associaTR outputs that contain both eSTR and eSNP results (i.e. we assume that dataframe_concatenator.py has been run).
Particularly, we remove indels that actually represent STRs, AND remove duplicate eSTRs (retain only one eSTR per duplicate set).
This is necessary to improve the accuracy of fine-mapping.
Indels are considered STRs (and subsequently removed) if:

they overlap with an STR region, specified by an eSTR in the associaTR output; and
the (reverse complement) sequence of the indel is a whole/partial copy of at least one cyclical representation of the STR motif.
e.g. a 'TAT' insertion overlapping with a 'ATT' STR would be considered an STR as 'TAT' is a cyclical representation of 'ATT'.
Similarly, a 'GC' insertion overlapping with a 'CAG' STR would be considered an STR, as 'GC' is a partial copy of 'GCA', a cyclical representation of 'CAG'.
Note that impure indels are conservatively not considered STRs. For example, a 'GCCGCA' insertion overlapping a 'GCC' STR would not be considered an STR.

This script additionally removes duplicate eSTRs (defined by sharing the same coordinates and motif), retaining only one eSTR per duplicate set (chosen based on having the lowest p-value).

str/associatr/fine-mapping/remove_STR_indels.py

silkm

Hi Hope, I've gone over this as best I can and all looks good.

hopedisastro added 9 commits June 10, 2024 12:57

first attempt

d98dec1

click.command

baef3b6

reduce logging statements

da440e4

Update remove_STR_indels.py

3acce44

reduce memory usage

cd00563

Update remove_STR_indels.py

8701e86

trying to stop the warning check

d13734b

update docu

f454d9d

black

f873437

hopedisastro requested a review from MattWellie June 10, 2024 03:29

hopedisastro commented Jun 10, 2024

View reviewed changes

str/associatr/fine-mapping/remove_STR_indels.py Outdated Show resolved Hide resolved

hopedisastro added 2 commits June 10, 2024 14:12

write to analysis

836b927

move the folder into str directory

6448394

hopedisastro requested a review from silkm June 12, 2024 22:51

silkm approved these changes Jun 13, 2024

View reviewed changes

whole copies only

98a3791

hopedisastro merged commit 724d283 into main Jun 14, 2024
3 checks passed

hopedisastro deleted the remove-STR-indels branch June 14, 2024 01:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter out indels that are actually STRs, as well as duplicate STRs. #227

Filter out indels that are actually STRs, as well as duplicate STRs. #227

hopedisastro commented Jun 10, 2024 •

edited

Loading

silkm left a comment

Filter out indels that are actually STRs, as well as duplicate STRs. #227

Filter out indels that are actually STRs, as well as duplicate STRs. #227

Conversation

hopedisastro commented Jun 10, 2024 • edited Loading

silkm left a comment

Choose a reason for hiding this comment

hopedisastro commented Jun 10, 2024 •

edited

Loading