Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best FORMAT field for filtering SVTyper Output #95

Open
andrewSharo opened this issue Sep 7, 2018 · 2 comments
Open

Best FORMAT field for filtering SVTyper Output #95

andrewSharo opened this issue Sep 7, 2018 · 2 comments

Comments

@andrewSharo
Copy link

Hi Dave,
I'm running SVTyper on output from Lumpy for hg38 whole genome samples with 50x depth, interested in finding Deletions and other SVs with a low false-positive rate. The purpose is for personal genome diagnostics. Output from SVTyper gives about 10,000 SVs that are 0/1 or 1/1 genotype. Which fields do you recommend for further filtering? SVTyper gives a number of helpful values (RO, AO, QR, QA, AB, etc) but I'm curious which is recommended for filtering. I would like to find SVs that are supported by coverage changes.
Best, Andrew

@ernfrid
Copy link
Contributor

ernfrid commented Sep 10, 2018

My experience is generally on large cohorts as opposed to single samples. Perhaps @brentp would have some advice on what the Quinlan lab is doing for individuals or small cohorts. One recommendation I can definitely make is to use the smoove wrapper for Lumpy and svtyper. It reduces the false positive rate significantly by performing stringent filtering on the inputs to Lumpy.

For large cohorts, we typically filter on mean sample quality (SQ). I would think similarly filtering on SQ for an individual would prove fruitful (I'd think a cutoff of ~100 might be a good place to start, but you'll likely need to tune a bit). Additionally, since you're primarily interested in coverage changes, I would annotate your candidate SV with the copynumber of the call region. We've typically been doing this by running cnvnator and then annotating the copynumber using the svtools copynumber command. See the copynumber annotation section of the svtools tutorial. Brent also recently released a new tool called duphold that I haven't tried out, but may be useful to you.

@brentp
Copy link
Contributor

brentp commented Sep 10, 2018

thanks for mentioning smoove and duphold @ernfrid

smoove has an annotate sub-command that will help you prioritize variants. It adds an SHQ (smoove het-quality) to the FORMAT fields and MSHQ (mean ...) to the INFO. You can see more about this in the README.

In the next month there should be more improvements to smoove that reduce false positives by incorporating duphold and a few other tricks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants