Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graphtyper flag may need to be added if working with Nanopore data otherwise empty merged vcf file #70

Open
masudermann opened this issue May 2, 2024 · 4 comments

Comments

@masudermann
Copy link
Contributor

Description of the bug

In discussion with Alex, Upasana, and Fernanda, we realized that we likely need to include the flag
--no_filter_on_proper_pairs when using the graphtyper genotype command with the 150-bp trimmed long reads.

(Thanks to Alex for bringing to our attention these additional parameters under graphtyper genotype --advanced --help).

If you don't use this, we found that if you merge your individual vcf files, using graphtyper vcf_concatenate command, the final merged output file is empty and no variants are called.

Command used and terminal output

No response

Relevant files

No response

System information

No response

@masudermann masudermann changed the title Graphtyper parameter may need to be added if working with Nanopore data otherwise empty merged vcf file Graphtyper flag may need to be added if working with Nanopore data otherwise empty merged vcf file May 2, 2024
@zachary-foster
Copy link
Contributor

I just checked and reproduced this behavior in the pipeline. Without --no_filter_on_proper_pairs there are no variants when only using nanopore samples, but with --no_filter_on_proper_pairs there are variants. Anyone know of a reason this flag should not be included when there are no nanopore samples? Its easy to add it all of the time, but if we need to only add it when nanopore samples are included, then it is a bit more work, but still not bad probably.

@masudermann
Copy link
Contributor Author

That is a good question and something I wondered too.

I'm looking into what happens when I call variants with and without the flag, for a small dataset of short read samples.

@masudermann
Copy link
Contributor Author

masudermann commented May 3, 2024

I did a fast experiment. I had 6 short read p. ramorum samples and I ran graphtyper exactly the same, except for the added flag or not. I then only filtered SNPs as the pipeline does.

When I look at pairwise SNP differences between samples, results are very similar, but not identical.

It seems for each sample pair, there are between 15-30 more SNP differences identified in the graphtyper analysis where the flag is used.

Here is the matrix when I don't include the flag:
Screenshot 2024-05-03 at 12 46 50 PM

Here is the matrix when I do include the flag:
Screenshot 2024-05-03 at 12 53 35 PM

@zachary-foster
Copy link
Contributor

Nice! Those look very similar. If that is representative of most datasets then I think we can just leave this flag always on for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants