-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Popscle demuxlet vs freemuxlet output stability. #41
Comments
Have you compared the VCFs generated from freemuxlet with the original
genotypes? Would it be possible that the data is a lot of ambient mRNAs so
that ambient RNAs were represented as one cluster? freemuxlet tried to
avoid such a case, but it may be imperfect.
Thanks,
Hyun.
-----------------------------------------------------
Hyun Min Kang, Ph.D.
Associate Professor of Biostatistics
University of Michigan, Ann Arbor
Email : hmkang@umich.edu
…On Mon, Feb 15, 2021 at 5:23 PM xmignot ***@***.***> wrote:
Hi,
I'm trying to demultiplex the sequence results of a series of 10x
experiments (both 3 and 5 chemistry). I started by using demuxlet (we
have gwas data available for the samples), but also ran freemuxlet using
1000 genomes VCF filtered as described in the tutorial as a reference. We
additionally have multiseq results (a more involved demultiplexing protocol
that I'm treating as ground truth) on just the 3 data. I'm a little
concerned about the results from freemuxlet, as they appear to map very
noisily to the demuxlet/multiseq sample ids. I built a mapping of consensus
SNG barcodes between each protocol, and while demuxlet maps very cleanly to
the multiseq labels in the 3 data for both the 3 and the 5 data the
freemuxlet clusters are distributed across lots of sample ids.
As an example, here are some rows from each mapping:
[demuxlet to multiseq]
109D12: ['109D12: 0.9238', '61C07: 0.0092', '119A02: 0.0074', '119A04: 0.0067', '113E02: 0.006']
...
[freemuxlet to multiseq]
6: ['61C04: 0.2754', '119A03: 0.2748', '119A02: 0.1642', '61D08: 0.1314', '119B12: 0.0609']
Do you have any advice on how to debug this or insights into what could be
going on? I haven't tried passing the variant gwas positions used in
demuxlet to freemuxlet as a reference, but I imagine this should give more
consistent results. However, I want to be able to use the 1000 genomes
variants as it seems this would be another way to independently validate
the demultiplexed barcodes - additionally I've been advised they are
probably more effective for freemuxlet.
Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#41>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABPY5ONAHU65M3PFVN7UOB3S7GNG5ANCNFSM4XVMVKGA>
.
|
How would you recommend comparing those generated VCF files to our genotype data? If one of the clusters is ambient mRNA wouldn't you expect to see just one cluster mapping very noisily and then all of the others mapping fairly well to particular sample ids? Or maybe a much higher fraction of DBL assignments? |
I would check the genotype concordance on overlapping variants first. It is
a bit tricky to achieve though. It is hard to figure out what the problem
is without knowing the nature of data, populations, the degree of
multiplexing, etc.
Thanks,
Hyun.
-----------------------------------------------------
Hyun Min Kang, Ph.D.
Associate Professor of Biostatistics
University of Michigan, Ann Arbor
Email : hmkang@umich.edu
…On Mon, Feb 15, 2021 at 6:17 PM xmignot ***@***.***> wrote:
How would you recommend comparing those generated VCF files to our
genotype data? If one of the clusters is ambient mRNA wouldn't you expect
to see just one cluster mapping very noisily and then all of the others
mapping fairly well to particular sample ids? Or maybe a much higher
fraction of DBL assignments?
It's possible that this is the case but I'm using the
filtered_feature_matrix 10x output barcodes so there should already be some
degree of QC - I'm wondering if because this noisiness showed up in both
the 5' and 3' data this indicates the problem is more likely to be related
to the reference VCF file?
Thanks for the prompt reply! I appreciate the help -
Xavier
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#41 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABPY5ONFY2Y3WG7TREZTYTTS7GTQBANCNFSM4XVMVKGA>
.
|
Alright, I will start with that! Thanks for the pointers. |
Hi,
I'm trying to demultiplex the sequence results of a series of 10x experiments (both 3' and 5' chemistry). I started by using demuxlet (we have gwas data available for the samples), but also ran freemuxlet using 1000 genomes VCF filtered as described in the tutorial as a reference. We additionally have multiseq results (a more involved demultiplexing protocol that I'm treating as ground truth) on just the 3' data. I'm a little concerned about the results from freemuxlet, as they appear to map very noisily to the demuxlet/multiseq sample ids.
I built a mapping of consensus SNG barcodes between each protocol, and while demuxlet maps very cleanly to the multiseq labels in the 3' data for both the 3' and the 5' data the freemuxlet clusters are distributed across lots of sample ids.
As an example, here are some rows from each mapping:
Do you have any advice on how to debug this or insights into what could be going on? I haven't tried passing the variant gwas positions used in demuxlet to freemuxlet as a reference, but I imagine this should give more consistent results. However, I want to be able to use the 1000 genomes variants as it seems this would be another way to independently validate the demultiplexed barcodes - additionally I've been advised they are probably more effective for freemuxlet.
Thanks!
The text was updated successfully, but these errors were encountered: