Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to input paired design pt.2 #5

Open
MingL196 opened this issue Jan 23, 2019 · 2 comments
Open

How to input paired design pt.2 #5

MingL196 opened this issue Jan 23, 2019 · 2 comments

Comments

@MingL196
Copy link

(Note: I also put the question under #3 "How to input paired design", but realized that I couldn't reopen the issue)

Hi,

I am a bit confused about using adjust.var to specify a paired design experiment.

In our experiment, we have 19 case-control pairs. Each case control pair has similar ages. For example:

paired_subject sample group (case/control)
1 1001 1
1 1002 0
2 1003 1
2 1004 0

We would like to set up pairwise comparisons:

subject_1 sample_1 group_1 compare_to subject_2 sample_2 group_2
1 1001 1 <-> 1 1002 0
2 1003 1 <-> 2 1004 0

You previously recommended that I add the pair variable by using the option adjust.var="paired_subject".

In your paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4705678/ , you wrote that:
"In MethylAction, the first stage performs the negative binomial test from DESeq (33) for each pairwise comparison."

However, your first stage function (analysis.r, line 72) doesn't use the variable adjust.var at all, and as such wouldn't have the pair data to perform pairwise comparisons. In fact, adjust.var seems to only be used as a covariate in stage 2 testing.

  1. Is adjust.var using the "paired_subject" variable as covariate or as pairing data?
    I.e. in mixed model terms, is adjust.var specifying a fixed effect or a random effect?

  2. How does Methylaction distinguish between covariate and pairing data?


By the way, should the variable
counts
in lines 508-513 of analysis.r be
counts[rows,,drop=F]
?


@jeffbhasin
Copy link
Owner

Hello,
Thanks for your continued interest in MethylAction. Pairwise comparisons in Stage 1 testing as mentioned in the paper means pairwise tests between each of the groups. If there are only two groups, then there is only one pairwise comparison (A vs B). If there are three groups, then there would be 3 pairwise comparisons of these groups (A vs B, A vs C, and B vs C). And so on if more groups. Each group must have 2 or more samples, or else there is no variance for us to perform a statistical test. From the second table in your message, it looks like the desire is to compare samples in a pairwise fashion, but this cannot be done with MethylAction (there must be n > 1 in each group).

What MethylAction can do is a two-group comparison of the "control" versus "case" group. To increase statistical power, it can leverage the fact that the samples are paired (from same subject but different condition) and this is what we mean by "paired design". We enable this to be done as the DEseq documentation recommended to us at the time - by adding the pairing variable as an adjustment to the GLM.

You are correct that the adjustment variable isn't used in Stage 1 testing. It is only possible to use this during Stage 2 when we use the GLM. The approach of our method is to use Stage 1 for narrowing down of the genome, so we use the more efficient exact test from DEseq, which I believe does not support the covariates because it does not use a GLM. Stage 1 is just used to capture regions, and we are more interested in leveraging the adjustment for Stage 2, after we have done that initial discovery and now are going to use that p-value to make an inference (with respect to error control by the permutation test).

So to answer your question, the program as we wrote it only addresses adjustment in Stage 2, and we believed this to be adequate given how our method works in multiple stages.

The "adjust.var" can be used either for a covariate or for a pairing variable, but not for both or for multiple covariates simultaneously. The underlying statistical test is provided by DEseq, and we simply followed that documentation for guidance on how to handle pairs and adjustments. I would refer to the DEseq paper and documentation for your question about fixed vs random effect, as I do not know the answer. As the MethylAction code's functions are written, it only supports a single covariate or pairing variable, but the underlying GLM could have multiple covariates. There would just need to be a vectorization of the "adjust.var" and some code to pass this through into creating the formula with each variable to the GLM. This improvement we have not implemented, however, since we only had data with a single covariate or pairing variable.

The GLM in stage 2 I believe can take a custom formula, so it should be possible to extend the GLM or do something specific to handle your pairing differently than the covariate way it currently supports via "adjust.var" by modifying our code and the wrapper function's options. We did not expose every possible way of forming the GLM in the interest of simplicity.

Jeff

@jeffbhasin
Copy link
Owner

I also looked at lines 508-513. I believe that "if" statement was to prevent some bugs where a chunk had only a single row. I'm not sure how often it is invoked, and may never be invoked. I only looked by eye, without testing myself, but I do think you are correct that the "counts" should be subset by "rows". As it is written, I'm not sure that the if statement would ever evaluate to be true, so that code may not even be necessary or executed. If you'd like to investigate further, please make a pull request if you propose changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants