Restrict MCMC to sets with at least one observed element #3

alyst · 2014-10-24T17:36:43Z

The PR allows MCMC active/inactive set flag sampling only to those sets that have at least one observed element; the sets without observed elements are fixed in inactive state.

There are 2 advantages:

the parameter space of set states is dramatically reduced (for most of the real data sets); the MCMC sampler can explore the shrunk space much more effectively
previously the parameter p had to be constrained to small values to avoid degenerated solutions (p > 0.5 and many sets without observed elements are active); by disabling a priori unrelated sets we exclude the degenerated solutions, and p constraints could be relaxed

It contains two commits of Pull request #3.

sba1 · 2014-10-25T12:04:20Z

I committed the two fixs of the pull request. For the remaining one, it is a great addition (if I remember correctly, Ontologizer also does something similar). But I'm wondering whether it is possible to implement the removing of the sets in a upper layer (e.g., in the R code) and make this available as an option. It is a change in the (documented) model and thus it should be switchable, which also helps documentation. Having this is the C code is also okay, but it should be switchable by the caller.

alyst · 2014-10-25T12:18:01Z

The "disabled" sets still contribute to the posterior probability via log(1-p)*n_disabled term. I think if we would apply MGSA as-is to the subset of "enabled" sets, the results would be different.
Of course, it's possible to provide just the number of the "disabled" sets to the C-routine, but I've also though about allowing sets weighting, and then the least intrusive way to implement weighting without loosing the performance is to have all the sets supplied into mgsa.c.

The flag to enable/disable this optimization would be easy to implement.

sba1 · 2014-10-25T13:01:19Z

Yes, the results would be different hence I think that it would be useful to switch this on/off. I missed that you still account for them in the score.

Still I think that if you won't allow these sets to become active one imposes an additional constraint to the model (you impose another prior over the active terms hence the marginalized posteriors are different), in particular for fixed alpha/beta/p. The previous model allowed that unrelated terms could be switched on, making the marginalized probability usually lower (but more spread across all terms, not only terms that contains observations) to account for the ambiguity. Excluding them from to be chosen (this is my understanding of your change) will have an impact on this. I haven't really looked into this since a while, may be I'm missing something and thus I can be wrong of course.

I agree however, that users probably don't expect that terms come up that have no annotations. However, if this happens, this could also be an indication that the experiment was not specific enough. So I opt to have a switchable option and also to have an explanation for users.

Regarding weighted sets in general, it really would be great to have such a feature. It has been on my TODO list since a long time.

Hyperparameters were not updated anymore after half the burn-in stage

Consists of 3 commits. Commit information: Commit id: 032e785 Merge branch 'alyst-fixes' It contains two commits of Pull request #3. Committed by: Sebastian Bauer Author Name: Sebastian Bauer Commit date: 2014-10-25 13:58:37 +0200 Author date: 2014-10-25 13:58:37 +0200 Commit id: 4287f56 use log1p() where possible log1p(-x) should be more accurate and faster than log(1-x) Committed by: Alexey Stukalov Author Name: Alexey Stukalov Commit date: 2014-10-24 19:06:07 +0200 Author date: 2014-10-23 15:28:46 +0200 Commit id: ba756a9 fix parameter sampling Committed by: Alexey Stukalov Author Name: Alexey Stukalov Commit date: 2014-10-23 02:09:01 +0200 Author date: 2014-10-23 02:05:37 +0200 git-svn-id: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/mgsa@96027 bc3139a8-67e5-0310-9ffc-ced21a209358

sba1 added a commit that referenced this pull request Oct 25, 2014

Merge branch 'alyst-fixes'

032e785

It contains two commits of Pull request #3.

knokknok referenced this pull request in knokknok/mgsa-bioc Feb 17, 2015

Update mgsa.c

f93fef5

Hyperparameters were not updated anymore after half the burn-in stage

restrict MCMC to sets that have observed elements

da13705

alyst force-pushed the ignore_unaffected_sets branch from a11172f to da13705 Compare January 28, 2016 11:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restrict MCMC to sets with at least one observed element #3

Restrict MCMC to sets with at least one observed element #3

alyst commented Oct 24, 2014

sba1 commented Oct 25, 2014

alyst commented Oct 25, 2014

sba1 commented Oct 25, 2014

Restrict MCMC to sets with at least one observed element #3

Are you sure you want to change the base?

Restrict MCMC to sets with at least one observed element #3

Conversation

alyst commented Oct 24, 2014

sba1 commented Oct 25, 2014

alyst commented Oct 25, 2014

sba1 commented Oct 25, 2014