Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict MCMC to sets with at least one observed element #3

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alyst
Copy link
Contributor

@alyst alyst commented Oct 24, 2014

The PR allows MCMC active/inactive set flag sampling only to those sets that have at least one observed element; the sets without observed elements are fixed in inactive state.

There are 2 advantages:

  • the parameter space of set states is dramatically reduced (for most of the real data sets); the MCMC sampler can explore the shrunk space much more effectively
  • previously the parameter p had to be constrained to small values to avoid degenerated solutions (p > 0.5 and many sets without observed elements are active); by disabling a priori unrelated sets we exclude the degenerated solutions, and p constraints could be relaxed

sba1 added a commit that referenced this pull request Oct 25, 2014
It contains two commits of Pull request #3.
@sba1
Copy link
Owner

sba1 commented Oct 25, 2014

I committed the two fixs of the pull request. For the remaining one, it is a great addition (if I remember correctly, Ontologizer also does something similar). But I'm wondering whether it is possible to implement the removing of the sets in a upper layer (e.g., in the R code) and make this available as an option. It is a change in the (documented) model and thus it should be switchable, which also helps documentation. Having this is the C code is also okay, but it should be switchable by the caller.

@alyst
Copy link
Contributor Author

alyst commented Oct 25, 2014

The "disabled" sets still contribute to the posterior probability via log(1-p)*n_disabled term. I think if we would apply MGSA as-is to the subset of "enabled" sets, the results would be different.
Of course, it's possible to provide just the number of the "disabled" sets to the C-routine, but I've also though about allowing sets weighting, and then the least intrusive way to implement weighting without loosing the performance is to have all the sets supplied into mgsa.c.

The flag to enable/disable this optimization would be easy to implement.

@sba1
Copy link
Owner

sba1 commented Oct 25, 2014

Yes, the results would be different hence I think that it would be useful to switch this on/off. I missed that you still account for them in the score.

Still I think that if you won't allow these sets to become active one imposes an additional constraint to the model (you impose another prior over the active terms hence the marginalized posteriors are different), in particular for fixed alpha/beta/p. The previous model allowed that unrelated terms could be switched on, making the marginalized probability usually lower (but more spread across all terms, not only terms that contains observations) to account for the ambiguity. Excluding them from to be chosen (this is my understanding of your change) will have an impact on this. I haven't really looked into this since a while, may be I'm missing something and thus I can be wrong of course.

I agree however, that users probably don't expect that terms come up that have no annotations. However, if this happens, this could also be an indication that the experiment was not specific enough. So I opt to have a switchable option and also to have an explanation for users.

Regarding weighted sets in general, it really would be great to have such a feature. It has been on my TODO list since a long time.

knokknok referenced this pull request in knokknok/mgsa-bioc Feb 17, 2015
Hyperparameters were not updated anymore after half the burn-in stage
sba1 added a commit that referenced this pull request Mar 30, 2016
Consists of 3 commits.

Commit information:

Commit id: 032e785

    Merge branch 'alyst-fixes'

    It contains two commits of Pull request #3.
    
Committed by: Sebastian Bauer
Author Name: Sebastian Bauer
Commit date: 2014-10-25 13:58:37 +0200
Author date: 2014-10-25 13:58:37 +0200

Commit id: 4287f56

    use log1p() where possible

    log1p(-x) should be more accurate and faster than log(1-x)
    
Committed by: Alexey Stukalov
Author Name: Alexey Stukalov
Commit date: 2014-10-24 19:06:07 +0200
Author date: 2014-10-23 15:28:46 +0200

Commit id: ba756a9

    fix parameter sampling

Committed by: Alexey Stukalov
Author Name: Alexey Stukalov
Commit date: 2014-10-23 02:09:01 +0200
Author date: 2014-10-23 02:05:37 +0200


git-svn-id: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/mgsa@96027 bc3139a8-67e5-0310-9ffc-ced21a209358
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants