Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test the impact and how the parameter num_hits works #344

Open
ypriverol opened this issue Jan 16, 2024 · 4 comments
Open

Test the impact and how the parameter num_hits works #344

ypriverol opened this issue Jan 16, 2024 · 4 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request high-priority

Comments

@ypriverol
Copy link
Member

Description of feature

Would be good to test for multiple datasets the impact of the parameter num_hits. The idea would be seen how this parameter will affect the identification step and the quant results.

@ypriverol ypriverol added the enhancement New feature or request label Jan 16, 2024
@ypriverol ypriverol added documentation Improvements or additions to documentation high-priority release 1.3 labels Jan 16, 2024
@daichengxin
Copy link
Collaborator

LFQ PXD001819 and TMT PXD007683 were tested using different num_hits values (1, 2 and 3).

LFQ results: When num_hits increased, the number of PSMs reported by search engines would increase. But distribution of search engines scores has no obvious change. Target PSMs and decoy PSMs are both significantly increased from Comet and MSGF. But the increasing part are most worse PEP scores. So the final results are not improved when increasing num_hits. Even performance dropped a litte.

image
image
image
image
image

TMT results: showed consistent results with the LFQ.
image
image
image
image
image

@jpfeuffer
Copy link
Collaborator

If you are using multiple hits, you probably want some more sophisticated consensus scoring. E.g. PEPMatrix that takes into account the similarities of the top_hits across SEs and allows some kind of reweighting based on the number of times a sequence "scaffold" was identified across multiple engines.
No guarantees that it gets better though 😁

@jpfeuffer
Copy link
Collaborator

Could also be used during feature linking but we do not have an algorithm for that yet. So no short-term improvements possible there.

@jpfeuffer
Copy link
Collaborator

One thing that I am a bit surprised about is that it gets worse. If we are only taking the best PSM per spectrum, nothing should change by adding second-best hits.
So maybe we are somewhere using more than just the best hit. If you upload a very small experiment, I can check it when I find time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request high-priority
Projects
None yet
Development

No branches or pull requests

3 participants