-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing mzIdentML is whitespace sensitive #493
Comments
Modification
is whitespace sensitive
Yes, you are indeed correct in that it seems like our mzid parsing is formatting-specific. I guess we never considered that anyone would want to write an mzid file without any formatting, as it makes it near impossible to read for humans. I could look into trying to adapt it, but probably better that I prioritize the pin file import instead?
No, I'm afraid this is not currently supported. It has been talked about, but we concluded that it would require too many changes to the underlying code to be worth the effort. At least with the current limited resources. |
Technically XML is supposed to be whitespace agnostic (except where it isn't), and I would assume that mzIdentML files follow that (given that the PSI Validator accepts unformatted mzid's). I can't imagine too many people prefer to read mzIdentMLs over tsv/csv/etc! Obviously not a pressing issue for me, but figured I would document this in the case of future bugs.
Absolutely - I should be ready very soon. |
"Should" is the keyword there. ;) But yes, this is clearly something that ought to be fixed in our home made mzid parser. The reason for making our own parser was that the available ones, at least at the time, were all too slow and used too much memory. Our parser only reads through the file once and only extracts the stuff we need and ignores everything else. I will try to get the time to look into improving it later. |
Hi,
Since we were discussing integration of Sage (compomics/searchgui#334), I wrote an MzIdentML module to write results, since I wanted to play around with PeptideShaker a bit more.
Unfortunately, it appears that parsing Modification (if not other items) appears to be whitespace dependent. The XML library I am using to write the MzIdentML files (serializing from Rust structs) does not support whitespace/indents at this time...
I have included links to two minimal examples of the same mzid file (that passes the PSI Validator tool), where one is formatted by an external tool and is loaded in PS fine - the other is the unformatted version that throws the below error:
Formatted mzid: https://gist.github.com/lazear/c7bc428bd7e5227d85a7b5745085c346
Unformatted mzid: https://gist.github.com/lazear/7dd0403d2df1c3f7dd2f0d08c91302f8
Notably, changing any Modification entry in the working file to a single line is sufficient to reproduce the issue.
Spectrum file is "b1906_293T_proteinID_01A_QE3_122212.raw" from http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD001468
Error message:
Also, while I'm here... is there a way to completely turn off all of PeptideShaker's filters & validation features? I would love to be able to use it as just a GUI/PSM visualizer that blindly trusts what is in the mzIdentML file - I understand if this doesn't align with the goals of the project though
The text was updated successfully, but these errors were encountered: