Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding DIA results in mzTab 1.0 #182

Open
ypriverol opened this issue May 4, 2020 · 11 comments
Open

encoding DIA results in mzTab 1.0 #182

ypriverol opened this issue May 4, 2020 · 11 comments

Comments

@ypriverol
Copy link
Contributor

@andrewrobertjones :

We have some users that want to export DIA results into mzTab 1.0. The proposal is to use optional columns to highlight the information from the spectral library, including:

  • array intensities
  • array masses
  • array ion annotations

This will be encoded as optional columns in the format (one column for each of the values) and the values will be arrays. I have open an issue in the spectral library format to know how they are planning to encode ion annotations. This representation can be also used for spectral library search.

HUPO-PSI/mzSpecLib#20

@timosachsenberg
Copy link
Contributor

do you have some details and examples? e.g. for the PRT and PEP section?

@ypriverol
Copy link
Contributor Author

These changes are most for the PSM section. The other two sections remains the same.

@ypriverol
Copy link
Contributor Author

@bittremieux can you give your input here. I remember long time ago we had a discussion about how to encode spectral library results into mztab.

@ypriverol
Copy link
Contributor Author

Another option is to add a reference to a spectral library result file that contains this information. Then you will have only one CVparam reference_to_spectrum_library and this will contain the index? or id of the spectrum in the library. We can start by accepting MSP for now but in the future, we can say needs to be mzSpecLib. What do you think @edeutsch @andrewrobertjones ?

@edeutsch
Copy link

edeutsch commented May 4, 2020

I think this is a better option myself. But note that MSP doesn't really have a spectrum id. It only have a spectrum name, which might be quite long and unclear on its uniqueness.

@ypriverol
Copy link
Contributor Author

We will use the file name for now as an index.

@bittremieux
Copy link

When was that? I don't recall the actual discussion, but in general using mzTab for spectral library results shouldn't be too hard. I'm already doing it with the ANN-SoLo output. The main thing is how to refer to the spectra in the library, so in the accession column I'm storing numeric indexes of the library spectra.

@ypriverol
Copy link
Contributor Author

@bittremieux numeric index meaning the index of the spectrum in the file.?

@timosachsenberg
Copy link
Contributor

Would it be more convenient to store the protein accession in the accession column and have another column for the reference to the spec. lib?

@andrewrobertjones
Copy link
Contributor

I would keep everything about the PSM line the same as a sequence database search. The accession column is needed for linking to the protein table, so you can't re-use that. Just add an opt_global CV param, with a reference to the external spectral library ID, using different CV terms if there are different external format ID types to cover

@bittremieux
Copy link

@ypriverol Yes, the index of the spectra in the spectral library.

@timosachsenberg Often there's no protein accession information because spectral libraries are inherently spectrum-based, in contrast to FASTA files which start from the whole protein sequences.

Makes sense to not misuse this column though like Andy says. I'll have to change it in my ANN-SoLo mzTab export.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants