Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RecordFormat] more clear statement on PK$ANNOTATION and cleanup of existing records #301

Open
stanstrup opened this issue Mar 24, 2021 · 1 comment

Comments

@stanstrup
Copy link

stanstrup commented Mar 24, 2021

Hi,

I was looking into the proper way to annotate fragments and losses, e.g. "[M+H-NH3]+".
The specifications say "Contributors freely define the record format by using appropriate terms. ", which leads me to expect a list of terms. But that seems not to be there. So are the allowed terms those from the examples?

I looked through the current DB on github and the most common column name seems to be "type". There are a few records with "ion". The only records that use "annotation" as the specs say put in an m/z value....




Isotopes

Then I was looking for isotope annotation and it seems the specs suggest to use the same field for isotopes and fragment/adduct annotation. The example

PK$ANNOTATION: m/z formula annotation exact_mass error(ppm) 
  167.08947 C9H12O2N [M+1]+(13C) 167.08961 0.81
  168.08681 C9H12O2N [M+1]+(13C, 15N) 168.08664 1.04

Some confusing things for me here

  1. For the +2 peak it seems from simulations that the contribution is about 50/50 from (13C, 18O) and (13C, 13C). Very little from (13C, 15N). Does it make sense to specify at all? Wouldn't it make more sense to simply have [M] and [M+1] for the isotope specification?. leading to next question -->
  2. M+1 is confusing here in my opinion. The peaks in the example refer to the [M+H]+ ions for [M] and [M+1] isotopes. Would a format of [M+H]([M]) and [M+H]+([M+1]) make sense? That is more similar to what CAMERA does.
  3. Would it make more sense to have a separate annotation field for isotopes?
@meowcat
Copy link
Contributor

meowcat commented Mar 24, 2021

Note that the HUPO-PSI people have been discussing on peak annotation format for a while:
HUPO-PSI/mzSpecLib#23
https://docs.google.com/document/d/1yEUNG4Ump6vnbMDs4iV4s3XISflmOkRAyqUuutcCG2w

Their current proposition is a NIST-like fomat, and encoded in a regex:

^(?:(?<analyte_reference>[^/\s]+)@)?(?:(?:(?<series>[axbycz]\.?)(?<ordinal>\d+))|(?<series_internal>[m](?<internal_start>\d+):(?<internal_end>\d+))|(?<precursor>p)|(:?I(?<immonium>[ARNDCEQGHKMFPSTWYVIL])(?:\[(?<immonium_modification>(?:[^\]]+))\])?)|(?<reporter>r(?:(?:\[(?<reporter_label>[^\]]+)\])))|(?:f\{(?<formula>[A-Za-z0-9]+)\})|(?:_(?<external_ion>[^\s,/]+)))(?<neutral_losses>(?:[+-]\d*(?:(?:[A-Z][A-Za-z0-9]*)|(?:\[(?:(?:[A-Za-z0-9:\.]+))\])))+)?(?:(?<isotope>[+-]\d*)i)?(?:\^(?<charge>[+-]?\d+))?(?:\[M(?<adducts>(:?[+-]\d*[A-Z][A-Za-z0-9]*)+)\])?(?:/(?<mass_error>[+-]?\d+(?:\.\d+)?)(?<mass_error_unit>ppm)?)?(?:\*(?<confidence>\d*(?:\.\d+)?))?
https://docs.google.com/document/d/1yEUNG4Ump6vnbMDs4iV4s3XISflmOkRAyqUuutcCG2w

A (currently still open) pull request for an annotation parser:
HUPO-PSI/mzSpecLib#28

I had proposed a less "encoded" and more easily machine-readable alternative (see HUPO-PSI/mzSpecLib#23 (comment) ), this was somewhat favorably received, but seems to not have gone any further.

@tsufz tsufz changed the title more clear statement on PK$ANNOTATION and cleanup of existing records [RecordFormat] more clear statement on PK$ANNOTATION and cleanup of existing records Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants