Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions regarding lactylation modifications in DIANN version 1.9.1 windows #1248

Open
ZHBHSMILE opened this issue Nov 7, 2024 · 0 comments

Comments

@ZHBHSMILE
Copy link

Hi Vadim,

Thank you, as always, for developing such an excellent tool.

I am currently analyzing lactylation and have encountered an issue in DIANN version 1.9.1:

My Questions

  1. Lactylation modifications are frequently detected only once at the end of Modified.Sequence, as shown in the attached image. Our goal is to exclude sequences where lactylation appears solely at the sequence’s end,because of the position after trypsin cleavage, lactylation modifications should not occur. In the advanced options window of the GUI, is there a command to exclude sequences where lactylation appears only at the end?
    qustion

  2. Expected Count for Lactylation: What is the typical number of Modified.Sequence entries expected to contain lactylation? With 172,893 unique sequences, I am concerned this may be an error. Could you confirm if this quantity is within an expected range?
    The total unique Modified.Sequence count I obtained is 172,893, which seems unusually high. I suspect there may be an error in my commands:

data <- read.delim("2024_11_4_SP_Homo_sapiens_report.pr_matrix.tsv", header = TRUE, check.names = FALSE)
s <- data$Modified.Sequence %>% unique()         # Total unique sequences: 172,893
s1 <- data$Modified.Sequence[str_detect(data$Modified.Sequence, "2114")]  # Sequences with lactylation: 153,072
s2 <- s1[(!str_detect(s1, ".*2114.*2114.*")) & str_detect(s1, ".*2114\\)$")] %>% length()  # Lactylation only at sequence end: 34,482

command in log.tsv:

diann.exe --f E:\project\12F236DP\F236DP\F236DP-N1.raw  --f E:\project\12F236DP\F236DP\F236DP-N2.raw  --f E:\project\12F236DP\F236DP\F236DP-N3.raw  --f E:\project\12F236DP\F236DP\F236DP-N4.raw  --f E:\project\12F236DP\F236DP\F236DP-N5.raw  --f E:\project\12F236DP\F236DP\F236DP-N6.raw  --f E:\project\12F236DP\F236DP\F236DP-N7.raw  --f E:\project\12F236DP\F236DP\F236DP-N8.raw  --f E:\project\12F236DP\F236DP\F236DP-N9.raw  --f E:\project\12F236DP\F236DP\F236DP-N10.raw  --f E:\project\12F236DP\F236DP\F236DP-N11.raw  --f E:\project\12F236DP\F236DP\F236DP-N12.raw  --f E:\project\12F236DP\F236DP\F236DP-N13.raw  --f E:\project\12F236DP\F236DP\F236DP-N14.raw  --f E:\project\12F236DP\F236DP\F236DP-N15.raw  --f E:\project\12F236DP\F236DP\F236DP-T1.raw  --f E:\project\12F236DP\F236DP\F236DP-T2.raw  --f E:\project\12F236DP\F236DP\F236DP-T3.raw  --f E:\project\12F236DP\F236DP\F236DP-T4.raw  --f E:\project\12F236DP\F236DP\F236DP-T5.raw  --f E:\project\12F236DP\F236DP\F236DP-T6.raw  --f E:\project\12F236DP\F236DP\F236DP-T7.raw  --f E:\project\12F236DP\F236DP\F236DP-T8.raw  --f E:\project\12F236DP\F236DP\F236DP-T9.raw  --f E:\project\12F236DP\F236DP\F236DP-T10.raw  --f E:\project\12F236DP\F236DP\F236DP-T11.raw  --f E:\project\12F236DP\F236DP\F236DP-T12.raw  --f E:\project\12F236DP\F236DP\F236DP-T13.raw  --f E:\project\12F236DP\F236DP\F236DP-T14.raw  --f E:\project\12F236DP\F236DP\F236DP-T15.raw  --lib  --threads 30 --verbose 1 --out E:\project\12F236DP\F236DP_search2\2024_11_4_SP_Homo_sapiens_report.tsv --qvalue 0.01 --matrices --out-lib E:\project\12F236DP\F236DP_search2\SP_Homo_sapiens_report-lib.parquet --gen-spec-lib --predictor --fasta E:\project\12F236DP\F236DP\20220122SP_Homo_sapiens.fasta --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --met-excision --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 1 --max-pr-charge 4 --cut K*,R* --missed-cleavages 2 --unimod4 --var-mods 1 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,*n --use-quant --peptidoforms --reanalyse --relaxed-prot-inf --rt-profiling --var-mod UniMod:2114,72.021129,K  --var-mods 4 
Could you provide guidance on how to configure these options to achieve the correct results?
  1. Manual Deletion of Modified.Sequence Entries: If the GUI does not support this exclusion, would manually deleting Modified.Sequence entries with lactylation at the end help ensure accuracy in the pr.matrix.tsv file?

  2. Impact on pg.matrix.tsv: If we manually remove these sequences from pr.matrix.tsv, will this affect the pg.matrix.tsv file? Is it necessary to preserve the raw pg.matrix.tsv without any modifications?

Thank you very much for your guidance.

Best wishes,
zplv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant