Lower than expected number of proteins in spectral library #282
Unanswered
silasmellor
asked this question in
Q&A
Replies: 1 comment 5 replies
-
Hi Silas, Most likely the FASTA file is not being read correctly. I guess it's not in UniProt format? Best, |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, Let me just preface by saying i am still fairly new to untargeted proteomics, so bear with me.
I have tried to use DIA-NN to analyze a dataset of timsTOF diaPASEF data. So far i have tried a few things. I started out trying the option to generate in-silico spectral library from FASTA. The results of this gives me a fairly low number of proteins in the library (around 3300 proteins), whereas the FASTA file contains about 36000 protein sequences.
I next went on to try generating a spectral library from DDA runs (also timsTOF), run on the same samples. This was done using fragpipe, and generated a spectral library of about 10800 proteins. When i use this library in DIA-NN for the DIA files, i again see only a low number of proteins after the program loads the FASTA file.
Am i missing something? Attached is the first part of the log for reference.
Any help much appreciated,
Best,
Silas
diann.exe --f "D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S1_A1_1_754.d
" --f "D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S2_A2_1_755.d
" --f "D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S3_A3_1_756.d
" --f "D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S4_A4_1_757.d
" --f "D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S5_A5_1_758.d
" --f "D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S6_A6_1_759.d
" --f "D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S7_A7_1_760.d
" --f "D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S8_A8_1_761.d
" --f "D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S9_A9_1_762.d
" --f "D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S10_A10_1_763.d
" --f "D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S11_A11_1_764.d
" --f "D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S12_A12_1_765.d
" --f "D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S13_B1_1_766.d
" --f "D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S14_B2_1_767.d
" --f "D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S15_B3_1_768.d
" --lib "D:\Fragpipe\DDA-inflata_spec-lib\library.tsv" --threads 6 --verbose 1 --out "D:\R0270\DIA-NN\220117 Inflata test\report.tsv" --qvalue 0.01 --matrices --temp "D:\R0270\DIA-NN\220117 Inflata test" --reannotate --fasta "D:\R0270\Petunia fasta files\Petunia_inflata_v1.0.1_proteins.fasta" --met-excision --cut K*,R* --missed-cleavages 2 --min-pep-len 7 --max-pep-len 52 --min-pr-mz 400 --max-pr-mz 1201 --min-pr-charge 2 --max-pr-charge 4 --unimod4 --var-mods 5 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,n --monitor-mod UniMod:1 --use-quant --double-search --no-prot-inf --reanalyse --smart-profiling --peak-center
DIA-NN 1.8 (Data-Independent Acquisition by Neural Networks)
Compiled on Jun 28 2021 14:55:31
Current date and time: Mon Jan 17 21:57:15 2022
CPU: GenuineIntel Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
SIMD instructions: AVX AVX2 FMA SSE4.1 SSE4.2
Logical CPU cores: 12
Thread number set to 6
Output will be filtered at 0.01 FDR
Precursor/protein x samples expression level matrices will be saved along with the main report
Library precursors will be reannotated using the FASTA database
N-terminal methionine excision enabled
In silico digest will involve cuts at K,R*
Maximum number of missed cleavages set to 2
Min peptide length set to 7
Max peptide length set to 52
Min precursor m/z set to 400
Max precursor m/z set to 1201
Min precursor charge set to 2
Max precursor charge set to 4
Cysteine carbamidomethylation enabled as a fixed modification
Maximum number of variable modifications set to 5
Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable
Modification UniMod:1 with mass delta 42.0106 at *n will be considered as variable
Existing .quant files will be used
Neural networks will be used for peak selection
Protein inference will not be performed
A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step
When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones
Fixed-width center of each elution peak will be used for quantification
DIA-NN will optimise the mass accuracy automatically using the first run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme.
The following variable modifications will be scored: UniMod:1
WARNING: double-pass mode is incompatible with PTM scoring, turned off
15 files will be processed
[0:00] Loading spectral library D:\Fragpipe\DDA-inflata_spec-lib\library.tsv
[0:06] Finding proteotypic peptides (assuming that the list of UniProt ids provided for each peptide is complete)
[0:06] Spectral library loaded: 10803 protein isoforms, 10803 protein groups and 95435 precursors in 80400 elution groups.
[0:06] Loading FASTA D:\R0270\Petunia fasta files\Petunia_inflata_v1.0.1_proteins.fasta
[22:13] Reannotating library precursors with information from the FASTA database
[22:14] Finding proteotypic peptides (assuming that the list of UniProt ids provided for each peptide is complete)
[22:14] 95435 precursors generated
[22:14] Protein names missing for some isoforms
[22:14] Gene names missing for some isoforms
[22:14] Library contains 2496 proteins, and 2496 genes
[22:14] Initialising library
[22:15] Saving the library to D:\Fragpipe\DDA-inflata_spec-lib\library.tsv.speclib
[22:15] First pass: generating a spectral library from DIA data
[22:15] File #1/15
[22:15] Loading run D:\R0270\rawDIA\20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S1_A1_1_754.d
For most diaPASEF datasets it is better to manually fix both the MS1 and MS2 mass accuracies to 10 ppm.
[25:01] 91284 library precursors are potentially detectable
[25:01] Processing...
[25:28] RT window set to 2.43923
[25:28] Ion mobility window set to 0.04
[25:28] Peak width: 6.268
[25:28] Scan window radius set to 13
[25:28] Recommended MS1 mass accuracy setting: 12.246 ppm
[25:53] Optimised mass accuracy: 15.0884 ppm
[27:01] Removing low confidence identifications
[27:01] Searching PTM decoys
[27:01] Removing interfering precursors
[27:07] Training neural networks: 86868 targets, 88387 decoys
[27:17] Number of IDs at 0.01 FDR: 57527
[27:18] Calculating protein q-values
[27:18] Number of genes identified at 1% FDR: 2217 (precursor-level), 2181 (protein-level) (inference performed using proteotypic peptides only)
[27:18] Quantification
[27:20] Precursors with monitored PTMs at 1% FDR: 272 out of 306
[27:20] Unmodified precursors with monitored PTM sites at 1% FDR: 243 out of 273
[27:23] Quantification information saved to D:\R0270\DIA-NN\220117 Inflata test/D__R0270_rawDIA_20211021_TIMS5_PRInLC1_PRI_P0096_R0270_120min_DIA_S1_A1_1_754_d.quant.
Beta Was this translation helpful? Give feedback.
All reactions