Formatting of fasta file for library-free search/library generation #881
-
Hi, I am running DIA-NN for the first time, So far it worked and I got output. However it seems that the program has difficulties to parse the description lines correctly from the fasta. In some cases there are no description lines, should I introduce an arbitrary? Further the desc lines which are present are split into distinct fractions. All this causes problems in the tsv output. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi Sebastian, It works with UniProt format. For all other formats it will correctly parse sequence IDs, and then you can augment DIA-NN report with descriptions, etc, just by loading the FASTA file in question using some R or Python package and matching sequence IDs reported by DIA-NN to those in the loaded FASTA. Best, |
Beta Was this translation helpful? Give feedback.
-
Hi, thanks for the input. I constructed a fake Uniprot fasta using a PERL skript, it works now in a way that I can use it. Sorry but I would have another question, my Database contains 49.589 entries, but when I run the spectral library generation it would contain only 49.377 proteins, how to explain this discrepancy and how could I check which poteins are "missing"? Best and thanks again |
Beta Was this translation helpful? Give feedback.
Hi Sebastian,
It works with UniProt format. For all other formats it will correctly parse sequence IDs, and then you can augment DIA-NN report with descriptions, etc, just by loading the FASTA file in question using some R or Python package and matching sequence IDs reported by DIA-NN to those in the loaded FASTA.
Best,
Vadim