Formatting of fasta file for library-free search/library generation #881

SebastianHoernstein · 2023-12-12T08:19:14Z

SebastianHoernstein
Dec 12, 2023

Hi,

I am running DIA-NN for the first time, So far it worked and I got output. However it seems that the program has difficulties to parse the description lines correctly from the fasta. In some cases there are no description lines, should I introduce an arbitrary? Further the desc lines which are present are split into distinct fractions. All this causes problems in the tsv output.
Is there any recommendation of a format regarding separation of gene identifier and description line?
Thanks in advance.

Answered by vdemichev

Dec 14, 2023

Hi Sebastian,

It works with UniProt format. For all other formats it will correctly parse sequence IDs, and then you can augment DIA-NN report with descriptions, etc, just by loading the FASTA file in question using some R or Python package and matching sequence IDs reported by DIA-NN to those in the loaded FASTA.

Best,
Vadim

View full answer

vdemichev · 2023-12-14T09:33:40Z

vdemichev
Dec 14, 2023
Maintainer

Hi Sebastian,

It works with UniProt format. For all other formats it will correctly parse sequence IDs, and then you can augment DIA-NN report with descriptions, etc, just by loading the FASTA file in question using some R or Python package and matching sequence IDs reported by DIA-NN to those in the loaded FASTA.

Best,
Vadim

0 replies

SebastianHoernstein · 2023-12-14T09:46:53Z

SebastianHoernstein
Dec 14, 2023
Author

Hi,

thanks for the input. I constructed a fake Uniprot fasta using a PERL skript, it works now in a way that I can use it. Sorry but I would have another question, my Database contains 49.589 entries, but when I run the spectral library generation it would contain only 49.377 proteins, how to explain this discrepancy and how could I check which poteins are "missing"?

Best and thanks again

2 replies

vdemichev Dec 14, 2023
Maintainer

Probably no peptides satisfying the digestion requirements originating from some proteins. You can export .predicted.speclib to .tsv to see what's actually being generated (might be a very large .tsv file though)

SebastianHoernstein Dec 14, 2023
Author

ok thanks, will try this.

best

Sebastian

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formatting of fasta file for library-free search/library generation #881

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Formatting of fasta file for library-free search/library generation #881

SebastianHoernstein Dec 12, 2023

Replies: 2 comments · 2 replies

vdemichev Dec 14, 2023 Maintainer

SebastianHoernstein Dec 14, 2023 Author

vdemichev Dec 14, 2023 Maintainer

SebastianHoernstein Dec 14, 2023 Author

SebastianHoernstein
Dec 12, 2023

Replies: 2 comments 2 replies

vdemichev
Dec 14, 2023
Maintainer

SebastianHoernstein
Dec 14, 2023
Author

vdemichev Dec 14, 2023
Maintainer

SebastianHoernstein Dec 14, 2023
Author