Skip to content
Robert J. Gifford edited this page Jun 24, 2024 · 4 revisions

In database-integrated genome-screening (DIGS), the output of similarity search-based genome 'screens' is captured in a relational database. This facilitates the implementation of automated screens that can be performed on a large scale, and allows for the interrogation and manipulation of output data using structured query language (SQL).

The Database-Integrated Genome-Screening (DIGS) tool aims to provide a robust and extensible framework for systematic, BLAST-based in silico screens of molecular sequence databases and for interrogating the resulting data.

The DIGS tool uses the basic local alignment search tool (BLAST) to perform sequence similarity searches. Two rounds of BLAST are performed. In the first round (forward BLAST), query sequences selected from the reference sequence library (the 'probes') are used to search target databases. In the second round (reverse BLAST), sequence ‘hits’ identified by screening are extracted from genomes and assigned a genotype by BLAST comparison to the reference sequence set.

The reverse BLAST step is included because probe sequences can often cross-match to a wide range of homologous sequences in the initial BLAST screen. For example, consider a gene that has two paralogs, ‘X’ and ‘Y’. Screening with a probe of type X may yield hits to both X and Y. Comparing hits to a library of representative reference sequences in the second BLAST step provides an efficient means for users to discriminate which hits are more like X’s and which are more like Y’s.