Skip to content
Robert J. Gifford edited this page Jun 23, 2024 · 7 revisions

Molecular sequence data are highly information rich, and are now being generated much faster than they can be analysed. Consequently, the immense quantities of genome data accumulating in public databases are largely comprised of DNA sequences that are at best incompletely understood.

Systematic, sequence similarity search-based genome screening is a powerful approach for for exploring this 'dark genome' in silico. This approach extends the basic sequence similarity search by:

  1. Performing multiple searches systematically, involving various query sequences and/or target databases.
  2. Classifying “hits” (matching sequences) via comparison to a reference sequence library curated by the investigator.

Database-integrated genome screening (DIGS) is a form of systematic genome screening in which a sequence similarity search-based screening pipeline is linked to a relational database management system (RDBMS). This provides a robust foundation for implementing large-scale, automated screens, and enables a 'database querying' approach to investigating screening output. In addition, it provides all the benefits of an RDBMS with respect to features such as data recoverability, multi-user support and network access.

The DIGS Tool is a software framework for implementing DIGS on UNIX/LINUX and Macintosh OSX platforms. The program is accessible through a text-based console interface. It uses the BLAST+ program suite to perform similarity search-based screening, and the MySQL RDBMS to capture screen output.