Skip to content

Releases: bbuchfink/diamond

DIAMOND v2.1.0

25 Jan 21:12
Compare
Choose a tag to compare
  • Added the cluster workflow to cluster protein sequences.
  • Added the realign workflow to generate clustering output.
  • Added the recluster workflow to correct errors in clusterings.
  • Added the reassign workflow to reassign cluster members to their closest centroid.
  • Added the option -M/--memory-limit to set a memory limit for clustering workflows.
  • Added the --approx-id option to filter alignments by approximate sequence identity and to set an approximate sequence identity threshold for clustering.
  • Added the --member-cover option to set the coverage threshold of the cluster member sequence.
  • Added the --cluster-steps option to set steps for cascaded clustering.
  • Added the --clusters option to specify clustering input file.
  • The blastx mode will now mask any open reading frame below the minimum required length as specified by --min-orf.
  • The blastx mode will only count unmasked letters towards the block size.
  • Fixed a bug that caused an error when using the global ranking mode.
  • Added the fast mode as the first round in iterative searches.
  • Fixed a bug that caused the program not to function on systems without support for SSE4.1.
  • Improved multi-threaded load balancing of gapped extension computations.
  • Improved performance of seed extension stage when HSP filter settings are used.
  • Added the option --soft-masking with possible values 0 and tantan to permit soft-masking using the tantan algorithm.
  • Fixed a bug that could cause an inflate error in multiprocessing mode.
  • Added the option --swipe to compute full Smith Waterman alignments of all queries against all targets.
  • Added the sensitivity mode --faster.
  • Added the output fields approx_pident and corrected_bitscore to the tabular format.
  • Added the --lin-stage1 option to linearize comparisons in the seeding stage by only considering hits against the longest query sequence for identical seeds (only supported when compiled with -DEXTRA=ON).
  • Added the --kmer-ranking option to rank sequences when --lin-stage1 is used (only supported when compiled with -DKEEP_TARGET_ID=ON).
  • Added the option --no-block-size-limit to deactivate upper limits for the block size when the --memory-limit option is used.
  • Added the greedy-vertex-cover workflow to compute clustering based on alignments.
  • Added the --edge-format option to set edge format for greedy vertex cover.
  • Added the --edges option to set input file for greedy vertex cover.
  • Added the --centroid-out option to output centroid sequences for greedy vertex cover.
  • Added the --unaligned-targets option to generate an output file of unaligned targets.
  • Fixed an issue that failed compilation using the Intel Compiler.
  • Fixed an issue that could cause a segmentation fault in rare cases.
  • The --header option can now be used with the parameter simple to enable simple headers for the tabular format, or without a parameter to enable headers for the clustering format.
  • Added the option --mp-self to optimize self-alignment in multiprocessing mode.
  • Added the option --query-or-subject-cover to report alignments if the query or the subject cover (or both) are above the given threshold.
  • Removed support for the --comp-based-stats 2 option (now equivalent to --comp-based-stats 3).
  • Removed hit culling in case of overlapping target ranges in frameshift alignment mode.
  • Added the option --anchored-swipe to activate anchored SWIPE extension.

DIAMOND v2.0.15

21 Apr 14:33
Compare
Choose a tag to compare
  • Fixed a bug (present since v2.0.12) that caused the diamond view workflow to report a zero bit score for all alignments.

DIAMOND v2.0.14

13 Jan 11:54
Compare
Choose a tag to compare
  • Fixed a compiler error on Linux systems that do not define _SC_LEVEL3_CACHE_SIZE.
  • Fixed an error when using --unal 1 with the cigar output field.
  • Fixed an illegal instruction error on systems that did not support AVX2.
  • Fixed a bug (present since v2.0.12) that could cause an error or suboptimal alignments when HSP filter settings were used.

DIAMOND v2.0.13

25 Oct 08:50
Compare
Choose a tag to compare
  • Fixed a bug that caused invalid bit scores in frameshift alignment mode.

DIAMOND v2.0.12

06 Oct 12:16
Compare
Choose a tag to compare
  • Fixed an error when using HSP filter settings together with a BLAST database.
  • Optimized the performance of alignment traceback.
  • A non-default setting of --max-hsps will now recompute a full-matrix Smith Waterman alignment with the ranges of the known HSPs masked in the target.
  • A non-default setting for --max-hsps can now be used together with --ext full.
  • The sensitivity levels used for iterated searches can now be manually set by using a space-separated list after the --iterate option.
  • Seeds are masked based on complexity instead of frequency by default.
  • Added the option --seed-cut to set a complexity cutoff for indexed seeds.
  • Added the option --freq-masking to enable masking seeds based on frequency.
  • The fast, default, mid-sensitive and sensitive modes will by default softmask a fixed set of highly abundant sequence motifs.
  • Added the option --motif-masking (0,1) to enable or disable motif masking.
  • Added the option --masking seg to enable SEG masking of target sequences (BLAST default) instead of tantan masking.
  • Fixed a bug that caused the full_sseq output field to contain invalid information or to produce an error when using a BLAST database.
  • Changed composition based statistics to use BLOSUM62 background frequencies.
  • Fixed the zstd dependency in the Dockerfile.
  • Added support for gap letters in BLAST databases.
  • Fixed a bug that caused the --custom-matrix option not to function correctly.
  • Changed the overlap for merging adjoining bands to >0.0.
  • Use more moderate filtering of HSPs in the chaining stage.

DIAMOND v2.0.11

05 Jul 09:12
Compare
Choose a tag to compare
  • Fixed a bug that could cause invalid output when using --masking 0 combined with the global ranking mode.
  • Enabled lazy repeat masking in the query-indexed and contiguous seed modes when using global ranking.
  • Added detection of cache size to auto-enable query-indexed mode.

DIAMOND v2.0.10

30 Jun 13:26
1126b71
Compare
Choose a tag to compare
  • Using BLAST databases now requires a preprocessing step using the command prepdb. The command line is: diamond prepdb -d /path/to/database. This call runs quickly and will write some small auxiliary files into the database directory.
  • Improved performance of searching small query files.
  • Added the "iterative" search mode (option --iterate) to search the query dataset with increasing sensitivity, only searching queries at the target sensitivity that do not produce a significant alignment at a lower sensitivity search. For example, using --sensitive --iterate will first search the query file at default sensitivity, and search all query sequences again in --sensitive mode that fail to align in the first round.
  • Added the "global ranking" mode (option -g) to set a limit on the number of Smith Waterman extensions per query, with the target sequences ranked by their ungapped extension scores.
  • Added the --fast sensitivity mode that is faster and less sensitive than the default mode.
  • Reduced the time for loading target sequences from BLAST databases.
  • Added the contiguous-seed mode (option --algo ctg) to improve performance for small query files.
  • Added support for using --comp-based-stats (3,4) in combination with --ext full.
  • Fixed a bug that could cause a Traceback error when using --comp-based-stats (3,4) in rare cases.
  • Changed the full_sseq output field to always contain unmasked sequences.
  • Fixed an issue that could cause target output order to be nondeterministic in case of identically scoring hits.
  • Added support for reading zstd-compressed input files (auto-detected) and writing zstd-compressed output files (option --compress zstd) (requires compilation using cmake -DWITH_ZSTD=ON).
  • Compilation with BLAST database support requires the zstd library.
  • Added error message when reading protein sequences from FASTA files that only contain DNA letters (can be disabled using --ignore-warnings).

DIAMOND v2.0.9

12 Apr 08:37
cd16d51
Compare
Choose a tag to compare
  • Reduced the memory use of database building with taxonomy mapping.
  • Removed the limitation of sequence accession length.
  • Fixed a bug that could cause using a BLAST database not to function correctly.
  • Added support for using BLAST alias databases (created by blastdb_aliastool).
  • Reduced the memory use of the seed hit sorting stage.
  • Improved the consistency of results when running in query-indexed mode (--algo 1).
  • Added the option --skip-missing-seqids to ignore cases of missing sequences
    in the database when using the --seqidlist option.
  • The --min-orf parameter now defaults to 1 in frameshift alignment mode.
  • Added support for using BLAST databases to the Docker container.

DIAMOND v2.0.8

10 Mar 09:40
Compare
Choose a tag to compare
  • Added support for directly using BLAST database files instead of the Diamond-formatted .dmnd database files. This feature is not yet available through all release channels. It can currently be accessed by downloading the GitHub release version or by compiling from source. Taxonomy features are not yet supported for BLAST databases.
  • Added the option --seqidlist to filter the database by sequence accession (only supported for BLAST databases).
  • Fixed a bug that caused the --dbsize option not to function correctly.
  • Added the command makeidx and the option --target-indexed that provide an optimisation specialized for small databases (<10 Mb). (see: https://github.com/bbuchfink/diamond/wiki/5.-Advanced-topics#small-database-optimization)
  • Added the option --mp-recover to recover aborted runs in multiprocessing mode.

DIAMOND v2.0.7

12 Feb 12:35
Compare
Choose a tag to compare
  • Added support for computing full-matrix instead of banded Smith Waterman extensions (command line option --ext full).
  • Added support for the new prot.accession2taxid.FULL.gz taxonomy mapping file from NCBI.
  • Added the option --gapped-filter-evalue to set the e-value threshold of the gapped filter heuristic.
  • Added setting the scores of the mask letter according to BLAST rules when a compositionally adjusted matrix is used.
  • Changed formatting of e-values to print two decimals instead of one.
  • Added the output field qseq_translated to print the translation of the aligned part of the query sequence.
  • Added support for providing two input files to --query/-q when running alignment in blastx mode.
  • Added the output field full_qseq_mate to print the sequence of the query's mate (enabled when using two query files in blastx mode).
  • Fixed a bug that could cause a crash in blastx mode for very long queries.