Releases: bbuchfink/diamond
Releases · bbuchfink/diamond
DIAMOND v2.1.0
- Added the
cluster
workflow to cluster protein sequences. - Added the
realign
workflow to generate clustering output. - Added the
recluster
workflow to correct errors in clusterings. - Added the
reassign
workflow to reassign cluster members to their closest centroid. - Added the option
-M/--memory-limit
to set a memory limit for clustering workflows. - Added the
--approx-id
option to filter alignments by approximate sequence identity and to set an approximate sequence identity threshold for clustering. - Added the
--member-cover
option to set the coverage threshold of the cluster member sequence. - Added the
--cluster-steps
option to set steps for cascaded clustering. - Added the
--clusters
option to specify clustering input file. - The
blastx
mode will now mask any open reading frame below the minimum required length as specified by--min-orf
. - The
blastx
mode will only count unmasked letters towards the block size. - Fixed a bug that caused an error when using the global ranking mode.
- Added the fast mode as the first round in iterative searches.
- Fixed a bug that caused the program not to function on systems without support for SSE4.1.
- Improved multi-threaded load balancing of gapped extension computations.
- Improved performance of seed extension stage when HSP filter settings are used.
- Added the option
--soft-masking
with possible values0
andtantan
to permit soft-masking using the tantan algorithm. - Fixed a bug that could cause an
inflate error
in multiprocessing mode. - Added the option
--swipe
to compute full Smith Waterman alignments of all queries against all targets. - Added the sensitivity mode
--faster
. - Added the output fields
approx_pident
andcorrected_bitscore
to the tabular format. - Added the
--lin-stage1
option to linearize comparisons in the seeding stage by only considering hits against the longest query sequence for identical seeds (only supported when compiled with-DEXTRA=ON
). - Added the
--kmer-ranking
option to rank sequences when--lin-stage1
is used (only supported when compiled with-DKEEP_TARGET_ID=ON
). - Added the option
--no-block-size-limit
to deactivate upper limits for the block size when the--memory-limit
option is used. - Added the
greedy-vertex-cover
workflow to compute clustering based on alignments. - Added the
--edge-format
option to set edge format for greedy vertex cover. - Added the
--edges
option to set input file for greedy vertex cover. - Added the
--centroid-out
option to output centroid sequences for greedy vertex cover. - Added the
--unaligned-targets
option to generate an output file of unaligned targets. - Fixed an issue that failed compilation using the Intel Compiler.
- Fixed an issue that could cause a segmentation fault in rare cases.
- The
--header
option can now be used with the parametersimple
to enable simple headers for the tabular format, or without a parameter to enable headers for the clustering format. - Added the option
--mp-self
to optimize self-alignment in multiprocessing mode. - Added the option
--query-or-subject-cover
to report alignments if the query or the subject cover (or both) are above the given threshold. - Removed support for the
--comp-based-stats 2
option (now equivalent to--comp-based-stats 3
). - Removed hit culling in case of overlapping target ranges in frameshift alignment mode.
- Added the option
--anchored-swipe
to activate anchored SWIPE extension.
DIAMOND v2.0.15
- Fixed a bug (present since v2.0.12) that caused the
diamond view
workflow to report a zero bit score for all alignments.
DIAMOND v2.0.14
- Fixed a compiler error on Linux systems that do not define
_SC_LEVEL3_CACHE_SIZE
. - Fixed an error when using
--unal 1
with thecigar
output field. - Fixed an
illegal instruction
error on systems that did not support AVX2. - Fixed a bug (present since v2.0.12) that could cause an error or suboptimal alignments when HSP filter settings were used.
DIAMOND v2.0.13
- Fixed a bug that caused invalid bit scores in frameshift alignment mode.
DIAMOND v2.0.12
- Fixed an error when using HSP filter settings together with a BLAST database.
- Optimized the performance of alignment traceback.
- A non-default setting of
--max-hsps
will now recompute a full-matrix Smith Waterman alignment with the ranges of the known HSPs masked in the target. - A non-default setting for
--max-hsps
can now be used together with--ext full
. - The sensitivity levels used for iterated searches can now be manually set by using a space-separated list after the
--iterate
option. - Seeds are masked based on complexity instead of frequency by default.
- Added the option
--seed-cut
to set a complexity cutoff for indexed seeds. - Added the option
--freq-masking
to enable masking seeds based on frequency. - The fast, default, mid-sensitive and sensitive modes will by default softmask a fixed set of highly abundant sequence motifs.
- Added the option
--motif-masking (0,1)
to enable or disable motif masking. - Added the option
--masking seg
to enable SEG masking of target sequences (BLAST default) instead of tantan masking. - Fixed a bug that caused the
full_sseq
output field to contain invalid information or to produce an error when using a BLAST database. - Changed composition based statistics to use BLOSUM62 background frequencies.
- Fixed the zstd dependency in the Dockerfile.
- Added support for gap letters in BLAST databases.
- Fixed a bug that caused the
--custom-matrix
option not to function correctly. - Changed the overlap for merging adjoining bands to >0.0.
- Use more moderate filtering of HSPs in the chaining stage.
DIAMOND v2.0.11
- Fixed a bug that could cause invalid output when using
--masking 0
combined with the global ranking mode. - Enabled lazy repeat masking in the query-indexed and contiguous seed modes when using global ranking.
- Added detection of cache size to auto-enable query-indexed mode.
DIAMOND v2.0.10
- Using BLAST databases now requires a preprocessing step using the command
prepdb
. The command line is:diamond prepdb -d /path/to/database
. This call runs quickly and will write some small auxiliary files into the database directory. - Improved performance of searching small query files.
- Added the "iterative" search mode (option
--iterate
) to search the query dataset with increasing sensitivity, only searching queries at the target sensitivity that do not produce a significant alignment at a lower sensitivity search. For example, using--sensitive --iterate
will first search the query file at default sensitivity, and search all query sequences again in--sensitive
mode that fail to align in the first round. - Added the "global ranking" mode (option
-g
) to set a limit on the number of Smith Waterman extensions per query, with the target sequences ranked by their ungapped extension scores. - Added the
--fast
sensitivity mode that is faster and less sensitive than the default mode. - Reduced the time for loading target sequences from BLAST databases.
- Added the contiguous-seed mode (option
--algo ctg
) to improve performance for small query files. - Added support for using
--comp-based-stats (3,4)
in combination with--ext full
. - Fixed a bug that could cause a
Traceback error
when using--comp-based-stats (3,4)
in rare cases. - Changed the
full_sseq
output field to always contain unmasked sequences. - Fixed an issue that could cause target output order to be nondeterministic in case of identically scoring hits.
- Added support for reading zstd-compressed input files (auto-detected) and writing zstd-compressed output files (option
--compress zstd
) (requires compilation usingcmake -DWITH_ZSTD=ON
). - Compilation with BLAST database support requires the zstd library.
- Added error message when reading protein sequences from FASTA files that only contain DNA letters (can be disabled using
--ignore-warnings
).
DIAMOND v2.0.9
- Reduced the memory use of database building with taxonomy mapping.
- Removed the limitation of sequence accession length.
- Fixed a bug that could cause using a BLAST database not to function correctly.
- Added support for using BLAST alias databases (created by
blastdb_aliastool
). - Reduced the memory use of the seed hit sorting stage.
- Improved the consistency of results when running in query-indexed mode (
--algo 1
). - Added the option
--skip-missing-seqids
to ignore cases of missing sequences
in the database when using the--seqidlist
option. - The
--min-orf
parameter now defaults to 1 in frameshift alignment mode. - Added support for using BLAST databases to the Docker container.
DIAMOND v2.0.8
- Added support for directly using BLAST database files instead of the Diamond-formatted
.dmnd
database files. This feature is not yet available through all release channels. It can currently be accessed by downloading the GitHub release version or by compiling from source. Taxonomy features are not yet supported for BLAST databases. - Added the option
--seqidlist
to filter the database by sequence accession (only supported for BLAST databases). - Fixed a bug that caused the
--dbsize
option not to function correctly. - Added the command
makeidx
and the option--target-indexed
that provide an optimisation specialized for small databases (<10 Mb). (see: https://github.com/bbuchfink/diamond/wiki/5.-Advanced-topics#small-database-optimization) - Added the option
--mp-recover
to recover aborted runs in multiprocessing mode.
DIAMOND v2.0.7
- Added support for computing full-matrix instead of banded Smith Waterman extensions (command line option
--ext full
). - Added support for the new
prot.accession2taxid.FULL.gz
taxonomy mapping file from NCBI. - Added the option
--gapped-filter-evalue
to set the e-value threshold of the gapped filter heuristic. - Added setting the scores of the mask letter according to BLAST rules when a compositionally adjusted matrix is used.
- Changed formatting of e-values to print two decimals instead of one.
- Added the output field
qseq_translated
to print the translation of the aligned part of the query sequence. - Added support for providing two input files to
--query/-q
when running alignment in blastx mode. - Added the output field
full_qseq_mate
to print the sequence of the query's mate (enabled when using two query files in blastx mode). - Fixed a bug that could cause a crash in blastx mode for very long queries.