Releases: lh3/minimap2
Minimap2-2.18 (r1015)
This release fixes multiple rare bugs in minimap2 and adds additional
functionality to paftools.js.
Changes to minimap2:
-
Bugfix: a rare segfault caused by an off-by-one error (#489)
-
Bugfix: minimap2 segfaulted due to an uninitilized variable (#622 and #625).
-
Bugfix: minimap2 parsed spaces as field separators in BED (#721). This led
to issues when the BED name column contains spaces. -
Bugfix: minimap2
--split-prefix
did not work with long reference names
(#394). -
Bugfix: option
--junc-bonus
didn't work (#513) -
Bugfix: minimap2 didn't return 1 on I/O errors (#532)
-
Bugfix: the
de:f
tag (sequence divergence) could be negative if there were
ambiguous bases -
Bugfix: fixed two undefined behaviors caused by calling memcpy() on
zero-length blocks (#443) -
Bugfix: there were duplicated SAM
@SQ
lines if option--split-prefix
is in
use (#400 and #527) -
Bugfix: option -K had to be smaller than 2 billion (#491). This was caused
by a 32-bit integer overflow. -
Improvement: optionally compile against SIMDe (#597). Minimap2 should work
with IBM POWER CPUs, though this has not been tested. To compile with SIMDe,
please usemake -f Makefile.simde
. -
Improvement: more informative error message for I/O errors (#454) and for
FASTQ parsing errors (#510) -
Improvement: abort given malformatted RG line (#541)
-
Improvement: better formula to estimate the
dv:f
tag (approximate sequence
divergence). See DOI:10.1101/2021.01.15.426881. -
New feature: added the
--mask-len
option to fine control the removal of
redundant hits (#659). The default behavior is unchanged.
Changes to mappy:
-
Bugfix: mappy caused segmentation fault if the reference index is not
present (#413). -
Bugfix: fixed a memory leak via 238b6bb
-
Change: always require Cython to compile the mappy module (#723). Older
mappy packages at PyPI bundled the C source code generated by Cython such
that end users did not need to install Cython to compile mappy. However, as
Python 3.9 is breaking backward compatibility, older mappy does not work
with Python 3.9 anymore. We have to add this Cython dependency as a
workaround.
Changes to paftools.js:
-
Bugfix: the "part10-" line from asmgene was wrong (#581)
-
Improvement: compatibility with GTF files from GenBank (#422)
-
New feature: asmgene also checks missing multi-copy genes
-
New feature: added the misjoin command to evaluate large-scale misjoins and
megabase-long inversions.
Although given the many bug fixes and minor improvements, the core algorithm
stays the same. This version of minimap2 produces nearly identical alignments
to v2.17 except very rare corner cases.
Now unimap is recommended over minimap2 for aligning long contigs against a
reference genome. It often takes less wall-clock time and is much more
sensitive to long insertions and deletions.
(2.18: 9 April 2021, r1015)
Minimap2-2.17 (r941)
Changes since the last release:
-
Fixed flawed CIGARs like
5I6D7I
(#392). -
Bugfix: TLEN should be 0 when either end is unmapped (#373 and #365).
-
Bugfix: mappy is unable to write index (#372).
-
Added option
--junc-bed
to load known gene annotations in the BED12
format. Minimap2 prefers annotated junctions over novel junctions (#197 and
#348). GTF can be converted to BED12 withpaftools.js gff2bed
. -
Added option
--sam-hit-only
to suppress unmapped hits in SAM (#377). -
Added preset
splice:hq
for high-quality CCS or mRNA sequences. It applies
better scoring and improves the sensitivity to small exons. This preset may
introduce false small introns, but the overall accuracy should be higher.
This version produces nearly identical alignments to v2.16, except for CIGARs
affected by the bug mentioned above.
(2.17: 5 May 2019, r941)
Minimap2-2.16 (r922)
This release is 50% faster for mapping ultra-long nanopore reads at comparable
accuracy. For short-read mapping, long-read overlapping and ordinary long-read
mapping, the performance and accuracy remain similar. This speedup is achieved
with a new heuristic to limit the number of chaining iterations (#324). Users
can disable the heuristic by increasing a new option --max-chain-iter
to a
huge number.
Other changes to minimap2:
-
Implemented option
--paf-no-hit
to output unmapped query sequences in PAF.
The strand and reference name columns are both*
at an unmapped line. The
hidden option is available in earlier minimap2 but had a different 2-column
output format instead of PAF. -
Fixed a bug that leads to wrongly calculated
de
tags when ambiguous bases
are involved (#309). This bug only affects v2.15. -
Fixed a bug when parsing command-line option
--splice
(#344). This bug was
introduced in v2.13. -
Fixed two division-by-zero cases (#326). They don't affect final alignments
because the results of the divisions are not used in both case. -
Added an option
-o
to output alignments to a specified file. It is still
recommended to use UNIX pipes for on-the-fly conversion or compression. -
Output a new
rl
tag to give the length of query regions harboring
repetitive seeds.
Changes to paftool.js:
- Added a new option to convert the MD tag to the long form of the cs tag.
Changes to mappy:
- Added the
mappy.Aligner.seq_names
method to return sequence names (#312).
For NA12878 ultra-long reads, this release changes the alignments of <0.1% of
reads in comparison to v2.15. All these reads have highly fragmented alignments
and are likely to be problematic anyway. For shorter or well aligned reads,
this release should produce mostly identical alignments to v2.15.
(2.16: 28 February 2019, r922)
Minimap2-2.15 (r905)
Changes to minimap2:
-
Fixed a rare segmentation fault when option -H is in use (#307). This may
happen when there are very long homopolymers towards the 5'-end of a read. -
Fixed wrong CIGARs when option --eqx is used (#266).
-
Fixed a typo in the base encoding table (#264). This should have no
practical effect. -
Fixed a typo in the example code (#265).
-
Improved the C++ compatibility by removing "register" (#261). However,
minimap2 still can't be compiled in the pedantic C++ mode (#306). -
Output a new "de" tag for gap-compressed sequence divergence.
Changes to paftools.js:
-
Added "asmgene" to evaluate the completeness of an assembly by measuring the
uniquely mapped single-copy genes. This command learns the idea of BUSCO. -
Added "vcfpair" to call a phased VCF from phased whole-genome assemblies. An
earlier version of this script is used to produce the ground truth for the
syndip benchmark [PMID:30013044].
This release produces identical alignment coordinates and CIGARs in comparison
to v2.14. Users are advised to upgrade due to the several bug fixes.
(2.15: 10 Janurary 2019, r905)
Minimap2-2.14 (r883)
Notable changes:
-
Fixed a bug that made minimap2 abort when --eqx was used together with --MD
or --cs (#257). -
Added --cap-sw-mem to cap the size of DP matrices (#259). Base alignment may
take a lot of memory in the splicing mode. This may lead to issues when we
run minimap2 on a cluster with a hard memory limit. The new option avoids
unlimited memory usage at the cost of missing a few long introns. -
Conforming to C99 and C11 when possible (#261).
This release occasionally produces base alignments different from v2.13. The
overall alignment accuracy remains similar.
(2.14: 5 November 2018, r883)
Minimap2-2.13 (r850)
Changes to minimap2:
-
Fixed wrongly formatted SAM when -L is in use (#231 and #233).
-
Fixed an integer overflow in rare cases.
-
Added --hard-mask-level to fine control split alignments (#244).
-
Made --MD work with spliced alignment (#139).
-
Replaced musl's getopt with ketopt for portability.
-
Log peak memory usage on exit.
This release should produce alignments identical to v2.12 and v2.11. Since this release, the bioconda minimap2 recipe has been updated to install k8 and paftools.js along with minimap2.
(2.13: 11 October 2018, r850)
Minimap2-2.12 (r827)
Changes to minimap2:
-
Added option --split-prefix to write proper alignments (correct mapping
quality and clustered query sequences) given a multi-part index (#141 and
#189; mostly by @hasindu2008). -
Fixed a memory leak when option -y is in use.
Changes to mappy:
-
Allow mappy to index a single sequence, to add extra flags and to change the
scoring system.
Minimap2 should produce alignments identical to v2.11.
(2.12: 6 August 2018, r827)
Minimap2-2.11 (r797)
Changes to minimap2:
-
Improved alignment accuracy in low-complexity regions for SV calling. Thank
@armintoepfer for multiple offline examples. -
Added option --eqx to encode sequence match/mismatch with the =/X CIGAR
operators (#156, #157 and #175). -
When compiled with VC++, minimap2 generated wrong alignments due to a
comparison between a signed integer and an unsigned integer (#184). Also
fixed warnings reported by "clang -Wextra". -
Fixed incorrect anchor filtering due to a missing 64- to 32-bit cast.
-
Fixed incorrect mapping quality for inversions (#148).
-
Fixed incorrect alignment involving ambiguous bases (#155).
-
Fixed incorrect presets: option
-r 2000
is intended to be used with
ava-ont, not ava-pb. The bug was introduced in 2.10. -
Fixed a bug when --for-only/--rev-only is used together with --sr or
--heap-sort=yes (#166). -
Fixed option -Y that was not working in the previous releases.
-
Added option --lj-min-ratio to fine control the alignment of long gaps
found by the "long-join" heuristic (#128). -
Exposed
mm_idx_is_idx
,mm_idx_load
andmm_idx_dump
C APIs (#177).
Also fixed a bug when indexing without reference names (this feature is not
exposed to the command line).
Changes to mappy:
Changes to paftools:
-
Don't crash when there is no "cg" tag (#153).
-
Fixed wrong coverage report by "paftools.js call" (#145).
This version may produce slightly different base-level alignment. The overall
alignment statistics should remain similar.
(2.11: 20 June 2018, r797)
Minimap2-2.10 (r761)
Changes to minimap2:
-
Optionally output the MD tag for compatibility with existing tools (#63,
#118 and #137). -
Use SSE compiler flags more precisely to prevent compiling errors on certain
machines (#127). -
Added option --min-occ-floor to set a minimum occurrence threshold. Presets
intended for assembly-to-reference alignment set this option to 100. This
option alleviates issues with regions having high copy numbers (#107). -
Exit with non-zero code on file writing errors (e.g. disk full; #103 and
#132). -
Added option -y to copy FASTA/FASTQ comments in query sequences to the
output (#136). -
Added the asm20 preset for alignments between genomes at 5-10% sequence
divergence. -
Changed the band-width in the ava-ont preset from 500 to 2000. Oxford
Nanopore reads may contain long deletion sequencing errors that break
chaining.
Changes to mappy, the Python binding:
- Fixed a typo in Align.seq() (#126).
Changes to paftools.js, the companion script:
-
Command sam2paf now converts the MD tag to cs.
-
Support VCF output for assembly-to-reference variant calling (#109).
This version should produce identical alignment for read overlapping, RNA-seq
read mapping, and genomic read mapping. We have also added a cook book to show
the variety uses of minimap2 on real datasets. Please see cookbook.md in the
minimap2 source code directory.
(2.10: 27 March 2017, r761)
Minimap2-2.9 (r720)
This release fixed multiple minor bugs.
-
Fixed two bugs that lead to incorrect inversion alignment. Also improved the
sensitivity to small inversions by using double Z-drop cutoff (#112). -
Fixed an issue that may cause the end of a query sequence unmapped (#104).
-
Added a mappy API to retrieve sequences from the index (#126) and to reverse
complement DNA sequences. Fixed a bug where thebest_n
parameter did not
work (#117). -
Avoided segmentation fault given incorrect FASTQ input (#111).
-
Combined all auxiliary javascripts to paftools.js. Fixed several bugs in
these scripts at the same time.
(2.9: 24 February 2018, r720)