VectorizedKmers.jl is a Julia package primarily designed for fast
This data structure can be used to quickly approximate distances between sequences. Notably, the squared Euclidean distance was used to approximate edit distance in this paper. The dot product has also proven to be a useful metric for comparing correlation between sequences.
julia> using VectorizedKmers, BioSequences
julia> kmer_array = count_kmers(dna"AACCGGTT", 2)
KmerArray{4, 2, Int64, Matrix{Int64}} with size (4, 4)
julia> kmer_array |> values
4×4 Matrix{Int64}:
1 0 0 0
1 1 0 0
0 1 1 0
0 0 1 1
julia> kmer_array[dna"AC"]
1
julia> kmer_array[dna"CA"]
0
The main downside of counting