Skip to content

simonrharris/SKA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SKA

License: MIT install with bioconda

Contents

Introduction

SKA (Split Kmer Analysis) is a toolkit for prokaryotic (and any other small, haploid) DNA sequence analysis using split kmers. A split kmer is a pair of kmers in a DNA sequence that are separated by a single base. Split kmers allow rapid comparison and alignment of small genomes, and is particulalry suited for surveillance or outbreak investigation. SKA can produce split kmer files from fasta format assemblies or directly from fastq format read sequences, cluster them, align them with or without a reference sequence and provide various comparison and summary statistics. Currently all testing has been carried out on high-quality Illumina read data, so results for other platforms may vary.

Installation

Installation using Conda

conda install -c bioconda ska

Many thanks to John Lees for creating this Conda recipe!

Installation using Homebrew

brew install brewsci/bio/ska

Many thanks to Torsten Seemann for creating this Brew formula!

Installation from source

SKA can be installed by cloning this repository and running make

git clone https://github.com/simonrharris/SKA

Or by Downloading and unpacking the latest release.

Then simply navigate into the SKA directory and run make

cd SKA
make

The executable will be compiled into a directory named bin. You can either add this bin directory to your path or move the executable into a path directory.

sudo make install

will move the executable to /usr/local/bin.

Requirements

SKA simply requires GNU make and a version of g++ which supports C++11.

Usage

ska <subcommand>

Subcommands:
align		Reference-free alignment from a set of split kmer files
alleles		Create a merged split kmer file for all sequenes in one or
		more multifasta files
annotate	Locate/annotate split kmers in a reference fasta/gff file
compare		Print comparison statistics for a query split kmer file
		against a set of subject split kmer files
distance	Pairwise distance calculation and clustering from split kmer
		files
fasta		Create a split kmer file from fasta file(s)
fastq		Create a split kmer file from fastq file(s)
humanise	Print kmers from a split kmer file in human readable format
info		Print some information about one or more kmer files
map		Align split kmer file(s) against a reference fasta file
merge		Merge split kmer file(s) into one file
summary		Print split kmer file summary statistics
type		Type split kmer files using a set of allele files
unique		Output kmers unique to a set of split kmer files
version		Print the version and citation for ska
weed		Weed kmers from a split kmer file

Please read the SKA wiki page for full usage instructions.

License

SKA is free software, licensed under MIT.

Citation

SKA is currently only available as a preprint, so for now, if you use it, please cite: Harris SR. 2018. SKA: Split Kmer Analysis Toolkit for Bacterial Genomic Epidemiology. bioRxiv 453142 doi: https://doi.org/10.1101/453142