Skip to content

Installation

Karolis Ramanauskas edited this page Aug 4, 2023 · 17 revisions

Installation

System Requirements

Before installing Kakapo, ensure that your system meets the following requirements:

  • Operating System: Linux, macOS, or Windows with Windows Subsystem for Linux installed
  • Python: 3.8 or later
  • RAM: Minimum 16 GB (32 GB or more recommended)
  • Disk Space: Raw read files will be downloaded and processed, while they will be stored in a compressed form, depending on the number of samples in your analysis, you may need hundreds of gigabytes of free disk space.

Kakapo was designed for, and should work on, machines running macOS or Linux, including Windows Subsystem for Linux. If you choose to run Kakapo on Windows Subsystem for Linux, I suggest using the latest Ubuntu distribution available on the Microsoft store.

Kakapo supports Python 3 and will not work with Python 2. Use pip command below to install. In case you have both Python 2 and Python 3 on your system, or if you are not certain, make sure you have a Python 3 version of pip by running the commands below:

pip -V

This should print the version of pip you have and if pip is using Python 3. If the output lists Python version 3.8 or higher, you are set.

pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10)

Otherwise you may try:

pip3 -V

If pip3 command works but pip does not, replace pip with pip3 in the installation steps.

In case none of the above commands work, you may not have pip installed. Follow the steps descibed here to install it. Alternatively, one easy way to install pip is by installing Conda or Miniconda.

Dependencies

Kakapo can download most of the required dependencies on it's own, however you will have to install Java (for Trimmomatic) and Perl (for Rcorrector) before running kakapo. Additionally, gzip is required, but will probably be installed on your system already. I highly recommend installing pigz as well as it will speed up compression steps by utilizing multiple CPU threads.

On an Ubuntu system these dependencies can be installed by running:

sudo apt update
sudo apt install pigz default-jre perl

Installation Steps

  1. Open a terminal on your system.

  2. Install Kakapo directly from GitHub repository by executing the following command:

    pip install --user --upgrade git+https://github.com/karolisr/kakapo

    The same command will also work to upgrade kakapo to the latest version in the future.

  3. To check that kakapo was installed and that it is visible to the system by running:

    kakapo

    You should see kakapo version and other information printed to the screen:

    Kakapo version: 0.9.6
    Python version: 3.10.12
    Operating system: Ubuntu 22.04
    System info: 32 physical and 64 logical cores, 503.71 GB RAM (x86_64)
    
    Configuration file was not provided. Nothing to do.
    
    usage: kakapo --cfg project_configuration_file --ss search_strategies_file
    
    options:
      --cfg path           Path to a kakapo project configuration file.
      --ss path            Path to a kakapo search strategies file.
      --ncpu count         Number of CPUs to use.
      --stop-after-filter  Stop kakapo after Kraken2/Bowtie2 filtering step.
      --force-deps         Force the use of kakapo-installed dependencies,
                           even if they are already available on the system.
      --install-deps       Install kakapo dependencies and quit.
      --dnld-kraken-dbs    Download Kraken2 databases and quit.
      --clean-data-dir     Remove cached NCBI taxonomy data and all software
                           dependencies downloaded by kakapo.
      -v, --version        Print kakapo version.
      -h, --help           Print kakapo help information.
    
  4. Kakapo is now installed, but not quite ready to use. Kakapo can install additional dependencies on it's own. The dependencies will be installed to ${HOME}/.local/share/kakapo/dependencies directory and will not interfere with or overwrite any of the software on your system. If you would like to only install those dependencies that cannot be found on your system, run this command:

    kakapo --install-deps

    For reproducibility, you may want kakapo to install all of the dependencies, including those that you may already have:

    kakapo --force-deps --install-deps

    Note, that the --force-deps option can also be used when running kakapo later to ensure that only the dependencies installed by kakapo are actually used by kakapo. You can rerun kakapo --install-deps command at any point, with and/or without the --force-deps option to see which programs (their versions and paths) kakapo "sees" and will use. For example, on my system I see this output:

    gzip is available: gzip 1.10 /usr/bin/gzip
    pigz is available: pigz 2.6 /usr/bin/pigz
    Seqtk is available: 1.3-r117-dirty /home/karolis/.local/share/kakapo/dependencies/seqtk-master/seqtk
    Trimmomatic is available: 0.39 /home/karolis/.local/share/kakapo/dependencies/Trimmomatic-0.39/trimmomatic-0.39.jar
    fasterq-dump is available: 2.11.3 /home/karolis/.local/share/kakapo/dependencies/sratoolkit.2.11.3-ubuntu64/bin/fasterq-dump
    makeblastdb is available: 2.12.0 /usr/bin/makeblastdb
    blastn is available: 2.12.0 /usr/bin/blastn
    tblastn is available: 2.12.0 /usr/bin/tblastn
    Vsearch is available: 2.21.1 /usr/bin/vsearch
    SPAdes is available: 3.15.4 /home/karolis/.local/share/kakapo/dependencies/SPAdes-3.15.4-Linux/bin/spades.py
    bowtie2 is available: 2.4.4 /usr/bin/bowtie2
    bowtie2-build is available: 2.4.4 /usr/bin/bowtie2-build
    Rcorrector is available: 1.0.5 /home/karolis/.local/share/kakapo/dependencies/Rcorrector-master/run_rcorrector.pl
    kraken2 is available: 2.1.2 /home/karolis/.local/share/kakapo/dependencies/kraken2-master/bin/kraken2
    kraken2-build is available: 2.1.2 /home/karolis/.local/share/kakapo/dependencies/kraken2-master/bin/kraken2-build
    kakapolib is available: /home/karolis/SyncThing/python/kakapo/kakapo/utils/c/lib/kakapolib_linux_x86_64.so
    

    Important: Make sure to check that the version is printed for fasterq-dump. It is part of SRA-Toolkit. If this is the first time any of the SRA-Toolkit programs was executed on your system, it will expect a configuration file to exist. Kakapo tries to create one, but sometimes it can fail. To resolve the issue, if the version is not being printed, copy the parent path listed next to fasterq-dump, append vdb-config --interactive and run the resulting command. On my system, based on the output above I would copy home/karolis/.local/share/kakapo/dependencies/sratoolkit.2.11.3-ubuntu64/bin/ and append vdb-config --interactive:

    /home/karolis/.local/share/kakapo/dependencies/sratoolkit.2.11.3-ubuntu64/bin/vdb-config --interactive

    You should see a blue configuration screen appear. Press Tab then Return to exit.

    Rerun the kakapo --install-deps command and see if the version is being printed now.

  5. Kakapo is now installed and ready to use... hopefully. If you experience any problems installing dependencies, please open an issue on GitHub. So I can investigate the problem.

Clone this wiki locally