title |
---|
Setup |
::: {.callout-tip}
If you are attending one of our workshops, we will provide a training environment with all of the required software and data.
If you want to setup your own computer to run the analysis demonstrated on this course, you can follow the instructions below.
:::
Note that we use tabsets to provide instructions for all three major operating systems. However, as much as possible we advice you use a Linux system, as our training environment is built on that.
We will perform a fresh installation of the conda package using the miniconda
installation option.
:::{.callout-note}
If you already have Miniconda or Anaconda installed, and you just want to upgrade, you should not proceed to making a fresh installation. Just use conda update
to update your existing version of conda.
conda update conda
After updatiing conda, you can proceed to the instructions from number 8 to install mamba into the base environment from the conda-forge channel.
:::
::::: {.panel-tabset group="os"}
Follow this link to install miniconda and this link to install mamba on your windows system.
Open a terminal and follow the following instructions:
- Navigate to your home directory:
cd ~
- Download the Miniconda3 installer for mac by running:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
:::{.callout-note}
For M1 processor users, you will need to run the below command:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
:::
- Run the installation script just downloaded:
bash Miniconda3-latest-MacOSX-x86_64.sh
- Follow the installation instructions accepting default options (answering 'yes' to any questions)
- If you are unsure about any setting, accept the defaults. You can change them later.
-
To make the changes take effect, close and then re-open your terminal window.
-
Test your installation.
- In your terminal window, run the command
conda list
:
conda list
- A list of installed packages appears if it has been installed correctly.
- Remove the installation script as it is no longer needed if successfully installed:
rm Miniconda3-latest-MacOSX-x86_64.sh
- Run the following command to add channels:
conda config --add channels defaults; conda config --add channels bioconda; conda config --add channels conda-forge; conda config --set channel_priority strict
This adds two channels (sources of software) useful for bioinformatics and data science applications.
- Install Mamba into the base environment from the conda-forge channel with the below command:
conda install mamba -n base -c conda-forge
- Run this to initiate mamba:
mamba init
Open a terminal and follow the following instructions:
- Navigate to your home directory:
cd ~
- Download the Miniconda3 installer for Linux by running:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
- Run the installation script just downloaded:
bash Miniconda3-latest-Linux-x86_64.sh
- Follow the installation instructions accepting default options (answering 'yes' to any questions)
- If you are unsure about any setting, accept the defaults. You can change them later.
-
To make the changes take effect, close and then re-open your terminal window.
-
Test your installation.
- In your terminal window, run the command
conda list
:
conda list
- A list of installed packages appears if it has been installed correctly.
- Remove the installation script as it is no longer needed if successfully installed:
rm Miniconda3-latest-Linux-x86_64.sh
- Run the following command to add channels:
conda config --add channels defaults; conda config --add channels bioconda; conda config --add channels conda-forge; conda config --set channel_priority strict
This adds two channels (sources of software) useful for bioinformatics and data science applications.
- Install Mamba into the base environment from the conda-forge channel with the below command:
conda install mamba -n base -c conda-forge
- Run this to initiate mamba:
mamba init
:::::
Creating conda environments for the workshop
:::::{.panel-tabset group="os"}
Content will soon be uploaded.
:::{.callout}
Open a terminal, make sure you are in the conda base environment and run this command to install all required packages and their dependencies:
mamba create -n qc fastq-scan=1.0.0 fastqc=0.11.9 fastp=0.23.2 kraken2=2.1.2 bracken=2.7 multiqc=1.13a
This creates an environment called qc
with the specified package versions and their dependencies.
NB. The tools fastq-scan and bracken
runs python scripts which require python libraries pandas, json, glob.
Use the below command to install the packages in the qc
environment:
conda install pandas=1.4.3 -n qc -c conda-forge
We will activate and use this environment in chapter 4 --- Sequencing Quality Control. :::
:::{.callout}
run this command to install all required packages and their dependencies:
mamba create -n mapping bwa=0.7.17 samtools=1.15 bcftools=1.14 pysam=0.16.0.1 biopython=1.78
This creates an environment called mapping
with the specified package versions and their dependencies.
We will activate and use this environment in chapter 5 --- Short Read Mapping. :::
:::{.callout}
Installing required packages for Assembly and Annotation
NB. For the Assembly and Annotation module, we will create three different environments because there are conflicts in the conda recipes and it'll be tricky to get all the tools working in a single environment.
We will thus, create each environment seperately with the following names:
mamba create -n shovill -c bioconda shovill=1.1.0
mamba create -n quast -c bioconda quast=5.2.0
mamba create -n bakta -c bioconda bakta=1.6.1
We will activate and use these environments in Chapter 6 --- Assembly and Annotation. :::
:::{.callout}
run this command to install all required packages and their dependencies:
mamba create -n phylogenetics -c bioconda iqtree=2.2.0.3 snp-sites=2.5.1
This creates an environment called phylogenetics
with the specified package versions and their dependencies.
We will activate and use this environment in Chapter 10 --- Introduction to Phylogenetics. :::
:::{.callout}
NB. For the genotyping and AMR prediction, we will create five different environments because some tools require specific versions of python and other related packages hence we cannot install all the packages in a single environment.
We will thus, create each environment seperately with the following names:
- mlst
- seroba
- spoligotyping
- tbprofiler
- ariba
run the following commands to create the specified environment and install all required packages and their dependencies for:
mlst:
mamba create -n mlst mlst=2.22.1
seroba:
mamba create -n seroba seroba=1.0.2
spoligotyping:
mamba create -n spoligotyping spotyping=2.1
tbprofiler:
mamba create -n tbprofiler tb-profiler=4.1.1
ariba:
mamba create -n ariba ariba=2.14.6
These create the specified environment names mlst
, seroba
, spoligotyping
, tbprofiler
and ariba
with the specified package versions and their dependencies.
We will activate and use these environments in chapter 11 --- Bacterial Genotyping and Drug Resistance Prediction. :::
:::{.callout-note}
As you may see, all the tools installed have specified version numbers added to the tool names in the format tool=version_numer
. This allows us to install the exact version of tools used for the training.
For your personal use, if you wish to use the latest version of these tools, just omit specifying the version z-version_number` and the latest version of the tool will hopefully be installed. :::
:::{.callout}
Open a terminal, make sure you are in the conda base environment and run this command to install all required packages and their dependencies:
mamba create -n qc fastq-scan=1.0.0 fastqc=0.11.9 fastp=0.23.2 kraken2=2.1.2 bracken=2.7 multiqc=1.13a
This creates an environment called qc
with the specified package versions and their dependencies.
NB. The tools fastq-scan and bracken
runs python scripts which require python libraries pandas, json, glob.
Use the below command to install the packages in the qc
environment:
conda install pandas -n qc -c conda-forge
We will activate and use this environment in chapter 4 --- Sequencing Quality Control. :::
:::{.callout}
run this command to install all required packages and their dependencies:
mamba create -n mapping bwa=0.7.17 samtools=1.15 bcftools=1.14 pysam=0.16.0.1 biopython=1.78
This creates an environment called mapping
with the specified package versions and their dependencies.
NB. Creating the pseudogenomes step runs python scripts which require some python libraries.
Use the below command to install the packages in the mapping
environment:
conda install pandas -n qc -c conda-forge
We will activate and use this environment in chapter 5 --- Short Read Mapping. :::
:::{.callout}
NB. For the genotyping and AMR prediction, we will create five different environments because some tools require specific versions of python and other related packages hence we cannot install all the packages in a single environment.
We will thus, create each environment seperately with the following names:
- mlst
- seroba
- spoligotyping
- tbprofiler
- ariba
run the following commands to create the specified environment and install all required packages and their dependencies for:
mlst:
mamba create -n mlst mlst=2.22.1
seroba:
mamba create -n seroba seroba=1.0.2
spoligotyping:
mamba create -n spoligotyping spotyping=2.1
tbprofiler:
mamba create -n tbprofiler tb-profiler=4.1.1
ariba:
mamba create -n ariba ariba=2.14.6
These create the specified environment names mlst
, seroba
, spoligotyping
, tbprofiler
and ariba
with the specified package versions and their dependencies.
We will activate and use these environments in chapter 11 --- Bacterial Genotyping and Drug Resistance Prediction.
:::
:::{.callout-note}
As you may see, all the tools installed have specified version numbers added to the tool names in the format tool=version_numer
. This allows us to install the exact version of tools used for the training.
For your personal use, if you wish to use the latest version of these tools, just omit specifying the version z-version_number` and the latest version of the tool will hopefully be installed. :::
:::::
::::: {.panel-tabset group="os"}
:::{.callout}
minikraken2 database
Download the kracken database "minikraken2_v1_8GB" into the database
directory:
wget ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v1_8GB_201904.tgz
Uncompress the database
tar xvfz minikraken2_v1_8GB_201904.tgz
If the unzipped database is not same as the one use in the workshop, rename the it to match the workshop codes used using:
mv <unzipped_database_name> minikraken2_v1_8GB
You can now remove the zipped downloaded database as it is no longer required
rm minikraken2_v1_8GB_201904.tgz
:::
:::{.callout}
Download the Bakta database "db.tar.gz" into the database
directory and unzip.
:::{.callout-note}
If you have the denove_assembly
environment activated, you can perform this step.
bakta_db download --output <output-path>
If you use this option, you don't need to perform the AMRFinderPlus step as the AMR-DB will be included automatically. :::
wget https://bakta-db.s3.computational.bio.uni-giessen.de/db.tar.gz
or
wget https://zenodo.org/record/7025248/files/db.tar.gz
Uncompress the database
tar -xzf db.tar.gz
Rename the database to match the workshop codes used
mv db bakta_db
Delete zipped file after unzipping
rm db.tar.gz
Download the AMRFinderPlus database
amrfinder_update --force_update --database bakta_db/amrfinderplus-db/
Updating an existing bakta database
bakta_db update --db <existing-db-path> [--tmp-dir <tmp-directory>]
:::
:::{.callout}
seroba database
For git users, navigate to your database
directory and clone the git repository:
git clone https://github.com/sanger-pathogens/seroba.git
Copy the database from the seroba/
to your database
directory --- this should be your current directory:
cp -r seroba/database .
Delete the git repository to clean up your system:
rm -r seroba
Still in your database
directory, rename the database to match the workshop codes used:
mv database seroba_db
:::
:::::
::: {.panel-tabset group="os"}
:::
::: {.panel-tabset group="os"}
You can use Singularity from the Windows Subsystem for Linux (see @wsl).
Once you setup WSL, you can follow the instructions for Linux.
Singularity is not available for Mac OS.
These instructions are for Ubuntu or Debian-based distributions1.
sudo apt update && sudo apt upgrade && sudo apt install runc
CODENAME=$(lsb_release -c | sed 's/Codename:\t//')
wget -O singularity.deb https://github.com/sylabs/singularity/releases/download/v3.10.2/singularity-ce_3.10.2-${CODENAME}_amd64.deb
sudo dpkg -i singularity.deb
rm singularity.deb
:::
::: {.panel-tabset group="os"}
- Go to the Visual Studio Code download page and download the installer for your operating system. Double-click the downloaded file to install the software, accepting all the default options.
- After completing the installation, go to your Windows Menu, search for "Visual Studio Code" and launch the application.
- Go to "File > Preferences > Settings", then select "Text Editor > Files" on the drop-down menu on the left. Scroll down to the section named "EOL" and choose "\n" (this will ensure that the files you edit on Windows are compatible with the Linux operating system).
- Go to the Visual Studio Code download page and download the installer for Mac.
- Go to the Downloads folder and double-click the file you just downloaded to extract the application. Drag-and-drop the "Visual Studio Code" file to your "Applications" folder.
- You can now open the installed application to check that it was installed successfully (the first time you launch the application you will get a warning that this is an application downloaded from the internet - you can go ahead and click "Open").
- Go to the Visual Studio Code download page and download the installer for your Linux distribution. Install the package using your system's installer.
:::
::: {.panel-tabset group="os"}
Download and install all these using default options:
Download and install all these using default options:
- Go to the R installation folder and look at the instructions for your distribution.
- Download the RStudio installer for your distribution and install it using your package manager.
:::
Footnotes
-
See the Singularity documentation page for other distributions. ↩