bactofidia

Primary repo and development on https://gitlab.com/aschuerch/bactofidia.git

Basic microbial WGS analysis pipeline

bactofidia is a bacterial assembly and basic analysis pipeline using Snakemake and bioconda. It is currently written for paired-end Illumina data with length 250 or 150. The pipeline is written to ensure reproducibility, and creates virtual software environments with the software versions that are used for analysis.

Dependencies

bactofidia runs under bash and relies on software available from bioconda and a (mini)conda installation. If conda is not present, the script will suggest a miniconda installation.

Usage

Clone this repository with

git clone https://gitlab.com/aschuerch/bactofidia.git bactofidia_[myproject]

where [myproject] is the name of your project.

Copy or symlink your paired-end read sequencing files (Sample1_R1.fastq.gz, Sample1_R2.fastq.gz, Sample2_R1.fastq.gz and Sample2_R2.fastq.gz) to the bactofidia_[myproject] directory. After succesful execution of the pipeline, these files will be removed from this folder. Make sure they have been stored elsewhere.

The first underscore in the sample names is regarded as the delimiter for the sample name. Avoid other underscores in the sample names.

Run the pipeline with

./bactofidia.sh Sample1_R1.fastq.gz Sample1_R2.fastq.gz Sample2_R1.fastq.gz Sample2_R2.fastq.gz

or

./bactofidia.sh ALL

The pipeline takes Illumina paired-end sequencing reads as compressed sequencing files (.fastq.gz) which must be present in the same directory from where the script is run.

The config.yaml or config_miseq.yaml files in the config/ directory can be adjusted for parameters of the different tools.

The different versions of the packages that are run are defined in the envs/ directory.

The first time bactofidia is run, it generates all virtual environments which can take a considerable time depending on the speed of your internet connection. Do not interrupt this process!

De-bugging and testing

For debugging or testing purposes, the pipeline itself can be dry-run with

./dryrun_bactofidia.sh

The pipeline takes compressed sequencing files (.fastq.gz) which must be present in the same directory from where the script is called.

Test the whole pipeline with:

ln -s  test/Test*gz .
./bactofidia.sh Test_R1.fastq.gz Test_R2.fastq.gz

This will run the pipeline on the included test files.

Analysis steps

Currently it runs:

quality check before trimming using fastqc
trimming with trimgalore
assembly with spades
mlst with mlst
resistance gene determination with abricate using the resfinder database
annotation (general, not genus-specific) with prokka
quality assessment of assembly with quast
coverage estimation with bbmap2
summarizing report with multiqc

Running only

./bactofidia.sh

will give an explanation of the (limited) options.

Output

The output can be found in the 'results' directory which contains the following files representing the output of the different tools

results/
├── assembly_graphs
│   ├── ... # contains the assembly graphs of each scaffold in gfa format
│
├── config
│   ├── ... # contains all configuration definitions, e.g. parameter choices
│ 
├── envs
│   ├── ... # contains all environment definitions, e.g. used versions of programs
│ 
├── scaffolds
│   ├── Test.fna   # scaffolds
│   
└── stats
    ├── annotated  # results of PROKKA
    │   ├── Test
    │      ├── Test.err
    │      ├── Test.faa
    │      ├── Test.ffn
    │      ├── Test.fna
    │      ├── Test.fsa
    │      ├── Test.gbk
    │      ├── Test.gff
    │      ├── Test.log
    │      ├── Test.sqn
    │      ├── Test.tbl
    │      ├── Test.tsv
    │      └── Test.txt
    │   
    ├── CoverageStatistics_summary.tsv # Summary of coverage for all sample
    ├── Extra
    │   ├── CoverageStatistics_Test.txt # Coverage details on each sample
    │ 
    ├── MLST.tsv # MLST results in table format
    ├── MultiQC_report_data.zip # Contains data to generate the MultiQC report
    ├── MultiQC_report.html # Check this file first for overview of quality
    └── ResFinder.tsv # Resistance genes per sample in table format

Opening the MultiQC_report.html in a browser is a good starting point to judge quality of data and assembly.

Adjusting command line parameters

Command line parameters for the different tools can be adjusted in the config.yaml or config_miseq.yaml file in the config/ directory, or in the Snakefile.assembly directly. For many cases, the default parameters should be sufficient.

Using different package versions

Package versions can be adjusted in envs/*yaml. The packages are in different files, mainly due to different dependencies such as python 2 / python 3. Please visit bioconda for available packages.

Adding other tools

For further customizing, see snakemake documentation

Trouble shooting

Sometimes the snakemake workflow will give you an error due to an already running instance. In this case, unlock the snakemake instance with

./unlock_bactofidia.sh

The pipeline can be tested with

./dryrun_bactofidia.sh

To re-run parts of the pipeline, try

./rerun_incomplete.sh

Name		Name	Last commit message	Last commit date
Latest commit History 398 Commits
config		config
envs		envs
test		test
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
Snakefile.assembly		Snakefile.assembly
bactofidia.sh		bactofidia.sh
dryrun_bactofidia.sh		dryrun_bactofidia.sh
rerun_incomplete_bactofidia.sh		rerun_incomplete_bactofidia.sh
unlock_bactofidia.sh		unlock_bactofidia.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bactofidia

Basic microbial WGS analysis pipeline

Dependencies

Usage

De-bugging and testing

Analysis steps

Output

Adjusting command line parameters

Using different package versions

Adding other tools

Trouble shooting

About

Releases

Packages

Contributors 2

Languages

License

aschuerch/bactofidia

Folders and files

Latest commit

History

Repository files navigation

bactofidia

Basic microbial WGS analysis pipeline

Dependencies

Usage

De-bugging and testing

Analysis steps

Output

Adjusting command line parameters

Using different package versions

Adding other tools

Trouble shooting

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages