Skip to content

Installation

DominikS edited this page Sep 16, 2023 · 18 revisions

Table of Content

Download XspecT

First you need to download the XspecT-Package. Please make sure enough disc space is availabe (~1.4 GB).

Installation requirements

XspecT requires the latest 64 bit Python version and a list of Python Modules (see below).

On Linux you need the python-dev package

sudo apt install python3.11-dev

pip install -r requirements.txt

List of used Modules for Python (3.11):

  • Flask
  • Flask-Bcrypt
  • Flask-Login
  • Flask-WTF
  • WTForms
  • Werkzeug
  • bcrypt
  • biopython
  • bitarray
  • mmh3
  • numpy
  • pandas
  • requests
  • scikit-learn
  • Psutil
  • Matplotlib
  • Pympler
  • H5py
  • Bio
  • wheel
  • ncbi-datasets-pylib
  • seaborn
  • pymmh3

How-To-Run: Local-Deployment

Webapp

Run the following command lines in a console, a browser window will open automatically after the application is fully loaded.

MAC/Linux:

$ export FLASK_APP=flaskr $ export FLASK_ENV=development $ python app.py

Windows cmd:

set FLASK_APP=flaskr set FLASK_ENV=development python app.py

How to use the XspecT command line interface:

Open the file XspecT_mini.py with the configuration you want to run it with as arguments.

python XspecT_mini.py Genus XspecT ClAssT Oxa Fastq 100000 Metagenome "path/to/your/input-set"

Important:

  • If you use reads the number of reads needs to specified directly after the file-type
  • the path to your data-set is the last argument
  • all commands are explained in XspecT_mini Commands/.md

Add new genera

It is possible to add new genera to XspecT for the classifications of different bacteria.

Automatic via NCBI

Use the following script to install new Bloom Filters for the desired genus:

python XspecT_trainer.py genus mode
  • genus: The genus name
  • mode: must be 1 for the automatic installation

This will download up to 8 assemblies (4 for Bloom Filter training and 4 for SVM training if available) from the NCBI database and train new Bloom Filters and the SVM automatically.

Using custom data

It's also possible to use your own datasets to train new Bloom Filters. Simply use the following command:

python XspecT_trainer.py genus mode path_to_bf_files path_to_svm_files
  • genus: How your dataset will be named
  • mode: Must be 2 for the custom installation
  • path_to_bf_files: Filepath to your Bloom Filter training data
  • path_to_svm_files: Filepath to your SVM training data

Custom dataset guidelines

Bloom Filter training data:

  • must be genome assemblies in .fna/.fa/.fasta format
  • (ideally) 4 assemblies for each species concatenated into one file
  • Species name must be the filename

SVM training data:

  • must be genome assemblies in .fna/.fa/.fasta format
  • (ideally) 4 assemblies for each species concatenated into one file
  • (ideally) don't use the same assemblies for Bloom Filter training
  • filename must be of the following format: ID_speciesname.fasta/.fa/.fna