-
Notifications
You must be signed in to change notification settings - Fork 0
Installation
First you need to download the XspecT-Package. Please make sure enough disc space is availabe (~1.4 GB).
XspecT requires the latest 64 bit Python version and a list of Python Modules (see below).
On Linux you need the python-dev package
sudo apt install python3.11-dev
pip install -r requirements.txt
List of used Modules for Python (3.11):
- Flask
- Flask-Bcrypt
- Flask-Login
- Flask-WTF
- WTForms
- Werkzeug
- bcrypt
- biopython
- bitarray
- mmh3
- numpy
- pandas
- requests
- scikit-learn
- Psutil
- Matplotlib
- Pympler
- H5py
- Bio
- wheel
- ncbi-datasets-pylib
- seaborn
- pymmh3
Run the following command lines in a console, a browser window will open automatically after the application is fully loaded.
$ export FLASK_APP=flaskr
$ export FLASK_ENV=development
$ python app.py
set FLASK_APP=flaskr
set FLASK_ENV=development
python app.py
Open the file XspecT_mini.py with the configuration you want to run it with as arguments.
python XspecT_mini.py Genus XspecT ClAssT Oxa Fastq 100000 Metagenome "path/to/your/input-set"
Important:
- If you use reads the number of reads needs to specified directly after the file-type
- the path to your data-set is the last argument
- all commands are explained in XspecT_mini Commands/.md
It is possible to add new genera to XspecT for the classifications of different bacteria.
Use the following script to install new Bloom Filters for the desired genus:
python XspecT_trainer.py genus mode
- genus: The genus name
- mode: must be 1 for the automatic installation
This will download up to 8 assemblies (4 for Bloom Filter training and 4 for SVM training if available) from the NCBI database and train new Bloom Filters and the SVM automatically.
It's also possible to use your own datasets to train new Bloom Filters. Simply use the following command:
python XspecT_trainer.py genus mode path_to_bf_files path_to_svm_files
- genus: How your dataset will be named
- mode: Must be 2 for the custom installation
- path_to_bf_files: Filepath to your Bloom Filter training data
- path_to_svm_files: Filepath to your SVM training data
Bloom Filter training data:
- must be genome assemblies in .fna/.fa/.fasta format
- (ideally) 4 assemblies for each species concatenated into one file
- Species name must be the filename
SVM training data:
- must be genome assemblies in .fna/.fa/.fasta format
- (ideally) 4 assemblies for each species concatenated into one file
- (ideally) don't use the same assemblies for Bloom Filter training
- filename must be of the following format: ID_speciesname.fasta/.fa/.fna