Augmented Thresholds for MONI

See PHONI, which this method is a variant of: https://github.com/koeppl/phoni

Modified version of one-pass MONI using runs compressed BWT index to generate matching statistics for a pattern. Uses threshold LCE trade-off as described in Augmented Thresholds for MONI. Cite this paper if you use this tool

Requires the same dependencies as PHONI to run all scripts/benchmarks: we borrow their description below modified for differences in running this version.

Preparations

We require the pattern and the text to be available in form of sequences stored in the .fa (FASTA) format. To use our solution, you need to have recent cmake, g++, zsh, and python 3 installed.

We need the following python 3 packages for extracting and concatenating .fa files:

	pip3 install biopython
	pip3 install fastaparser
	pip3 install psutil

git clone --branch phoni https://github.com/drnatebrown/aug_phoni

Compile

mkdir build
cd build; cmake ..
make

Building the index

To build the index we use the command aug build from the build directory. Alternatively, replace aug with phoni to build the version without augmented thresholds.

aug build \
-r <filename of the reference> \
-t <number of threads> \
-g <grammar format> \
-f <input file is a fasta file> \

For example, to build the aug-phoni index for the file yeast.fa using 4 threads and the plain grammar we run from the build folder:

python3 aug build -r ../data/yeast.fa -f -t 4 -g plain

This command will produce yeast.fa.aug and yeast.fa.plain.slp in the data folder, which represent the aug-phoni index.

Querying the index

To query the index we use the command aug query from the build directory. We replace aug with phoni to instead run without augmented, assuming PHONI was also built.

aug ms \
-i <filename of the reference> \
-p <fasta pattern file> \
-g <grammar format> \

For example, to query the phoni index for the file yeast.fa using the plain grammar with the pattern samples.fa we run from the build folder:

python3 aug ms -i ../data/yeast.fa -p ../data/samples.fa -g plain

This command will produce samples.fa.positions and samples.fa.lengths in the data folder, which represent the matching staistics positions and lengths of samples.fa against yeast.fa, respectively.

Benchmarks

We provide a script and benchmark files to evaluate Aug PHONI described in the paper:

C. Martínez-Guardiola, N. K. Brown, F. Silva-Coira, D. Köppl, T. Gagie and S. Ladra, "Augmented Thresholds for MONI," 2023 Data Compression Conference (DCC), Snowbird, UT, USA, 2023, pp. 268-277, doi: 10.1109/DCC55655.2023.00035.

In our experiments we used the file

chr19.1000.fa.xz as our text dataset, and prefixes of it as our pattern

We have a shell script benchmark.sh for an automatic benchmark. For this to work, some variables in it has to be set, meaning it is necessary to download and compile those projects individually, and the set the corresponding variables in benchmark.sh manually. Finally, the output of benchmark.sh can be processed by sqlplots to generate the plots shown in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
CMakeModules		CMakeModules
include		include
pipeline		pipeline
test/src		test/src
thirdparty		thirdparty
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
benchmark.sh		benchmark.sh
catfasta.py		catfasta.py
prefixpattern.py		prefixpattern.py
readlog.sh		readlog.sh
splitpattern.py		splitpattern.py
testsuite.sh		testsuite.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Augmented Thresholds for MONI

Preparations

Compile

Building the index

Querying the index

Benchmarks

About

Releases

Packages

Languages

License

drnatebrown/aug_phoni

Folders and files

Latest commit

History

Repository files navigation

Augmented Thresholds for MONI

Preparations

Compile

Building the index

Querying the index

Benchmarks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages