Skip to content

Latest commit

 

History

History
14 lines (10 loc) · 735 Bytes

README.md

File metadata and controls

14 lines (10 loc) · 735 Bytes

CRISPR-spacer identification

Identify CRISPR arrays using CRT and PILERCR
python identify_crispr.py -i example/GUT_GENOME147678.fna -o out

The programs CRT and PILERCR are located in the bin directory The program first splits your input sequence into chunks of 50 Mbp. This is done to limit RAM usage. Not an an issue for small genomes, but can be one for large metagenomes. Each sequence is padded with 50 Ns, which helps to find CRISPR arrays at contig boundaries CRT v1.2 and PILER v1.06 are then run on each chunk of data The results are parsed and written to the output directory

Merge overlapping CRISPR arrays identified using CRT and PILERCR
python merge_crispr.py out/crt out/pilercr out/merged