pacbio-14-nctc-assemblies

14 NCTC samples for testing PacBio assemblers. They are the 14 samples used in the Circlator paper.

Assemblies and files for analysis are all in this github repository. Raw sequencing reads are in the ENA. The filtered subreads FASTQ and corrected reads FASTA files made when running HGAP are available from ftp://ngs.sanger.ac.uk/production/pathogens/mh12/pacbio-14-nctc-assemblies.

The file sample_data.tsv lists accession IDs for the raw reads and the reference assembly of each sample, and some basic stats (assembly size, number of reads etc).

Each directory NCTCxxxxx/ contains all the files relating to that sample. The FASTA files in each directory are:

ref.fa - the reference sequence
canu.1.{0,1}.fa - as assembly made with versions 1.0, 1.1 of canu
miniasm.0.2.fa - an assembly made with miniasm (preprint here), and miniasm.0.2.quiver.fa is the result of running quiver.
hgap.fa - an assembly made with HGAP (publication here).
sprai.0.9.9.10.fa - an assembly made with version 0.9.9.10 of Sprai

Canu assemblies

Made with canu version 1.0 and 1.1. The filtered subreads were used as input with

-pacbio-raw filtered_subreads.fq

and the genome size was set to the length of the reference genome for each sample, using

genomeSize=$length

where $length was taken from the file sample_data.tsv.

The only other options changed were cluster-specific:

maxThreads=8 maxMemory=16 useGrid=0

HGAP assemblies

Details TBC...

miniasm assemblies

Made with version 0.2 (and minimap version 0.2) using the filtered subreads output during a run of HGAP. The three commands run were:

minimap -Sw5 -L100 -m0 -t4 $reads $reads | gzip -1 > miniasm.paf.gz
miniasm -f $reads miniasm.paf.gz > miniasm.gfa
awk '$1=="S" {print ">"$2"\n"$3} ' miniasm.gfa > miniasm.fa

where $reads is the FASTQ file of reads, and the final output FASTA file of contigs is called miniasm.0.2.fa.

Each miniasm assembly has had quiver run on it. The FASTA file is called miniasm.0.2.quiver.fa.

Sprai assemblies

Made with version 0.9.9.10 of Sprai using this wrapper script with the options --threads 8 --memory 16. Sprai runs Celera. Version 8.3rc2 of Celera was used. For each sample, the genome length given to the wrapper script was taken from the file sample_data.tsv.

To do

Gather HGAP assembler version/options etc
Add PBcR assemblies
Run Quast on all assemblies/refs

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
NCTC10005		NCTC10005
NCTC10833		NCTC10833
NCTC10963		NCTC10963
NCTC11192		NCTC11192
NCTC12419		NCTC12419
NCTC13251		NCTC13251
NCTC13277		NCTC13277
NCTC13307		NCTC13307
NCTC13348		NCTC13348
NCTC13349		NCTC13349
NCTC13360		NCTC13360
NCTC13616		NCTC13616
NCTC13626		NCTC13626
NCTC3610		NCTC3610
LICENSE		LICENSE
README.md		README.md
act_cartoon.py		act_cartoon.py
assembly_stats.tsv		assembly_stats.tsv
canu.1.0.resources.tsv		canu.1.0.resources.tsv
canu.1.1.resources.tsv		canu.1.1.resources.tsv
miniasm.0.2.quiver.resources.tsv		miniasm.0.2.quiver.resources.tsv
miniasm.0.2.resources.memory.png		miniasm.0.2.resources.memory.png
miniasm.0.2.resources.run_time.png		miniasm.0.2.resources.run_time.png
miniasm.0.2.resources.tsv		miniasm.0.2.resources.tsv
miniasm_v_hgap.assembly_stats.tsv		miniasm_v_hgap.assembly_stats.tsv
miniasm_v_hgap.hit_length_compare.tsv		miniasm_v_hgap.hit_length_compare.tsv
miniasm_v_hgap_analysis.plot.hit_length_ratio.png		miniasm_v_hgap_analysis.plot.hit_length_ratio.png
miniasm_v_hgap_analysis.plot.number_of_contigs.png		miniasm_v_hgap_analysis.plot.number_of_contigs.png
miniasm_v_hgap_analysis.plot.total_bases_compare.png		miniasm_v_hgap_analysis.plot.total_bases_compare.png
miniasm_v_hgap_analysis.sh		miniasm_v_hgap_analysis.sh
miniasm_v_hgap_analysis_plots.R		miniasm_v_hgap_analysis_plots.R
nctc_ids.txt		nctc_ids.txt
sample_data.tsv		sample_data.tsv
sprai.0.9.9.10.resources.tsv		sprai.0.9.9.10.resources.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pacbio-14-nctc-assemblies

Canu assemblies

HGAP assemblies

miniasm assemblies

Sprai assemblies

To do

About

Releases

Packages

Languages

License

martinghunt/pacbio-14-nctc-assemblies

Folders and files

Latest commit

History

Repository files navigation

pacbio-14-nctc-assemblies

Canu assemblies

HGAP assemblies

miniasm assemblies

Sprai assemblies

To do

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages