Spades 3.12 #114

SilasK · 2018-06-01T07:33:03Z

Ads spades 3.12, which handles merged reads separately.
Added "loose parameters"
add options to individually toggle:

normalization
error_correction
merging
error_correction of spades (BaysHammer),
filtering of contigs

added pythonic symlinks to reduce memory storage.

merged reads are used sperately by spades. added possibility to drop normalization

made filtering optional not tested

Sofie8 · 2018-06-22T05:19:39Z

Hi Silas,

Sorry, also a question here, I ran the new atlas with spades 3.12; and I wanted to choose the k-mers, 21,33,55 and 77. Although I specified this in the yaml file: it only does 21,33,55. I wanted to try 77 for 2x250 Illumina reads, and because we have high abundance of certain strains, so longer kmers might maybe improve assembly. Hope this is a quick fix, that the spades command reads in the parameters given here, and not his default.

Thanks,
Sofie

Spades

#------------
spades_skip_BayesHammer: false
spades_k: 21,33,55,77

SilasK · 2018-06-22T12:27:03Z

I corrected it on the master branch.
You can go even higher up to k=127

Tell me if you see improvements.

Sofie8 · 2018-06-25T22:58:46Z

Hi Silas,

I tried it, and it works, it runs all the specified k's up to 127 I tried! Thanks.
It's interesting, as for the soil sample I tried, I ran Megahit (default), Megahit (meta-sensitive), Spades (default), Spades (k21,33,55,77) and Spades (k21,33,55,77,99,127), and then dRep on all bins.
Spades 127 seems to work good, from the winning genomes, 3/4 are from the k127 assembly., the other 1/4 from the Spades k77.

The only issue I run into (with both v1.0.33 and v1.0.34), and it was not in the v1.0.32, is that MaxBin 2.2.1 stops Atlas when it reaches a sample where no decent bin was found, so it can't find the marker gene. It quits everything, terminating with an error, instead of moving on to the next.

Error in maxbin2.log of sample AA3s

MaxBin 2.2.1
Input contig: AA3s/AA3s_contigs.fasta
Located abundance file [AA3s/genomic_bins/AA3s_contig_coverage.tsv]
out header: AA3s/genomic_bins/AA3s
Min contig length: 200
Thread: 20
Probability threshold: 0.9
Max iteration: 50
Searching against 107 marker genes to find starting seed contigs for [AA3s/AA3s_contigs.fasta]...
Running FragGeneScan....
Running HMMER hmmsearch....
Try harder to dig out marker genes from contigs.
Marker gene search reveals that the dataset cannot be binned (the medium of marker gene number <= 1). Program stop.

SilasK · 2018-06-26T07:42:43Z

I had a similar problem #29 but don't know how to solve it. If you can't find not enough marker genes, you probably have to check your assembly...

I didn't change anything in the maxbin rule between v1.022 .

You can add the command line uption --keep-going which tells the Atlas to continue with all the other samples, which don't have an error.

SilasK · 2018-06-26T07:45:43Z

Thank you for your genome stats. I should also try to use higher numbers of k for the spades assembly.

dRep only chooses the best genome, It doesn't "merge" genomes, does it?

Sofie8 · 2018-06-28T07:32:07Z

Ok, Yes, dRep uses gANI and mash to calculate the genome-genome distance, it doesn't merge, if I understood well. And then selects the best bin. The advantage is that is does allow comparing bins from different assemblies.

DAS Tool is similar and performs an aggregation step and starts from scaffold2bin.tsv and a contig file, from 1 assembly. It can be interesting, like the ability to run not only MaxBin2 (now in atlas), but the ability to choose e.g. also CONCOCT, MetaBAT, and tetraESOMs binning tools and then run DAS Tool to get the high quality, de-replicated bin, and start annotating these bins. That would be great. It is more automated and can save time for manual bin curation in Anvio, which can still come downstream of all this.

Sofie

SilasK · 2018-06-28T08:29:38Z

Actually, I've implemented Concoct and Metabat in Atlas, the "mags"- branch. but the results are much worse at least on the Cami data, that I don't think it's the best ides to make as many bins as possible and then merge them.

…

On Thu, Jun 28, 2018 at 9:32 AM Sofie8 ***@***.***> wrote: Ok, Yes, dRep uses gANI and mash to calculate the genome-genome distance, it doesn't merge, if I understood well. And then selects the best bin. The advantage is that is does allow comparing bins from different assemblies. DAS Tool is similar and performs an aggregation step and starts from scaffold2bin.tsv and a contig file, from 1 assembly. It can be interesting, like the ability to run not only MaxBin2 (now in atlas), but the ability to choose e.g. also CONCOCT, MetaBAT, and tetraESOMs binning tools and then run DAS Tool to get the high quality, de-replicated bin, and start annotating these bins. That would be great. It is more automated and can save time for manual bin curation in Anvio, which can still come downstream of all this. Sofie — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#114 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHLK2ucgC7kOrxSevMpRdWk3WSfB0-87ks5uBIZ4gaJpZM4UWO6F> .

Sofie8 · 2018-06-28T08:41:37Z

Oh ok, and how can I specify this exactly, to run Concoct and Metabat in Atlas? Maybe you are right, that merging doesn't improve too much, but the results from the paper looked very promising so I thought to give it a try. It can only be as good as your assemblies though, and the quality of the individual binning tools, but well.

SilasK · 2018-07-02T07:33:15Z

Have a look at #87

I implemented concoct and metabat on the mags branch.
The branch is not up to date but should still work.

I suggest you download the git repository in another folder, to have two versions of Atlas.
Download the repro grom github:

git clone https://github.com/pnnl/atlas.git

change to the development branch:

git checkout <branch>

I suggest directly to access directly the Snakefile. I din’t finalised everything in the setup.py.

instead of atlas assemble -o {outdir} config.yaml <params>
use:

snakemake -s path/to/atlas/atlas/Snakefile --use-conda --config workflow="complete" \
    -d {outdir} \
    --configfile config.yaml \
    --config combine_contigs=True \
<params>

SilasK · 2018-07-02T07:35:50Z

@Sofie8
It is true that the bining is only as good as the assembly, therefore I worked mainly on adding the new version of spades.
For metabat and concot they have the additional step of compining the contigs from different samples (I don't like co-assblies). I use dedupe from bbmap to do this but it's not ideal.

Sofie8 · 2018-07-08T11:58:53Z

Hi Silas,

I performed the things you wrote above, but I get syntax error, and I checked the output file I am writing to that is consistent in my yaml and .pbs script.

SyntaxError:
Not all output, log and benchmark files of rule error_correction contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file.
File "/vsc-hard-mounts/leuven-data/314/vsc31426/miniconda3/envs/atlasmags/atlas/atlas/Snakefile", line 179, in
File "/vsc-hard-mounts/leuven-data/314/vsc31426/miniconda3/envs/atlasmags/atlas/atlas/rules/assemble.snakefile", line 128, in

SilasK · 2018-07-09T05:41:05Z

Ah, the old scripbate not compatible with snakemake v5. If you want you can downgrade snakemake to v4.7. ma be. Lstter this week I've time to updatr the mags branch.

…

On Jul 8, 2018 13:58, "Sofie8" ***@***.***> wrote: Hi Silas, I performed the things you wrote above, but I get syntax error, and I checked the output file I am writing to that is consistent in my yaml and .pbs script. SyntaxError: Not all output, log and benchmark files of rule error_correction contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file. File "/vsc-hard-mounts/leuven-data/314/vsc31426/miniconda3/envs/ atlasmags/atlas/atlas/Snakefile", line 179, in File "/vsc-hard-mounts/leuven-data/314/vsc31426/miniconda3/envs/ atlasmags/atlas/atlas/rules/assemble.snakefile", line 128, in — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#114 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHLK2rMcAEbsu5cTI8fRS4W75I5la-PXks5uEfP9gaJpZM4UWO6F> .

Sofie8 · 2018-07-16T00:27:16Z

Hi Silas,
In the meantime I am running the mags branch with snakemake v4.7 which seems to do fine so far!
One question though, how do I specify metabat as binner?
I am now running concoct, as in the .yaml file:
binner: concoct # ['concoct'] Bin contigs acccording to abundance profiles, only if 'perform_genome_binning: true'

Other question, prokka was updated some days ago (tseemann/prokka#320), and now it breaks my atlas runs:
Can't locate Bio/Root/Version.pm in @inc (you may need to install the Bio::Root::Version module) (@inc contains: /opt/conda/envs/prokka/lib/site_perl/5.26.2/x86_64-linux-thread-multi /opt/conda/envs/prokka/lib/site_perl/5.26.2 /opt/conda/envs/prokka/lib/5.26.2/x86_64-linux-thread-multi /opt/conda/envs/prokka/lib/5.26.2 .) at /opt/conda/envs/prokka/bin/prokka line 30.

When I reinstall prokka and run outside atlas, it works, but not within atlas. Can we edit this in the snakemake file, that it retrieves only the last, working prokka version?

SilasK · 2018-07-17T07:09:33Z

To select metabat as binner:

combine_contigs_params:
    binner: metabat
perform_genome_binning: true

SilasK · 2018-07-17T11:46:30Z

@Sofie8 @brwnj Do you think it's a good Idea to use metabat on single samples. It was generated for binnining based on abundances in different samples. But for example in https://www.nature.com/articles/s41564-017-0012-7 they used metabat on sinlge samples.

Sofie8 · 2018-07-18T02:28:23Z

Hi Silas,

I ran concoct, and in the log it says 100 % complete, I see combined_contigs.fasta, but is there also the output of bins? or scaffolds2bins.tsv? I attach you my .yaml and log files.
RBC-yaml-log.zip

I would like to run afterwards Dastool:
e.g.
./DAS_Tool -i sample_data/sample.human.gut_concoct_scaffolds2bin.tsv,
sample_data/sample.human.gut_maxbin2_scaffolds2bin.tsv,
sample_data/sample.human.gut_metabat_scaffolds2bin.tsv,
-l concoct,maxbin,metabat
-c sample_data/sample.human.gut_contigs.fa
-o sample_output/DASToolRun1

The samples I have are actually 12 fractions of the same sample, but with different GC content because the DNA underwent GC density fractionation. My co-assemblies are thus still originating from 1 sample, but GC-normalised kind of.

Thanks for your prompt replies.

SilasK added 4 commits May 31, 2018 11:02

added spades 3.12

421c4f2

merged reads are used sperately by spades. added possibility to drop normalization

corrected bugs- tested on spades and megahit

9247476

added loose parameters

3f63fa4

made filtering optional not tested

corrected bug with rel paths

de8c3e6

SilasK merged commit d4a4576 into increase_assembly Jun 1, 2018

SilasK mentioned this pull request Jun 1, 2018

Dev #115

Merged

SilasK deleted the spades-3.12 branch June 20, 2018 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spades 3.12 #114

Spades 3.12 #114

SilasK commented Jun 1, 2018 •

edited

Loading

Sofie8 commented Jun 22, 2018

SilasK commented Jun 22, 2018

Sofie8 commented Jun 25, 2018

SilasK commented Jun 26, 2018

SilasK commented Jun 26, 2018

Sofie8 commented Jun 28, 2018

SilasK commented Jun 28, 2018 via email

Sofie8 commented Jun 28, 2018

SilasK commented Jul 2, 2018

SilasK commented Jul 2, 2018

Sofie8 commented Jul 8, 2018

SilasK commented Jul 9, 2018 via email

Sofie8 commented Jul 16, 2018

SilasK commented Jul 17, 2018

SilasK commented Jul 17, 2018

Sofie8 commented Jul 18, 2018

Spades 3.12 #114

Spades 3.12 #114

Conversation

SilasK commented Jun 1, 2018 • edited Loading

Sofie8 commented Jun 22, 2018

Spades

SilasK commented Jun 22, 2018

Sofie8 commented Jun 25, 2018

SilasK commented Jun 26, 2018

SilasK commented Jun 26, 2018

Sofie8 commented Jun 28, 2018

SilasK commented Jun 28, 2018 via email

Sofie8 commented Jun 28, 2018

SilasK commented Jul 2, 2018

SilasK commented Jul 2, 2018

Sofie8 commented Jul 8, 2018

SilasK commented Jul 9, 2018 via email

Sofie8 commented Jul 16, 2018

SilasK commented Jul 17, 2018

SilasK commented Jul 17, 2018

Sofie8 commented Jul 18, 2018

SilasK commented Jun 1, 2018 •

edited

Loading