Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spades 3.12 #114

Merged
merged 4 commits into from
Jun 1, 2018
Merged

Spades 3.12 #114

merged 4 commits into from
Jun 1, 2018

Conversation

SilasK
Copy link
Member

@SilasK SilasK commented Jun 1, 2018

Ads spades 3.12, which handles merged reads separately.
Added "loose parameters"
add options to individually toggle:

  • normalization
  • error_correction
  • merging
  • error_correction of spades (BaysHammer),
  • filtering of contigs

added pythonic symlinks to reduce memory storage.

merged reads are used sperately by spades.
added possibility to drop normalization
made filtering optional
not tested
@SilasK SilasK merged commit d4a4576 into increase_assembly Jun 1, 2018
@SilasK SilasK mentioned this pull request Jun 1, 2018
Merged
@SilasK SilasK deleted the spades-3.12 branch June 20, 2018 20:10
@Sofie8
Copy link

Sofie8 commented Jun 22, 2018

Hi Silas,

Sorry, also a question here, I ran the new atlas with spades 3.12; and I wanted to choose the k-mers, 21,33,55 and 77. Although I specified this in the yaml file: it only does 21,33,55. I wanted to try 77 for 2x250 Illumina reads, and because we have high abundance of certain strains, so longer kmers might maybe improve assembly. Hope this is a quick fix, that the spades command reads in the parameters given here, and not his default.

Thanks,
Sofie

Spades

#------------
spades_skip_BayesHammer: false
spades_k: 21,33,55,77

@SilasK
Copy link
Member Author

SilasK commented Jun 22, 2018

I corrected it on the master branch.
You can go even higher up to k=127

Tell me if you see improvements.

@Sofie8
Copy link

Sofie8 commented Jun 25, 2018

Hi Silas,

I tried it, and it works, it runs all the specified k's up to 127 I tried! Thanks.
It's interesting, as for the soil sample I tried, I ran Megahit (default), Megahit (meta-sensitive), Spades (default), Spades (k21,33,55,77) and Spades (k21,33,55,77,99,127), and then dRep on all bins.
Spades 127 seems to work good, from the winning genomes, 3/4 are from the k127 assembly., the other 1/4 from the Spades k77.

The only issue I run into (with both v1.0.33 and v1.0.34), and it was not in the v1.0.32, is that MaxBin 2.2.1 stops Atlas when it reaches a sample where no decent bin was found, so it can't find the marker gene. It quits everything, terminating with an error, instead of moving on to the next.

Error in maxbin2.log of sample AA3s

MaxBin 2.2.1
Input contig: AA3s/AA3s_contigs.fasta
Located abundance file [AA3s/genomic_bins/AA3s_contig_coverage.tsv]
out header: AA3s/genomic_bins/AA3s
Min contig length: 200
Thread: 20
Probability threshold: 0.9
Max iteration: 50
Searching against 107 marker genes to find starting seed contigs for [AA3s/AA3s_contigs.fasta]...
Running FragGeneScan....
Running HMMER hmmsearch....
Try harder to dig out marker genes from contigs.
Marker gene search reveals that the dataset cannot be binned (the medium of marker gene number <= 1). Program stop.

@SilasK
Copy link
Member Author

SilasK commented Jun 26, 2018

I had a similar problem #29 but don't know how to solve it. If you can't find not enough marker genes, you probably have to check your assembly...

I didn't change anything in the maxbin rule between v1.022 .

You can add the command line uption --keep-going which tells the Atlas to continue with all the other samples, which don't have an error.

@SilasK
Copy link
Member Author

SilasK commented Jun 26, 2018

Thank you for your genome stats. I should also try to use higher numbers of k for the spades assembly.

dRep only chooses the best genome, It doesn't "merge" genomes, does it?

@Sofie8
Copy link

Sofie8 commented Jun 28, 2018

Ok, Yes, dRep uses gANI and mash to calculate the genome-genome distance, it doesn't merge, if I understood well. And then selects the best bin. The advantage is that is does allow comparing bins from different assemblies.

DAS Tool is similar and performs an aggregation step and starts from scaffold2bin.tsv and a contig file, from 1 assembly. It can be interesting, like the ability to run not only MaxBin2 (now in atlas), but the ability to choose e.g. also CONCOCT, MetaBAT, and tetraESOMs binning tools and then run DAS Tool to get the high quality, de-replicated bin, and start annotating these bins. That would be great. It is more automated and can save time for manual bin curation in Anvio, which can still come downstream of all this.

Sofie

@SilasK
Copy link
Member Author

SilasK commented Jun 28, 2018 via email

@Sofie8
Copy link

Sofie8 commented Jun 28, 2018

Oh ok, and how can I specify this exactly, to run Concoct and Metabat in Atlas? Maybe you are right, that merging doesn't improve too much, but the results from the paper looked very promising so I thought to give it a try. It can only be as good as your assemblies though, and the quality of the individual binning tools, but well.

@SilasK
Copy link
Member Author

SilasK commented Jul 2, 2018

Have a look at #87

I implemented concoct and metabat on the mags branch.
The branch is not up to date but should still work.

I suggest you download the git repository in another folder, to have two versions of Atlas.
Download the repro grom github:

git clone https://github.com/pnnl/atlas.git

change to the development branch:

git checkout <branch>

I suggest directly to access directly the Snakefile. I din’t finalised everything in the setup.py.

instead of atlas assemble -o {outdir} config.yaml <params>
use:

snakemake -s path/to/atlas/atlas/Snakefile --use-conda --config workflow="complete" \
    -d {outdir} \
    --configfile config.yaml \
    --config combine_contigs=True \
<params>

@SilasK
Copy link
Member Author

SilasK commented Jul 2, 2018

@Sofie8
It is true that the bining is only as good as the assembly, therefore I worked mainly on adding the new version of spades.
For metabat and concot they have the additional step of compining the contigs from different samples (I don't like co-assblies). I use dedupe from bbmap to do this but it's not ideal.

@Sofie8
Copy link

Sofie8 commented Jul 8, 2018

Hi Silas,

I performed the things you wrote above, but I get syntax error, and I checked the output file I am writing to that is consistent in my yaml and .pbs script.

SyntaxError:
Not all output, log and benchmark files of rule error_correction contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file.
File "/vsc-hard-mounts/leuven-data/314/vsc31426/miniconda3/envs/atlasmags/atlas/atlas/Snakefile", line 179, in
File "/vsc-hard-mounts/leuven-data/314/vsc31426/miniconda3/envs/atlasmags/atlas/atlas/rules/assemble.snakefile", line 128, in

@SilasK
Copy link
Member Author

SilasK commented Jul 9, 2018 via email

@Sofie8
Copy link

Sofie8 commented Jul 16, 2018

Hi Silas,
In the meantime I am running the mags branch with snakemake v4.7 which seems to do fine so far!
One question though, how do I specify metabat as binner?
I am now running concoct, as in the .yaml file:
binner: concoct # ['concoct'] Bin contigs acccording to abundance profiles, only if 'perform_genome_binning: true'

Other question, prokka was updated some days ago (tseemann/prokka#320), and now it breaks my atlas runs:
Can't locate Bio/Root/Version.pm in @inc (you may need to install the Bio::Root::Version module) (@inc contains: /opt/conda/envs/prokka/lib/site_perl/5.26.2/x86_64-linux-thread-multi /opt/conda/envs/prokka/lib/site_perl/5.26.2 /opt/conda/envs/prokka/lib/5.26.2/x86_64-linux-thread-multi /opt/conda/envs/prokka/lib/5.26.2 .) at /opt/conda/envs/prokka/bin/prokka line 30.

When I reinstall prokka and run outside atlas, it works, but not within atlas. Can we edit this in the snakemake file, that it retrieves only the last, working prokka version?

@SilasK
Copy link
Member Author

SilasK commented Jul 17, 2018

To select metabat as binner:

combine_contigs_params:
    binner: metabat
perform_genome_binning: true

@SilasK
Copy link
Member Author

SilasK commented Jul 17, 2018

@Sofie8 @brwnj Do you think it's a good Idea to use metabat on single samples. It was generated for binnining based on abundances in different samples. But for example in https://www.nature.com/articles/s41564-017-0012-7 they used metabat on sinlge samples.

@Sofie8
Copy link

Sofie8 commented Jul 18, 2018

Hi Silas,

I ran concoct, and in the log it says 100 % complete, I see combined_contigs.fasta, but is there also the output of bins? or scaffolds2bins.tsv? I attach you my .yaml and log files.
RBC-yaml-log.zip

I would like to run afterwards Dastool:
e.g.
./DAS_Tool -i sample_data/sample.human.gut_concoct_scaffolds2bin.tsv,
sample_data/sample.human.gut_maxbin2_scaffolds2bin.tsv,
sample_data/sample.human.gut_metabat_scaffolds2bin.tsv,
-l concoct,maxbin,metabat
-c sample_data/sample.human.gut_contigs.fa
-o sample_output/DASToolRun1

The samples I have are actually 12 fractions of the same sample, but with different GC content because the DNA underwent GC density fractionation. My co-assemblies are thus still originating from 1 sample, but GC-normalised kind of.

Thanks for your prompt replies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants