Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_full_length_virus #1

Open
antoine4ucsd opened this issue May 6, 2022 · 8 comments
Open

create_full_length_virus #1

antoine4ucsd opened this issue May 6, 2022 · 8 comments

Comments

@antoine4ucsd
Copy link

Hello
I am trying to apply Haploflow to a set of nanopore FL SIV data.
I figured I would start with the toy3.fq example (I installed haploflow with anaconda)
it does generate 2 outputs : contigs.fa and Cov.tsv (no graph?, nothing else) without error message - log attached
my goal was to create fl viruses but it does not seem to work. can you help?
I have tried
python create_full_length_virus.py contigs.fa
or with a reference
ython create_full_length_virus.py contigs.fa HXB2.fasta

I do not have snp files , coords_file or duplication_ratio_file as part of the haploflow output. I guess I am missing something obvious here

thank you!
log.txt

@AlphaSquad
Copy link
Collaborator

Hi,
did you use the -debug option of Haploflow? I changed the generation of the graph files because, depending on the data set, Haploflow sometime would produce a lot of graphs. The log looks fine, there should be 3 contigs in the contigs.fa file (0,2 and 3)?

Regardless, the SNP/coords file etc. are not produced by Haploflow itself, but by running QUAST. What you would need to do is run quast with the contig file and e.g. HXB2 as reference (I blasted the short contig from the log and it matched basically perfectly to the JRCSF strain, so using that as reference might be even better) and then use these files as input for the create_full_length_virus.py script

@antoine4ucsd
Copy link
Author

thank you for your prompt response! I did not realize I need QUAST, sorry. will do it now
I will also rerun with the debug
thank you ++

@antoine4ucsd
Copy link
Author

I was able to install and run quast with HXB2.fasta and HXB2.gff3 references.
I got no errors and many outputs including the attached coords file
can you help/be more specific about the cmd line to run afterward?
I am still getting the same error when trying for example:
python create_full_length_virus.py contigs.fa HXB2.fasta contigs.coords

thank you!

contigs.coord.txt

@AlphaSquad
Copy link
Collaborator

Yes, sorry the overview in this repository is not particularly easy to follow and was written with a previous version of QUAST in mind.
You will need more than just the coords file, i.e. you need all SNPs (unzip the corresponding file in the quast folder, see in the command below), the coords file you linked, the general report.tsv file and a mapping as bam file of the contigs to the reference. The latest QUAST version does not include this bam-file so you need to run e.g. bowtie2 or bwa to create this bam-file. Finally, you need to also provide an output folder to the script where it will put the sequences. The command then will look something like this:
python create_full_length_virus.py contigs.fa HXB2.fa quast/contigs_reports/minimap_output/contigs.used_snps.txt quast/contigs_reports/minimap_output/contigs.coords quast/report.tsv contigs.bam out_path/

@antoine4ucsd
Copy link
Author

I have all but the bam output. when I run what you suggest above, I still have the same error

python create_full_length_virus.py contigs.fa HXB2.fa contigs.used_snps.txt contigs.coords report.tsv contigs.bam ./outpat

IndexError: list index out of range

see attached. if you have 2' does that work on your laptop? hopefully we are close!
thanks again,

test.zip

@AlphaSquad
Copy link
Collaborator

AlphaSquad commented May 6, 2022

I created a small bamfile using minimap and tried the following command:
python create_full_length_virus.py contigs.fa HXB2.fasta contigs.used_snps.txt contigs.coords report.tsv contigs.bam .
and attached the result (as well as the bam-file) as strains_cds.fa so it was working for me (after the commit I added to the repository since the file formats of quast changed). Could you try again with the latest version?
haploflow.zip

Note that quast reports one of the contigs as unaligned to the reference (probably because the "error rate" is too high), so there are only two full length contigs. Maybe you need to set the --min-identity threshold of quast to value less than 95% (which is the default)

Also, all three contigs in the contigs.fa file are basically full length (9084, 8982, 8899 bases)

@antoine4ucsd
Copy link
Author

antoine4ucsd commented May 6, 2022

thank you. I will give it a try!

@antoine4ucsd
Copy link
Author

it worked after updating to the lsat commit.
thank you! amazing support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants