You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using Hifiasm (version 0.19.7-r598) to assemble a plant genome (2n=28) with HiFi and Hi-C data.
First, assuming it’s a diploid plant, so I run with command: hifiasm -o OS010681.asm -t 64 --h1 hic_r1.fastq.gz --h2 hic_r2.fastq.gz OS010681.hifi.fq.gz
The result file OS010681.asm.hic.p_ctg.gfa was used for running a mummerplot with A. thaliana genome as reference
According to the mummerplot, this plant looks like a tetraploid. To check the heterozygous rate, I run GenomeScope2 with the HiFi read ( p4_summary.txt), its heterozygous rate is quite high (~8%).
I also refer issues #571, then I re-run hifiasm with command hifiasm -o OS010681.asm.v2 -t 64 -s 0.25 --n-hap 4 --h1 hic_r1.fastq.gz --h2 hic_r2.fastq.gz OS010681.hifi.fq.gz.
I got OS010681.asm.v2.hic.p_ctg.gfa (276M) with four hap files:
OS010681.asm.v2.hic.hap1.p_ctg.gfa (296M), OS010681.asm.v2.hic.hap2.p_ctg.gfa (267M), OS010681.asm.v2.hic.hap3.p_ctg.gfa(276M), OS010681.asm.v2.hic.hap4.p_ctg.gfa (353M).
In #431, you mentioned that ”If you have HiC reads, the latest release Hifiasm-0.19.3-r572 will give you 4 haplotypes. But the results might be not perfect right now”, I am quite confused right now, which assembly files should I use for further scaffolding in yahs?
Also, I am wondering whether you could help me with the following questions as well:
Q1, I noticed that the OS010681.asm.v2.hic.p_ctg.gfa (276M) is much smaller than the previous run OS010681.asm.hic.p_ctg.gfa (387M). What causes this difference?
Q2, (https://hifiasm.readthedocs.io/en/latest/faq.html#are-polyploid-genomes-supported) mentioned “The *r_utg.gfa and *p_utg.gfa are lossless so that they also work for polyploid genomes”, I am wondering what’s difference between *p_utg.gfa vs *p_ctg.gfa? How could I use the information from p_utg.gfa for my polyploid assembly?
Q3, #431, you mentioned that “mannually set --hom-cov to the homozygous coverage”, could you clarify how big the impact is by manually setting the hom-cov value? Also please provide a little bit more details about how to calculate the homozygous coverage if possible?
Q4, #537, you mentioned that “-l0 is designed for the homozygous sample, which will disable diploid phasing. Please do not use -l0 for the Hi-C phasing”. What’s the default value for -l in Hi-C assembly when run Hifiasm?
Sorry about the long question list, thank you so much for your help!
The text was updated successfully, but these errors were encountered:
Hello i am afraid I cant answer all your questions however I have recently assembled a highly heterozygous tetraploid using hifiasm. I used HiFiasm but used the utg assembly for scaffolding. This is because the utg are haplotype specific but you cannot guarantee that the contigs are. Unitigs can be thought of as high confidence contigs in that they have no conflicts. When you join Unitigs you get contigs and the assembler has to make certain decisions when for example it reaches a bubble in the graph (i.e. a heterozygous site) the assembler will choose one of the four (if its a tetraploid) alleles and the others will be considered part of the alternate assembly. ctg_p - is this primary contig assembly so whenever there was a bubble it will only give you one you can output the alternate assembly as well using a flag in hifiasm. I used an older version of HiFiasm and found that the phasing didnt work as well for separating out the 4 haplotypes however I have not tested the current version.
Hi there,
I am using Hifiasm (version 0.19.7-r598) to assemble a plant genome (2n=28) with HiFi and Hi-C data.
First, assuming it’s a diploid plant, so I run with command:
hifiasm -o OS010681.asm -t 64 --h1 hic_r1.fastq.gz --h2 hic_r2.fastq.gz OS010681.hifi.fq.gz
The result file OS010681.asm.hic.p_ctg.gfa was used for running a mummerplot with A. thaliana genome as reference
According to the mummerplot, this plant looks like a tetraploid. To check the heterozygous rate, I run GenomeScope2 with the HiFi read (
p4_summary.txt), its heterozygous rate is quite high (~8%).
Then, I re-read the FAQs before re-run the assembly. (https://hifiasm.readthedocs.io/en/latest/faq.html#which-types-of-assemblies-should-i-use) mentioned “if Hi-C data is available, hic.hap.p_ctg.gfa produced in Hi-C mode is the best choice”, and (https://hifiasm.readthedocs.io/en/latest/faq.html#are-polyploid-genomes-supported) mentioned that “ The *r_utg.gfa and *p_utg.gfa are lossless so that they also work for polyploid genomes. However, currently the contig-generation modules of hifiasm are designed for diploid samples, which means both the partially phased assembly and the fully-phased assembly does not directly support polyploid genomes”.
I also refer issues #571, then I re-run hifiasm with command
hifiasm -o OS010681.asm.v2 -t 64 -s 0.25 --n-hap 4 --h1 hic_r1.fastq.gz --h2 hic_r2.fastq.gz OS010681.hifi.fq.gz
.I got OS010681.asm.v2.hic.p_ctg.gfa (276M) with four hap files:
OS010681.asm.v2.hic.hap1.p_ctg.gfa (296M), OS010681.asm.v2.hic.hap2.p_ctg.gfa (267M), OS010681.asm.v2.hic.hap3.p_ctg.gfa(276M), OS010681.asm.v2.hic.hap4.p_ctg.gfa (353M).
In #431, you mentioned that ”If you have HiC reads, the latest release Hifiasm-0.19.3-r572 will give you 4 haplotypes. But the results might be not perfect right now”, I am quite confused right now, which assembly files should I use for further scaffolding in yahs?
Also, I am wondering whether you could help me with the following questions as well:
Q1, I noticed that the OS010681.asm.v2.hic.p_ctg.gfa (276M) is much smaller than the previous run OS010681.asm.hic.p_ctg.gfa (387M). What causes this difference?
Q2, (https://hifiasm.readthedocs.io/en/latest/faq.html#are-polyploid-genomes-supported) mentioned “The *r_utg.gfa and *p_utg.gfa are lossless so that they also work for polyploid genomes”, I am wondering what’s difference between *p_utg.gfa vs *p_ctg.gfa? How could I use the information from p_utg.gfa for my polyploid assembly?
Q3, #431, you mentioned that “mannually set --hom-cov to the homozygous coverage”, could you clarify how big the impact is by manually setting the hom-cov value? Also please provide a little bit more details about how to calculate the homozygous coverage if possible?
Q4, #537, you mentioned that “-l0 is designed for the homozygous sample, which will disable diploid phasing. Please do not use -l0 for the Hi-C phasing”. What’s the default value for -l in Hi-C assembly when run Hifiasm?
Sorry about the long question list, thank you so much for your help!
The text was updated successfully, but these errors were encountered: