-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could you explain in detail how to extend reads? #25
Comments
The run log of miniasm [M::main] ===> Step 1: reading read mappings <=== |
I guess you solved your minimap2 issue if you have generated output from teloclip-extract. Please try this example workflow. Make sure you have a stable release version for teloclip installed. You will also need samtools and minimap2. # Install from PyPi
pip install teloclip
# Check version is 0.0.4
teloclip --version
#teloclip 0.0.4 First, it is advisable to view all reads that that overhang the end of a contig (no filtering for telomeric motifs). # Create index of reference fasta
samtools faidx ref.fa
# Map hifi reads with mm2;
# exclude secondary alignments;
# keep only alignments that overhang the end of a contig;
# sort alignments and save as bamfile.
minimap2 -ax map-hifi ref.fa hifi_reads.fq.gz | samtools view -h -F 0x100 | teloclip --ref ref.fa.fai | samtools sort > primary_overhangs.bam Some contigs will have MANY more overhangs than others, these are likely to be mitochondrial or chloroplast genomes. Be careful to identify these contigs and exclude them from extension steps. Next, you can filter the primary overhang alignments for reads that contain the motif 'AAACCCT' in the overhang section by running the previous output through We can then pass these filtered alignments to samtools view -h primary_overhangs.bam | teloclip --ref ref.fa.fai --motifs AAACCCT | teloclip-extract --refIdx ref.fa.fai --extractReads --extractDir SplitOverhangs You will have output file that look like this for each contig where at least one overhang with the motif was found:
The soft-clipped or "overhang" segment of the read will be in lowercase letters. This will be at the start for reads that overhang the left end of the contig, or at the end for reads overhanging the right end of the contig. To manually extend the contigs, you should first check the BAM file to see that:
If the alignments pass this check you can usually select the longest overhang region (lowercase in the teloclip-extract output) and paste these bases onto the end of your contig. |
Dear autor:
I am assembling a plant genome, and now it is assembled to the level without gap. It have 21 chromosomes, but so far only 39 have been counted, and there are 3 motifs (AAACCCT) without telomeres.
Seeing the tools you developed, I think it will save a lot of manual work. I have tried step by step,
1. First index the reference assembly
samtools faidx hifiasm.fasta
2. Reading alignments from SAM file
minimap2 -ax map-hifi -t 120 hifiasm.fasta ../hifi/all.hifi.fasta | samtools view -@ 40 -h -F 0x100 >in.sam
3. Report clipped alignments containing target motifs
teloclip --ref hifiasm.fasta.fai --motifs AAACCCT in.sam| samtools sort -@ 60 > out.motifs.bam
4. Extract clipped reads
teloclip --ref hifiasm.fasta.fai --motifs AAACCCT in.sam | teloclip-extract --refIdx hifiasm.fasta.fai --extractReads --extractDir SplitOverhangs
but I am not familiar with the use of miniasm. the result is empty !
Could you give me some suggestions? Thank you very much.
The text was updated successfully, but these errors were encountered: