Skip to content

Commit

Permalink
Merge pull request #3084 from lleroi/get-started-genome-assembly
Browse files Browse the repository at this point in the history
Add introduction to get started genome assembly and annotation
  • Loading branch information
abretaud authored Nov 30, 2021
2 parents 346580b + 5c2eddd commit fa26c78
Show file tree
Hide file tree
Showing 14 changed files with 405 additions and 3 deletions.
Binary file added topics/assembly/images/collapsed-consensus.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added topics/assembly/images/coverage-x-GC.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added topics/assembly/images/genomes-size.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added topics/assembly/images/gigabase-read-length.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added topics/assembly/images/heterozygous.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added topics/assembly/images/kmers-frequency.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added topics/assembly/images/phased-assemblies.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added topics/assembly/images/ploidy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
402 changes: 402 additions & 0 deletions topics/assembly/tutorials/get-started-genome-assembly/slides.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion topics/genome-annotation/tutorials/funannotate/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ In this tutorial, you will learn how to perform a structural genome annotation,

To annotate our genome using Funannotate, we will use the following files:

- The **genome sequence** in fasta format. For best results, the sequence should be soft-masked beforehand. You can learn how to do it by following the [RepeatMasker tutorial]({% link topics/genome-annotation/tutorials/repeatmasker/tutorial.md %}). For this tutorial, we will try to annotate the genome assembled in the [Flye assembly tutorial]({{ TODO link topics/assembly/tutorials/flye-assembly/tutorial.md }}).
- The **genome sequence** in fasta format. For best results, the sequence should be soft-masked beforehand. You can learn how to do it by following the [RepeatMasker tutorial]({% link topics/genome-annotation/tutorials/repeatmasker/tutorial.md %}). For this tutorial, we will try to annotate the genome assembled in the [Flye assembly tutorial]({% link topics/assembly/tutorials/flye-assembly/tutorial.md %}).
- Some RNASeq data in fastq format. We will align them on the genome, and Funannotate will use it as evidence to annotate genes.
- A set of **protein sequences**, like UniProt/SwissProt. It is important to have good quality, curated sequences here, that's why, by default, Funannotate will use the UniProt/SwissProt databank. In this tutorial, we have prepared a subset of this databank to speed up computing, but you should use UniProt/SwissProt for real life analysis.

Expand Down
4 changes: 2 additions & 2 deletions topics/genome-annotation/tutorials/repeatmasker/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ We call this operation "masking" because, by making repeats lowercase, or replac

Multiple tools exist to perform the masking: [RepeatMasker](https://www.repeatmasker.org/), [RepeatModeler](https://www.repeatmasker.org/RepeatModeler/), [REPET](https://urgi.versailles.inra.fr/Tools/REPET), ... Each one have specificities: some can be trained on specific genomes, some rely on existing databases of repeated elements signatures ([Dfam](https://www.dfam.org/), [RepBase](https://www.girinst.org/repbase/)).

In this tutorial you will learn how to soft mask the genome sequence of a small eukaryote: Mucor mucedo (a fungal plant pathogen). You can learn how this genome sequence was assembled by following the [Flye assembly tutorial]({{ TODO link topics/assembly/tutorials/flye-assembly/tutorial.md }}). We will use RepeatMasker, which is probably the simplest solution giving an acceptable result before annotating the genome in the [Funannotate annotation tutorial]({% link topics/genome-annotation/tutorials/funannotate/tutorial.md %}).
In this tutorial you will learn how to soft mask the genome sequence of a small eukaryote: Mucor mucedo (a fungal plant pathogen). You can learn how this genome sequence was assembled by following the [Flye assembly tutorial]({% link topics/assembly/tutorials/flye-assembly/tutorial.md %}). We will use RepeatMasker, which is probably the simplest solution giving an acceptable result before annotating the genome in the [Funannotate annotation tutorial]({% link topics/genome-annotation/tutorials/funannotate/tutorial.md %}).

> ### Agenda
>
Expand Down Expand Up @@ -131,6 +131,6 @@ As we have used a generic species (Human), we only identified the most common re
# Conclusion
{:.no_toc}

By following this tutorial you have learn how to mask an eukaryotic genome using RepeatMasker, after assembling ([Flye assembly tutorial]({{ TODO link topics/assembly/tutorials/flye-assembly/tutorial.md }})) an before annotating it ([Funannotate annotation tutorial]({% link topics/genome-annotation/tutorials/funannotate/tutorial.md %})).
By following this tutorial you have learn how to mask an eukaryotic genome using RepeatMasker, after assembling ([Flye assembly tutorial]({% link topics/assembly/tutorials/flye-assembly/tutorial.md %})) an before annotating it ([Funannotate annotation tutorial]({% link topics/genome-annotation/tutorials/funannotate/tutorial.md %})).

Often times, annotation tools prefer to use soft masked genomes, as they primarily search for genes in non repeated regions, but tolerate that some genes overlap partially with these regions.

0 comments on commit fa26c78

Please sign in to comment.