The gene sequence and genetic feature variation between different individuals are very important for natural variation research. And the genome sequence and genetic features (always in GFF/GTF format) of reference is well established. In general the genomic sequence of different lines/accession are sequenced. The genetic features of variant individuals are also very interesting, while there no well established solution to transform the genetic features annotation of reference accession/line to other individuls. This pipeline tries transform the reference genetic fearures to variant individuls with genome sequence avaliable by whole-genome resequencing or de novo assembly . Here we provide a solution for inconsistent alignment problem which could lead to false positive splice sites disturb or ORF-shift predication. And whole genome MSA is all developped basing on the genetic features. GEAN could also use to transform the well annotated genetics feature of model species to phylogenetically nearby species with whole genome newly sequenced .
Here GEAN solved this problem by a dynamic programming algorithm (Zebric stripped dynamic programming). This pipeline could help to peform analysis where the gene structure annotaion of different accessions might important such the natural variation of certain single interesting gene, quantify the gene expression for non-reference accession/line and detect the difference expression level across different accession/line.Using the gene annotation as anchors, GEAN could perform base-pair resolution whole-genome-wide sequence alignment and perform variant calling.
CPU support avx2
GNU GCC >=6.0
Cmake >= 3.0
git clone https://github.com/baoxingsong/GEAN.git
cd GEAN
cmake CMakeLists.txt
make
this command will generate an executable file named gean
There are several functions are under testing and are included in the source code. The document for those functions will be released after testing. Best Practices for different aims could be found under example.
- whole genome wide pair-wise sequence alignment and variants calling for de novo assembly
- genome wide multiple sequence alignment using variant calling result
- transform maize reference genome annotation to de novo assembly
Please support by citing us in your publication.
Bug report? Any question? Any suggestion? Any requirement? Want to cooperate?
Please feel free to send E-mail to songbaoxing168@163.com
If you use GEAN, please cite:
Song B, Sang Q, Wang H, Pei H, Gan X and Wang F. Complement Genome Annotation Lift Over Using a Weighted Sequence Alignment Strategy. Front. Genet. 10:1046. doi: 10.3389/fgene.2019.01046
Thank Prof. Usadel Björn for great suggestions for speeding up
Thank Dr. Hequan Sun from MPIPZ for discussions
Thank Lukas Baumgarten from MPIPZ and Qiushi Li from University of UCalgary for bug reporting
Thank Elad Oren from Hebrew University of Jerusalem for the extending usage of GEAN
NSFC:31900486