-
MDELTA
is an algorithm for Multifuricating Developmental cEll Lineage Tree Alignment. In essence, it compares two rooted, unordered, tip-labeled trees, and finds the best global/local correspondence between the nodes. The mDELTA program is designed for analyzing developmental cell lineage trees reconstructed through single-cell DNA barcoding (such as done byscGESTALT
orSMALT
, while greater cellular coverage is expected to yield more meaningful MDELTA alignments). -
Except for dealing with cell lineage trees instead of biological sequences, MDELTA is conceptually similar to sequence alignment. It helps quantify similarity among different lineage trees, disentangle the consensus and variation, find recurrent motifs, and facilitate comparative/evolutionary analyses.
-
Also included in this repository are Python/R scripts for statistical analyses and visualization of MDELTA results, which facilitates their biological interpretation.
-
MDELTA was developed by Jingyu Chen (EeWhile) under the supervision of Professor Jian-Rong Yang at the Zhongshan School of Medicine of Sun Yat-Sen University in China.
For more details, please refer to the following content:
- You can try to learn about the latest examples of dynamic MDELTAUI running results, which will help you better understand the purpose of mDERTA โ๏ธ http://eewhile.cn/mdelta_ui
- You can obtain a separate Python package for mDELTA for further development(Note that the version of this package is relatively old and many features have not been updated). https://github.com/Chenjy0212/mdelta
If you have any questions, please contact me. My contact information is located at the bottom
- It's best to use jupyter as your menu display tool, or you can install the corresponding plugin in vscode to use it.
- You can install the required packages automatically by running
mdelta_menu.ipynb
, or you can manually run thepackage_manger.py
to install the required packages.Tips : The first download time may be relatively long. If any installation package fails, please go ahead and install a package that is suitable for your current running environment (Python and R)
- You should first install stable versions of
Python
andR
language to facilitate subsequent operations
Open jupyter and run mdelta_menu.ipynb
, run the following code to automatically install the necessary Python packages and open the mdelta function menu.
%run package_manger.py
%run ./mdelta/mydefault.py
#from mdelta.mydefault import *
myargs = get_default()
Select, click, and swipe to achieve automatic parameter acquisition. The relevant parameters are detailed in the "Parameter parsing table" section of the menu. Continuing to run the next section of code will obtain the desired result.
%run $mdelta $TREE $TREE2 -nt $N2T -nt2 $N2T2 -xsd $XScoreFile -lsd $LScoreFile -t $top -ma $ma -mi $mi -p $p -T $Tqdm -n $notebook -mg $mgg -x $diffs -o $output -P $PERM -c $cpu -pper $pper
!Rscript $match_tree $mdelta_json $XScoreFile $output $LScoreFile $mi $ma
%run $network $mdelta_json $output
!Rscript $densitree $mdelta_json $output $dcc
!Rscript $da $mdelta_json $output $dcc
show more ๐๐๐ CLICK
from itertools import product
TREE,TREE2,N2T,N2T2,XScoreFile,LScoreFile,top,mavv,mavvstep,mivv,mivvstep,ps,psstep,tqdm,n,mg,mgstep,xs,xsstep,o,PERM,cpu,mdelta,match_tree,network,densitree,da,dcc,pper = get_listvalue(myargs.values())
output = get_output(o)
notebook, Tqdm = TF_to_10(n, tqdm)
for ma, mi, p, mgg, diffs in product(forlist(mavv[0], mavv[1], mavvstep),
forlist(mivv[0], mivv[1], mivvstep),
forlist(ps[0], ps[1], psstep),
forlist(mg[0], mg[1], mgstep),
forlist(xs[0], xs[1], xsstep)):
%run $mdelta $TREE $TREE2 -nt $N2T -nt2 $N2T2 -xsd $XScoreFile -lsd $LScoreFile -t $top -ma $ma -mi $mi -p $p -T $Tqdm -n $notebook -mg $mgg -x $diffs -o $output -P $PERM -c $cpu -pper $pper
if not PERM > 0:
mdelta_json = output + '{}_{}_top{}_diff{}_pv{}_miv{}_mav{}_mg{}.json'.format(os.path.basename(TREE).split('.')[0], os.path.basename(TREE2).split('.')[0], top, str(diffs), str(p), str(mi), str(ma), str(mgg))
!Rscript $match_tree $mdelta_json $XScoreFile $output $LScoreFile $mi $ma
%run $network $mdelta_json $output
!Rscript $densitree $mdelta_json $output $dcc
!Rscript $da $mdelta_json $output $dcc
Running the following command can obtain the parsing of relevant parameters
mDELTA.py -h
show more ๐๐๐ CLICK
Parameter | Type | Description |
---|---|---|
TreeSeqFile | [path/filename] | A text file storing cell lineage tree #1 in newick format. Tips can be labeled by name or cell type. Branch lengths should be removed. |
TreeSeqFile2 | [path/filename] | A text file storing cell lineage tree #2 in newick format. Tips can be labeled by name or cell type. Branch lengths should be removed. |
Parameter | Type | Description |
---|---|---|
-h, --help | show this help message and exit | |
-nt NAME2TYPEFILE, --Name2TypeFile NAME2TYPEFILE |
[path/filename] | List of correspondance between tip name and cell type for cell lineage tree #1. |
-nt2 NAME2TYPEFILE, --Name2TypeFile2 NAME2TYPEFILE |
[path/filename] | List of correspondance between tip name and cell type for cell lineage tree #2. |
-xsd XSCOREDICTFILE, --ScoreDictFile SCOREDICTFILE |
[path/filename] | A comma-delimited text file used to determine similarity scores between cells. If there are exactly three columns, they will be interpreted as (1) the cell (name or type) in Tree #1, (2) the cell in Tree #2, and (3) the similarity score. If otherwise, the first column will be interpreted as the cell (name or type) and the remaining columns as features of the cell (e.g. expression of a gene). The similarity scores will be estimated between all pairs of cells based on the Euclidean distance calculated using all the features. Overrides `-ma` and `-mi`. |
-lsd LSCOREDICTFILE1, --ScoreDictFile1 SCOREDICTFILE1 |
[path/filename] | Calculate the Euclidean distance based on the characteristics of the cell (such as gene expression) to obtain the result x, and obtain the corresponding score through a series of operations such as - ln (x+1). |
-t TOP, --top TOP | [int > 0] | Performs local (instead of global) alignment, and output the top NUM local alignments with the highest score (e.g. `-t 10`). In the case of global alignment, this parameter should be omitted. |
-ma MAV, --mav MAV | [float] | Default=2. Score of two paired end nodes with the same name or type. |
-mi MIV, --miv MIV | [float] | Default=-1. Shorthand for a simple matching score scheme, where the matching score between a pair of the same cell types is MAV and all other pairs are MIV. (e.g. `-ma 2 -mi -2`). Overridden by `-sd`. |
-p PV, --pv PV | [float] | Default=-1. The score for pruning a tip of the tree (e.g. `-p -2`). Default to -1. |
-pper PPER, --prunepercent PRUNEPERCENT | [float] | Default=20. The comparison result requires either party's pruning rate to be less than or equal to PPER. The pruning rate is the proportion of pruned leaf nodes to all leaf nodes in the subtree. |
-T TQDM, --Tqdm TQDM | [0(off) or 1(on)] | Whether to display the running Progress bar |
-n NOTEBOOK, --notebook NOTEBOOK |
[0(off) or 1(on)] | Toggle for the jupyter notebook environment. |
-P PERM, --PERM PERM | [int > 0] | Toggle for the statistical significance. For each observed alignment, the aligned trees will be permuted PERM times to generate a null distribution of alignment scores, with which a P value can be calculated for the observed alignment score. |
-c CPUS, --CPUs CPUS | [int > 0] | Number of threads for multi-processing. Default to 50., it can reach the maximum number of local CPU cores - 1. |
-o OUTPUT, --output OUTPUT |
[path] | Output path, eg:'/home/username' |
-mg MERGE, --merge MERGE |
[float] | This is the scaling factor for calculating the score of merging an internal node (e.g. -mg -1), which is multiplied by the number of tips of the internal node to be merged. Default to 0. |
-x DIFF, --diff DIFF |
[int > 0] | Alignment must consist of a minimal of DIFF percent aligned cell pairs that are different from previous(better) local alignments in order to be considered as another new alignment (e.g. `-x 20` means 20 percent). |
ExampleFile ๐
// Qualitative calculation
Name2Type.csv
# CXscorefile.csv
// Quantitative Calculation
Output_example ๐
Circle.pdf
# ACluster.pdf
# BLine.pdf
# C
topN.pdf
// N represents the sequence number of the nth
_densitrees_info.csv
All density tree results are represented as ๏ผ
- MatchTimes_TreeX_Label_densitree_leaves_match.csv
- The nwk subtree sequence file of TreeX and its root node have been converted to a cell type nwk subtree sequence file '''
- MatchTimes_TreeX_Label_densitree_leaves_match.csv
- The nwk subtree sequence file of TreeX and its root node have been converted to a cell type nwk subtree sequence file
Using the idea of dynamic programming, update the score matrix. Except for the scores of the end nodes derived from default values or user input socrefile
, the scores of all other cells are calculated using the weighted bipartite graph most matching algorithm (KM) to obtain the optimal value
In order to speed up our computing speed, we can prepare the socrefile in advance, and use the data preprocessing program mdelta_prepro
to calculate our biological data in advance, which can be reused or regenerated in the future. Greatly improve efficiency.
้ๆฏๅฎ EeWhile ๐คตโโ๏ธ
Student of SYSU. ๐ซ
ZHONGSHAN SCHOOL OF MEDICINE,SYSU https://zssom.sysu.edu.cn/zh-hans ๐
Undergraduate majoring in computer science, master majoring in bioinformatics. ๐จโ๐ป
Below is the contact information of the author. ๐
If you use this project in your research, please cite this project.
@misc{mdelta2022,
author = {Jingyu Chen},
title = {mDELTA: Multifuricating Developmental cEll Lineage Tree Alignment},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/Chenjy0212/mdelta_full}},
}