RNAErnie_baselines

Official implement of BERT-like baselines (RNABERT, RNA-MSM, RNA-FM) for paper "Multi-purpose RNA Language Modeling with Motif-aware Pre-training and Type-guided Fine-tuning" with pytorch.

RNAErnie_baselines

Installation

First, download the repository and create the environment.

git clone https://github.com/CatIIIIIIII/RNAErnie_baselines.git
cd ./RNAErnie_baselines
conda env create -f environment.yaml

Then, activate the "RNAErnie" environment.

conda activate ErnieFold

Pre-training

You need to download the pre-training model weight from RNABERT, RNA-MSM and place them in the ./checkpoints folder. The pre-training model weight of RNA-FM would be downloaded automatically when you run the fine-tuning script.

Downstream Tasks

RNA sequence classification

1. Data Preparation

You can download training data from Google Drive and place them in the ./data/seq_cls folder. For baselines, only dataset nRC is available for this task.

2. Fine-tuning

Fine-tune BERT-style large-scale pre-trained language model on RNA sequence classification task with the following command:

python run_seq_cls.py \
    --device 'cuda:0' \
    --model_name RNAFM

You could configure backbone model by changing --model_name to RNAMSM or RNABERT.

RNA RNA interaction prediction

1. Data Preparation

You can download training data from Google Drive and place them in the ./data/rr_inter folder.

2. Fine-tuning

Fine-tune RNAErnie on RNA-RNA interaction task with the following command:

python run_rr_inter.py \
    --device 'cuda:0' \
    --model_name RNAFM

You could configure backbone model by changing --model_name to RNAMSM or RNABERT.

RNA secondary structure prediction

1. Data Preparation

You can download training data from Google Drive and unzip and place them in the ./data/ssp folder. Two tasks (RNAStrAlign-ArchiveII, bpRNA1m) are available for this task.

2. Adaptation

Adapt RNAErnie on RNA secondary structure prediction task with the following command:

python run_ss_pred.py \
    --device 'cuda:0' \
    --model_name RNAFM

You could configure backbone model by changing --model_name to RNAMSM or RNABERT. Or test on different tasks by changing --task_name to RNAStrAlign or bpRNA1m.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
RNABERT		RNABERT
RNAFM		RNAFM
RNAMSM		RNAMSM
__pycache__		__pycache__
configs		configs
vocabs		vocabs
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
base_classes.py		base_classes.py
calculate_FLOPs.py		calculate_FLOPs.py
collators.py		collators.py
datasets.py		datasets.py
environment.yaml		environment.yaml
interface.cpython-38-x86_64-linux-gnu.so		interface.cpython-38-x86_64-linux-gnu.so
losses.py		losses.py
metrics.py		metrics.py
param_turner2004.py		param_turner2004.py
rr_inter.py		rr_inter.py
run_rr_inter.py		run_rr_inter.py
run_seq_cls.py		run_seq_cls.py
run_ss_pred.py		run_ss_pred.py
seq_cls.py		seq_cls.py
ss_pred.py		ss_pred.py
tokenizer.py		tokenizer.py
trainers.py		trainers.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNAErnie_baselines

Installation

Pre-training

Downstream Tasks

RNA sequence classification

1. Data Preparation

2. Fine-tuning

RNA RNA interaction prediction

1. Data Preparation

2. Fine-tuning

RNA secondary structure prediction

1. Data Preparation

2. Adaptation

About

Releases 1

Packages

Languages

CatIIIIIIII/RNAErnie_baselines

Folders and files

Latest commit

History

Repository files navigation

RNAErnie_baselines

Installation

Pre-training

Downstream Tasks

RNA sequence classification

1. Data Preparation

2. Fine-tuning

RNA RNA interaction prediction

1. Data Preparation

2. Fine-tuning

RNA secondary structure prediction

1. Data Preparation

2. Adaptation

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages