Training datasets #11

andrewjjung47 · 2022-05-20T14:29:08Z

Hello,

I am wondering if the training datasets used in the publication (RNAStralign, TR0, and fine-tuning dataset for bpRNA-new) are available? I was able to find the evaluation datasets from Google Drive (https://drive.google.com/drive/folders/143TJDgGqYOhumayiDx7w4XWLDVEOSZ2X) but do not see the training sets.

Also, I am wondering which datasets were used to train ufold_train.pt and ufold_train_alldata.pt?

Thank you so much in advance!

The text was updated successfully, but these errors were encountered:

sperfu · 2022-05-24T11:49:19Z

Hi there,

The training dataset we used in the publication can be retrieved from mxfold2 paper(RNAStralign and TR0), you can download that data from its repository, as for the fine-tuning dataset for bpRNA-new, we kept the fasta file after mutating and CD-Hit. You can download that from the google drive link.

As for the ufold_train.pt and ufold_train_alldata.pt, the difference between these two files is the training data. We use all the data including training and testing datasets to train the latter model for future prediction on other sequences. And we also kept the pre-trained model only for training (ufold_train.pt).

Thanks.

andrewjjung47 · 2022-06-07T07:30:35Z

Thank you for your response!

andrewjjung47 · 2022-06-07T08:22:54Z

Hello there, I have another quick question regarding exactly which datasets were used to train ufold_train.pt.

Was it trained only on TR0 or RNAStralign & TR0? Also, was the provided ufold_train.pt also trained on the fine-tuning dataset for bpRNA-new?

CatIIIIIIII · 2022-10-19T15:09:23Z

Hello there, I have another quick question regarding exactly which datasets were used to train ufold_train.pt.

Was it trained only on TR0 or RNAStralign & TR0? Also, was the provided ufold_train.pt also trained on the fine-tuning dataset for bpRNA-new?

Same question

CatIIIIIIII · 2022-10-19T15:23:42Z

Hello there, I have another quick question regarding exactly which datasets were used to train ufold_train.pt.

Was it trained only on TR0 or RNAStralign & TR0? Also, was the provided ufold_train.pt also trained on the fine-tuning dataset for bpRNA-new?

I doubt that the author mistoke testing data for training since I can only reach up to 0.5 F1s with TR0 for training and TS0 for testing under same settings with this repo

sperfu · 2022-10-20T08:23:13Z

Hi there,

As illustrated in our paper, we trained the model on TR0 & RNAStralign & augmented mutated dataset for ufold_train.pt. TS0 is only used for training(the dataset we used is the same as MXfold2 and SPOTRNA). So we don't think we mistoke testing data for training. And ufold_train.pt is not trained on the fine-tuning dataset for bpRNA-new dataset, it is designed for fine-tuning for PDB dataset.

Thanks

CatIIIIIIII · 2022-10-21T14:56:46Z

Hi there,

As illustrated in our paper, we trained the model on TR0 & RNAStralign & augmented mutated dataset for ufold_train.pt. TS0 is only used for training(the dataset we used is the same as MXfold2 and SPOTRNA). So we don't think we mistoke testing data for training. And ufold_train.pt is not trained on the fine-tuning dataset for bpRNA-new dataset, it is designed for fine-tuning for PDB dataset.

Thanks

Hello,

Thank you for your response. Maybe I need to set training epoches larger for better results.

CatIIIIIIII · 2022-10-28T04:51:01Z

Hi there,

As illustrated in our paper, we trained the model on TR0 & RNAStralign & augmented mutated dataset for ufold_train.pt. TS0 is only used for training(the dataset we used is the same as MXfold2 and SPOTRNA). So we don't think we mistoke testing data for training. And ufold_train.pt is not trained on the fine-tuning dataset for bpRNA-new dataset, it is designed for fine-tuning for PDB dataset.

Thanks

Hi,

Could you share your augmented mutated dataset pickle file?

black0017 · 2023-06-01T11:07:05Z

Hi all @CatIIIIIIII @sperfu @andrewjjung47 !

I would like to ask if it's possible to share with us the mutated sequence of bpRNA-new that you generated (synthetic/augmented data).

That would be very helpful!

Thanks in advance.
N.

black0017 · 2023-06-01T11:19:47Z

@CatIIIIIIII @sperfu @andrewjjung47

The PDB train set (669 structures), as detailed in the supplementary material, would be very helpful if you could indicate which file/folder it corresponds to.

black0017 · 2023-06-09T08:37:20Z

@CatIIIIIIII @sperfu @andrewjjung47
Update:

PDB train data

hi, I have found the 669 PDB files here: https://bprna.cgrb.oregonstate.edu/index.html
Nonetheless, the .fasta files are just the sequences, there are no contact maps.

Mutated data

merge_allbpnew_mutate.cdhit.fa probably contains the mutated sequences but the targets (contact maps) are again missing.

andrewjjung47 closed this as completed Jun 7, 2022

andrewjjung47 reopened this Jun 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training datasets #11

Training datasets #11

andrewjjung47 commented May 20, 2022

sperfu commented May 24, 2022

andrewjjung47 commented Jun 7, 2022

andrewjjung47 commented Jun 7, 2022

CatIIIIIIII commented Oct 19, 2022

CatIIIIIIII commented Oct 19, 2022

sperfu commented Oct 20, 2022

CatIIIIIIII commented Oct 21, 2022

CatIIIIIIII commented Oct 28, 2022

black0017 commented Jun 1, 2023

black0017 commented Jun 1, 2023

black0017 commented Jun 9, 2023 •

edited

Loading

Training datasets #11

Training datasets #11

Comments

andrewjjung47 commented May 20, 2022

sperfu commented May 24, 2022

andrewjjung47 commented Jun 7, 2022

andrewjjung47 commented Jun 7, 2022

CatIIIIIIII commented Oct 19, 2022

CatIIIIIIII commented Oct 19, 2022

sperfu commented Oct 20, 2022

CatIIIIIIII commented Oct 21, 2022

CatIIIIIIII commented Oct 28, 2022

black0017 commented Jun 1, 2023

black0017 commented Jun 1, 2023

black0017 commented Jun 9, 2023 • edited Loading

PDB train data

Mutated data

black0017 commented Jun 9, 2023 •

edited

Loading