Can you provide the data with unprocessed code for the 28563 examples? #5

Jun-jie-Huang · 2022-03-31T07:24:15Z

Hi, thanks for your wonderful dataset and repo.

I'm developing a model for code summarization and want to take notebookCDG as one of my tasks. But I couldn't find the original dataset.

I follow the instructions in README. I download the data from your link the fully processed data (as a pkl file) can be downloaded [HERE](https://ibm.biz/Bdfpk6) and also checked the data in huggingface dataset. I only find the data with processed code string in code.seq file. But the punctuation marks are removed.

So could you provide the data with original code without any tokenization or removing punctuations for search? I'd appreciate it if you could release the dataset!

Thanks,
Junjie

The text was updated successfully, but these errors were encountered:

xuyeliu · 2022-11-18T21:30:00Z

Readme already update the raw notebook and raw pair

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can you provide the data with unprocessed code for the 28563 examples? #5

Can you provide the data with unprocessed code for the 28563 examples? #5

Jun-jie-Huang commented Mar 31, 2022

xuyeliu commented Nov 18, 2022

Can you provide the data with unprocessed code for the 28563 examples? #5

Can you provide the data with unprocessed code for the 28563 examples? #5

Comments

Jun-jie-Huang commented Mar 31, 2022

xuyeliu commented Nov 18, 2022