Skip to content

Question generation from given context using Transformers.

License

Notifications You must be signed in to change notification settings

Karthick47v2/question-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to question-generator 👋

License: MIT

This repo contains simplified question generator model pipeline. This is a part of Quizzy project, you can see the model in action there. Generating questions by finetuning T5 transformer on both SQuAD and SciQ datasets. PyTorch Lightning is used to finetune transformer. Context, Question and Answer extracted from datasets. Context and Answer will be given to model as input in order to generate Question. SQuAD is a reading comprehension dataset, model trained on that dataset is used for general purpose question generation and SciQ contains science exam questions and context, model trained on that dataset used specifically for physics, chemistry and biology related question generation on Quizzy application.

Transformer model input

"context: {context} answer: {answer}"

Prerequisite

  • Python 3.7 or newer.

Dataset

Usage

All dataset links available in notebook. SciQ dataset contains context related to Physics, Chemistry and Biology exam questions with context and SQuAD datast is a reading comprehension dataset. So, choose a dataset suitable for you.

  • Export training dataset from data_extraction.ipynb.
  • Run train.ipynb to start training.

Skip ONNX conversion and quantization steps if you are using GPU for inference. fastT5 is used to convert PyTorch model to ONNX which only supports CPU version of onnxruntime currently.

Author

👤 Karthick T. Sharma

Citation

SQuAD

@inproceedings{rajpurkar-etal-2016-squad,
    title = "{SQ}u{AD}: 100,000+ Questions for Machine Comprehension of Text",
    author = "Rajpurkar, Pranav  and
      Zhang, Jian  and
      Lopyrev, Konstantin  and
      Liang, Percy",
    booktitle = "Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2016",
    address = "Austin, Texas",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D16-1264",
    doi = "10.18653/v1/D16-1264",
    pages = "2383--2392",
}

SciQ

@inproceedings{SciQ,
    title={Crowdsourcing Multiple Choice Science Questions},
    author={Johannes Welbl, Nelson F. Liu, Matt Gardner},
    year={2017},
    journal={arXiv:1707.06209v1}
}

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page.

Show your support

Give a ⭐️ if this project helped you!