A literature review exploring the world of multi-armed bandit problems! 🤓
A bandit is a simple slot machine wherein you insert a coin into the machine, pull a lever, and get an immediate reward. In this project, the multi-armed bandit model seeks to balance exploration (gathering information) and exploitation (maximizing reward) to solve sequential decision making problems. It has applications in recommendation systems, clinical trials 👨⚕️, and more!
This literature review examines seminal papers on multi-armed bandits that advanced the field, including:
- The stochastic multi-armed bandit problem and Gittins indices
- Refined lower bounds in both the fixed-confidence along with matching algorithms for Gaussian and Bernoulli bandit models.
- Upper confidence bound 📈 algorithms
- Best arm identification problems
- Contextual/linear 🔢 bandits
- Thompson sampling 🎯
Papers reviewed and summarized include work by:
- Kaufmann et al
- Vicotr Gabbillon
- Shivaram Kalyanakrishnan
- Jean-Yves
📝 How to Use This Repo
- Read the literature_review.pdf file for full summaries
- Check the References.bib file for full citations
- Let me know if you have any other bandit questions! 🙋
This is my first literature review project and I performed this project under the supervision of 👨💼 Prof. Manjesh K. Hanawal from IIT Bombay, India.