Skip to content

Comparison between the widely used BeautifulSoup4 and the new Requests-HTML parsing libraries, for parsing review data from Metacritic.

Notifications You must be signed in to change notification settings

najeebuddinm98/metacritic_parsing

Repository files navigation

metacritic_parsing

The main aim of this project is to provide a foundational comparison between BeautifulSoup4 library and Request-HTML library for parsing websites.

BS4 has been widely used as the de-facto library for parsing any website or html document. It is known to be very user-friendly and easy to use even for beginners. As an alternative, Requests-HTML was released by the same people who made the requests library. The main advantages that it boasts includes full support for JavaScript and Async support.

In this project, we parse the Metacritic webpages containing the ratings of all videogames in their records. There are total 181 pages as of writing this but I have only parsed 100 pages for convenience. The scripts can be easily expanded for all 181 pages. The webpages do not need JavaScript support, so the playing field is level. The data, consisting for the name, score, release date and the platform is stored in a csv file after all the parsing is done.

I have also made a small visualization in Jupyter of the data obtained.

Improvements:

The obvious one is parsing all 181 pages. Also, ScraPy is another very commonly used library for such tasks. I does have added functionality for making parsing from multiple webpages easier

About

Comparison between the widely used BeautifulSoup4 and the new Requests-HTML parsing libraries, for parsing review data from Metacritic.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published