Little project to scrape company reviews from kununu.com with the Scrapy framework (based on Python).
"Scrapy is an application framework for crawling web sites and extracting structured data
which can be used for a wide range of useful applications,
like data mining, information processing or historical archival."
- Scrapy installed on your machine
→ Follow this installation guide: https://docs.scrapy.org/en/latest/intro/install.html
-
Clone this repo into your scrapy folder. (where the default tutorial folder should exist after your installation)
-
Your folder structure should look something like this:
scrapy/kununu/ README.md scrapy.cfg __init__.p kununu_project/ items.py middlewares.py pipelines.py settings.py spiders/ __init__.py kununu.py scrapy/tutorial/ scrapy.cfg tutorial/ items.py ...
-
Open your python CLI (I used anaconda prompt):
3.1 Navigate into the spider folder within scrapy folder → (scrapy/kununu/kununu_project/spiders)
3.2 Execute the following command: scrapy runspider kununu.py -
By default it scrapes reviews from ec4u expert consulting ag.
You can change this by adapting the links within the "kununu.py" - Spider.