Web Scraping kununu.com

Little project to scrape company reviews from kununu.com with the Scrapy framework (based on Python).

"Scrapy is an application framework for crawling web sites and extracting structured data
which can be used for a wide range of useful applications, like data mining, information processing or historical archival."

Prerequisites

Scrapy installed on your machine
→ Follow this installation guide: https://docs.scrapy.org/en/latest/intro/install.html

How to run this project

Clone this repo into your scrapy folder. (where the default tutorial folder should exist after your installation)

Your folder structure should look something like this:

  scrapy/kununu/
     README.md
     scrapy.cfg
     __init__.p
     kununu_project/
             items.py
             middlewares.py
             pipelines.py
             settings.py
             spiders/
                __init__.py
                kununu.py

 scrapy/tutorial/
     scrapy.cfg
     tutorial/
             items.py
             ...

Open your python CLI (I used anaconda prompt):
3.1 Navigate into the spider folder within scrapy folder → (scrapy/kununu/kununu_project/spiders)
3.2 Execute the following command: scrapy runspider kununu.py
By default it scrapes reviews from ec4u expert consulting ag.
You can change this by adapting the links within the "kununu.py" - Spider.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
kununu_project		kununu_project
LICENSE		LICENSE
README.md		README.md
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping kununu.com

Prerequisites

How to run this project

About

Releases

Packages

Languages

License

TheWoops/Web-Scraping

Folders and files

Latest commit

History

Repository files navigation

Web Scraping kununu.com

Prerequisites

How to run this project

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages