Scraping workshop

Where are the challenges ?

The challenges are here.

All data are from the Titanic disaster (it reminds you Kaggle ?)

Challenges are :

Extract all persons from one page
Extract all persons from multiple pages (use pagination)
Bypass the user-agent

How to complete the challenge ?

Step 0: Fill prerequisite

Scrapy works only with Python 2.7.

Please install Python 2.7, and not Python 3.x!

Step 1: Clone the repository

git clone https://github.com/fabienvauchelles/scraping-challenge-workshop.git
cd scraping-challenge-workshop

Step 2: Install all Python dependencies

pip install -r requirements.txt

Step 3: Edit your scraper to complete the challenge

Scraper code is inside the file myscraper/spiders/myscraper.py.

Items are inside the file myscraper/items.py.

Step 4: Start the scraper

cd scraping-challenge-workshop
scrapy crawl myscraper -t jsonlines -o persons.json

Exports items are inside the file persons.json.

Licence

See the Licence.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
myscraper		myscraper
.gitignore		.gitignore
LICENCE.txt		LICENCE.txt
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraping workshop

Where are the challenges ?

How to complete the challenge ?

Step 0: Fill prerequisite

Step 1: Clone the repository

Step 2: Install all Python dependencies

Step 3: Edit your scraper to complete the challenge

Step 4: Start the scraper

Licence

About

Releases

Packages

Languages

License

lenzai/scraping-challenge-workshop

Folders and files

Latest commit

History

Repository files navigation

Scraping workshop

Where are the challenges ?

How to complete the challenge ?

Step 0: Fill prerequisite

Step 1: Clone the repository

Step 2: Install all Python dependencies

Step 3: Edit your scraper to complete the challenge

Step 4: Start the scraper

Licence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages