Skip to content

Running the example project

Jeremy Chou edited this page May 30, 2023 · 2 revisions

This example illustrates how to share a spider’s requests queue across multiple spider instances, highly suitable for broad crawls.

Setup scrapy_redis package in your PYTHONPATH

Run the crawler for first time then stop it:

cd example-project
scrapy crawl dmoz
... [dmoz] ...
^C

Run the crawler again to resume stopped crawling:

scrapy crawl dmoz
... [dmoz] DEBUG: Resuming crawl (9019 requests scheduled)

Start one or more additional scrapy crawlers:

scrapy crawl dmoz
... [dmoz] DEBUG: Resuming crawl (8712 requests scheduled)

Start one or more post-processing workers:

python process_items.py dmoz:items -v
...
Processing: Kilani Giftware (http://www.dmoz.org/Computers/Shopping/Gifts/)
Processing: NinjaGizmos.com (http://www.dmoz.org/Computers/Shopping/Gifts/)
...