To run the spider, navigate to the project’s top-level directory and run:
scrapy crawl quotes
This command will initiate the spider, sending requests to the quotes.toscrape.com
domain. The output will look something like this:
2024-01-01 19:19:10 [scrapy.core.engine] INFO: Spider opened
...
2024-01-01 19:19:10 [quotes] DEBUG: Saved file quotes-1.html
2024-01-01 19:19:10 [quotes] DEBUG: Saved file quotes-2.html
2024-01-01 19:19:10 [scrapy.core.engine] INFO: Closing spider (finished)
Two new files would have been created: quotes-1.html
and quotes-2.html
, with the content for the URLs listed in the spider.
Alternatively, instead of implementing start_requests()
, you can define a start_urls
class attribute as follows: