>>>response.css("title") [<Selector query='descendant-or-self::title' data='<title>Quotes to Scrape</title>'>]
To begin learning the CSS method, open the Scrapy shell and try this:
>>>response.css("title") [<Selector query='descendant-or-self::title' data='<title>Quotes to Scrape</title>'>]
The resulting list-like object called SelectorList represents a list of Selector objects that wrap around XML/HTML elements.
You need to refine the query to fine-grain the selection. The below code, for instance, will extract the text from the <title>
element.
>>>response.css("title::text").get() 'Quotes to Scrape'
There are two things to note here. Firstly, ::text
has been added to the CSS query, to select only the text inside the <title>
element. Secondly, the get()
method retrieves text from the first <title>
element.
If there are more than one instances of the element, use the getall()
method to retrieve all results. The output will be a list containing all results, i.e.:
>>>response.css("title::text").getall() ['Quotes to Scrape']
Besides the getall()
and get()
methods, you can also use the re()
method to filter and extract content using regular expressions:
>>>response.css("title::text").re(r"Quotes.*") ['Quotes to Scrape'] >>>response.css("title::text").re(r"Q\w+") ['Quotes'] >>>response.css("title::text").re(r"(\w+) to (\w+)") ['Quotes', 'Scrape']
In order to find the proper CSS selectors to use, you might find it useful to open the response page from the shell in your web browser using view(response)
. You can use the browser’s developer tools to inspect the HTML and come up with a selector.
Use the Search Bar to find content on MarketingMind.
Contact | Privacy Statement | Disclaimer: Opinions and views expressed on www.ashokcharan.com are the author’s personal views, and do not represent the official views of the National University of Singapore (NUS) or the NUS Business School | © Copyright 2013-2025 www.ashokcharan.com. All Rights Reserved.