>>>response.css("title")
[<Selector query='descendant-or-self::title' data='<title>Quotes to Scrape</title>'>]
To begin learning the CSS method, open the Scrapy shell and try this:
>>>response.css("title")
[<Selector query='descendant-or-self::title' data='<title>Quotes to Scrape</title>'>]
The resulting list-like object called SelectorList represents a list of Selector objects that wrap around XML/HTML elements.
You need to refine the query to fine-grain the selection. The below code, for instance, will extract the text from the <title> element.
>>>response.css("title::text").get()
'Quotes to Scrape'
There are two things to note here. Firstly, ::text has been added to the CSS query, to select only the text inside the <title> element. Secondly, the get() method retrieves text from the first <title> element.
If there are more than one instances of the element, use the getall() method to retrieve all results. The output will be a list containing all results, i.e.:
>>>response.css("title::text").getall()
['Quotes to Scrape']
Besides the getall() and get() methods, you can also use the re() method to filter and extract content using regular expressions:
>>>response.css("title::text").re(r"Quotes.*")
['Quotes to Scrape']
>>>response.css("title::text").re(r"Q\w+")
['Quotes']
>>>response.css("title::text").re(r"(\w+) to (\w+)")
['Quotes', 'Scrape']
In order to find the proper CSS selectors to use, you might find it useful to open the response page from the shell in your web browser using view(response). You can use the browser’s developer tools to inspect the HTML and come up with a selector.
Use the Search Bar to find content on MarketingMind.
Contact | Privacy Statement | Disclaimer: Opinions and views expressed on www.ashokcharan.com are the author’s personal views, and do not represent the official views of the National University of Singapore (NUS) or the NUS Business School | © Copyright 2013-2025 www.ashokcharan.com. All Rights Reserved.