Pandas can perform a range of data cleaning operations such as removing rows containing empty cells dropna()
, or removing duplicate drop_duplicates()
. These methods are illustrated in the code in Exhibit 25.56.
# Return a new Data Frame with no empty cells:
new_df = df.dropna()
# By default, the dropna() method returns a new DataFrame, and will not change the original. To change the original DataFrame, use the inplace = True argument:
df.dropna(inplace = True)
# To remove duplicates, use the drop_duplicates() method:
df.drop_duplicates(inplace = True)
# Methods to correct/change format, e.g. dates:
df['Date'] = pd.to_datetime(df['Date'])
Use the Search Bar to find content on MarketingMind.
Contact | Privacy Statement | Disclaimer: Opinions and views expressed on www.ashokcharan.com are the author’s personal views, and do not represent the official views of the National University of Singapore (NUS) or the NUS Business School | © Copyright 2013-2025 www.ashokcharan.com. All Rights Reserved.