Python Pandas — Data Cleaning

Pandas can perform a range of data cleaning operations such as removing rows containing empty cells dropna(), or removing duplicate drop_duplicates(). These methods are illustrated in the code in Exhibit 25.56.

Data Cleaning Operations
# Return a new Data Frame with no empty cells:
new_df = df.dropna()

# By default, the dropna() method returns a new DataFrame, and will not change the original. To change the original DataFrame, use the inplace = True argument:
df.dropna(inplace = True)

# To remove duplicates, use the drop_duplicates() method:
df.drop_duplicates(inplace = True)

# Methods to correct/change format, e.g. dates:
df['Date'] = pd.to_datetime(df['Date'])

Exhibit 25.56 Data cleaning operations.


Previous     Next

Use the Search Bar to find content on MarketingMind.