Word Cloud (FB data) in Python

A broader perspective on visualization of social media data with Python are covered in Section Appendix — Python Visualization. In this section is devoted to the application of word clouds for text analytics using the Facebook data in fb_data.txt.

A word cloud is a visual representation of text data where word size indicates frequency or importance, highlighting key themes within the text. It simplifies complex information, making it easier to understand and analyze. For details on the creation of word clouds, and their benefits, refer Section Word Clouds in the Appendix — Python Visualization.

Exhibit 25.21 provides the Python code for generating word cloud for the Facebook data stored in the file fb_data.txt. Here is a breakdown of what it does:

  1. Importing Libraries:
    • wordcloud: This library provides tools for creating word clouds.
    • matplotlib.pyplot: This library helps visualize the word cloud.
  2. Facebook Data File: The code opens the fb_data.txt file and reads its contents into a variable called text.
  3. Setting Up Stop Words: The code imports a set of common words considered unimportant for analysis, like “the”, “and”, “a”, etc. These are stored in stopwords.
  4. Creating the Word Cloud: WordCloud is called to create an image object. We specify:
    • Size: width and height are set to 800 pixels, making it an 800 × 800 image.
    • Background: background_color is set to “white” for a clean background.
    • Stop Words: We tell the word cloud to exclude the stop words defined earlier using stopwords.
    • Minimum Font Size: min_font_size is set to 10 to ensure all words are at least visible.
    • generate(text): This line takes the text data from text (presumably containing the cleaned Facebook data) and uses it to create the word cloud.
  5. Visualizing the Word Cloud:
    • plt.figure: Creates a new figure for displaying the image.
    • plt.imshow(wordcloud): This displays the generated word cloud on the figure.
    • plt.axis("off"): Hides the x and y axes since they are not relevant for the word cloud.
    • plt.tight_layout(pad = 0): Adjusts spacing to ensure the word cloud fills the entire area without extra padding.
    • plt.show(): Finally, this line displays the generated word cloud image on your screen.

Word Cloud
from wordcloud import WordCloud, STOPWORDS # for generation word cloud
import matplotlib.pyplot as plt # for visualization of data
import pandas as pd  # panel data analysis/python data analysis 
import nltk  # natural language toolkit

# Open fb_data.txt file and assign it to the variable f
with open('data/fb_data.txt') as f: 
    # read file f and assign the resulting string to variable text
    text = f.read()  

stopwords = set(STOPWORDS) # Convert stop words list into a set. 

# Generate an 800 X 800 wordcloud image from the tokens in text 
wordcloud = WordCloud(width = 800, height = 800,background_color ='white', stopwords = stopwords, 
     min_font_size = 10).generate(text) 

# plot the WordCloud image                        
plt.figure(figsize = (8, 8), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
  
plt.show() 
    Matplotlib - Word Cloud    

Exhibit 25.21   This code demonstrates how to generate a word cloud from a textual data file containing Facebook posts. Jupyter notebook.

Overall, this code processes the Facebook data, removes unimportant words(stopwords), and then creates a visual representation where word size reflects how often it appears in the text.

Another coding example for generating a word cloud is provided in Section Word Cloud in Python.

Time Series Visualization (FB data) in Python

The representation of data that is collected over time helps to identify trends, patterns, and anomalies within the data. By plotting data points along a timeline, you can visually analyze the movement of values over time.

Exhibit 25.22 provides the Python code for the visualization of time series data. The example uses data stored in the file fb_comments_metrics.csv, which contains information on the likes and shares of comments on Facebook posts throughout a day.

The Seaborn library is used to create the bar plots in this example. Built on top of Matplotlib, Seaborn provides a high-level interface for creating attractive and informative statistical graphics. For more information on Seaborn and to explore additional examples of its use in statistical analysis and visualization, refer to the section Seaborn Visualization in Python.  

The code in Exhibit 25.22 is a continuation of our analysis of Facebook textual data, demonstrating how representing data collected over time can highlight trends, patterns, and anomalies.

The code analyzes data about likes and shares received by Facebook posts throughout a day. Here’s a breakdown of the steps in the analysis:

  1. Import libraries: Import the tools needed to work with the data (like math functions, data frames, and plotting tools).
  2. Read data: Read information about likes and shares from a file called fb_comments_metrics.csv.
  3. Prepare the data for analysis: Set the size and font size for the charts to be clear and easy to read. Extract the time information (hour, day, month, year) from the existing “created_time” column to analyze how metrics change over time.
  4. Likes by Hour: Create a bar chart where the x-axis shows the hours of the day (1-24) and the y-axis shows the average number of likes (blue bars) received for each hour.
  5. Shares by Hour: Repeat step 4 for the number of shares (green bars).
  6. Shares by Day: Set the background of the chart to white and create a bar chart where the x-axis shows the days of the month (1-31) and the y-axis shows the average number of shares (black bars)received for each day.

In short, this code helps you see how the number of likes and shares for Facebook posts change throughout the day and across different days.

Read and View the Facebook Posts Dataset
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

df = pd.read_csv(r"data/fb_comments_metrics.csv", encoding ="latin-1") 
'''
`r` prefix preceding a string literal denotes a raw string literal.
Specifically, backslashes (\) are treated as literal characters, and are not 
used for escaping special characters like they are in regular string literals.
'''
# set figure size and font size
sns.set(rc={'figure.figsize':(40,20)})
sns.set(font_scale=3)
Facebook dataset for time series analysis    
Add Fields to separate out time by Hour, Day, Month and Year
# Separating time by Hour, Day, Month and Year for further analysis using datetime package
import datetime as dt
'''
A lambda function is a small anonymous function that can take any number of arguments, but can only have one expression.
e.g.:
lambda x: x.hour
paramter: x
expression: x.hour ... converts time in datetime format to hour and returns hour
'''
df['time'] = pd.to_datetime(df['created_time'])
df['hour'] = df['time'].apply(lambda x: x.hour)
df['month'] = df['time'].apply(lambda x: x.month)
df['day'] = df['time'].apply(lambda x: x.day)
df['year'] = df['time'].apply(lambda x: x.year)
df.head()
Facebook dataset with additional fields separating out time by Hour, Day, Month and Year for time series analysis    
Bar Plot of Likes by Hour
# set x labels
x_labels = df.hour

#create bar plot
sns.barplot(x=x_labels, y=df.likes, color="blue")

# display the plot
plt.show()

# only show x-axis labels for Jan 1 of every other year
# tick_positions = np.arange(10, len(x_labels), step=24)
Bar plot of likes of Facebook post over the hours during a day - time series analysis    
Bar Plot of Shares by Hour
#create bar plot
sns.barplot(x=x_labels, y=df.shares, color="green")

# display the plot
plt.show()
Bar plot of shares of Facebook post over the hours during a day - time series analysis    
Bar Plot of Shares by Day
# Set the background color to white
sns.set_style(rc = {'axes.facecolor': 'white'})

#create bar plot
sns.barplot(x=df.day, y=df.shares, color="black")

# display the plot
plt.show()
Bar plot of shares of Facebook post over days - time series analysis    

Exhibit 25.22   Seaborn visualization of time series: This analysis of the likes and share of Facebook posts demonstrates how to generate bar plots of the metrics using Seaborn. Jupyter notebook.


Previous     Next

Use the Search Bar to find content on MarketingMind.