A broader perspective on visualization of social media data with Python are covered in Section Appendix — Python Visualization. In this section is devoted to the application of word clouds for text analytics using the Facebook data in fb_data.txt
.
A word cloud is a visual representation of text data where word size indicates frequency or importance, highlighting key themes within the text. It simplifies complex information, making it easier to understand and analyze. For details on the creation of word clouds, and their benefits, refer Section Word Clouds in the Appendix — Python Visualization.
Exhibit 25.21 provides the Python code for generating word cloud for the Facebook data stored in the file fb_data.txt
. Here is a breakdown of what it does:
wordcloud
: This library provides tools for creating word clouds.
matplotlib.pyplot
: This library helps visualize the word cloud.
fb_data.txt
file and reads its contents into a variable called text
.
stopwords
.
WordCloud
is called to create an image object. We specify:
width
and height
are set to 800 pixels, making it an 800 × 800 image.
background_color
is set to “white” for a clean background.
stopwords
.
min_font_size
is set to 10 to ensure all words are at least visible.
generate(text)
: This line takes the text data from text (presumably containing the cleaned Facebook data) and uses it to create the word cloud.
plt.figure
: Creates a new figure for displaying the image.
plt.imshow(wordcloud)
: This displays the generated word cloud on the figure.
plt.axis("off")
: Hides the x and y axes since they are not relevant for the word cloud.
plt.tight_layout(pad = 0)
: Adjusts spacing to ensure the word cloud fills the entire area without extra padding.
plt.show()
: Finally, this line displays the generated word cloud image on your screen.
from wordcloud import WordCloud, STOPWORDS # for generation word cloud
import matplotlib.pyplot as plt # for visualization of data
import pandas as pd # panel data analysis/python data analysis
import nltk # natural language toolkit
# Open fb_data.txt file and assign it to the variable f
with open('data/fb_data.txt') as f:
# read file f and assign the resulting string to variable text
text = f.read()
stopwords = set(STOPWORDS) # Convert stop words list into a set.
# Generate an 800 X 800 wordcloud image from the tokens in text
wordcloud = WordCloud(width = 800, height = 800,background_color ='white', stopwords = stopwords,
min_font_size = 10).generate(text)
# plot the WordCloud image
plt.figure(figsize = (8, 8), facecolor = None)
plt.imshow(wordcloud)
plt.axis("off")
plt.tight_layout(pad = 0)
plt.show()
Overall, this code processes the Facebook data, removes unimportant words(stopwords), and then creates a visual representation where word size reflects how often it appears in the text.
Another coding example for generating a word cloud is provided in Section Word Cloud in Python.
The representation of data that is collected over time helps to identify trends, patterns, and anomalies within the data. By plotting data points along a timeline, you can visually analyze the movement of values over time.
Exhibit 25.22 provides the Python code for the visualization of time series data. The example uses data stored in the file fb_comments_metrics.csv
, which contains information on the likes and shares of comments on Facebook posts throughout a day.
The Seaborn library is used to create the bar plots in this example. Built on top of Matplotlib, Seaborn provides a high-level interface for creating attractive and informative statistical graphics. For more information on Seaborn and to explore additional examples of its use in statistical analysis and visualization, refer to the section Seaborn Visualization in Python.
The code in Exhibit 25.22 is a continuation of our analysis of Facebook textual data, demonstrating how representing data collected over time can highlight trends, patterns, and anomalies.
The code analyzes data about likes and shares received by Facebook posts throughout a day. Here’s a breakdown of the steps in the analysis:
fb_comments_metrics.csv
.
In short, this code helps you see how the number of likes and shares for Facebook posts change throughout the day and across different days.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
df = pd.read_csv(r"data/fb_comments_metrics.csv", encoding ="latin-1")
'''
`r` prefix preceding a string literal denotes a raw string literal.
Specifically, backslashes (\) are treated as literal characters, and are not
used for escaping special characters like they are in regular string literals.
'''
# set figure size and font size
sns.set(rc={'figure.figsize':(40,20)})
sns.set(font_scale=3)
# Separating time by Hour, Day, Month and Year for further analysis using datetime package
import datetime as dt
'''
A lambda function is a small anonymous function that can take any number of arguments, but can only have one expression.
e.g.:
lambda x: x.hour
paramter: x
expression: x.hour ... converts time in datetime format to hour and returns hour
'''
df['time'] = pd.to_datetime(df['created_time'])
df['hour'] = df['time'].apply(lambda x: x.hour)
df['month'] = df['time'].apply(lambda x: x.month)
df['day'] = df['time'].apply(lambda x: x.day)
df['year'] = df['time'].apply(lambda x: x.year)
df.head()
# set x labels
x_labels = df.hour
#create bar plot
sns.barplot(x=x_labels, y=df.likes, color="blue")
# display the plot
plt.show()
# only show x-axis labels for Jan 1 of every other year
# tick_positions = np.arange(10, len(x_labels), step=24)
#create bar plot
sns.barplot(x=x_labels, y=df.shares, color="green")
# display the plot
plt.show()
# Set the background color to white
sns.set_style(rc = {'axes.facecolor': 'white'})
#create bar plot
sns.barplot(x=df.day, y=df.shares, color="black")
# display the plot
plt.show()
Use the Search Bar to find content on MarketingMind.
Contact | Privacy Statement | Disclaimer: Opinions and views expressed on www.ashokcharan.com are the author’s personal views, and do not represent the official views of the National University of Singapore (NUS) or the NUS Business School | © Copyright 2013-2024 www.ashokcharan.com. All Rights Reserved.