Exhibit 25.60 provides the Python code for generating word cloud for the YouTube data stored in the file youtube_comments.csv
. Here is a breakdown of what it does:
wordcloud
: This library provides tools for creating word clouds.
matplotlib.pyplot
: This library helps visualize the word cloud.
youtube_comments.csv
table and reads it into the dataframe variable df
.
comment_words
.
stopwords
.
WordCloud
is called to create an image object. We specify:
width
and height
are set to 800 pixels, making it an 800 × 800 image.
background_color
is set to “white” for a clean background.
stopwords
.
min_font_size
is set to 10 to ensure all words are at least visible.
generate(comment_words)
: This line takes the words from comment_words and uses it to create the word cloud.
plt.figure
: Creates a new figure for displaying the image.
plt.imshow(wordcloud)
: This displays the generated word cloud on the figure.
plt.axis("off")
: Hides the x and y axes since they are not relevant for the word cloud.
plt.tight_layout(pad = 0)
: Adjusts spacing to ensure the word cloud fills the entire area without extra padding.
plt.show()
: Finally, this line displays the generated word cloud image on your screen.
# importing all necessary modules
from wordcloud import WordCloud, STOPWORDS # for generation word cloud
import matplotlib.pyplot as plt # for visualization of data
import pandas as pd # panel data analysis/python data analysis
import nltk # natural language toolkit
# Read 'youtube_comments.csv' file - containing data from YouTube
df = pd.read_csv("data/youtube_comments.csv")
df[:5] # print first records
stopwords = set(STOPWORDS) # Convert stop words list into a set.
# Extract words from the CONTENT field and save into string variable comment_words
comment_words = ''
for val in df.CONTENT: # iterate through the table's CONTENT field
# typecast each val to string
val = str(val)
# split the text in val and save the words into tokens
tokens = val.split() # list of words from CONTENT field
# Converts each token in the tokens to lowercase
for i in range(len(tokens)):
tokens[i] = tokens[i].lower()
# Join (i.e. concatenate) all tokens separate by blank spaces
comment_words += " ".join(tokens)+" " # this will create a large string of tokens
print(comment_words[:1000]) # print the first 1000 characters in comment_words
# Generate an 800 X 800 wordcloud image from the tokens in comment_words
stopwords = set(STOPWORDS) # Convert stop words list into a set.
wordcloud = WordCloud(width = 800, height = 800,background_color ='white', stopwords = stopwords,
min_font_size = 10).generate(comment_words)
# plot the WordCloud image
plt.figure(figsize = (8, 8), facecolor = None)
plt.imshow(wordcloud)
plt.axis("off")
plt.tight_layout(pad = 0)
plt.show()
This code processes the comments in a YouTube data file, removes unimportant words (stopwords), and then creates a visual representation of the comments where word size reflects how often it appears in the text.
Another coding example for generating a word cloud is provided in section Word Cloud (FB data) in Python.
Note that you can also use online resources such Word Cloud Generator to quickly generate word clouds.
Use the Search Bar to find content on MarketingMind.
Contact | Privacy Statement | Disclaimer: Opinions and views expressed on www.ashokcharan.com are the author’s personal views, and do not represent the official views of the National University of Singapore (NUS) or the NUS Business School | © Copyright 2013-2025 www.ashokcharan.com. All Rights Reserved.