Word Cloud in Python

Exhibit 25.60 provides the Python code for generating word cloud for the YouTube data stored in the file youtube_comments.csv. Here is a breakdown of what it does:

Importing Libraries:
- wordcloud: This library provides tools for creating word clouds.
- matplotlib.pyplot: This library helps visualize the word cloud.
YouTube Data File: The code opens the youtube_comments.csv table and reads it into the dataframe variable df.
Extract Words from YouTube Comments: Extract words from the CONTENT field and save into string variable comment_words.
Setting Up Stop Words: The code imports a set of common words considered unimportant for analysis, like “the”, “and”, “a”, etc. These are stored in stopwords.
Creating the Word Cloud: WordCloud is called to create an image object. We specify:
- Size: width and height are set to 800 pixels, making it an 800 × 800 image.
- Background: background_color is set to “white” for a clean background.
- Stop Words: We tell the word cloud to exclude the stop words defined earlier using stopwords.
- Minimum Font Size: min_font_size is set to 10 to ensure all words are at least visible.
- generate(comment_words): This line takes the words from comment_words and uses it to create the word cloud.
Visualizing the Word Cloud:
- plt.figure: Creates a new figure for displaying the image.
- plt.imshow(wordcloud): This displays the generated word cloud on the figure.
- plt.axis("off"): Hides the x and y axes since they are not relevant for the word cloud.
- plt.tight_layout(pad = 0): Adjusts spacing to ensure the word cloud fills the entire area without extra padding.
- plt.show(): Finally, this line displays the generated word cloud image on your screen.

Import Modules and Read Data

# importing all necessary modules 
from wordcloud import WordCloud, STOPWORDS # for generation word cloud
import matplotlib.pyplot as plt # for visualization of data
import pandas as pd  # panel data analysis/python data analysis 
import nltk  # natural language toolkit

# Read 'youtube_comments.csv' file - containing data from YouTube  
df = pd.read_csv("data/youtube_comments.csv")
df[:5]  # print first records

Extract Tokens into a word string 'comment_words'

stopwords = set(STOPWORDS) # Convert stop words list into a set.

# Extract words from the CONTENT field and save into string variable comment_words
comment_words = '' 
for val in df.CONTENT: # iterate through the table's CONTENT field
      
    # typecast each val to string 
    val = str(val) 
  
    # split the text in val and save the words into tokens
    tokens = val.split()  # list of words from CONTENT field 
      
    # Converts each token in the tokens to lowercase 
    for i in range(len(tokens)): 
        tokens[i] = tokens[i].lower() 
      
    # Join (i.e. concatenate) all tokens separate by blank spaces
    comment_words += " ".join(tokens)+" " # this will create a large string of tokens     
    
print(comment_words[:1000]) # print the first 1000 characters in comment_words

+447935454150 lovely girl talk to me xxx i always end up coming back to this song
my sister just received over 6,500 new #active youtube views right now. the only thing she used was pimpmyviews. com cool hello i'am from palastine wow this video almost has a billion views! didn't know it was so popular go check out my rapping video called four wheels please ❤️ almost 1 billion aslamu lykum... from pakistan eminem is idol for very people in españa and mexico or latinoamerica help me get 50 subs please i love song :) alright ladies, if you like this song, then check out john rage. he's a smoking hot rapper coming into the game. he's not better than eminem lyrically, but he's hotter. hear some of his songs on my channel. the perfect example of abuse from husbands and the thing is i'm a feminist so i definitely agree with this song and well...if i see this someone

Generate Wordcloud

# Generate an 800 X 800 wordcloud image from the tokens in comment_words 
stopwords = set(STOPWORDS) # Convert stop words list into a set. 
wordcloud = WordCloud(width = 800, height = 800,background_color ='white', stopwords = stopwords, 
     min_font_size = 10).generate(comment_words) 

# plot the WordCloud image                        
plt.figure(figsize = (8, 8), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
plt.show()

Exhibit 25.60 This code demonstrates how to generate a word cloud of the comments from a csv data file containing YouTube posts. Jupyter notebook.

This code processes the comments in a YouTube data file, removes unimportant words (stopwords), and then creates a visual representation of the comments where word size reflects how often it appears in the text.

Another coding example for generating a word cloud is provided in section Word Cloud (FB data) in Python.

Note that you can also use online resources such Word Cloud Generator to quickly generate word clouds.

Previous Next

Use the Search Bar to find content on MarketingMind.