Exhibit 25.27 demonstrates the implementation of sentiment analysis using Python. With a dataset of tweets (tweets.csv
, the code processes each tweet in the text column and generates sentiment scores using the VADER SentimentIntensityAnalyzer
.
The SentimentIntensityAnalyzer
class provides a method called polarity_scores() that takes a piece of text as input and returns a dictionary containing the sentiment scores. This dictionary contains four keys: “neg”, “neu”, “pos”, and “compound”.
The initial visualization presents a summary that shows the count of positive, negative, and neutral tweets in a column chart.
The “compound” score (refer VADER Classifier) in the polarity_scores
function represents the overall sentiment polarity of each tweet. It is a single value that encapsulates the compound sentiment score, considering both the positive and negative sentiments expressed in the text.
This score is computed by combining the sentiment scores for individual words in the text. It ranges from -1 to 1, where 1 indicates a highly positive sentiment, -1 indicates a highly negative sentiment, and 0 indicates a neutral sentiment.
The script concludes with a table listing the tweets alongside their compound sentiment scores, allowing you to assess the effectiveness of the VADER algorithm.
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import matplotlib.pyplot as plt
import pandas as pd
# read the tweets and place into dataframe df
df = pd.read_csv("data/tweets.csv")
df[:3]
sentiment = SentimentIntensityAnalyzer()
''' apply sentiment analysis to the text column in df (tweets) and place the
compound polarity scores in a new column - sentiment.'''
df['sentiment'] = df.text.apply(lambda x: sentiment.polarity_scores(x)['compound'])
pos = len(df[df.sentiment > 0]) # count of 'positive' tweets
neg = len(df[df.sentiment < 0]) # count of 'negative' tweets
neu = len(df[df.sentiment == 0]) # count of 'neutral' tweets
y = [pos, neu, neg] # vector of y-values
print("positive, neutral, negative: ", y)
# plot of y
plt.title("Sentiment Analysis")
plt.ylabel('Number of tweets')
# the x-axis, range(len(y) is 0, 1, 2. Label these as 'positive', 'neutral', 'negative'
plt.xticks(range(len(y)), ['positive', 'neutral', 'negative'])
# plot a bar chart where the y-axis (height) is y, and the x-axis (0, 1, 2) is labelled 'positive', 'neutral', 'negative'
plt.bar(range(len(y)), height=y, width = 0.75, align = 'center', alpha = 0.8)
plt.show()
positive, neutral, negative: [3272, 1351, 1821]![]()
df[['text','sentiment']] # show the columns text and sentiment
'''Setting 'display.max_colwidth' to None removes any restriction on the column width. Pandas will display the full content of each cell instead of truncating long text values. This is particularly useful when working with text-heavy data such as sentiment analysis, where tweets, reviews, or comments need to be fully visible.''' pd.set_option('display.max_colwidth', None) df[['text','sentiment']]
SentimentIntensityAnalyzer()
. Jupyter notebook.Use the Search Bar to find content on MarketingMind.
Contact | Privacy Statement | Disclaimer: Opinions and views expressed on www.ashokcharan.com are the author’s personal views, and do not represent the official views of the National University of Singapore (NUS) or the NUS Business School | © Copyright 2013-2025 www.ashokcharan.com. All Rights Reserved.