Standard Sentiment Analysis

Exhibit 25.27 demonstrates the implementation of sentiment analysis using Python. With a dataset of tweets (tweets.csv, the code processes each tweet in the text column and generates sentiment scores using the VADER SentimentIntensityAnalyzer.

The SentimentIntensityAnalyzer class provides a method called polarity_scores() that takes a piece of text as input and returns a dictionary containing the sentiment scores. This dictionary contains four keys: “neg”, “neu”, “pos”, and “compound”.

The initial visualization presents a summary that shows the count of positive, negative, and neutral tweets in a column chart.

The “compound” score (refer VADER Classifier) in the polarity_scores function represents the overall sentiment polarity of each tweet. It is a single value that encapsulates the compound sentiment score, considering both the positive and negative sentiments expressed in the text.

This score is computed by combining the sentiment scores for individual words in the text. It ranges from -1 to 1, where 1 indicates a highly positive sentiment, -1 indicates a highly negative sentiment, and 0 indicates a neutral sentiment.

The script concludes with a table listing the tweets alongside their compound sentiment scores, allowing you to assess the effectiveness of the VADER algorithm.

Import Libraries and Read the Tweets Data (tweets.csv)
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import matplotlib.pyplot as plt
import pandas as pd
# read the tweets and place into dataframe df
df = pd.read_csv("data/tweets.csv")
df[:3]
Tweets dataset
Sentiment Analysis
sentiment = SentimentIntensityAnalyzer()

''' apply sentiment analysis to the text column in df (tweets) and place the 
compound polarity scores in a new column - sentiment.''' 
df['sentiment'] = df.text.apply(lambda x: sentiment.polarity_scores(x)['compound'])

pos = len(df[df.sentiment > 0]) # count of 'positive' tweets
neg = len(df[df.sentiment < 0]) # count of 'negative' tweets
neu = len(df[df.sentiment == 0]) # count of 'neutral' tweets

y = [pos, neu, neg]	  # vector of y-values
print("positive, neutral, negative: ", y)

# plot of y
plt.title("Sentiment Analysis")
plt.ylabel('Number of tweets')

# the x-axis, range(len(y) is 0, 1, 2. Label these as 'positive', 'neutral', 'negative' 
plt.xticks(range(len(y)), ['positive', 'neutral', 'negative'])

# plot a bar chart where the y-axis (height) is y, and the x-axis (0, 1, 2) is labelled 'positive', 'neutral', 'negative'
plt.bar(range(len(y)), height=y, width = 0.75, align = 'center', alpha = 0.8)
plt.show()
positive, neutral, negative:  [3272, 1351, 1821]
Sentiment analysis
Compound Score
df[['text','sentiment']] # show the columns text and sentiment
Sentiment analysis - Compound Score - VADER
Tweets and Compound Sentiment Score
'''Setting 'display.max_colwidth' to None removes any restriction on the column width.
Pandas will display the full content of each cell instead of truncating long text values.
This is particularly useful when working with text-heavy data such as sentiment analysis, 
where tweets, reviews, or comments need to be fully visible.'''
pd.set_option('display.max_colwidth', None)
df[['text','sentiment']]
Sentiment analysis - tweet and VADER compound sentiment score

Exhibit 25.27 Sentiment analysis of tweets — Python implementation using VADER SentimentIntensityAnalyzer(). Jupyter notebook.


Previous     Next

Use the Search Bar to find content on MarketingMind.