Social Network Analysis Process

Exhibit 25.47 Social network analysis process.

Social network analysis (SNA) is a method used to understand and map out relationships and interactions within social networks. The process typically involves four key steps: data collection, data preparation, data visualization, and analysis.

1. Data Collection

The first step in SNA is gathering data, often from an API, depending on the social platform being analysed. For example, Pinterest’s REST API allows developers to access data about users, boards, and pins.

Pinterest Data Collection via REST API

Pinterest offers a REST API for developers or third parties to extract valuable information, such as user activity, pin creation, and board descriptions. To access Pinterest data, developers must:

Create an application under their Pinterest account.
Obtain an App ID and App Secret to authenticate API requests, similar to the processes followed for Facebook or Twitter APIs.
Retrieve an access token to access the authenticated user’s data (e.g., boards and pins created by the user).

Limitations of the Pinterest API

Pinterest’s API limits the data to pins and boards owned by the authenticated users. For more comprehensive data on public boards and pins, web scraping techniques may be employed. Through scraping, you can extract lists of pins, boards, and their associated users, along with their titles and descriptions.

2. Data Preparation

Once data is collected, the next step is to prepare it for analysis. This involves cleaning the data and extracting relationships between entities.

Data cleaning is essential for removing irrelevant noise, such as extra whitespaces, HTML tags, URLs, standardizing words (e.g., splitting attached words), converting text to lowercase, removing stopwords, and tokenizing the text for further analysis.

Bigram Extraction

To understand common topics or themes within the dataset, bigrams (pairs of words that frequently appear together) can be extracted from the cleaned text. This provides insights into the prevalent relationships and topics.

The code to extract bigrams using Python’s nltk library is provided in Exhibit 25.48.

# Import nltk for text processing
import nltk 

# Import all functions and classes related to 
# collocations (word pairings) from nltk.collocations
from nltk.collocations import * 

# Import stopwords from the nltk corpus.
from nltk.corpus import stopwords 

# Import the re (regular expressions) module 
# to handle text pattern matching and manipulation
import re  

# Initialize an object to measure bigrams (word pairs) using statistical methods
bigram_measures = nltk.collocations.BigramAssocMeasures() 

# Create a finder object that identifies bigrams from 
# the given 'documents' (a list of tokenized texts)
bigram_finder = BigramCollocationFinder.from_documents(documents)

Exhibit 25.48 Code to extract bigrams using Python’s `nltk` library.

After preparing the data, the next step is to visualize the social network, often represented as graphs where nodes represent users, and edges (links) represent relationships.

3. Visualizing Relationships and Centrality

In a typical Pinterest analysis, centrality measures can be used to identify key topics or users. The following types of centrality measures are commonly used in SNA:

Degree Centrality
Closeness Centrality
Betweenness Centrality
Eigenvalue Centrality

For instance, in a small dataset related to fashion on Pinterest, you may find that topics like “fashion trends” have high degree and closeness centrality, indicating a strong influence in the network.

The code to calculating centrality measures, detecting communities and visualizing networks using Python’s networkx library is provided in Exhibit 25.49.

#1. Loading a dataset:
import networkx as nx

# Load a dataset from a CSV file
G = nx.read_edgelist("data.csv", delimiter=",")

#2. Calculating centrality measures:
# Calculate degree centrality
degree_centrality = nx.degree_centrality(G)

# Calculate betweenness centrality
betweenness_centrality = nx.betweenness_centrality(G)

#3. Detecting communities:
# Detect communities using the Louvain algorithm
communities = nx.algorithms.community.louvain_communities(G)

#4. Visualizing a network:
import matplotlib.pyplot as plt

# Draw the network
nx.draw(G, with_labels=True)
plt.show()

Exhibit 25.49 Calculating centrality measures, detecting communities and visualizing networks using Python’s `networkx` library.

4. Analysis

Identifying Influential Users

Beyond visualizing the data, it is important to identify key influencers within the network. Instead of focusing on metrics like follower count, SNA focuses on content creation and topic relevance. For example, in a fashion network, users who publish content around trending topics (as identified by bigram extraction) can be considered influencers.

Uncovering User Communities

Clustering algorithms can reveal distinct user communities. In visualized graphs, these communities are often represented by different colours. Larger clusters are typically found at the centre of the graph, while smaller, more targeted clusters form on the periphery. Each cluster represents users with similar interests, and analysing these clusters can provide insights into niche communities within the larger network.

Clustering Example

Consider a Pinterest analysis focused on fashion. The graph might show:

Central clusters: Representing users with diverse interests or highly connected users.
Peripheral clusters: Representing niche communities with more focused content. Each cluster is colour-coded, revealing insights into the subtopics of interest, such as fashion trends, sustainable fashion, or seasonal styles.

Previous Next

Use the Search Bar to find content on MarketingMind.