Exhibit 25.47 Social network analysis process.
Social network analysis (SNA) is a method used to understand and map out relationships and interactions within social networks. The process typically involves four key steps: data collection, data preparation, data visualization, and analysis.
The first step in SNA is gathering data, often from an API, depending on the social platform being analysed. For example, Pinterest’s REST API allows developers to access data about users, boards, and pins.
Pinterest Data Collection via REST API
Pinterest offers a REST API for developers or third parties to extract valuable information, such as user activity, pin creation, and board descriptions. To access Pinterest data, developers must:
- Create an application under their Pinterest account.
- Obtain an App ID and App Secret to authenticate API requests, similar to the processes followed for Facebook or Twitter APIs.
- Retrieve an access token to access the authenticated user’s data (e.g., boards and pins created by the user).
Limitations of the Pinterest API
Pinterest’s API limits the data to pins and boards owned by the authenticated users. For more comprehensive data on public boards and pins, web scraping techniques may be employed. Through scraping, you can extract lists of pins, boards, and their associated users, along with their titles and descriptions.
Once data is collected, the next step is to prepare it for analysis. This involves cleaning the data and extracting relationships between entities.
Data cleaning is essential for removing irrelevant noise, such as extra whitespaces, HTML tags, URLs, standardizing words (e.g., splitting attached words), converting text to lowercase, removing stopwords, and tokenizing the text for further analysis.
Bigram Extraction
To understand common topics or themes within the dataset, bigrams (pairs of words that frequently appear together) can be extracted from the cleaned text. This provides insights into the prevalent relationships and topics.
The code to extract bigrams using Python’s nltk
library is provided in Exhibit 25.48.
import nltk
from nltk.collocations import *
Exhibit 25.48 Code to extract bigrams using Python’s nltk
library.
After preparing the data, the next step is to visualize the social network, often represented as graphs where nodes represent users, and edges (links) represent relationships.
In a typical Pinterest analysis, centrality measures can be used to identify key topics or users. The following types of centrality measures are commonly used in SNA:
For instance, in a small dataset related to fashion on Pinterest, you may find that topics like “fashion trends” have high degree and closeness centrality, indicating a strong influence in the network.
The code to calculating centrality measures, detecting communities and visualizing networks using Python’s networkx
library is provided in Exhibit 25.49.
Exhibit 25.49 Calculating centrality measures, detecting communities and visualizing networks using Python’s networkx
library.
Identifying Influential Users
Beyond visualizing the data, it is important to identify key influencers within the network. Instead of focusing on metrics like follower count, SNA focuses on content creation and topic relevance. For example, in a fashion network, users who publish content around trending topics (as identified by bigram extraction) can be considered influencers.
Clustering algorithms can reveal distinct user communities. In visualized graphs, these communities are often represented by different colours. Larger clusters are typically found at the centre of the graph, while smaller, more targeted clusters form on the periphery. Each cluster represents users with similar interests, and analysing these clusters can provide insights into niche communities within the larger network.
Clustering Example
Consider a Pinterest analysis focused on fashion. The graph might show:
- Central clusters: Representing users with diverse interests or highly connected users.
- Peripheral clusters: Representing niche communities with more focused content.
Each cluster is colour-coded, revealing insights into the subtopics of interest, such as fashion trends, sustainable fashion, or seasonal styles.