Facial expressions are the most eloquent form of body language. The emotions that they convey also tend to be universal, cutting across countries and cultures.
Facial coding is the process of quantifying emotions by tracking and analysing facial expressions. Recorded online through web cameras, consumers’ expressions are analysed by computer algorithms to reveal their emotions.
Dating back to Charles Darwin’s ground-breaking theories, the language of the emotions forms the basis of facial coding. Darwin suggested that facial expressions were innate and common to humans and other mammals.
A century later, in 1972, Paul Ekman’s studies corroborated that a number of facial expressions are found universally across different cultures. He defined six basic facial expressions — anger, happiness, surprise, disgust, sadness and fear — in terms of facial muscle movements. Subsequently, he and Wallace Friesen created the Facial Action Coding System (FACS) that maps the coordinates of the face to the muscle movements associated with the key emotions.
Today, thanks to advances in information technology, facial coding is widely adopted by marketers and researchers for measuring emotions in advertising. Computer algorithms record and analyse human expressions formed by facial features such as eyes, mouth and eyebrows, including tiny movements of facial muscles, to detect a range of emotions. These real-time insights into viewers’ spontaneous, unfiltered reactions to visual content yield a continuous, flow of emotional and cognitive metrics. Many of these responses are so fleeting that consumers may not even remember them, let alone be able to objectively report them.
Well known service providers include EyeSee, Emotient (bought by Apple, 2016), Affdex, Realeyes and Kairos. Affdex, a cloud-based solution developed by Affectiva measures a range of emotions and facial expressions. You may try their demo video, at the Affectiva website.
Facial expressions are formed by the movement of facial muscles. There are 43 facial muscles, almost all innervated by the facial nerve (aka seventh cranial nerve), and attached to a bone and facial tissue, or to facial tissue only.
Facial expressions can appear intentionally or actively, as when putting on a forced smile, or involuntarily, for instance, laughing at a clown.
The facial nerve, which emerges from the brainstem, controls involuntary and spontaneous expressions. Intentional facial expressions on the other hand, are controlled by a different part of the brain, the motor cortex. This is why a fake smile does not appear or feel the same as a genuine smile, it does not reach the eyes.
There are 3 types of facial expressions, categorized as macro expressions, micro expressions and subtle expressions. Macro expressions last up to 4 seconds, and are obvious to the naked eye. Micro expressions, on the other hand, last only a fraction of a second, and are harder to detect. They appear when the subject is either deliberately or unconsciously concealing a feeling.
Subtle expressions are associated with the intensity of the emotion, not the duration. They emerge either at the onset of an emotion, or when the emotional response is of low intensity.
While facial coding can detect macro expressions, it is unable to capture finer micro expressions, or the subtle facial expressions, where the underlying musculature is not active enough to move the skin.
An alternative technique, Facial Electromyography (fEMG), can detect these finer movements by measuring the activation of individual muscles. It also has much higher temporal resolution, which makes it ideal for recording subtle, fleeting expressions. However, since it is not practical to place more than one or two electrodes on the face, the range of expressions fEMG can track is limited. Moreover, the additional hardware also makes it much less versatile than facial coding, for large scale consumer research studies.
Facial Action Coding System (FACS) is a method of measuring facial expressions, and describing observable facial movements. It breaks down facial expressions into elementary components of muscle movement called Action Units (AUs).
Labelled as AU0, AU1, AU2 etc., AUs correspond to individual muscles or muscle groups. They combine in different ways to form facial expressions. The analysis of the AUs of a facial image, therefore, leads to the detection of the expression on the face.
Automated facial coding (AFC) powered by machine learning algorithms and webcams, has led to the propagation of the technology across numerous sectors, including marketing analytics.
AFC is essentially a 3-step process:
Face detection: When you take a photo with a camera or a smart phone, you will see boxes framing the faces of the individuals in the photo. These boxes are appearing because your camera is using face detection algorithms. AFC employs the same technology.
Facial landmark detection (Exhibit 25.17): AFC detects facial landmarks such as eyes and eye corners, brows, mouth corners, and nose tip, and creates a simplified face model that matches the actual face, but only includes the features required for coding.
Coding: Machine learning algorithms analyse the facial landmarks and translate them into action unit codes. The combination of AUs are statistically interpreted to yield metrics for facial expressions.
Emotion, expression and emoji metrics are probabilistic measure. The scores ranging from 0 to 100 indicate the likelihood of a specific emotion or expression.
Facial action coding systems typically capture a wide range metrics and variables, such as the following pertaining to Affectiva:
International Affective Picture System (IAPS): is a database of about 1,000 standardized colour photographs rated on their emotional content. Designed for emotion and attention research, IAPS is widely used in academic and commercial research. Access is restricted only to academics.
The input is essentially a video feed, whether from a laptop, tablet, phone, GoPro, or standalone webcam.
For analysis, the video is split into a number of short intervals or epochs (for instance, 1 second each). A median facial expression score is computed for each respondent, over each epoch, based on a signal threshold algorithm. The respondent is counted if her score exceeds the threshold level. This permits easy quantification and aggregation of the data.
Consider for instance Exhibit 25.18, pertaining to the analysis of a video. It depicts the number of respondents with joyful expressions over the course of the video.
A typical analysis would comprise a series of similar charts, as shown in this Realeyes video, relating to different emotions, facial expressions, as well as metrics pertaining to engagement and sentiment valence.
The information is auto-generated, and is easy interpret.
The potential of facial coding technology is huge considering that it is so easy to implement (webcam-based, no need for controlled location), easy to deploy, scalable, affordable, and since the information is easy to interpret and visualize.
As people around the world watch TV or gaze at their computer screens, marketers might be staring right back, tracking their expressions and analysing their emotions.
BBC, an early adopter, uses webcams and facial coding technology, to track faces as people watch show trailers, to see what kinds of emotions the trailers produced. The broadcasting house has also used the technology to study participant’s reactions to programs like Top Gear and Sherlock.
Research firms, GfK’s EMO Scan for instance, use consumers’ own webcams, with their permission, to track their facial expression in real time as they view advertising.
One of the limitations is that facial expressions vary across cultures, individuals, and even demographics, such as age. Some individuals are expressive, others are impassive. This variance in facial muscle activity is why baselining is often recommended for 5 to 10 seconds at the start of a session.
Due to these reasons it is problematic to measure the intensity of emotional expressions across different stimuli, individuals or cultures. So while computer-based facial coding reveals the valence (positive/negative) and class of emotion, it cannot accurately assess emotional arousal (intensity).
One should consider using it in combination with other biometric technologies, such as galvanic skin response, that are able to capture emotional arousal.