“If the quality of emotional experience is derived from expressive behavior, then, if people were induced to express an emotion, it would be expected that they would subsequently report feeling that emotion” Laird (1974).
When I did my little review of studies that Laird claimed showed evidence that facial feedback results in an emotional experience (and experience is key), I didn’t pay much attention to theory. I focus on method and results. It is not because theory is uninteresting. It is more that the theoretical background may not be that well developed. It was used to justify the experiments, but I think the methods and results are more interesting at this point (in order to better chisel out new versions of theories. Theories may well be wrong, they are subject to fashion – such as the old computer model of psychology which I noted when I went through the Srull & Wyer trace.)
But, I looked a little bit at what motivated Laird in the first place – because it does come through in the additional studies.
As the title of his paper suggests, he thinks of the emotional experience as kind of an outcome of self-attribution. Several parts go into this self-attribution as data (in his words), and these can strengthen and discount the different components.
This is grounded in the Schachter and Singer (1962) experiment, as well as the James-Lange notion that experienced emotion is the result of perception of bodily changes.
Physiological arousal serves as data about the intensity of the experience. Expectations and context can then shape this experience in different ways. One could take it as a signal of an intensity of emotional experience, but, if one has another explanation for why one feels aroused (epinephrine, coffee, high bridges, quick run) the emotion-attribution can be discounted. (I’m not afraid, I just had too much coffee).
The ambiguity or non-ambiguity of the situation also matters. If, for some unfathomable reason I found myself riding Himmels-skippet at Tivoli, I would correctly attribute my feelings to my very reasonable fear of heights, and not to the lovely double-shot latte I would have downed shortly before, whereas if I sit in my first-floor office, and feel the same thing, I might wonder if had had one too many coffees.
But, he asks, more information than just arousal and situation could contribute to this self-attribution of emotion. If my eyebrows are raised, my mouth agape, my shoulders hunched, the feedback from my body would contribute to my emotional experience – which is what he is interested in.
So he is actually interested in whether people experience the emotion they are physically expressing.
Which, of course, means that you have to disguise this interest, considering that participants usually want to be helpful, and answer as suggested.
So much of the experiment happens “in plain sight” so to speak. The faces are posed, their emotions are assessed (frequently), but it is all very cleverly disguised by giving the cover-story (interest in perception when tensing-relaxing muscles. Need to measure what you feel, because the emotional state might influence the results), and removing those participants that guessed that the position of their face might have something to do with the emotional state (though the details of the debriefing aren’t stated – just that if at any point participants indicated a connection between the two, they were considered aware, and then excluded).
I can see why the Strack paradigm became so appreciated. The set-up is very elaborate, the cover story more complex than the cover story for Strack, and, even with all the care, all people that were aware of the connection may not have been removed.
Moving quickly into “use”
One thing that struck me is how quickly the paradigm was moved into use, rather than testing for boundary conditions and replicability. (Of course, I have a very small sample, and only the one that Laird provided as evidence for facial feedback, but a very quick check on citations suggest that there may not have been many more that posed facial expressions at that time – this is something to probe further in the future).
In paper two they establish an individual difference between those that are susceptible to the facial feedback and those that seem to be more susceptible to environmental cues (and, somewhat interestingly, the data on how facial feedback results in emotional experience is actually so weak in this paper that this really should be counted as a non-replication than a replication).
This division into those that are susceptible to feedback and those that are not are then used in several subsequent papers. The result from the manipulation is often not reported directly (that is, their emotion rating). The method for dividing them up is also somewhat convoluted. Participants have their faces posed, they view something of the opposite emotional meaning, they rate their emotional state. Then, they combine the ratings to get a final number that indicate whether they were more influenced by the facial position, or the emotion signal from what they viewed.
The upshot is that even if the posing method is the same for just about all of the experiments (bar one), the DV that is reported vary quite a bit, to the point that one has to be really careful interpreting the p-curve I included, because the DV’s vary so much. Can you really compare number of correct recall of emotion-congruent items with the ratings of emotions on a check-list?
The p-curve I included in the blog-post was also the most generous p-curve (I included the DV with the highest F or t from each experiment). I played around with it a bit and did a p-curve where I only included ratings on aggression/negativity whenever it was feasible. (I kept 2 where there is no valence indicated). The curve looks flatter, does not suggest evidentiary value, but still doesn’t suggest p-hacking. (P-curve aggression/negativity)
Emotion congruence – example of moving quickly to use
I wanted to highlight the paper on emotion-congruent recall (Laird, Wagener, Halal & Szegda, 1982). It was published the year after Bower’s Mood and Memory, which I think is considered the Ur mood and memory paper (it is the one that should be cited). They use the “individual difference” technique, and end up with a very small sample in the self-produced (that is, emotional) groups: 9 and 10. They do use repeated measures, because, as they state, there is too much heterogeneity between people when it comes to memory to safely use a between subjects design. But, the samples are very small for an induction that is rather weak.
I spent my graduate years working on emotion-cognition questions (mostly about congruent processing and emotional response categorization), and getting data on emotion congruence is – difficult. It requires quite a bit of power. We routinely started out with 30 participants in each condition, and used repeated measures designs (although we kept the emotion condition between participants). We used movies and/or music to induce emotion. Participants spent 12 minutes getting induced before doing their task, and, if at all possible, we kept the emotional music going during the task.
The induction was very effective. We always did a manipulation check, using something very similar to the MSCL – the BMIS. 16 adjectives that were rated on a 4 point scale. That the induction worked was never evident by simply looking at each participants BMIS ratings. It only became clear when we aggregated across adjectives, but then it was evident. In fact, I collected a lot of the emotion ratings that I still have for a paper on how we use film in psychological research to induce emotions.
But, a lot of the data I collected were from research that we never published, because on the subsequent task we either got nothing, or got results we couldn’t interpret. So, when I see the results for emotion-congruent recall for a lot fewer participants with a much weaker induction, it gives me pause. It is not that I don’t think they got what they got. I’m just not sure what it should be attributed to, especially since we don’t have their ratings of emotionality.
Reporting of information
This leads me into a brief comment on the reporting of information in old studies. It is very hit and miss. I saw no standard deviations anywhere. Lots of intermediate results missing (what were the aggregated emotion ratings after having posed the face – even though that wasn’t the main part that was interesting). It is also not very standardized (which, perhaps, wasn’t to be expected). For several of the papers, there was no information that I could use.
A plus is that they do use repeated measures for the posing in many instances. Each participant is exposed to all of the conditions, and sometimes they are quite a few.
The long durations of holding the expression
But, perhaps one would question some of the long durations for holding ones face still. The original Laird paradigm asks participants to hold their face in a pose for 15 second. That’s a while. Now, of course, I think many of us have experienced those situations where nothing could wipe the expression off our faces (I particularly recall youthful encounters with enchanting individuals, or times of skeptically pulling ones eyebrows together for a longish time), but generally expressions are quick, dynamic and fleeting. Fifteen seconds is still somewhat reasonable. Seven minutes (as in the Rhodewalt & Comer Paper), while writing a counter-attitudinal essay…. In fact, in a couple of papers, they employed an experimenter that looked at the participant to make sure they kept their face in the pose for the duration (which was a lot longer than 15 seconds).
The high-level constructs
The initial theoretical background is, in many ways, rather high-level. There is input from the face, from the viscera and from the environment (the pictures), which then may give rise to experienced emotion. But, I don’t see anything there moving closer to anything more physiological. It is all posing followed by adjective ratings, followed by classical social-psychological tasks. But, in at least one (one of the works comparing normal weight and underweight) the posing of the face and subsequent experience of emotion (or not) is taken as a measure of proprioception. And, perhaps it is, but there seems to be a number of steps more that one would need to check. Now, I know from my other readings that there have been experiments where they have posed the face and then measured physiological reactions, so it may be that it is just in this small sample that this is not well covered.
Finally, the individual differences (which I mentioned above). The handful of papers is dominated by this individual difference between those that use facial feedback for emotion attribution, and those that use the situation. But in the small handful of papers on facial feedback published later that I have looked at, this is not mentioned. Has it been forgotten? Is it not interesting? Has this possible individual difference in sensitivity to bodily feedback been pursued elsewhere?