Blog post on the occasion of Strack Stepper & Martin not replicating, and thoughts about what to do next.

Strack didn’t replicate. STRACK DIDN’T REPLICATE. If you wonder which Strack (which, really, one should as he is prolific), I’m clarifying – it is the one where you stick a pen in your mouth and it makes you think a cartoon is more (or less) amusing depending on how that pen-holding is screwing up your face. Correct is to call it Strack Martin & Stepper (1988).

And, I’m a bit sad. We were going to be part of the replication effort, but last fall semester hit hard, and I had to give up. We needed to collect data before students had heard of the experiment, and we just did not get it together in time. I had predicted there would be an effect.

But, perhaps that should have been a bit moderated. I believed SOMETHING would happen, based on work that were done in the Niedenthal lab, but that effect was a little bit different.

She did a series of morphing studies, where faces changed from one expression to another, and people had to detect the change. The main exploration was whether emotional state has an effect on perception of emotional stimuli. But, in one variant she used the Strack manipulation. For good reason. You want to tease out the edges of an effect. Would it be enough with just facial feedback, or did we need to do the full-blown emotion induction? But, what seems to have happened instead was that holding the pen in the mouth disrupted mimicry – that is the published story.

But, I really think we need to put the study in context. As I wrote in my blog on my beginning trace, the paper is just one tile in the mosaic of studies investigating the role of facial feedback in emotion  processing (and I’m deliberately vague).

One can easily trace this back all the way to the James-Lange theory of emotion, which crudely (and somewhat errouneously) is portrayed in introductory books as you feel afraid, because you are running away from the Bear.

But lets narrow it a bit more – the Strack experiment was part of a much larger body of research looking at the role of facial feedback.

The facial feedback story (as I tell it to my undergraduates) goes roughly like this (admittedly with plenty licence).

When Ekman, Friesen and Ellsworth were figuring out which facial movements could be considered primitives of expressions (the FACS), evidently they noticed that when they worked on furrowing brows, and gaping their mouths into snarly shapes, they got into more snapping and actual snarling * Could it possibly be that screwing up your face into emotional expressions resulted in a feedback to the emotion processing areas in the brain, possibly giving rise to a faint experience of that actual emotion. And, from there, they proceeded to experiment on that notion, usually by asking people to position their face in a certain way (e.g. pull down the outer corners of your mouth. Stick out your tounge. Wrinkle your nose).

I don’t think they were alone pursuing facial feedback. Zajonc has worked on this. Laird has worked on this. Hess has worked on this. Alan Fridlund looked at this. Levenson, Lanzetta, and Gross (the beginning of emotion regulation work), and on and on and on.

What the Strack paradigm specifically addressed was the objection that people may figure out that they were asked to screw up their face in disgust, and, being compliant participants (which my experience says is more common than the recalcitrant) they reported more disgust or amusement etc.

It really was addressing this  particular objection in a very clever way. According to the standards of the time, it worked. And, for some reason, it became THE experiment (in textbooks etc) which demonstrated the existence of facial feedback.

Which, of course it isn’t. No single study ever is!

If someone thinks this refutes the facial feedback hypothesis, or embodiment, that person is doing the same reasoning fallacy as when someone tests a gaggle of undergraduates in Georgia, and then claims to have found evidence for some universal principle of human function.

Instead, be more precise – sticking a pen in the mouth of an individual in order to make them pose their face in a semblance of smile or pout, without alerting them to the fact that you are interested in what happens when the face is put in different positions – seem to have no effect on how funny a (by now) fairly large sample of participants think funny cartoons are.

And, yes, after this experiment, I actually strongly believe just that – which is a very very narrow area.

If we really want to evaluate the veracity of the facial feedback thesis, we must do better than single directed RRR, because this is a web of experiments evaluating a theory.  We need to undertake a comprehensive review.

There needs to be a review of mimicry – human tendency to mimic the facial expressions they are exposed to. There are lots of experiments. Some filming faces, some measuring EMG, some looking at brain correlates, and there are a lot of papers here (I used to read this as a doctoral student).

Next, what happens when mimicry is disrupted? Through instruction (don’t show what you feel – keep a stone face), or physical disruption (e.g pens in mouth, botox).

Then, we need to review what we know this mimicry (or disrupted mimicry) results in for the individual. (Suggestions – mild experience of the same emotion, changes in physiological signatures, perceptual sensitivity to congruent materials, enhanced emotional reaction to other materials).

I don’t think the facial feedback hypothesis is stupid. Humans have a tendency to imitate and entrain (we think anyway), and it is a feasible first mechanism for trying to understand how we communicate, and how we understand one another. (Even my son has heard that mimicry is the basis of empathy – and he is 13 – it has face validity, but the evidence needs scrutiny). I tend to take an ecological/evolutionary view of things, which is why I think it is non-stupid.

Now, there is a lot of research on this. Why not evaluate it, see how strong (or not) it is, possibly do some very directed experiments once there is a better map (if warranted), and do it on more than undergraduates.

I think I will actually do this – but, of course, I will have to get help.

* I have no idea where I picked up this anecdote. Could be you-tube, could be the conference I went to 2003, could be some paper.)


  1. Sanjay Srivastava says:

    [Cross-posted from PsychMAP.] Terrific post, Ase. This highlights an important broader point about replications: They can give us valuable information about a particular experimental protocol, but how that protocol should inform theory is a separate question. The answer may differ from one replication to another, depending on a lot of other particulars.

    As is often the case, commentary and criticism about a replication turns out to apply as well or better to the original. Consider the idea that an experimental protocol is a good test of a theory if two things are true: (a) If the theory is correct, we should expect some particular result with high probability, and (b) if the theory is incorrect, that same result should be very improbable. We gain some confidence in a theory when it passes a severe test, more so than when it passes a not-severe one.

    What I think you’re suggesting is that it is very possible that the facial feedback theory (broadly construed) could be correct even if the specific pen-in-mouth protocol doesn’t produce a replicable effect. If that’s the case (and I think that’s a very reasonable reading of the theory), the implication is that the original Strack et al. study was never a particularly strong test of the facial feedback theory in the first place. Our appraisal of the theory stays about the same no matter how the experiment comes out. So if we are tempted to shrug off the RRR, we should ask how the predecessor came to be so taken as a critical demonstration of the theory.

    • asehelene says:

      Thank you, and, yes, I think we maybe should update how we decide on RRR’s. I think the “doing very famous results” first was a reasonable first stab. But, perhaps we need to shift and think, can we uncover key results, key experiments for a theory, and then replicate that one. But, we’re new at this. we can only improve!

  2. Anonymous says:

    I’ve read your work on following the citation-trail concerning another paper, and have read on FB that you might do something similar regarding the Strack et al. finding. I hope you will follow through with this, because I reason this could provide us with interesting information regarding citations, the process of science building on other findings, theory development, and the possible effect of projects like Registered Replication Reports (RRR) for all of this, especially because the Strack et al. finding did not replicate.

    If the Strack et al. finding is mostly cited to make a general statement, but is just one of many possible papers that could be cited to provide evidence for it, then it could be argued that the results of a failed replication will have little effect, if any. People can just pick a different paper to provide evidence for the statement.

    If the Strack et al. finding is being cited in the method section for instance, to use the same paradigm in a new study, one could wonder how that new study found the new effect. How could this be, when the original paradigm has now been shown to not work? Does this indicate that the new finding is probably a chance finding?

    I think you and/or other people can come up with better, more interesting and informative questions that could be asked. I guess the main point I am trying to make is 1) I like your idea of following a citation trail and seeing how citations are being used 2) with the Strack et al. finding you now have a situation where a much cited article has been “debunked”. Perhaps some interesting, and useful, questions can be asked concerning the studies that cited the original work, the process of science in general, the possible effect of a failed replication for the underlying theory, etc.

  3. asehelene says:

    Thank you, and, Yes, I’m working on a trace for Strack, and I am also following up on those papers that he listed in his response as having replicated his work. For the Strack trace, I have only (so far) pulled citations from the first 150 papers, but it is a bit like you say – the paper is most often cited just once, which my experience with the prior trace (and also as a citing scientist) suggests is that the claim there isn’t what is under scrutiny, but the function is as part support of a chain of arguments.

    And, like you, I think this is an opportunity to delve deeper into the facial feedback/mimicry ideas and see – how strong is the evidence really, and try to build better theories.

