Bully for you, Chilly for me: Scientific fame



Perspectives on Psychological Science published an invited symposium on eminence in psychology, which starts with Robert Sternberg’s Introductory article called “Am I famous yet? Judging Scholarly Merit in Psychological Science: An Introduction ”*

As Bobbie Spellman pointed out on facebook – Only One Woman. Guess what topic?

Sure, judging scholarly merit is an interesting question (Meehl discussed it in his recorded last lecture series – along with its problems), and inquiring into why some individuals are considered eminent in a field, and others not is certainly a legitimate area of research both in psychology and sociology (not to speak of history).

But, the question – and the answers – seem ill posed. Science is about ideas. It is about advancing knowledge. It is created by people, but most likely not by individuals, and they seem to be looking for a way of discovering the feature of individuals that can predict eminence, rather than looking at systems for advancing ideas.

I’m reminded of Duncan Watt’s book “everything is obvious once you know the answer”. Once something has reached fame – be it Mona Lisa, Harry Potter, Star Wars, or your choice of famous scientist. What he claims is that beforehand, there was nothing in particular that suggested that this piece of art or that particular scientist was especially note-worthy. (Mona Lisa hung around for a long time until someone stole it. Harry Potter and Star Wars were rejected multiple times). But, once something or someone is famous, we tend to – in hind sight – attribute a lot of special features that we think are of obvious merit. Now, if these features are so obvious, how come it took so long to discover them, and why hasn’t anybody been able to come up with nice sure fire metrics for identifying the wheat among all the chaff?

I teach two papers that he wrote with Salganik where he investigated what might be the forces that create eminence (hint – people). I’ve blogged in more detail about it here (and here), but I’ll do a short summary.

Their product of choice was the pop-song. In fact 48 pop-songs by unknown bands. Participants were a thousands of people that were contacted via the internet (this was prior to facebook). They were invited to listen to and rate the songs, and as thanks they could download one of them for free. Only catch was that they would have to listen to them. (They didn’t have to listen to all 48). The participants were divided into “worlds”. In one control world they got no information about what other people thought of the songs. But, in the experimental worlds, participants got access to ratings and to popularity (number of times down-loaded). The ratings in the control world could be seen as a base-line on appeal (how “good” the songs were when you are not influenced by what others think). In the worlds where participants got information about ratings and downloads there were top songs that emerged. Interestingly, they were different in all of the worlds (in the second paper they had 8). Clearly people use other peoples endorsements when they decide which song to listen to and rate. The only thing that they noted, quality wise, is that those rated low in the independent world never made it to the top. To put it bluntly – we know crap, but we don’t know quality.

Might it be the same in science?

And, as always, I’m reminded of Hull’s “Science as a process”. As he states in the introduction, he was particularly interested in the interaction of the social and the knowledge-seeking aspects of research. His thesis is an explicitly evolutionary account of scientific progress. Individual scientists may have good ideas but if nobody is engaging with them – either collaboratively or adversarially – they won’t be passed on. Now, that lots of people engage with an idea is, of course, not a guarantee that in the end it is right. As long as it is engaged with (and allowed to morph) there will be an advance. For example, we like to, in hindsight, sneer at the idea of Phlogiston. But as rmathematicus lays out in this blog post, it was a highly fertile idea that likely paved the way for the discovery of the role of oxygen in combustion.

Do any of you know the names of the Phlogiston theorists? The names of the early oxygen proponents may be more known (Kuhn brings them up), but I can only recall Scheele, because he’s Swedish. The idea lives on.

Sure, when one over produce scientists, there may be a need for understanding better how to staff ones university, and making bets that may be a bit better than a coin toss. But, like grooming the next boy-band, that is extreme decision making under uncertainty.

But, eminence and fame shouldn’t be what to look for in science. It is ideas. And the aim might instead to be how to collect good teams that can tackle interesting questions. No eminence needed.

*I must confess, besides reading all of Sternbergs article, and skimming through Eagly’s I have only read the abstracts. Some of them are somewhat thoughtful, but still, to my mind, misguided.


Posted in Uncategorized | 2 Comments

Reflection on Laird and Facial Feedback.

“If the quality of emotional experience is derived from expressive behavior, then, if people were induced to express an emotion, it would be expected that they would subsequently report feeling that emotion” Laird (1974).


When I did my little review of studies that Laird claimed showed evidence that facial feedback results in an emotional experience (and experience is key), I didn’t pay much attention to theory. I focus on method and results. It is not because theory is uninteresting. It is more that the theoretical background may not be that well developed. It was used to justify the experiments, but I think the methods and results are more interesting at this point (in order to better chisel out new versions of theories. Theories may well be wrong, they are subject to fashion – such as the old computer model of psychology which I noted when I went through the Srull & Wyer trace.)

But, I looked a little bit at what motivated Laird in the first place – because it does come through in the additional studies.

As the title of his paper suggests, he thinks of the emotional experience as kind of an outcome of self-attribution. Several parts go into this self-attribution as data (in his words), and these can strengthen and discount the different components.

This is grounded in the Schachter and Singer (1962) experiment, as well as the James-Lange notion that experienced emotion is the result of perception of bodily changes.

Physiological arousal serves as data about the intensity of the experience. Expectations and context can then shape this experience in different ways. One could take it as a signal of an intensity of emotional experience, but, if one has another explanation for why one feels aroused (epinephrine, coffee, high bridges, quick run) the emotion-attribution can be discounted. (I’m not afraid, I just had too much coffee).

The ambiguity or non-ambiguity of the situation also matters. If, for some unfathomable reason I found myself riding Himmels-skippet at Tivoli, I would correctly attribute my feelings to my very reasonable fear of heights, and not to the lovely double-shot latte I would have downed shortly before, whereas if I sit in my first-floor office, and feel the same thing, I might wonder if had had one too many coffees.

But, he asks, more information than just arousal and situation could contribute to this self-attribution of emotion. If my eyebrows are raised, my mouth agape, my shoulders hunched, the feedback from my body would contribute to my emotional experience – which is what he is interested in.

So he is actually interested in whether people experience the emotion they are physically expressing.

Which, of course, means that you have to disguise this interest, considering that participants usually want to be helpful, and answer as suggested.

So much of the experiment happens “in plain sight” so to speak. The faces are posed, their emotions are assessed (frequently), but it is all very cleverly disguised by giving the cover-story (interest in perception when tensing-relaxing muscles. Need to measure what you feel, because the emotional state might influence the results), and removing those participants that guessed that the position of their face might have something to do with the emotional state (though the details of the debriefing aren’t stated – just that if at any point participants indicated a connection between the two, they were considered aware, and then excluded).

I can see why the Strack paradigm became so appreciated. The set-up is very elaborate, the cover story more complex than the cover story for Strack, and, even with all the care, all people that were aware of the connection may not have been removed.

Moving quickly into “use”

One thing that struck me is how quickly the paradigm was moved into use, rather than testing for boundary conditions and replicability. (Of course, I have a very small sample, and only the one that Laird provided as evidence for facial feedback, but a very quick check on citations suggest that there may not have been many more that posed facial expressions at that time – this is something to probe further in the future).

In paper two they establish an individual difference between those that are susceptible to the facial feedback and those that seem to be more susceptible to environmental cues (and, somewhat interestingly, the data on how facial feedback results in emotional experience is actually so weak in this paper that this really should be counted as a non-replication than a replication).

This division into those that are susceptible to feedback and those that are not are then used in several subsequent papers. The result from the manipulation is often not reported directly (that is, their emotion rating). The method for dividing them up is also somewhat convoluted. Participants have their faces posed, they view something of the opposite emotional meaning, they rate their emotional state. Then, they combine the ratings to get a final number that indicate whether they were more influenced by the facial position, or the emotion signal from what they viewed.

The upshot is that even if the posing method is the same for just about all of the experiments (bar one), the DV that is reported vary quite a bit, to the point that one has to be really careful interpreting the p-curve I included, because the DV’s vary so much. Can you really compare number of correct recall of emotion-congruent items with the ratings of emotions on a check-list?

The p-curve I included in the blog-post was also the most generous p-curve (I included the DV with the highest F or t from each experiment). I played around with it a bit and did a p-curve where I only included ratings on aggression/negativity whenever it was feasible. (I kept 2 where there is no valence indicated). The curve looks flatter, does not suggest evidentiary value, but still doesn’t suggest p-hacking. (P-curve aggression/negativity)





Emotion congruence – example of moving quickly to use

I wanted to highlight the paper on emotion-congruent recall (Laird, Wagener, Halal & Szegda, 1982). It was published the year after Bower’s Mood and Memory, which I think is considered the Ur mood and memory paper (it is the one that should be cited). They use the “individual difference” technique, and end up with a very small sample in the self-produced (that is, emotional) groups: 9 and 10. They do use repeated measures, because, as they state, there is too much heterogeneity between people when it comes to memory to safely use a between subjects design. But, the samples are very small for an induction that is rather weak.

I spent my graduate years working on emotion-cognition questions (mostly about congruent processing and emotional response categorization), and getting data on emotion congruence is – difficult. It requires quite a bit of power. We routinely started out with 30 participants in each condition, and used repeated measures designs (although we kept the emotion condition between participants). We used movies and/or music to induce emotion. Participants spent 12 minutes getting induced before doing their task, and, if at all possible, we kept the emotional music going during the task.

The induction was very effective. We always did a manipulation check, using something very similar to the MSCL – the BMIS. 16 adjectives that were rated on a 4 point scale. That the induction worked was never evident by simply looking at each participants BMIS ratings. It only became clear when we aggregated across adjectives, but then it was evident. In fact, I collected a lot of the emotion ratings that I still have for a paper on how we use film in psychological research to induce emotions.

But, a lot of the data I collected were from research that we never published, because on the subsequent task we either got nothing, or got results we couldn’t interpret. So, when I see the results for emotion-congruent recall for a lot fewer participants with a much weaker induction, it gives me pause. It is not that I don’t think they got what they got. I’m just not sure what it should be attributed to, especially since we don’t have their ratings of emotionality.

Reporting of information

This leads me into a brief comment on the reporting of information in old studies. It is very hit and miss. I saw no standard deviations anywhere. Lots of intermediate results missing (what were the aggregated emotion ratings after having posed the face – even though that wasn’t the main part that was interesting). It is also not very standardized (which, perhaps, wasn’t to be expected). For several of the papers, there was no information that I could use.

Repeated measures

A plus is that they do use repeated measures for the posing in many instances. Each participant is exposed to all of the conditions, and sometimes they are quite a few.

The long durations of holding the expression

But, perhaps one would question some of the long durations for holding ones face still. The original Laird paradigm asks participants to hold their face in a pose for 15 second. That’s a while. Now, of course, I think many of us have experienced those situations where nothing could wipe the expression off our faces (I particularly recall youthful encounters with enchanting individuals, or times of skeptically pulling ones eyebrows together for a longish time), but generally expressions are quick, dynamic and fleeting. Fifteen seconds is still somewhat reasonable. Seven minutes (as in the Rhodewalt & Comer Paper), while writing a counter-attitudinal essay…. In fact, in a couple of papers, they employed an experimenter that looked at the participant to make sure they kept their face in the pose for the duration (which was a lot longer than 15 seconds).

The high-level constructs

The initial theoretical background is, in many ways, rather high-level. There is input from the face, from the viscera and from the environment (the pictures), which then may give rise to experienced emotion. But, I don’t see anything there moving closer to anything more physiological. It is all posing followed by adjective ratings, followed by classical social-psychological tasks. But, in at least one (one of the works comparing normal weight and underweight) the posing of the face and subsequent experience of emotion (or not) is taken as a measure of proprioception. And, perhaps it is, but there seems to be a number of steps more that one would need to check. Now, I know from my other readings that there have been experiments where they have posed the face and then measured physiological reactions, so it may be that it is just in this small sample that this is not well covered.

Individual differences.

Finally, the individual differences (which I mentioned above). The handful of papers is dominated by this individual difference between those that use facial feedback for emotion attribution, and those that use the situation. But in the small handful of papers on facial feedback published later that I have looked at, this is not mentioned. Has it been forgotten? Is it not interesting? Has this possible individual difference in sensitivity to bodily feedback been pursued elsewhere?

Posted in Uncategorized | Leave a comment

Posing the Face – an overview of early Laird research

Let me start with this link to a lovely blog by Lynneguist on the meaning of Frowns.


Because, evidently it varies! I always considered it meaning that you pull your eye-brows down in a somewhat angry expression – frowning on something that you disapprove of. But, clearly (and I had come across this) there are those that considers the frown a sad face.

Now, the term “frown” is used in the scientific literature on emotional expressions –as you will see below. And, I’ll tell you up front that in most cases it was used as synonymous for the angry face, but in one case it was used for the sad face. Let that be a warning about using folk-psychological terms, because they may indicate very different things.

Nevertheless, I won’t heed my own advice in the work below, but I will let you know if the frown means a sad face.



In all the brouhaha that the non-replication of Strack brought, someone linked in a relatively early paper by Laird, where he responded to another non-replication of facial feedback from surreptitiously posing faces..

The paper is “The real role of facial response in the experience of emotion: a reply to Tourangeau and Ellsworth, and Others. Published in JPSP 1984. On the first page, he lists a simple nose-count of papers that have replicated the face-posing effects. As we should know, at least since Meehls asterisk paper, simple nose-counts is not good enough evidence that an effect exists, considering that we now understand there are as many un-interesting ways to get significant results as there are uninteresting ways of failing to get a result, and only a file-drawer separates the two.

So, I figured as a warmup for a longer review I should find those nose-counted papers and look at what they say (the were not that many).

It starts with his 1974 paper “Self attribution of emotion: The effects of expressive behavior on the quality of emotional experience.  The paper wass, in part, based on a his doctoral dissertation. Is this important? I don’t know. People talk so much about expertise as being a factor. It was early career work, at least research wise.

His theoretical background (which I’m less interested in – I want to look at what was measured, how it was measured, and the results thereof – theories develop one would hope), is grounded in Bem’s self-perception theory , and Schacters work on arousal and external cues. He thinks that changes in physical arousal, and changes in patterns of bodily expression are both parts that will change self-attribution of emotion. If one knows that there may be an external reason for arousal, one can then discount this effect.

Experiment 1

Lets start with experiment 1. Sixtyfive undergraduate males participated. Not all (as we will see) were in the experimental group, and even among those, some were excluded.

The experiment is quite elaborate: There is a cover story: Participants were told that it was about “the activity of facial muscles under various conditions”.  This was backed up by the presence of scientific looking apparatus, and by placing electrodes between the eye-brows and to the corner of their jaws. The electrodes seem to have had a function – but not as electrodes. That was a complete sham. Instead they were used to direct the participants to pose their faces so they appeared like facial expressions of emotion without letting on that that was being done. Here are the quotes from what they were told:

For the “angry” Position:

[Touching lightly the electrodes between the eyebrows] Now I’d like you to contract these muscles. [If this was unsuccessful, then ] Contract them by drawing them together and down [and if this was unsuccessful, then ] Pull your brows down and together. [Whenever the experimenter was satisfied, he said ] Good, now hold it like that. [Now touching lightly the electrodes at the corners of the jaw] Now contract these [if this was unsuccessful, then ] Contract them by clenching your teeth.


For the “Happy” Expression:

[Touching lightly the electrodes near the corners of the mouth] Now I’d like you to contract these muscles under here [If this was unsuccessful, then ] Contract them by drawing the corners of your mouth back and up [When satisfied, the experimenter said] Good, now hold it like that.

In the discussion he actually notes the mean number of steps in instruction to get the expressions right: 2,80 for the smiles, and 2.63 for the angry expression (but they did not differ).

But, to slightly move back. While the electrodes were placed on the face, the experimenter explained that there could be some subtle emotional changes, so after each trial, the participant would rate their emotional experience so that could be controlled for.

So, what we have here is – placing fake electrodes, explaining that emotional experience could be a confound (to justify measuring their emotional state), stating that the experiment involved tensing and relaxing facial muscles, and instruction on how to do that tensing.

We are ready for the experiment.

Once the face was positioned, participants were shown a picture for 15 seconds, before filling in the mood-adjective questionnaire. There were 4 pictures total – two of Ku Klux Klan members, and two of playing children. The participants saw all 4 pictures. A KKK and Kid picture while “smiling” and the other KKK and kid picture while looking angry.

The mood checklist

The mood adjective list was adapted from the Nowlis-Green Mood Adjective Check List (Nowlis 1968). It contained 40 mood words, and these were related to factors indicating Aggression, Anxiety, Remorse, Elation, Social Affection and Surgency. (Interesting to look at the names of the factors actually). The interesting set of adjectives would be those related to Aggression, Elation and Surgency (which is reasonable). Each adjective was rated on a 5 point scale ranging from “did not feel” to “Strongly felt”. Then, to get an index, the ratings for all adjectives that would indicate Aggression was averaged. Fairly standard procedure (it was what I used with the BMIS when we measured emotion).


He performed an interesting control for experimenter bias also. As much as possible, participants were run in pairs. One participant got his (they were all dudes) face manipulated, whereas the other one didn’t. The two subjects were separated by a screen so they could not see each other, but the experimenter could see both of them. The idea here is that if the researcher would inadvertently indicate what was intended, both participants would show this particular bias, but as only one received the manipulation, the bias could possibly be detected by looking at how similar the mood scores were between the two participants.

This pairing didn’t work perfectly. There were only 20 instances where both showed up, and 25 where the subject was alone. In total then, there were 45 manipulated participants, and 20 controls.

Seven of the manipulated participants seemed to be aware of a connection between the facial manipulation and their mood, and were then excluded from the analysis.

To re: The point of interest is – whether facial feedback result in an emotion signal, even if you don’t realize that your face is posed into an expression, and that the supposed control questionnaire is the actual dependent measure.

The emotional content of the pictures seems to have not been of a main interest here, but it is analyzed, and, not surprisingly, all people rated themselves as more aggressive after viewing the KKK pictures, and more elated after viewing the kid pictures.

This is how I translated the table of the results into graphs for the manipulated participants.


Laird posts the F-values, so I actually took those and the degrees of freedom and stuck them into Schimmacks nifty R-index sheet so I could get some p-values.

Study 1, Experimental                       Aggression N F df1 df2 p
Expression main effect 38 8,18 1 37 0,007
Expression x Picture interaction 38 4,18 1 37 0,048
Expression main effect 38 7,21 1 37 0,011
Expression x picture interacton 38 4,5 1 37 0,041
Expression main effect 38 5,91 1 37 0,020
Aggression x picture interaction 20 3,26 1 19 0,087
Expression main effect 20 1,66 1 19 0,213
Expression x picture interaction 20 1,54 1 19 0,230

Note that in the control condition (where the participants didn’t screw up their faces), there were no effects expected. Laird notes down some of the F-values (that are not less than 1), so I stuck them in here just for completeness.

He also goes into doing a manipulation check between the experimental and observer participants (there ends up being only 16 pairs), and find that they do differ as expected but it is quite weak, but I won’t discuss it here. In fact, I would recommend people read his own discussion, because it is quite detailed and thoughtful.

Some commentary

He uses a complete within-subjects design, with an interesting control. He measures their emotions quite openly, but most of them think that this is not of interest. They have to hold their facial expressions for quite a long time. He actually asked if it was distracting or uncomfortable, and some did. Three for all, six for anger and four for smile. Most of them didn’t

Experiment 2

In experiment 2 he addresses what will happen when the situational cue is ambiguous. (the pictures in experiment 1 weren’t ambiguous). To do so, he uses cartoons. He argues that the participants will attribute the source of their emotion to the cartoons. The selected cartoons had received a moderate humor rating.

The setup was similar to that in experiment 1, but a few differences. It was (again) a within-subjects design, but it appears there were only two repeats – one in the happy condition, one in the angry condition. The main measure was the ratings of the cartoon (but that was, as in the earlier experiment, tossed off as a control measure rather than the main measure), this time on a 9 point scale going not at all funny, to funniest ever. The mood checklist had been shortened to just 6 items, 3 from aggression and 3 from elation. The same post experimental questionnaire was used. No observer subjects this time.

32 undergraduates this time (no mention of gender). Six were excluded because they guessed the hypothesis.

And here are the results, copied from the paper, and with cohen’s d added (using Daniel Lakens nifty effect-size spread sheet).

Angry Happy t p d
Humor rating 4,42 5,5 2,8 0,01 0,55
Elation 4,11 4,42 < 1.0
Aggression 23,81 1,88 2,46 0,021 0,48
N = 26

Laird & Crosby (1974) individual differences in the self-attribution of emotion

Laird & Crosby’s work was a chapter in the book “Thought and Feeling: Cognitive Alteration of feeling states.”

I’ll focus on the results of the face-manipulation only.

They started out with 32 undergraduates, but removed 6 because they were aware of the hypothesis.

The cover story and face-manipulation was the same as in Laird 1974. The stimuli were cartoons, and for the emotion measure they used 3 adjectives for Elation (carefree, elated, pleased) and three for Aggression (Angry, annoyed, Defiant), rated on 5 point scales. The scores for each factor was summed. Then the aggression factor was subtracted from the elation factor, resulting in a single score for emotional experience.

The participants went through the procedure on two separate sessions, with 2-3 days delay. In each session they were asked to do both poses, while presented with a cartoon.

Smile Frown t p p one tailed d
Day 1 2,23 2,38 ns ns
Day 3 3,19 1,04 1,78 0,087 0,044 0,35
N = 26


The first day the manipulation did not make any difference in the ratings of emotional state. And, in some ways, it didn’t happen the second day either, as the test is 1 tailed. They proceed to divide people up into those that rated their emotions as negative both days, positive both days and those that switched, in order to investigate individual differences. It is interesting, but less interesting for a review on whether there is good evidence that posing the face in emotional expressions gives rise to emotional feelings. But, it turns out that some of the subsequent papers use the results from this part to divide particpants up in internal-cue sensitive and external-cue sensitive.

Paper 3.

The Duncan & Laird (1977) paper is very much more elaborate, but I think one can simply look at the face-manipulation part. The title of the paper is “cross-modality consistencies in individual differences in self-attribution”, and involves a very complex set-up where people are first tested on their attitudes, then about a month later are asked to do a counter-attitudinal video, which has some snags in it so, oh, by the way, could you help with this other work on perception while tensing and relaxing facial muscles.

As in the Laird & Crosby paper above, I’ll only focus on the results of posing the face.

They started out with 40 undergraduates (men and women, but, as in paper 2, they found no gender difference). In the end, they removed 14 subjects, because they were aware of aspects in the two different paradigms.

The set up for posing the face is the same as above. But, rather than pictures or cartoons, they are told the experimenters are interested in the reversing perspective of the Necker cube. They also added a neutral condition, in order to make a clearer base-line comparison.

All participants did two smile and two frown trials (and, presumably also a neutral trial), properly counterbalanced. After each trial, they filled in a mood adjective list, as always. This time it consisted of 26 descriptive adjectives from that same Nowlis-Green Mood Adjective Check list, again rating them on a 5 point scale (0-4). They used 6 items from aggression, 5 from Surgency and 4 from Elation, and some fillers. They summed the scores within each factor, and averaged them across the two trials of each type.

Frown Neutral Smile t frown vs neutral p frown vs neutral t smile vs neutral p smile vs. Neutral
Elation 1,9 3 4,4 3,49 0,002 2,83 0,008
surgency 3,7 4,9 6,2 2,1 0,022 2,14 0,041
Aggression 6,3 2,4 2,3 2,43 0,011 0,21 0,835


Paper 4

The next study, Laird Wagener, Halal and Szegda “remembering what you feel: The effects of emotion on memory (JPSP 1982) also uses the same face posing work, but using it in a p-curve is – a stretch. I will, but using emotion-congruent recall as a measure of facial feedback is several processing steps away.

As the title says, they were interested in emotion-congruent recall. Half of them started out reading a couple of Woody Allen anecdotes (positive stories), the other half a couple of editorials (anger inducing stories).

Then, as in the earlier studies, participants had their faces posed in frowns and smiles, there is a casual mention that felt emotion may bias the results so could they fill in this questionnaire after each pose. They actually even have a pre-measure of emotion before they start the facial poses.

The perceptual stimuli this time are four abstract paintings that have received titles with an emotion connotation: For happy “spring” and “dancing”. For angry “rip-off” and “betrayal”. And, the little twist here is that they were shown the angry-titled pictures while their faces were screwed up in smiles, and the happy titled pictures when they were frowning.

They do a rather elaborate summing of the mood scores (which I don’t want to go into). What they want to do is sort the participants into two separate groups – the self-produced group (the facial expression seems to dominate in the mood measure) and the situation cue group (those that take their mood cue from the pictures rather than from their faces). This results in 19 people that seem to take their cues from facial feedback, and thirty-two in the situational cue.

There is no report on the results from this section. The outcome is simply used as a separator for individual differences.

Instead, they proceed to the next stage, where people again get to pose their faces, and then they recall as much as they can for each story (written response). In the self-produced group, nine of them recalled the Woody Allen anecdotes, one while smiling, one while frowning. Ten recalled the editorials, one while smiling, one while frowning. The cell-numbers for the situational cue group was 17 and 15 respectively.

Their dependent variables where number of correctly recalled facts, and number of errors (assessed by two independent judges). Everybody recalled more from the editorials, but that was, in part, because they had more statements to recall. Thus, that is not terribly interesting.

What they were more interested in was whether there was evidence for more emotion-congruent recall for the self-producing participants when comparing them to the situational-cue group. (This was a planned comparison).

So they do a planned comparison on number of facts recalled of expression x passage x individual differences and it reaches significance: F (1,47) = 4,31 p = .043 (per p-checker). They do the same for number of errors and the result here is F(1,47) = 18,76, p < .000 (Is that one weirdly high).

For the situational cue group, there is no interaction between the posed expression and either correct recall or errors.


Self-produced cue group
Woody allen (n = 9) Editorials (n = 10) F
smile frown smile frown Passage x express p value
correct recall 3,3 2,2 6,7 8,3 9,98 0,0065
errors 0,6 1,2 2,1 1,6 4,13 0,0602
df per paper (1,15). Some must have dropped out, as this should have a df of 1,17


Here are the results, but, as I have noted, there are some issues. The listed df in the paper is 1,15, but they do not note any drop-outs. The df really should be 1, 17. There are also discrepancies between the reporting of the first F value in text and in table. It is small (9,96 vs 9,98). In addition there is a second discrepancy in the reported p-value for the errors. In the text, the p-value is reported as  < 0.55, but the p-value I get from Schimmacks r-index is higher than that (I report the one from df 1, 15).

They claim the results are as expected, but somewhat ambiguous (in that we don’t know if the supposed emotion congruent recall is due to actual congruence, or to a general positivity/negativity effect), which then then attempt to address in experiment 2.

Experiment 2

They note three major changes, and I quote

  1. a) to use different expressions during the memory and mood parts of the procedure, b to employ only material and expressions of negative emotions, which were fear, anger, and sadness, and c) to manipulate expressions during the initial encounter with the material as well as during recall. “

This time, there were 22 undergraduates – two were removed for awareness.

In the first part, they went through the same expression manipulation as in experiment 1 (I think, to separate out those that do produce an emotional feeling from those that don’t)

Then they were to judge “72 different slides on a variety of emotional scales”. I actually don’t know what was on those slides, because it is not described. What was more interesting (to the researchers) were two sentences that were read prior to each slide – one read by a woman, the other by a man. The sentences were read with emotional intonation, and also had emotional content, such as “did you hear that noise?” (for fear). During this part, the participants faces were also manipulated, but this time to a fearful, angry and sad position. All participants had their faces placed in all three positions.

To be more precise – the sentences/pictures were presented in 6 blocks. During each block, the participant held their face in one of the three positions (so they held each position for two blocks). Each block contained 24 sentences, 8 of each emotion. So, a total of 144 sentences (which really then should be considered the trials).

The blocks were about 3,5 minutes long – which is a long time to hold a static facial expression. In fact, they state that an experimenter was watching them so they could be reminded to hold their face in position.

What they were really interested in was in the recall of the sentences, while having their faces (again) positioned in the same three expressions (also within subjects). The subjects thought this was just a manipulation check. They didn’t want them to spend any effort trying to memorize them. The recall also took place after each block.

The recall was scored for correctness (and they were fairly generous with that).


Self-produced Situational cue
Fear anger sad fear anger Sad
Fearful 4,9 3,6 3,2 3,9 4 3,7
Angry 2,9 5,7 2,6 3,3 5,6 4,9
Sad 2,5 3 5,5 2,9 4,1 5,2
n = 10 in each group.
Overall interaction F(4,72) = 3,68


If participants were recalling everything more or less correctly, they would have gotten 16 in each cell.

I’ll post the means here, because I can’t really make head or tails out of the df’s for the various sub-analyses. The one I post above seems correct when it comes to df’s anyway. When one throws in all of the data, and analyze it in a mixed ANOVA with the between factor being self-produces vs situational cued participants (2), and the two within-subjects factors being posed face (3) and emotional tone of sentence (3), those make sense.

They do a planned comparison (hey, maybe there is the df problem) to check out the difference in the sentence/face interaction between the two groups, and come up with a not-significant result F(1,72) = 3.38, p = .066, but that was in those days when this was not considered not significant.

Then they report the self-produced and the situational cue separately, and I think they mess up on the df’s here again. There is a significant interaction between story and expression for the self-produced, F(4,72) = 2,78 – but I think it should be F(4,36), as they are only testing half the subjects here.

For the situational, the same effect was not significant, F(4,72) = 1.03 ns (again, I think F(4,36).

So, what they are claiming is that there is emotion-congruent recall, which is emotion specific, but only for those that are sensitive to facial feedback.

I have no idea how I should go about coding this into my r-index and my p-curve data-sheets. At least not now.

And, I really don’t know whether this should be interpreted as a replicated instance of facial feedback. They are actually assuming that facial feedback occur (at least in some of the participants), which then spills over into emotion congruent recall. For both types of participants, there appears to be more correct recall for the emotion-congruent. It is just not significant for the situational cues.

But, is the recall a reasonable measure of whether facial feedback works (as in, giving rise to an emotion that corresponds to the expression). In this work, it is simply assumed that the facial feedback does exactly that. The measures where they ask about how they feel are simply used for sorting people into two types, and in that measure they are receiving two conflicting types of information – from their face, and from the label.

Kellerman & Laird

In the last Laird paper: Kellerman & Laird, “The effect of appearance on self-perception” there is no data to scrape! They did the facial positioning, had people rate their emotions, went through an elaborate scoring, and then used it simply to sort people into self-produces and situational cue responders. Evidently, they could do that, but it provides nothing that I can use to keep assessing whether we have decent evidence for some sort of facial feedback.


I now move into the papers that he cites, that he didn’t also co-author.

Rhodewalt & Comer

The first one is Rhodewalt, & Comer (1979) “Induced-compliance attitude change: once more with feeling.

A total of 60 participants, divided across 4 conditions. Well, they started out with 69, but there were drop-outs as usual.

It is all very elaborate, in order to get to their research question, but I’ll gloss over the parts that are not directly about measures of facial feedback.

They start with a pretest session, done in groups, which are mainly about information, but where, oh by the way, another researcher needed help with filling out an opinion survey of 18 issues.

A week later they return (individually) to the experimental sessions. There are 4 groups smile, frown, neutral and control. In the 3 first, they are asked to write counter-attitudinal essays while holding the expression. In the fourth they simply copy down some written materials.

The posing instructions are taken from Laird 1974. As with the previous, possible changes in mood are explained as an artifact to control for (hence the measure). Mood was measured with the Nowlis-Green mood adjective check list- using 18 adjectives measuring Elation, Surgency, Social Affection,Anxiety, Remores and Aggression. (3 adjectives for each).

Each participants had  7 minutes to write the counter-attitudinal essay (and, topic, of course, coming from the pre-session), while keeping their face frozen in whatever expression was designed for them. There was an observer present to make sure they kept their face in the pose. (Boy, that is a long time!)


For the mood measure, they created a single composite score for the positive factors, and a single composite score for the negative factors. Plus, they calculated a difference score.

Positive mood Negative mood Difference
Smile 3,8 1,91 1,89
Neutral 2,43 2,54 -0,11
Fown 1,71 3,58 -1,87
Control 1,84 1,71 0,13
f(3,56) 3,21 3,32 4,93
P 0,029775 0,026179 0,004148
n = 15 in each group.


I’m not reporting the attitude change data. It is just too many steps away to say anything interesting about facial feedback.

Zebrowitz McArthur et al

Next up is Zebrowitz McArthur, Solomon & Jaffe (1980) Weight differences in Emotional Responsiveness to Proprioceptive and Pictorial Stimuli

The topic here is to investigate difference in emotional responsiveness between overweight and norma weight participants. For this they recruit 24 overweight participants, and 36 normal weight participants.

They use the Laird paradigm – but with some changes. The smile one was the same, but here they use the “bottom mouth” meaning for the frown – they place the mouth area into a sadness expression. Here is the instruction:

Please contract your lips by drawing them together and down. Now push out your lower lip a little… Good, now hold it like that.

They also had a neutral instruction

Please relax your face, keeping your mouth closed…. Good, now hold it like that

Now, onto the set-up. Each participant went through 9 trials. In each trial, they were shown a picture for 15 seconds. The pictures themselves depicted humans in sad postures (3) animals that were in “happy” postures (3) and microorganisms, (also 3 which I presume are considered neutral).

The first three pictures were shown while in one facial configuration, the next three in the next, and the final three in the final expression. And, of course, there was a happy, a sad and a neutral picture for each expression. Nice counter-balancing and all.

After each projection, the participant rated how they felt on a sub-set of the MACL. The target emotions were two adjectives for elated, and two adjectives for sad, and then there were 4 fillers. Instead of 5 pt scale, they used a 9 pt scale.

So, it is a within-subject manipulation, all possible combinations.

Additional part for control – they tested participants in pairs, where each individual in the pair posed a different expression – to control for possible experimenter influence. They also tried, as much as they could to counter balance the seats used by males and females, as well as by over-weight and normal weights. The participants could not see each other. (Of course, because they all were viewing the stimuli together, the order of the pictures was the same for everyone).


I’ll do the matrix of mean scores first (the mean score is a composite of the elation and sadness. More negative, more sad. More positive more happy).


Posed facial exprssion
Smile Neutral Sad All
Positive normals 6,661 5,917 4,806 5,778
overweight 6,583 6,75 6,042 6,458
Neutral Normals 0,361 0,333 -0,083 0,204
overweight 0,5 -0,5 0,333 0,111
Negative normals -4,722 -7,528 -5,917 -5,056
overweight -4,75 -3,625 -3,458 -3,944
All normals 0,75 0,574 -0,398
overweight 0,778 0,875 0,972
n overweight = 24/cell. N= 72 row and column
n normalweight 36 /cell, 108 in row and column totals.

The interesting (for us) is the two bottom rows, because that would indicate the net-effect of posing the faces. Clearly for the over-weight there is none (which is what they were checking), but there is some for the non-overweight.

And, luckily, they have a planned simple effects analysis for that:

Expression effect for normal weight: F(2,72) = 4.77, p = .0113

For overweight, F was less than 1, so nothing is reported.

They go further into the results, and see that this seems to be a sadness result – most of the effect being driven by that expression. In addition, they look at the scores for the other emotions, but find no effect. So, suggestion is that the feedback effect is expression specific. There were picture effects also, but those are not so important here.

Kleinke et al

The Kleinke & Walton (1982) paper is different from those above. Title is “Influence of Reinforced Smiling on affective responses in an interview”. They claim that the results support a facial feedback theory, and possibly it does, but it is rather messy. It doesn’t involve posing faces into different expression. Instead, they have subjects in the experimental condition, who get reinforced every time they smile (they get a nice green light when they smile, and their task is to try to have as much green light as possible. ) They weren’t told that it was smiling that would be reinforced. In two yoked groups participants were either told to smile whenever a light came on (same schedule as the reinforcement). I won’t go deeper into the experiment, because I think there are many other possible reasons for the results other than some kind of facial feedback (the paper would be important in a larger meta-analysis.


The final article that Laird sites is Barbara Edelman’s 1984 paper, A multiple factor study of body weight control.  And, again,there is nothing here that I can use. The facial manipulation (which, they claim, closely follow Laird & Crosby) is simply used to separate participants into self-percievers and situational-sensitive.

Some brief comments

I think this is interesting to note. Early on, Laird and Crosby uses their feedback in order to separate participants that are sensitive to facial feedback from those that are not, and in some of the subsequent articles, that is simply what they use it for, with no possibility for anyone to evaluate how strong the facial feedback effect was.

Of course, this notion of individual difference in facial feedback sensitivity was not part of the original Strack, it was not part of the work I did with Niedenthal where we interpreted the effects of pen-holding as mimicry disruptor, it was not considered for the Strack replication, and, as far as I have gotten in the review of Stracks list of conceptual replications, this is not considered either.



I did a p-curve using one focal text from each experiment (I also did it with all, but that makes it throw in repeats for the same manipulation – just slightly different ways of doing it). My shiny-app scripts are listed below.

It does suggest some evidential value, but we only have 8 data-points, and some of them are rather oblique (mood-congruent recall).

This is my first pass at this. I’ll do some better coding/clean-up. The posing is very similar for all. The measures aren’t always reported. There are several repeated measures designs, where there really are several repeats. Frequently they are asked to pose the face for a long time. 15 seconds – 7 minutes! (yikes).

In all of this, the only measure of facial feedback is the self-reported moods after each trial. There are theoretical accounts, but they are rather abstract – self-perception, perception of arousal. Situational cues.

This is just a small sliver of the literature, but I’m interested in whether there is work connecting this closer to additional biological function (physiological measures, brain-imaging etc), and that may very well exist. Or not. It is also hard to know if they really really really didn’t realize it was about emotion. It is measured, after all, even if mentioned that it is to control for unwanted affect. There are just a lot of questions, that one needs to see if the literature has answered.


Duncan, J.W. & Laird, J.D. (1977) Cross-modality consistencies in individual differences in self-attribution. Journal of Personality, 45, 191-206

Edelman, B. (1984) A multiple-factor of body weight control. Journal of General Psychology, 40, 363-369

Kellerman, J, & Laird, J.D. (1982). The effect of appearance on self-perception. Journal of Personality, 50. 296-315

Kleinke, C.L., & Walton, J.H. (1982). Influence of reinforced smiling on affective responses in an interview. Journal of Personality and Social Psychology, 42, 3, 557-565

Laird, J.D:, (1974) Self-attribution of emotion: The effects of expressive behavior on the quality of emotional experience. Journal  of Personality and Social Psychology, 475-486

Laird, J.D. (1984). The real role of facial response in the experience of emotion: A reply to Tourangeau and Ellsworth, and Others. Journal of Personality and Social Psychology,47, 909-917.

Laird, J.D. & Crosby, M (1974). Individual differences in the self-attribution of emotion. In H. London & R. Nisbett (Eds.). Thinking and feeling. The cognitive alteration of feeling states. Chicago: Aldine.

Laird, J.D. Wagener, J.J., Halal, M., & Szegda, M. (1982) Remembering what you feel: The effects of emotion on memory. Journal of Personality and Social Psychology, 42, 4, 646-657

McArthur, L. A., Solomon, M. R., & Jaffee, R.H. (1980). Weight and sex differences in emotional responsiveness to proprioceptive and pictorial stimuli. Journal of Personality and Social Psychology, 39, 308-319

Rhodewalt, F., & Comer, R. (1979). Induced-compliance attitude change: Once more with feeling. Journal of Experimental Social Psychology, 15  35-47







My shiny app data – two versions. One where I throw in all, so to speak, although several are separate measures for the same thing. In the second, I cross out duplicates. That is, I only select one measure for each experiment (and, when possible, the one that seems to be the strongest).


# Easy mode (‘#’ starts a comment)

#Paper 1 Laird 1974

#Experiment 1

F(1,37) = 8.18 #Aggression

F(1, 37) = 7.21 # Elation

F(1, 37) = 5.91 #Surgency

#Experiment 2

t(25) = 2.8 #Humorrating

t(25) = 2.46 #Aggression

#Laird Crossby 1974

t(25) = 1.78 #Day 3 effect

#Duncan Laird

t(39) = 3.49 #Elation, frown vs neutral

t(39) = 2.83 #Elation, Neutral vs. Smile

t(39) = 2.1 #Surgency, frown vs neutral

t(39) = 2.14 #Surgency, Neutral vs. Smile

t(39) = 2.43 #Aggression, frown vs neutral

t(39) = 0.21 #Aggression, Neutral vs. Smile

#Laird Wagener Halal Szegda

#Experiment 1

F(1,15) = 9.98 #correct recall, self perceivers

F(1,15) = 4.13 # Errors, self perceivers

#Experiment 2

F(4,72) = 2.78 #Interaction sentence, expression self perceivers

#Rhodewalt and Comer

F(3,56) = 3.21 #Positive index

F(3,56) = 3.32 #Negative Index

F(3,56) = 4.93 #Difference score

#zebrowitz et al

F(2,72) = 4.77 #Facial expression effect of normal weights



# Easy mode (‘#’ starts a comment)

#Paper 1 Laird 1974

#Experiment 1

F(1,37) = 8.18 #Aggression

#F(1, 37) = 7.21 # Elation

#F(1, 37) = 5.91 #Surgency

#Experiment 2

t(25) = 2.8 #Humorrating

#t(25) = 2.46 #Aggression

#Laird Crossby 1974

t(25) = 1.78 #Day 3 effect

#Duncan Laird

t(39) = 3.49 #Elation, frown vs neutral

t(39) = 2.83 #Elation, Neutral vs. Smile

#t(39) = 2.1 #Surgency, frown vs neutral

#t(39) = 2.14 #Surgency, Neutral vs. Smile

#t(39) = 2.43 #Aggression, frown vs neutral

#t(39) = 0.21 #Aggression, Neutral vs. Smile

#Laird Wagener Halal Szegda

#Experiment 1

F(1,15) = 9.98 #correct recall, self perceivers

#F(1,15) = 4.13 # Errors, self perceivers

#Experiment 2

F(4,72) = 2.78 #Interaction sentence, expression self perceivers

#Rhodewalt and Comer

#F(3,56) = 3.21 #Positive index

#F(3,56) = 3.32 #Negative Index

F(3,56) = 4.93 #Difference score

#zebrowitz et al

F(2,72) = 4.77 #Facial expression effect of normal weights





Posted in Uncategorized | Leave a comment

Blog post on the occasion of Strack Stepper & Martin not replicating, and thoughts about what to do next.

Strack didn’t replicate. STRACK DIDN’T REPLICATE. If you wonder which Strack (which, really, one should as he is prolific), I’m clarifying – it is the one where you stick a pen in your mouth and it makes you think a cartoon is more (or less) amusing depending on how that pen-holding is screwing up your face. Correct is to call it Strack Martin & Stepper (1988).

And, I’m a bit sad. We were going to be part of the replication effort, but last fall semester hit hard, and I had to give up. We needed to collect data before students had heard of the experiment, and we just did not get it together in time. I had predicted there would be an effect.

But, perhaps that should have been a bit moderated. I believed SOMETHING would happen, based on work that were done in the Niedenthal lab, but that effect was a little bit different.

She did a series of morphing studies, where faces changed from one expression to another, and people had to detect the change. The main exploration was whether emotional state has an effect on perception of emotional stimuli. But, in one variant she used the Strack manipulation. For good reason. You want to tease out the edges of an effect. Would it be enough with just facial feedback, or did we need to do the full-blown emotion induction? But, what seems to have happened instead was that holding the pen in the mouth disrupted mimicry – that is the published story.

But, I really think we need to put the study in context. As I wrote in my blog on my beginning trace, the paper is just one tile in the mosaic of studies investigating the role of facial feedback in emotion  processing (and I’m deliberately vague).

One can easily trace this back all the way to the James-Lange theory of emotion, which crudely (and somewhat errouneously) is portrayed in introductory books as you feel afraid, because you are running away from the Bear.

But lets narrow it a bit more – the Strack experiment was part of a much larger body of research looking at the role of facial feedback.

The facial feedback story (as I tell it to my undergraduates) goes roughly like this (admittedly with plenty licence).

When Ekman, Friesen and Ellsworth were figuring out which facial movements could be considered primitives of expressions (the FACS), evidently they noticed that when they worked on furrowing brows, and gaping their mouths into snarly shapes, they got into more snapping and actual snarling * Could it possibly be that screwing up your face into emotional expressions resulted in a feedback to the emotion processing areas in the brain, possibly giving rise to a faint experience of that actual emotion. And, from there, they proceeded to experiment on that notion, usually by asking people to position their face in a certain way (e.g. pull down the outer corners of your mouth. Stick out your tounge. Wrinkle your nose).

I don’t think they were alone pursuing facial feedback. Zajonc has worked on this. Laird has worked on this. Hess has worked on this. Alan Fridlund looked at this. Levenson, Lanzetta, and Gross (the beginning of emotion regulation work), and on and on and on.

What the Strack paradigm specifically addressed was the objection that people may figure out that they were asked to screw up their face in disgust, and, being compliant participants (which my experience says is more common than the recalcitrant) they reported more disgust or amusement etc.

It really was addressing this  particular objection in a very clever way. According to the standards of the time, it worked. And, for some reason, it became THE experiment (in textbooks etc) which demonstrated the existence of facial feedback.

Which, of course it isn’t. No single study ever is!

If someone thinks this refutes the facial feedback hypothesis, or embodiment, that person is doing the same reasoning fallacy as when someone tests a gaggle of undergraduates in Georgia, and then claims to have found evidence for some universal principle of human function.

Instead, be more precise – sticking a pen in the mouth of an individual in order to make them pose their face in a semblance of smile or pout, without alerting them to the fact that you are interested in what happens when the face is put in different positions – seem to have no effect on how funny a (by now) fairly large sample of participants think funny cartoons are.

And, yes, after this experiment, I actually strongly believe just that – which is a very very narrow area.

If we really want to evaluate the veracity of the facial feedback thesis, we must do better than single directed RRR, because this is a web of experiments evaluating a theory.  We need to undertake a comprehensive review.

There needs to be a review of mimicry – human tendency to mimic the facial expressions they are exposed to. There are lots of experiments. Some filming faces, some measuring EMG, some looking at brain correlates, and there are a lot of papers here (I used to read this as a doctoral student).

Next, what happens when mimicry is disrupted? Through instruction (don’t show what you feel – keep a stone face), or physical disruption (e.g pens in mouth, botox).

Then, we need to review what we know this mimicry (or disrupted mimicry) results in for the individual. (Suggestions – mild experience of the same emotion, changes in physiological signatures, perceptual sensitivity to congruent materials, enhanced emotional reaction to other materials).

I don’t think the facial feedback hypothesis is stupid. Humans have a tendency to imitate and entrain (we think anyway), and it is a feasible first mechanism for trying to understand how we communicate, and how we understand one another. (Even my son has heard that mimicry is the basis of empathy – and he is 13 – it has face validity, but the evidence needs scrutiny). I tend to take an ecological/evolutionary view of things, which is why I think it is non-stupid.

Now, there is a lot of research on this. Why not evaluate it, see how strong (or not) it is, possibly do some very directed experiments once there is a better map (if warranted), and do it on more than undergraduates.

I think I will actually do this – but, of course, I will have to get help.

* I have no idea where I picked up this anecdote. Could be you-tube, could be the conference I went to 2003, could be some paper.)

Posted in Uncategorized | 4 Comments

On Brannigan (rise and fall of Social Psych), and Henrich (the secret of our success) and psychological research in general.

When I read David Hull’s ”Science as a process (1988), he reiterated one controversy that I found interesting for Psychology. His area were systematics – how to best classify animals and plants (the stuff of Linnaeus – science is never done). The controversy were between those that thought classification needed to have an evolutionary grounding – species have a history, and that  ought to be reflected in the classification – and those that thought one needed to classify based on (more or less) observable features existing right now. (The controversy is discussed in the two chapters “Down with Darwinism-Long live Darwinism, and Down with Cladism – Long Live Cladism, if I recall right).

From my naïve outsider view I first thought that of course you want to use the evolutionary history to figure out how to classify living things, but as the opposing side pointed out – even if one doesn’t doubt the importance of evolution, the actual evidence available was so spotty that it wasn’t possible to use it as basis for classification. Instead, one should stick to what is observable now for classification. As Hull points out, what is observable is also not quite straight forward. (Visible traits? Genetic markers – which requires a whole lot of apparatus to detect?  Also, what a particular trait is had at one point been hotly debated – his example is what is the dorsal and ventral part of an animal. Observation of current traits is theory-laden, which perhaps we as scientists forget).

My mind went to social psychology/evolutionary psychology. Of course humans, and their psychology are a product of evolution, but, as many critics have pointed out, minds and behaviors don’t fossilize well, so much of the work has to be done by careful (but still in part speculative) theory applied to present day humans. Perhaps there is a real point in cataloging current humans and their traits and behaviors, before considering evolution. Or some iterative work combining the two.

This weekend I read two books (well, I’m not finished with one of them yet): August Brannigan’s “The rise & fall of Social Psychology (subtitle the Use and Misuse of the Experimental Method) published 2004, and Joseph Herich’s “The secret of our success: How culture is driving human evolution, published 2015.

I got the tip for Brannigan’s book in a Facebook discussion (from a European student). I think it is a must read. Of course, Social Psychology doesn’t seem to be in any kind of post-experimental wasteland, although it has been in the focus the current so called crisis following the refusal to publish a non-replication of Bem’s ESP work, and the vast fraud of Diederik Stapel.

I think it is a must read for anyone interested in social psychology, and anyone interested in psychology as a science. Did you know that Festinger left Social psychology to take up work on perception, and then eventually “exploring prehistoric and archaeological data” (as per the Wikipedia page here) , evidently being disappointed in psychology.

Did you know that there have, time and again, been accomplished researchers that critiqued Social Psychology scathingly? Decades ago? I recognized none of the names, possibly, as Hull also points out, that they were alone in the wilderness with no Deme advancing their position.

As a grad student, I felt frustrated that there seemed to be no larger theoretical framework from which to reason about psychology, and my adviser pointed out that, yes that is the case. The field is filled with mini-theories, but nothing over-arching. Evidently, from this book, far better scientists than I have noted this, complained about it (for example – chapters in a social psychology book could be shuffled, with no ill effect) thus, there is no cumulative understanding, no placing effects in a larger frame (e.g what can be attributed to situation, what to traits, what to larger social circumstances etc – Brannigan is a sociologist, and works in criminology).

He is especially critical about the sine qua non of the experiment in social psychology (at the exclusion of field work and other methods). This, he claims, has in part lent social psychology an air of proper hard science which has allowed it a great deal more influence in the actual world than he thinks is warranted (e.g. work on violence in movies). But, the experiments seem more to be performances and demonstrations rather than actual tests of theories. There are few falsifications (as we know). Positive supportive results are the only thing presented. In effect, the experiments are de facto anecdotes that support a narrative that is already decided.

As his cases in point he uses Festingers dissonance theory (there are aspects of it that are absurd – like the enormous payments some of the students get. 20 dollars then was quite a bit more than it is now – sometimes I first saw Tom Stafford point out); Muzafer Sherif’s work on the autokinetic effect, which was claimed as evidence for norms and and who one conforms to – but which is (per Brannigan) very much removed from actual social situations, and a rather minute piece of evidence for building a larger piece a theoretical narrative: Zimbardo’s Prison experiment – with the ethics problems; Milgrams work on obedience which perhaps is not so much about obedience to authority as it has to do with the expectations of an experiment (e.g. it is an experiment, they will not allow anything bad to happen, so it is OK to comply – this is not what happened in Nazi Germany); Asch’s work on group pressure, and all the work on the horrors of TV imparting violent behavior to our children/making guys watching Porn being more OK with rape – which he deems rather shallow (so many more interesting questions), an expression of class (we don’t belong to those nasty unwashed TV watchers) unduly influential (bans on violence on TV, bans on Porn), and neglecting truly interesting questions such as immersion, narrative, separation of story from reality (although I do think there is work on this – albeit maybe not so splashy in the news).

Yes, there is more to Social Psychology, but, no, we are much too enamored of the experiment, see earlier critiques by Paul Rozin (2001) Social Psychology and Science. Some Lessons from Solomon Asch, Robert Cialdini (2009) We have to break up, Martin Orne (1962) On the social psychology of the psychological experiment. We may not know enough about a phenomenon to actually do an experiment (difficulty falsifying, because we are making too long chains of assumptions between how we do it, and what we actually want an answer to), we do it on populations that already have an idea about how to be good subjects, and will thus behave in a manner that has to do with the experimental situation and not give us any answer to what we want to test, etc.

But, go read. Even if you don’t agree. As scientists, we need to have statements that we can use as foundation of our critique.

Which brings me to the Henrich book.  His thesis (in my interpretation) is that our special feature is our capacity for cumulative cultural learning. More-over, that capacity is something that feeds back on genetic evolution. One example is lactose tolerance (which is fairly standard). If you have animals that give milk, there may be an advantage to those individuals that don’t shut down the lactase digesting hormone once weaning is done to get better nutrition and hydration from unprocessed milk (cheese and yoghurt chews up the lactose so you don’t have to be a lactase mutant).

Another example is the human as long-distance runner – in order to track down and kill large animals. He suggests a number of adaptations – one of particular interest is how to keep cool while running long distance in a hot climate – hairlessness and sweating. But, sweating means that there must be a good supply of water that can be sweated out to cool us. Now, that is not something we are born with, unlike, for example, camels. We can only store so much water. His point is that this adaptation must have occurred after humans figured out how to externally access water: Water pouches, straws to access pools in tree-trunks, recognition of plants and other signs that indicate where water or watery plants may be, lore to keep track of water-holes, etc.

By now, I’m reading about kinship, sharing rules, food preparation rules, imitation and faith. All of these abilities must hinge on some psychological capacity, some bred in bias on where it is best to look: who to imitate, who to listen to, how to police others to do the “right thing”, and all without us necessarily understanding the why. He claims cumulative cultural evolution is smarter than us.

Many of these “biases” do show up in Cialdini’s six ways to yes in persuasion (and I’m sure Cialdini is quite aware that there is an evolutionary quality to those). We reciprocate, we look to authority (possibly more the prestige type than the dominant type), we look to the crowd, tough rituals make us more committed, we look to those we like and like us and are like us. These little biased hooks are what allows us to accumulate culture over long times.

I’m thinking, here is a more overarching theoretical framework from which to reason about psychological phenomena. It may not be right, but it is useful, and it is something one could test. There will be cultural differences – where is the underlying invariant?

I’m a big fan of ecological psychology also – but it seems like it is still best applied to perception/action (although I have seen attempts at ecological social psychology). This is also, in many ways, grounded in an evolved thinking – minds and bodies have evolved to capitalize on our surroundings. Perhaps eventually these can be brought together (or not).

There is a point in doing research on contemporary beings (because, who else?), without necessarily using a deeper evolutionary thought. But, perhaps a though on what brought this part out could help guide where to look and what to attempt to falsify. I’m a little bit tired of the narratives in social psychology textbooks. The effects must be placed in the context of effects of traits/personality, class, social systems, cultural systems, etc., and I rarely see that. (Nazar Akrami has looked at the relative contribution of personality factors and more social psychological factors and found that personality dominates – but more like this is needed  ).

Cialdini, Influence.

Cialdini (2009). We have to break up. Perspectives on psychological science, 4, 5-6.

Orne, Martin (1962) On the social psychology of the psychological experiment. With particular reference to demand characteristics and their implications. American psychologist, 17 776-786.

Rozin, Paul (2001) Social psychology and science: Some lessons from Solomon Asch. Personality and Social Psychology Review, 5, 12-14.

Brannigan, Augustine (2004) The rise and fall of social psychology.

Henrich, Joseph (2015). The secret of our success.

Posted in Uncategorized | Leave a comment

Tracing Strack, the preamble

I’ve started a second trace! A bit to pursue proof of concept, get a feel for extendability. A trace is, after all, a bit like a case study. I’ve selected the Strack, Martin and Stepper (1988). Better known as the one where participants get to hold a pen in their mouths to get their face shaped like a smile or a frown, without them really realizing this is happening. This is also a paper that is under PoPS registered replication. I was going to participate as one of the independent labs, but work hit me and my team (I became director of our international Masters), and I just had to give up. (Still feel a bit sad about that).

It is, to cite Jens Förster (citation # 102 in my trace) a Classic. * It is so generally well known that in the instruction for the registered replication we were asked to make sure that the population we tapped were not aware of the effect – that is, get the psych undergrads before the Emotion module, or tap other undergrads (we were planning on using the film/linguistics/humanities set). The paper has a reputation! Of course we want to replicate it.

I pulled the data for the trace on the 29th of April, 2016, and at that time, the paper had been cited 544 times (all social science citation index). Not as many as Srull & Wyer (1979) that I pulled a year before, but this was published a decade after. Not shabby.

I also decided to be more ambitious with the trace this time. Instead of the 5 first year (53 article), I decided to trace the first 150 articles. (I am going solo so far, and I am going the artisanal way. No automatic scraping of info here!)** First citation is in 1989, the last in 2006 so we are spanning over a decade and a half. I figure that might also be enough to find  possible citation distortions (I have). This actually includes a paper where I’m co-author. We used the Pen in the Mouth technique, but it didn’t seem to work as enhancement, but more as a mimicry-disruptor. (Niedenthal  et al 2001), which still is some kind of effect on Facial Feedback that is interesting.

I realized early on that the trace here had a different nature from the Srull & Wyer trace. The Srull & Wyer paper were very much an origin paper for subsequent work on Social Priming***, whereas the Strack et al is a relatively recent paper in the tradition of facial and bodily feedback, which can reasonably be traced back to the James-Lange theory of emotion, was under ongoing investigated by the Ekman Deme**** and the Zajonc Deme, and in many ways was an ingenious technical solution to the pesky demand objection that came from asking people to pose their faces in emotional configurations.

Classifying the papers (based on the abstracts) also was different. For the Srull & Wyer trace I classified papers as either extending related or oblique. My intention was to particularly pay attention to those papers that extended the priming idea, whereas for other papers I would only look closer at the citation patterns. This was not so evident for the Strack Paper. Yes, there were clear obliques (Emotions and God, Education, Robots – although that turned out to actually fit within extension), but it was far less obvious which papers extended the work and which were related but not extending. This is quite possibly because the Strack Paper isn’t an origin paper for a particular area of research, but a paper that is mid-stream in ongoing research on bodily feedback on affective processes. Even if I did a rough sorting, I then went in and made a somewhat more fine-grained classification of the topics. The majority involve research on facial feedback (39), but there are also papers on Arm Flexion (17), emotional expression (7), Embodiment (7), Emotion regulation (6), mix of other types of bodily feedback including head nodding (11) and effects of induced mood or emotional states (25) which all seem to be somewhat relevant and could potentially be extending.

So far, I have pulled citations from all the papers I could find without having to go too far out of my way. (I have a handful from papers like Cognition and Emotion and Cortex which evidently I have to request prints rather than download PDF’s, and there are also a few non-english papers that I can’t get to – I included a bit more than just peer reviewed papers in this trace).

That is 128 papers. One thing I noticed when doing my Srull & Wyer (1979) trace was that in papers that extended their work, they tended to be cited multiple times  (both for the theoretical and empirical background as well as for methods and in the discussion). In the related and oblique papers they tended to be cited maybe one or two times.  This is the citation patterns so far for Strack et al.

Times cited in paper Frequency
1 90
2 15
3 12
4 7
5 0
6 1
7 1
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 1


As I actually go and pull the citations manually (with the help of the search function, when that works), I do get a quick feel for what is going on. The paper is highly cited, because this is an important addition in the ongoing work on bodily feedback, as it rules out demand. But, direct extensions of the technique are not that common. (The one with 15 cites most definitely did a replication).

In the trace, I’m most of all interested in the direct extension of the source work (it doesn’t have to be like that. Lots of questions can be asked of a trace), so what I’m directly scrutinizing may be rather small in the end. But, I’m starting to look closer at the various experiments on bodily feedback to see what that can yield.

Some refs

Förster, J (2004) How body feedback influences consumers’ evaluation of products. JOURNAL OF CONSUMER PSYCHOLOGY.

Hull, Davis (1988). Science as a process.

Niedenthal, PM; Brauer, M; Halberstadt, JB; Innes-Ker, AH (2001). When did her smile drop? Facial mimicry and the influences of emotional state on the detection of change in emotional expression. Cognition and Emotion.

Strack, F, Martin, L, & Stepper, S. (1988) Inhibiting and facilitationg conditions of the human smile: A nonobtrusive test of the facial feedback hypothesis. Journal of personality and Social Psychology, 54, 768-777.



*Förster collaborated with Strack quite a bit on similar questions, so I think he spoke about it from inside this particular Deme.

** because I haven’t spent time to figure out how to.

*** Yes, I know people object to this, because there are so many different variants that this doesn’t capture what it is about. But, it is useful to distinguish it from the type of priming that seem just focused on associative networks – like the doctor-nurse, apple-orange thing, which is quite robust.

**** A deme is, in biology, a local breeding population. I also found out (looking for definition) that it is an old greek word for a village or district (distinct from Polis). I got it from Hull (1988)  (who most likely got it from  biology. In his meaning a scientific Deme is a group of scientists that work more or less cooperatively on a particular idea in science (the cooperation doesn’t need to be uncontentious).

*****Simine Vazires penchant for asterisks are spreading. A bit of cultural evolution (the social copying kind).

Posted in Uncategorized | Leave a comment

Tracing Srull & Wyer manuscript

I wrote a manuscript detailing my Srull & Wyer trace (the one I have been blogging about). If anybody cares to read and give some comment, I’d be grateful. At some point I’d like to submit it.


Posted in Uncategorized | 1 Comment

All references are equal, but should some be treated as more equal than others? On the data-set authorship discussion.

There’s been an interesting discussion on data-sharing and how to properly give credit when you are using someone else’s data on both Facebook and Twitter. Candice and Richard Morey  did a nice blog-post on why sharing data should not automatically mean authorship. Talking to other researchers, that seems to be part of what the Vancouver and beyond suggests for criteria for authorship. The proper way to credit a shared data-set is to include a reference.

Authorship and references are the two traditional ways of assigning credit. For the individual scientist authorship signals origination, and reference signals the use other scientists find in the original work.

But, references are a strange measure of success of an idea/work. When I was going through my Srull & Wyer (1979) trace, I collected all the places in the manuscripts where they had been cited in the first 53 articles that cited them. The reason for citing them ranged from the peripheral to the profound. Examples of the peripherals was an opening sentence where the author cited them (along with others) as evidence social psychologists were now interested in cognitive explanations for social phenomena, and a foot-note where they stated that the current paper was not interested in the priming phenomenon, but one should look to Srull & Wyer 1979 if one was interested. In the profound, they were cited multiple times because the research essentially extended the original research.

This shouldn’t be surprising. We are trained to cite just about everything we have gotten from other researchers, be it trivial, profound or antagonistic, and this is perfectly fine. I like being able to look in the references to pursue ideas that may not be central to the present research. I even find it disconcerting when it doesn’t exist. I started reading William James “Principles of Psychology” and found it distracting that there were no references to statement that it was clear he had learned from others. But, of course, in our citing practices, papers will vary in their degree of centrality.

None of that is evident from a reference list.

It seems we may need to look over how we are apportioning credit, especially when authorship and references are given so much weight in important measures of success. I don’t have a clear thought on how to do this, because there are always downsides, and simply complicating things by grading the importance of a cite is something that I instinctively think can become problematic.

Perhaps one needs to abandon the traditional ways of indexing success is the way to go (I doubt that will be the case).

But, should we distinguish between peripheral and central contributions from earlier research? Sharing stimuli or sharing data-sets or using tested paradigms, questionnaires, analysis schemes – are they “worth” more than the more peripheral citations, or do we run other risks of conflict and credit arbitrage?

Posted in Uncategorized | Leave a comment

A longish tl;dr conclusion of Srull & Wyer trace

Social Priming is in the news again. Well, Pashler et al published a critique of a recent money-priming paper, and Neuroskeptic wrote it up. So, I figured I would advertise for my Srull & Wyer trace from last spring, where I follow citations forward from Srull & Wyer (1979) – the Ur-Donald paper (and, possibly along with Higghins Rholes & Jones 1977, one of the papres that started the social priming area). I’m still working on a manuscript for this. Not so easy when you use non-traditional (for psychology) methods. (Plus teach too much).


I got a tweet message from Michael Inzlicht asking for the TL;dr (or, take-home message as he said). It isn’t that easy. No tweet-size take-home really. But, I thought I should try to summarize a little bit about what I have learned.

First, I think priming happens – when the priming is strong, relentless and conscious. Srull & Wyer had participants unscramble an awful lot of hostile sentences, and the more they unscrambled, the more they judged Donald to be hostile. The effect was similar for kind, but somewhat weaker. There is nothing subtle about this. What people did not guess was that the rating of Donald had anything to do with the sentences.

I’m much more doubtful about the subtle primes. The few instances, the oblique influence chains, the outside awareness primes. The difference in means there are smaller. In fact, sometimes they seem more like published null results than anything else. But, with only 11 studies in total, and just a handful doing subtle primes, there isn’t much I can say.

But, putting it differently –I think Srull & Wyer would replicate. I’m guessing those that show films or pictures might (but am less certain), and I have big doubts about the subtle/outside consciousness would do so.

Few other interesting things:

The early papers being inspired by Srull & Wyer don’t really work on extensions, rather than, in essence saying “oooooh, look at this cool paper. Wonder if we can adapt it for our own purposes).

The number of participants in each cell is very small in general, so most of the experiments are underpowered.

Standard deviations are only reported in one of the 11 extension papers. It is also the only paper that report effect sizes. It is also never cited… But, there are also a few papers that publish the F-table, which actually is nice.

When I try the R-index and p-curving, it appears that there is over-reporting of significant results, but there may be some evidential value in the lot (but it is so heterogeneous, it is hard to tell).

It is also surprising how quickly the subtle primes dominate.

Some of the papers make me sad. So much careful work, such low power.

Others irritate me with their handwavey confidence, noice inducing methods, and certainty in pronouncing results that, for all I can see, is analyzing noise.

So, shorter: I believe what happened right before will influence how you see the next thing, even if you don’t think they go together (a kind of pathdependence), especially if the next thing is kind of ambiguous. Especially if the former thing is kind of strong and conscious. But, I severely doubt the more subtle kinds as a general effect.

Posted in Uncategorized | Leave a comment

A past crush passes.

When I was eleven, my parents were involved with a school-Theatre Project for my province: Dala-teatern. This meant that we were visited by the actors and the director working on setting this up, having discussion and what not.

I was particularly taken by the Young director. He was skinny, Brown-eyed with long-swoopy dark hair, and looked all dashing, smoking his pipe.

He occupied my fantasies, and I kind of knew that I was still not quite old enough to understand all the parts of this, but if he could just wait until I was a bit older.

He was 22, it was obvious to everybody that I was having a crush. I’d fetch his ashtray, I’d sit in the room listening to him and my parents talking. My mom even commented that he obviously had a Little slave in me.

Nothing really came of the project. I guess a few attempts, and then the big ideas petered out and the actors and the dashing director stopped visiting.

I knew Before i moved to the US that he had kept working in Theatre and published some book, because you know someone, you just notice the name.

Even the first time I returned to Sweden my sister told me he was doing directing, and had continuted writing books, and that was interesting that my old crush were kind of a public person.

Actually, at some later visit, I got some of his books – mostly murder mysteries. Ripping yarns, but very gruesome murders. Some were set up North, others way down South.

As time passed, I realized that his books were now translated into many many languages – an american friend of mine recommended one. Once back, I also realized that the murder mysteries set down South had also been turned into several TV series on Swedish television.

Then the BBC did one with Kenneth Branagh starring as Kurt Wallander.

And, today, a push-notice on my phone told me he had died. From cancer. 67 years old. RIP Henning Mankell.

(I thought of bringing this Little story up in my class on basking in others glory, but I was way too embarrassed. It was my 11 year old crush. I had a few others, and none of them became internationally famous. I’d be sad finding out they died, too, but I’m unlikely to find out via push-notices)

On Edit:

I found a Picture. From a few years later. Click the Dalateatern – second from the left.

Posted in Interlude | Leave a comment