Wyer, Bodehausen and Gorman. Final paper in the trace of Srull & Wyer 1979

The final paper in this early sampling that directly look at social priming is one on Rape judgment by Wyer, Bodenhausen and Gorman. “Cognitive Mediators of Reactions to Rape”

This is another complex design with not many participants. As I mentioned on twitter, lots of sparsely populated cells.

They do mention that they consider the work exploratory rather than confirmatory. I kinda like that.

The idea here, like much of this literature is that when you are faced with making a judgment about a situation or a person or an item that is somewhat uncertain (in this case, descriptions of rape-cases), you won’t take the time searching through all of your memory to find some matching prototype, but most likely stop with some information that you have already in mind. Such as something that was presented in that earlier experiment that has absolutely nothing to do with this experiment….

As with the others, there is a great deal of reasoning about how different kinds of primes may push around judgments, and I won’t really go into these here, because I really don’t think a cell size of 5 can properly answer those, but instead be more focused on what they did.

They recruited an equal number of men and women – students. 35 of each to be exact.

The cover story for the priming was that they wanted to investigate reactions to pictures that are shown in public media, that some people may think are morally objectionable. The priming materials consisted of 10 slides of pictures, where the “to be primed” concept was placed in the 3, 8th and 10th position.

They came up with 7 different priming conditions!

  1. Negative outcomes of aggression (basically, dead people)
  2. Aggressive acts that are considered socially OK (e.g. police subduing a criminal)
  3. Non-sexual intimacy between man and woman (e.g. holding hands)
  4. Female sex object (photos and cartoons)
  5. Sexual arousal (for men, or whoever digs women)
  6. Sexual arousal again – even more explicit

In the actual task, they first viewed all 10 slides without doing anything. In the second viewing, they reported their reactions to the slides using a checklist. They are not entirely clear what the checklist consists of, but they correspond to the following 5 factors:

  1. People are cruel and inhumane
  2. Aggression is socially sanctioned
  3. Intimate relations are desirable
  4. Women are sex objects
  5. Sexual arousal.

Come to think about it, could you really consider showing pictures of dead peeps, officers subduing perps, and women masturbating as priming? I can kind of see them thinking of it like that in 1985 (or prior, as the research must have been done before), but these are the types of stimuli that are used to induce momentary affect – such as Lang’s IAPS.

After this, they went on to the second experiment, which was more of a forensic experiment. Each participant was asked to judge 5 different descriptions of rape-cases on 10 different factors.

Each case described a rape. In each case, there was one section that described the perpetrator as either a stranger or an acquaintance. Another section stated that the woman had either resisted or not resisted (“fearing that she might provoke more serious injury to herself”).

A final fifth version omitted both types of statements. It wasn’t analyzed but was simply there so they could present the order of the cases in a latin-square manner across the 5 individuals per cell.

The participants rated each case according to following (using a 0-10 scale)

  1. a) extent which woman provoked rape
    b) likelihood she could have avoided
    c) likelihood she responded correctly
    d) extent her life in danger
    e) how emotionally upset she was
    f) how harmful effect rape had
    g) belief defendant should be convicted
    h) likelihood he will be convicted
    i) likelihood story is true

The ratings were aggregated into 4 composites
1) Perception of Crime (d, e, f)
2) Perception of victim – true (i)
e) Perception of victim – responsibility (a, b, c*)
4) Conviction judgment (g, h)

* means reverse scored.

In this judgment task, we are up in what would now be considered appropriate levels of observations. All 70 participants rated all of the cases, which were all properly randomized/latin-squared. They do report, very briefly, on what they consider the effect of the situational variations, but the inferential statistics consists simply of “All results to be noted were significant at F(1, 56) > 4,40, p < .05.” They also note the means for only one of these results.

I won’t go into detail here about what they found. It could be interesting, but it is aggregated across 7 different types of primes, so that should add some systematic noisiness (and it also isn’t my main concern).

Then they go on to analyze the effects of priming on the 4 composite judgments. They divide this up in three sections: The first looks at the two types of aggressive priming – comparing to control. The second looks at priming of relationship, and the third on priming women as sex-objects.

Remember. There are 5 individuals in each cell, because they are looking at men and women differently. (and not reporting any standard deviations).

Let’s start with the aggressive acts priming, and judged responsibility of the victim

Victims responsibility for rape
Aggressive outcomes Aggressive acts Control
Defendant stranger
males 3,7 3,4 2,33
females 4,1 3,4 2,7
Defendant acquainntance
males 5,23 2,5 2,47
females 3,77 2,7 3,73
Aggressive outcomes Aggressive acts Control
Victim resisted
Males 2,93 3,2 1,37
females 3,97 2,33 2,3
Victim did not resist
Males 6 3 3,43
Females 3,9 3,77 4,13

I have circled the two ratings that stick out – both in the aggressive outcome priming, and both by the male group (5 individuals) when the defendant ins an acquaintance and the victim did not resist. Means here are above the half-way point (which they are not for the other). They also report that there is significant interactions between priming type, sex of subjects and both of those types (remember these are two different analyses).

Both F’s are actually the same: F(2,56) = 4.09, p < .05.

I’m wondering if that was a typo, though. I’m not sure what the likelihood is that the actual F value would be exactly the same.

But, with only 5 in each cell, who knows if this is due to one particular individual in that particular cell.

Conviction of defendant

When they report their analysis of conviction of the defendant, they actually separate the responses (ought to be convicted vs will be convicted), but collapse across gender. As the two measures are within subjets, this means that the cells are now 10 individuals.

Should be convicted
Aggressive outcomes Aggressive acts Control
Victim resists 9,3 9,9 9,3
Victim does not resist 8,6 9,6 9,5
Victim resists 9,4 9,9 8,8
Victim does not resist 7,5 9,1 8,3
Will be convicted
Aggressive outcomes Aggressive acts Control
Victim resists 5,4 4,2 4,5
Victim does not resist 6,4 2,6 3,9
Victim resists 5,5 3,5 4,7
Victim does not resist 3,1 3,6 2,7

Priming doesn’t do anything to the ratings whether the defendant should be convicted, regardless of whether it is a stranger or an acquaintance.

They report two interactions for this – one 3 way, and one 4 way.

priming x acquaintance x resistance F(2,56) = 5.81, p < .01
Priming x acquaintance x resistance x type of judgment F(2,56) = 4,37, p < .05.

(Yes, I have a hard time understanding what is going on too.)

Priming aggression seems to not have had any discernible effects on the other two types of judgments.

Reading through the discussion, they are appropriately mute about interpreting the results. The make a little flag for possibly this is consistent with a just world. Being primed with aggressive outcomes resulted in higher ratings that the defendant ought to be convicted. But, as I keep harping on, 10 participants in each cell.

They do an interpretation about the “is she partially responsible” results – where the five guys together judged females who were raped by an acquaintance and did not resist the rape were judged as more responsible. We have to recall that this involves 3 different scenarios, although each measure involves two. When the defendant is an acquaintance, there is one scenario where the woman resisted, and one where she didn’t resist. Likewise, when the defendant is a stranger, there is one scenario where she resisted, and one where she didn’t. One of these overlap both judgments.

I don’t know what to make of it. I don’t think anybody should, considering how few participants there are.

Intimacy priming

Here they are comparing the 10 people who were primed with the couples to the control group.

They report a whole bunch of effects. First, how priming may have altered the perception of harm to the victim

Priming control
General percpetion 8,2 7,2 F(1,56) = 8,21, p < .01
acquaintance 8 6,5
Stranger 8,3 7,8 F(1,56) = 4,51, p < .01

Means are overall higher on the scale for those that were primed.

Also seems to have increased the degree to which the participants though the victim told the truth.

priming control
truth 8,5 7,7

They separate this out in men and women, as well as the circumstances, and report a significant effect. Men seemed to move around a bit here, but they report to means, just an F statistic. F(1,56) = 5,67, p < .02- The claim is that for men who judged a victim that resisted a rape by an acquaintance (that is, one scenario only) did not show an elevated belief in truth compared to control. Got that?

So, in other words, the 5 men in the priming condition judged the degree of truth of one story more like the ones in the control group, but we have no idea by how much. Is it close to the 7,7 overall? Does it even make sense to parse it down like this.

Finally victims responsibility (the conviction judgments did not yield any differences).

Priming Control
Males 3,6 2,4
Females 2 3,2 F(1,56) = 6,88, P < .01
Priming Control
Resist 2,6 1,8
Not resist 3 3,8 f(1,56) = 4.20, P < .05

I think what I want to point out first, is that all of the values are on the low end. In the text, they suggest that males judged the victim more responsible than in the control, but the reverse was true for the females.

Then, in the other analysis, they pool across gender to look at the effect of resisting, and they report it weirdly (although technically I can see it being correct). The primed rated the responsibility of the resisting victim higher than for control. The reverse was the case for judging responsibility of the non-resisters. Yes, technically that is true, but non-resisters are overall judged more responsible (or less non-responsible). But, can you even say anything with so few points? (I know, I keep harping on this).

Women as sex objects.

I actually have no idea how they aggregated this. They start with the three sex primes (sex object and two different sexual arousal ones) and then the control, which they put together in a 2 x 2 design. And, from this they find out that the sexual arousal only have a couple of effects that are independent of the sexual object so, as they say “therefore, all results to be presented are independent of the effects of priming stimuli on sexual arousal).

Did they throw out the nudie primed? Or what did they do? I suspect this, as the df is 1, 56 in their analyses, so what I present here seems to use only those individuals that were primed with sex-objects (and the controls).

There were nothing on perception of crime.

However, there were effects on belief that the story was truthful, and on responsibility. 4-way interactions between type of prime (object, control) sex of perceiver, whether rapist was acquaintance or stranger, and whether the woman resisted or not.

So here, for each data point, you have 5 individuals making two judgments,

First, truthfulness

Truth of testimony
Defendant stranger . Defendant acquaintance
Resisted did not resist . Resisted did not resist
Sex object 7,6 6,2 . 7,3 6,7
control 8,7 7,8 . 8,4 5,9
Sex object 9,2 9,3 . 9,1 8,2
Control 8,1 7,4 . 7,2 7,8

One result I could possibly believe is that men and women rate the victim differently when primed with women as a sex-object in that the women tend to believe her more and men to believe her less than compared to control. Yes, things are moving around due to the acquaintance and resistance factors, but geez….

The inferential evidence is this 4-way interaction

priming x sex of subjext x acquaintance x resistance F(1,56) 0 7,45 P < .01.

They also found an effect – same type of 4 way interaction – for victim responsibility.

Vicim responsibility .
Defendant stranger . Defendant acquaintance
Resisted did not resist Resisted did not resist
sex obj prime 2,1 3,4 . 2,9 2,7
no sex obj prime 2,4 2,6 . 1,2 3,7
sex obj prime 2,1 2 . 2 3,9
no sex obj prime 2,5 3,6 . 3,1 4,2

Again, here is the 4-way statistic.

priming x sex of subjext x acquaintance x resistance F(1,56) = 9,31 p < .01

What does this say? Aggregate of 3 ratings, but for 2 vignettes, and still only 5 participants in each point.

Possibly the most striking is that women that were primed held the defendant very low on responsibility, except when it was an acquaintance and she did not resist. But, I am really not sure what the results from 5 women can say here.

The judgment of conviction yielded nothing. They reported an “uninterpretable interaction approaching significance”. P-value was .10. I think we’ll well satisfied calling that not significant.

conviction judging does nothing. They report an “uninterpretable interaction approaching significance” that is a p .10
I’d say, there is nothing.

I kind of feel exhausted after having gone through this. It is, in a way, such a complex design, with the 7 different primes, the 4 different variants of stories, and the four types of questions where some were aggregated and others not. And, with 5 people in each cell, and no standard deviations anywhere, what can you say? Other than this is interesting to follow up on to see if these effects hold. The paper has been cited 61 times, so it is up for a forward trace.

But, as I mentioned, I’m not so sure these are primes rather than emotion inductions, and they differ from the verbal primes earlier.

I’ve gone through and noted the results in such detail, as I want to try to pull something together on these, but mainly I feel depressed over so much work done with so few participants.

Wyer, Robert S., Bodenhausen, Galen V., Gorman, Theresa F. (1985). Cognitive mediators of reactions to rape. Journal of Personality and Social Psychology, 48, 324-338.

Posted in Uncategorized | Leave a comment

Halo attentuation, and the availability of the letter “T” – furthering the trace.

The Halo attenuation paper is the first paper with reported Standard Deviations! And Eta Squared!

Kors I taket (Cross on the ceiling) as we would say in Sweden.

The n of each cell is also 40 (although I’m wondering if the correct n should really be 20, and the 40 is for a particular marginal effect, but I will get back to that).

We’re now up in 1984, and this is a paper on the Halo effect, looking at whether you can use priming as a ways of attenuating it.

The halo effect, if you don’t recall, is that individuals (and even things like companies, as one of my bachelors students found) that have some really good traits – they are nice, they are beautiful, they are successful, also are judged as more positive on other areas as well. The positive traits shine like a halo, and brightens everything around.

This can be a problem when we want to form accurate judgments, for example, in order to do fair performance judgments.

The halo-effect, they claim, has been stubbornly difficult to short-circuit in judgment situations, and they cite work where raters have been informed about the Halo-effect in various ways prior to making judgments, to no avail.

However, the authors have been impressed by Srull & Wyer’s work on how priming particular traits later alters the judgment of an ambiguous character – making the judgments assimilate towards the prime.

Perhaps, they think, if we first prime participant with some trait, they will then be more resistant to letting the halo-effect influence judgment on the trait.

The trait they choose is physical appearance. The “trait” (or what you should call it) that casts its halo is a teacher that has a warm personality and a lenient teaching philosophy, as opposed to a cold hearted bastard with a strict teaching philosophy.

The traits to be rated are liking, physical appearance, mannerism and accent.

As an aside, I find this order of work…interesting. I would think that the true problem with halo is when something like good looks makes people think performance is better or more desirable than warranted, but what they are testing is in some ways the other way around. I’d be happy if my warm personality and lenient teaching style would also make me look pretty, and if I was a cold-hearted bitch with strict rules, why would I care what you think about my looks.

But, no matter, it is an interesting question.

Those that are primed (81 out of the 161 participants) get to rate what they think about a number of physical traits. For example “ I find moustaches 1) extremely irritating ……8) extremely appealing

The to be judged material were two videotapes from Nisbet & Wilson (1977) depicting a teacher either advocating a lenient teaching philosophy or a rigid teaching philosophy. I’m assuming they use the same teacher in both. He is definitely male.

Taylor et al decided to double down on the halo of these two dudes, so they created short vignettes to be read prior to watching the films. The first one of a warm family man, and the other of a cold grouchy lonely man.

These were all crossed, resulting in a 2 x 2 x 2 between subjects design. That is priming vs no priming, warm vs cold vignette and lenient vs harsh philosophy.

In their analysis, they eliminated all of those that got mixed messages (that is, warm vignette with harsh philosophy, or cold vignette with lenient philosophy). These are not reported at all. Of course, it is interesting looking at the double whammys, but I would have liked to see the more ambiguous/contradictory ones also.

Finally, their measure is one where they rate the teacher, on a 7 point scale, how much they like him- After that they rate the teacher on physical appearance, mannerism and accent using that same scale as in the prime: 1: extremely irritating ….8 Extremely appealing.

They also had a sub-set of participants do a memory test of appearance, vignette and videotape, to see if the priming influence this (they looked at this as a more objective measure than the impression formation above).


In a nutshell, priming attenuated the positive halo effect for the physical appearance and mannerism. Nothing much happened for the negative descriptions, and nothing happened for the judgment of the accent

unprimed primed
positive 4,3 3,4
negative 3,2 3,3
unprimed primed
positive 5 3,9
negative 3,6 3,6
unprimed primed
positive 2,9 2,4
negative 2,4 2,7
Priming x physical appearance F(1, 157) = 9,35, p < .05. Eta squared = .05
Priming x mannerism F (1, 157) = 10, 73, p < .05, eta squared = .05

For some of the participants they also did a memory check – which they considered a more objective measure. Mainly they were interested in seeing whether priming also influenced how well they recalled the information in the vignettes and the films.

The priming of physical appearance increased memory for physical appearance, but nothing else. The effect was small: a 7.8 vs 7.2 score (out of maximum of 10, eta squared .6). I would think this is a bona fide priming effect too – have people rate what they think about various appearance things, later, when asked to recall appearance they are better at it (perhaps because they were primed to pay more attention to appearance). They didn’t remember the vignettes and the videos better (outside the appearance).


Here they used 40 participants in each group, which I would think is much more robust, from the vantage point of 2015. (Nelson et al had some rules of thumbs of what effects can be detected with what sample size. I think this is touching it).

But, this is not the type of priming effect that is described by Srull & Wyer. The vignettes and movies are not ambiguous. Instead they are very specifically positive or negative. The role of the priming is to try to regulate the halo effect – the warm and relaxed guy looks better than the cold and rigid guy, not because they look different, but because the warmth and coldness spills over. It seems to have done so in the positive account, but no evidence in the negative account. But, my suspicion is that there wasn’t any negative halo taking place there anyway, so nothing to attenuate.

Is this really of the same nature as pushing around ambiguity? I’m not sure. I’m not sure how to think about ostensible priming effects, although I like Andrew Wilson’s suggestion that it is some kind of canalization. Appearance is brought to mind, then, when judging appearance perhaps one is more likely to pay better attention to it. It wasn’t a big effect, but it was there.

I’ll stick this short paper (published next in the order) here. I’m not sure it really belongs to the part that extends. It primes, and does so subliminally, but it is not about perceiving people, but judging frequency of words with the letter T.

I suspect that it cites Srull & Wyer (and Bargh & Pietromonaco) because the research was done as an honors thesis under Russell Fazio.

Participants were presented with 2 blocks of 20 words that had the letter T in it. They were presented at 1/500 ms tachistoscopically.

At the end, they were asked to rate, on a 9 point scale which one of two letters appeared more often.

For example: “Do more words contain T or S”. Anchors would be “Many more contain T” and “Many more contain S”.

Target comparisons were comparing T to the letter D, M P R and S. (There were other comparisons).

The Primed participants judged that T was more frequent: M= 5, 25 vs m = 0,43) t(13) = 2,43, p <.05.

I’ll just end with ETA OIN SHRDLU

Taylor, Karen, Bernardin, H. John, Riegelhaupt, Barry J. (1984). Halo error: An assessment of priming as a reduction technique. Perceptual and Moteor Skills, 59, 447-457.

Gabrielcik, Adele & Fazio, Russell H. (1984). Priming and frequency estimation: A strict test of the availability heuristic. Personality and Social Psychology BUletin, 10, 85-89.

Posted in Uncategorized | Leave a comment

Higgins Bargh & Lombardi, more on the trace.

I’m skipping a couple of papers (just for the blog series, I will get back to them) to post about this one.

Higgins, Bargh & Lombarid (1984) is definitely one that is extending the priming literature.

Three words first:

5 per cell.


Yes, granted, they actually collapse over those cells, which then ups it to 15 per cell, but still.

Let me get back to the purpose, and how they go about the experiment.

What they want to do is to distinguish between 3 models that can account for the priming effect on categorization.

They consider two types first of all: The mechanistic model, and the excitation transmission model. The first is a very computational one (from Srull & Wyer), whereas the second is more electric. They subdivide the transmission model further into two; The battery model and the synapse model.

And, I think I leave it there, because the models are perhaps not that important for what I’m trying to pursue. I like the idea that they are setting up models, and deriving alternative predictions that can then be tested, of course. None of those “differs from null” things here. But, I’m not entirely sure how well this end up working in the end.

Instead, I think I will focus on what they did, and what the results are. I’m a bit Nassim Taleb inspired here. Theory/schmeory, look at the damned phenomenon.

In this work, they think the crucial dividing point between the models is whether something has been frequently primed or more recently primed.

The Srull & Wyer work suggests that frequency matter. So far, nobody has really looked at recency, although through squinting enough, one could think the Fazio et al, with the puzzle placed in the 7th position (which due to duplication becomes the 7th and the 17th position over 20 presentations) could possibly be considered a mild recency effect – but then again, I’m not sure that effect actually happened.

So, how do they go about this?

The general template is the Donald paradigm: a priming sentence-unscrambling task, followed by judgment of an ambiguously described individual.

But, they didn’t want to have just one ambiguous trait dimension. They wanted more, to see if the effect generalizes. So, they created ambiguous stories that could be either independent/aloof, or adventurous/reckless or Persistent/stubborn. This is actually not analyzed, so I’m not sure what happened. For simplicity’s sake, I’m using the Adventurous/reckless example to describe the priming manipulation.

The idea here is to see whether the more frequently primed, or the more recently primed construct will influence the subsequent judgment of the ambiguous character. And, of course, the frequently primed and the recently primed need to have opposite valence. That is, in the Adventurous/Reckless example, positive synonyms for adventurous are presented more frequently (Bold, courageous, brave) whereas a negative synonym for reckless are presented as the last prime (Foolhardy, to pick one of their synonyms). And, vice versa. It is perfectly nicely crossed.

The priming task was a sentence-unscrambling task, 4 words presented on the screen, same specification as in the original Srull & Wyer. Participants are to say their sentence out loud.

First they go through two 20 sentence practice trials (they don’t know it is practice). Then they go through the 20 sentence priming trial. The 7th, 12th and 15th trial contains the synonyms for one of the valences, and the 20th a synonym for the opposite.

Once they are done with this, they are asked to do a backwards count in thirds from some large number for either 15 seconds or 120 seconds. This is an interference task. The delays are selected so that they can distinguish between the models.

Finally they are presented with what they are supposed to judge, and the method here is actually – ambiguous.

They are presented with a series of ambiguous descriptions that they are supposed to label with one word (written). In the first series they get descriptions of animals, and in the third they get the description of an individual that behaves either adventurous/reckless or any of the other combinations.

The description is very unclear. They talk about series, and I can’t make out whether all participants get to label all of the ambiguous persons, or only the one particularly fitting the prime. I think the latter makes more sense (I’m not sure the priming would work across like this), which means that, again, this is a one-shot measure.

This is all the involvement of the participants (they do the probe for suspicion, and get rid of 3, which they then replace).

The labels are then rated by judges as to how synonymous they are with the primed traits on a 6 point scale. A one indicates that the word coincides with one of the negative synonyms (“same as negative alternative construct”) and a 6 that it coincides with the positive.

So, get that? An additional label of judgment, done by others.

So what do we have here, design-wise? We have two types of priming (positive frequent/negative recent vs. Negative frequent/positive frequent). We have 2 types of delay. And we have 3 types of traits. 12 cells. Five in each.

All of this is thrown into an ANOVA, but only the 2 x 2 is reported.

Brief Long
+ frequent/-recent 3,1 3,4
-frequent/+recent) 4,8 2,9

The interaction here is significant F(1,48) = 4,84 p < .05

They then look at how often participants classify the ambiguous person using either the recent construct or the frequent construct (they throw in ambiguous also, because I guess not everyone were that clear in their labeling).

Brief Long
Recent 21 11
ambiguous 1 3
frequent 8 16

They test this with a chi-square, (N=60) = 6,79, p < .05

Looks like recency works across short delays, but not sure what happens over long delays, suggesting a reversal.

This supposedly discriminates between the three models (only the synaptic model would predict this pattern they say. I won’t evaluate that claim).

What I’m much more baffled by is the very low N. Even when collapsing over the different trait types, there are only 15 individuals in each condition. There are many instances of uncertainty and noise to creep in, and I’m not sure how replicable this is.

Methodologically I think it is interesting, even the reasoning is interesting. I think there are parts in here that are, well, open for intrusion so that results may not be as robust.

Posted in Uncategorized | Leave a comment

Leadership categories, prototypes, and a failure to prime.

Lord, Foti & De Vader’s paper is more… well, inspired, in part, by Srull & Wyer, especially the third experiment.

The two first are much more interested in understanding how we think about leadership, and more precisely, leadership categories in a Rosch manner. In the first experiment, they simply have participants list features that they think a good leader should have, and then they analyze these features to understand whether leadership categories are related through family resemblance, and which features seem to have cue validity (if this feature is present, it is probably a leader).

In the second experiment, they are interested in looking at the accessibility of features that are either leadership related, neutral, or anti-leadership related. The create a questionnaire, that they call the “Acron Leadership Questionnaire” or ALQ. It consists of 25 two word items, such as “Emphasizes goals” “makes jokes” and “neglects details”. I assume that the first is congruent with leadership, the second neutral, and the last incongruent. Participants are asked to respond to each using one of 5 computer buttons that mainly correspond to likert-scale type categories. (Not at all well to extremely well). The researchers are particularly interested in reaction time, as quick reaction times likely means that this particular description is highly accessible, and would give a cue as to the underlying category structure of leaders.

I’m not going to discuss the results, because they are not of the main interest here. But, I described the task in some detail, because most of the participants in this experiment also participated in experiment 3, and these participants were considered primed with the leadership concept.

In the third experiment they are investigating what it is that makes someone perceived as a leader. They propose two mechanisms: the first is how well the described individual matches a prototypical leader, and the second is whether the prototype of leader has recently been brough to mind, in the way that Srull & Wyer brought hostility to mind via their sentence unscrambling task.

To investigate this, Lord et all created three short vignettes of a manager, John Perry. The vignettes were either prototypical, neutral or antiprototypical of leadership. (They dipped into experiment 1 to construct these).

After having read one of these vignettes participants got to rate John Perry on the following:

His “contribution to store managers’ effectiveness”

“his influence in deremining the new product’s success”

“his leadership exhibited”

“His desirability as a district manager”

They also rated how often they thought John Perry would engage in each of the two-word behaviors from the ALQ

The participants that rated were most of those that had done the ALQ rating in experiment 2 (primed condition) plus an additional 34 participants who did it without the priming.

The main finding for us is that priming did not do anything to any of the dependent measures.

I thought I should, maybe, speculate why this is. As with all the earlier work, the n/cell isn’t high. There were 61 participants in the priming condition, but as this was divided into three vignettes (one shot) that leads to about 20 participants in each cell. Even fewer for the non-prime – about 11 per cell.

This, so far, seems to be the standard.

But, should we really expect a priming effect? In the earlier work, the ostensible effect (robust or not) is that the prime biases the judgment of information that is ambiguous. When we can’t make sense of things in and of itself, what has been brought to mind earlier will influence how we judge it. Hence, we have the more hostile Donald. But, of these three vignettes, only the neutral could possibly be considered ambiguous, or at least not displaying any particular cues as to whether John Perry is a good leader or not. This should be the only place where you would see an effect of a prime biasing responses.

Also, in all the earlier priming work, the prime (or variants of primes) are designed so that they should be able to bias the subsequent responses. I’m not sure that responding to the ALQ can be considered a biasing prime. It contains both prototypical and anti-prototypical leadership behaviors. If anything, the prime could possibly have narrowed standard deviations (the concept of leadership is already activated, so the participants don’t have to create one ad hoc), and possibly speed up responses – although they don’t measure it.

It is mildly interesting that the exposure doesn’t seem to have an effect, but there isn’t much more that can be said.

Lord, Robert G., Foti, Roseanne, J. & De Vader, Christy L. (1984). A Test of Leadersip Categorization theory: Internal Structure, Information Processing, and Leadership Perceptions. Organizational behavior and human performance, 34, 343-378.

Posted in Uncategorized | Leave a comment

Will funny TV prime kids with Funny?

The first paper looking at non-college students, in fact young children, turns up 1983. Byron Reeves and Gina Garramone tested whether exposure to a TV program could prime traits that are then use to judge another character – very much in line with the work of Srull & Wyer and all the others.

The participants were kids in 2nd, 4th and 6th grade.* Two classes for each. One were the experimental class, and the other the control class.

What they wanted to prime was “Funny”. First, they put together a 10 minute video with clips from “prime time syndicated situation comedy programs”. The clips had been rated by other kids on how much they made them laugh, and how funny they thought the characters were.

Then, they created a vignette of Andy which was ambiguous as to how funny he was, although the situations described were ones where he could have been funny. For example “Later in the day, Andy’s class went on a field trip and Andy made jokes on the bus”.

The experimental classes got to see the film, and then they read about Andy. The control classes only read about Andy. Then they rated him on 25 traits, using a scale from 1-4.

Target traits, they claim, were Funny, Attractive and Strong. I’m not sure how they came up with this. I certainly buy Funny, but wonder if they did a bit of exploring to find effects for both Attractive and Strong also.

Overall, collapsed over grade, priming did not result in any difference. Neither did class. But, there were interactions for the above three traits. Mainly, it looks like there was a lot more variance between the grades in the control condition than in the experimental condition. Especially the second graders tended to rate Andy as more funny, attractive and strong in the control condition than in the experimental condition. The 4th graders don’t seem to differ much, whereas the 6th graders tended to go in the other direction from the 2nd graders, but the differences are not large

Control Experimental Control Experimental Control Experimental
2nd 3,6 3 2,67 2 2,6 2,2
4th 3,1 3 2,18 1,9 2,25 2,12
6th 2,95 3,27 1,75 2,16 1,87 2,29

The cell sizes ranged from 19-25 (whole classes), so mainly in range of what has been done before.

Reeves, Byron & Garramone, Gina M. (1983). Television’s influence on children’s encoding of person information. Human Communication Research, 10 257-268.

Posted in Uncategorized | Leave a comment

The two next papers in my Srull & Wyer trace: Carver et al (1983) and Fazio et al (1983)

Onward to two more papers in the trail of Srull & Wyer citations.

The first one is the easier one.: Carver, Ganellen, Froming and Chamber’s probe into Modeling and Category Accessibility. Modeling here is not related to SEM or Neural nets, or posing in front of a camera, but to Observational Learning, where the model is the person we emulate.

It consists of 2 experiments where the first uses the Donald Paragraph*, and the second uses the sentence priming task from Srull & Wyer.

In experiment 1, participants are first exposed to a video of a businessman and his secretary (ungendered, but one can guess). In one video, the businessman was hostile and derogatory, in the other he was neutral.

Then they judged toned-down-for-Florida Donald, on the same scales as Srull & Wyer.

Originally, they intended to also look at gender-differences, so they recruited 20 males and 19 females (appears that one guessed the connection and got dropped), but they found no gender differences. They did find a model difference, as predicted.

Hostile Neutral
Descriptively related 38,83 35,14 F (1,74) = 5,58, p < .03

Nothing for the evaluatively related (interesting).

Experiment 2 reads like a mash-up between Milgram and Srull & Wyer – minus the scientist demanding obedience.

The ostensible task that the participants are doing is a learning task. They will be the teacher, and their job is to administer electric shocks to the learner when he did a mistake (the learner is a confederate of undisclosed gender, but I don’t think “he” is a stretch.) The shocking apparatus has 10 setting, the participant gets to experience and rate the intensity of the shock. The instruction to the participant is to teach the problem to the learner as effectively as possible.


Then, oops, a poor masters-student would like to get some help (this is definitely a she), could they please help?

This task is the sentence-unscrambling task from Srull & Wyer. There are 30 items, and the mix is either 80% hostile or 20% hostile.

They then proceed to the learning task. There are 34 trials, and the confederate is mistaken on 20 of these. Yes, like in milgram, no shocks are administered. I presume that, unlike Milgram, there are no shrieks of pain.

In a debriefing, participants did not realize that the priming and the shocking had anything to do with one-another, but one participant didn’t believe he had administered any shocks, so he was eliminated from the analysis.

The Dependent measure was, of course, the average shock-intensity, with the predicted effect that the 80% mix would administer stronger shocks.

It came true!

80% hostile 20% hostile t cohens d
3,31 2,24 t(29) = 2,.24, p < .05 0,82

*In, presumably a pre-study, they found that the University of Miami students perceived Donald as more hostile than the University of Illinois students did, so the softened and/or deleted some elements to make it more ambiguous.

The second paper, by Fazio, Powell & Herr is a lot more… complicated.

Fazio is mr Attitudes, and attitudes is what he has explored during his career. (I figure I disclose again that he was one of my professors, and I took his attitudes class).

The question they are exploring here is whether being exposed to an attitude object will influence subsequent judgments of something that is unrelated.

To back-track a bit, an attitude is considered an evaluation of an attitude object. To make it more concrete – I like my i-phone, so my attitude is positive. I loathe ketchup, so my attitude is negative, in fact very strongly so. But, for lots of things that we are at least somewhat familiar with, we do have this kind of mild positive/negative evaluation. We like or dislike them.

Now, if I’m exposed to something that I have either a positive or negative attitude towards, quite incidentally, could that possibly bias me so that the judgment I make of an ambiguous person will be more like my attitude? Or, would exposing me to ketchup drive me to judge Donald as more Hostile?

That is a rather oblique chain.

Their first experiment is a conceptual replication of Srull & Wyer. They simply need to find out if priming with evaluative adjectives could bias judgment in a person perception task.

The priming task was the Color task described in the earlier Fazio study (which I believe is adapted from Higgins et al 1977). The ten pairs were presented twice. (As they had been in the earlier also)

They created four conditions:

a)Positive applicable, b) negative applicable, c) positive non-applicable, d) negative non-applicable.

The selected prime-words are listed below.







Negative non-applicable
















The ambiguous story was about Ted, a high-school student waiting for his ride who is then asked to participate in an experiment where he solves a number of problems.

Participants are then asked to indicate why they think Ted participated by rating the following on a 0-10 point scale:

  1. In order to earn the extra money
  2. To have something to do while waiting for his ride
  3. Because he liked and was interested in the experimental task.

What they were particularly interested in was whether the prime could move around the judgment of the third reason – the intrinsically motivated reason.

They do a somewhat convoluted summary of the causes. They take the mean of the first two (extrinsically motivated), then subtract the rating of the third. In the resulting index it means then that the lower the number the more the participant attributed the reason to participate to intrinsic motivation.

The results are weak. The mean rating does not reach significance, although the pattern looks like they had hoped. The reason, they speculate, is that the standard deviation in each cell is very high. There are 15 individuals in each, so maybe not surprising it is unstable and non-significant.







Negative non-applicable
1,567 3,200 3,033 3,033

Mean standard deviation: 8,57. Cohen’s d for the applicable primes: ,19







Negative non-applicable
Above median 4 10 9 6
Below Median 11 5 6 9

So, instead, they do a median split, and count number of participants above and below median- as you can see above. They find an interaction between applicability and valence using a non-parametric analysis, and in the applicable condition, they find a significant difference between the priming conditions.

Just for laughs I stuck the applicable and the non-applicable in separate Chi-square analyses, and found the first one significant, but not the other one.

They admit that this is disappointingly weak, but proceed to the next task anyway.

Experiment 2.

Here the connections are even more stretched – and this is the main experiment.

They really didn’t want to prime with attitudes, but to prime with an attitude-objects; simply exposing someone to this object to see if that changes the rating (as in the ketchup-Donald suggestion).

But, our attitudes can be somewhat idiosyncratic, of varying strengths and really difficult to control in an experimental setting, so they do what they have done in lots of experiments since: they create and manipulate attitudes.

Participants are presented with 5 puzzles. These are presented in two versions. For 1/3 of the participants (that is about 37 or so) the worksheets are not filled in, and they are asked to work on solving the puzzles. For the remaining 2/3rds (whatever n to fill up to 112 participants), they get the same puzzles, but these are now solved, and they get to listen to a tape where they explain the puzzles and how to solve them. These two conditions are the direct experience and the indirect experience conditions.

After they have been exposed to the puzzles, they get to rate each puzzle on a -5 to + 5 scale, where -5 means extremely boring, and -5 extremely interesting.

For half of those in the indirect condition the experience doesn’t stop here. No, they also get to repeat their ratings of the puzzles twice. This is done under the guise that the experimenter needs some help with the data-entry and getting the ratings to a professor.

So, here we have created three types of attitude-formations:

Direct experience,

1 repetition of explicit attitude

Indirect experience

1 repetition of explicit attitude

Indirect experience

3 repetitions of explicit attitudes

One can think that the attitude would be stronger in the first and 3rd condition than in the middle condition.

In the next step, the priming takes place, and the individual priming is in part tailor-made as follows.

The experimenter is presented with the participant’s favorite and least favorite puzzle. The participant is then randomly assigned to either a positive or a negative condition. The priming task is then created with the selected puzzle in the 7th position of that same color priming task.

Now, the participant will be primed either by her or his favorite or least favorite puzzle.

Finally, they get to rate Ted, who has now become even more ambiguous as to what motivates him to participate in the experiment (incidentally, of course, it is a puzzle experiment). They add an open-ended question also, which they simply correlate with the other ratings.

The ratings are combined as in experiment 1, and here are the means.

Direct experience,

1 repetition of explicit attitude

Indirect experience

1 repetition of explicit attitude

Indirect experience

3 repetitions of explicit attitudes

Positive -,309 ,063 -,170
Negative ,285 -,234 ,366

Notice first of all that no ratings are higher than the absolute value around 0. Remember that the scale goes from 0-10, and that the intrinsically motivated score was subtracted from the mean score of the two extrinsically motivated scores. This means that they were all very similar. I’m not sure one can fruitfully compare to experiment 1, as so many of the numbers are embedded in one another, but the means were much higher there, and there seems to be much more variation.

The reported results are marginal: The main effect of puzzle valence: F(1,106) = 2,72, p = .11

Condition x valence interaction F(2,106) = 2,87, p = .06.

But, let’s look at the means to see what they are after. The idea is that you may be able to have a stronger reaction if the attitude is strong. This is not the same as extreme. It is just strong. It will be reliably and quickly evoked in whatever direction. And, a way to create a strong attitude is to either interact with the attitude object, or to repeat ones attitude several times. That is the case in the two conditions on the flanks. Those primed with positive attitude objects end up having a negative score, meaning that they attributed Teds behavior more to intrinsic than extrinsic motivation, whereas when primed with negative objects, the reaction is the reversed. It is as if the evaluation of the object kinda spills over into the evaluation of puzzle solving ted (he does it because he likes it). The stats for the direct experience are t(106) = 2,01, p < .05, and for the repetition t(106) = 1,84, p < .07.

In the weak attitude condition, the pattern is the opposite way, although I’m not sure why that would be the case. T(106) = 1.01.

Then again, none of this is very strong evidence. I keep wondering (now) if they are chasing noise.

In both these papers, there are never more than 20 people in each cell, measuring effects that really should be thought of as weak, and which are showing up as weak also. I’m not sure how to put this all together.

Carver, Charles S., Ganellen, Ronald, J., Froming, William J., & Chambers, William (1983). Modeling: An analysis in terms of category accessibility. Journal of Experimental Social Psychology, 19, 403-421.

Fazio, Russell H., Powell, Martha C., Herr, Paul M. (1983) Toward a process model of the attitude-behavior relation: Accessings one’s attitude upon mere observation of the attitude object. Journal of Personality and Social Psychology, 44, 723-735.

Posted in Uncategorized | Leave a comment

A psychologist goes to her 40 year reunion.

In our Advanced Social Psychology book, there is a claim that people tend to be very good at recognizing facial identity (that is, people) across very long time-spans. I got to test that this weekend. I returned to my hometown for our 40 year reunion. This is Sweden, so the reunion was for “Grundskolan”, the 9 year obligatory school from 7-16.

I left Ludvika in 1978, when I was 19, after the additional 3 years of “Gymnasium”. Although I have been back, visiting my parents on occasion, before even they left around 2002, the visits have been sparse and brief, with no known meetings with old school mates. Many, like me, also left town and settled elsewhere. As I settled in Los Angeles that didn’t give much opportunity to meet up, and keep the neural net-works updated on identity despite the subtle shifts of looks.

It was hard!

I came early. One of the arrangers greeted me, and clearly knew who I was, and I was at a loss! Didn’t help she was an impish one letting me stew on it for a bit (though she took mercy on me finally).

The meetings really were of guessing. Sometimes it didn’t go both ways – people recognized me, even though I didn’t recognize them, and vice versa. But, there really was that hesitating, do I remember, do I recognize, will there kick something in.

There were a few where I think I did get them from the looks. Two women with fairly distinctive features that were still there.

There were others where the name did something to alter recognition so I knew, yes, I remember who you are, and how you looked, and can match it with how you look now.

But, had I run into any of them in the street, there would not have been a glimmer of recognition.

We’re all chunkier, and wrinklier, and saggier, and greyer or balder than the 16 year olds we last remembered, as we should be.

Just fascinating.

And now, I find that whatever networks are processing are slowly connecting the looks now with the looks then, creating more of a continuity – the same that I have with those that I have seen more frequently across time, where sometimes chunks like 10 years seems to do very little to impair recognition, and the sense is that they look exactly the same as then.

I had a lovely lovely time. It was nice meeting up, and talking to people, and hearing about their lives.

Posted in Uncategorized | Leave a comment

Next Up, Herr, Sherman & Fazio. On Ferocious and Large animals.

Next up in my Srull & Wyer tracing, is Herr, Sherman & Fazios paper, ”on the consequences of priming: Assimilation and contrast effects”, published 1983 in the Journal of experimental social psychology.*

The research here is more conceptually related than directly pushing further on either the Donald story, or using sentence unscrambling tasks.

Where they are similar is that the work investigates incidental priming (ferocity or size in this case), and how this prime then potentially influences the ratings of ambiguous stimuli (non-existing animals).

Where they extend is that they are checking whether priming always leads to assimilation (which one can argue is the case in the original Srull & Wyer, as well as in the Bargh & Pietromonaco article), or if there are instances where the prime leads to a contrast effect instead.

Some notes first: I had trouble getting my head around the results, until I graphed it in a way I could understand – and then things became much clearer. I’ll see if I can share those graphs.

The paper sports two allegedly significant 3-way interactions, where one has a p-levels of .058, and the other .067. We don’t know if those p-values are approaching or running away from that arbitrary cut-off level of significance, as the authors don’t say. No matter. It was 1983, and those p-values like to boogie anyway.

The back-ground

The puzzle they try to answer (according to the introduction) is that whereas the priming work shows an assimilation effect (judgment gets drawn closer to the accessible category), it is much more common in the social judgment literature to see a contrast effect. Perhaps, they think, it is the ambiguity of the stimulus-to-be-judged. For non-ambigous stimuli, a contrast effect might emerge. In addition, the extremity of the prime may matter. If the primed category is very extreme, perhaps there will be no assimilation, even of ambiguous stimuli. It is simply too far away to serve as a reasonable category, and thus a contrast effect will be evident. They cite plenty of older evidence for this.

They don’t evaluate this hypothesis in the domain of hostility or kindness, though. Instead they use judgment of features of animals that are either known or unknown.

The prediction would then be that for known animals, priming will simply result in the judgment being contrasted away from the activated category. For the unknown animals, priming with extreme exemplars will also result in contrast, whereas priming with moderate exemplars will result in assimilation. I will show a table of predictions below.

The set up.

In the first experiment, they test rating of ferocity, and in the second rating of size.

First, they gather together 20 animals as exemplars of ferocity. Four are rated as extremely ferocious (Grizzly bear, Tiger, Lion, Shark) Four are rated as moderately ferocious (Vulture, Wolf, Rhinoceros, Badger). Then we have four moderately non-ferocious (Kangaroo, Opossum, Cat, Seal), and finally the meekest of all (Dove, Kitten, Rabbit Puppy).

There is also the 4 remaining – the moderate set: Fox, Porcupine, Weasel and Bat. These are the animals that will later be rated on ferocity, along with two imagninary animals,, the Jabo and the Lemphor.

Priming task

The priming task is an ostensible Stroop task. Participants see a series of pairs of slides. (10 pairs in total). The first slide in each pair shows a word, that the participant is supposed to memorize. After a while, they see the second slide. For example RED . Now they are supposed to say the memorized word, and the color of the ink as fast as possible. The researcher is standing there with a stop-watch, tick-tock.

Most words are neutral, but in the 3rd, 5th, 7th and 8th position they slip in the animal name.

The task is modeled on the Higgins, Rholes and Jones (1977) task. (This seems to be one of the Ur-papers for this particular line of questioning).

This ends up being 4 priming conditions, 20 people in each cell (better than 8). The conditions can be divided into two dimensions: Ferocious or not, Extreme or Moderate.

This is where I got confused, as extremity bundles together the kitten and the lion, but I think I got the understanding sorted in the end.

Next, in good Social Psychology tradition, the students are lied to. It is the standard, “two experiment” set-up, where , oh, by the way, we have a short other experiment that we would be ever so grateful if you took part in. Won’t take long, promise.

This is the rating task. They have cards with the animals (I’m assuming they give the name. Who has ever seen a Lemphor?), and ratings sheet. For each animal, they give a rating, on an 11 point scale, how 1) ferocious the animal is, 2) the “likelihood that the animal would cause harm” and 3) “the seriousness of harm the animal could inflict”.

Half of the participants rate the real animals before the imaginary, and the other half the imaginary before the real.

I note there are 4 real animals and 2 fake, but the mean ratings for all of them are thrown into the ANOVA later. Hmmmm.

Then they debrief, and nobody believed that the two experiments really were related.

The results

First, something about how they created the scores. First they averaged each rating across the animals (so a mean score of ferocity across 4 real animals, and a mean score of ferocity across 2 imagniary animals). This results in 3 means. These means were then added up to a composite score.

I was a little bit surprised fining scores above 11 on an 11 point likert scale, but this explains it. I think I would have preferred averages of averages (to relate it better to the scale), but these are just linear combinations, so no matter.

As promised, this is the pattern of ratings that we would expect:

Low ferocious prime High Ferocios prime
real ambiguous real ambiguous
extreme Higher Higher extreme Lower Lower
moderate Higher Lower moderate Lower Higher

And, what do they find? Let me give the actual ratings.

Low Ferocious prime High Ferocious prime
real ambiguous real ambiguous
extreme 10,37 11,8 extreme 10,9 10,45
moderate 10,05 10,8 moderate 9,54 12,88

The ambiguous animals are, overall, rated as more ferocious than the real animals. This is also significant in the ANOVA: Then, there’s the allgeded 3-way interaction between type of animal (real or imaginary) Ferocity (Meek or ferocious) and extremity (Extreme or moderate).

They seem to take this as a reason to do a planned contrast to test the above prediction (that only the ambiguous animals get assimilated in the moderate prim condition, whereas the rest get contrasted away). This one is significant, F(1,72) = 6,50 p < .01. (I don’t know how they set this together, though. I’ve spent quite a bit puzzling this out.).

If you look at it, the one that sticks out (and they mention it too) is the rating of the ambiguous animals after having been exposed to a moderately ferocious prime. This is by far the highest composite ferocity score.

You can look at the above again for either the real or the ambiguous animals (and this is what they do analyze). You would expect that the real animals would be rated as more ferocious after being exposed to the low ferocious prime, and less ferocious after being exposed to the high ferocious prime (contrast effect). Just sheer eye-balling says that this is not there, and neither does their ANOVA.

For the ambiguous, one would expect higher ferocity after extreme low ferocious prime than after extreme high ferocious prim (contrast) which eyeballing may suggest is weakly the case. Also, one would think that (according to assimilation) the rating would be of lower ferocity after moderate low ferocious prime, and higher ferocity after higher ferocious prime. Which you kinda see. Though the interaction is not significant.

Like they say, the patterns are in the expected direction, but not statistically significant. Well, good enough for publishing in 1983.



Experiment 2

On to experiment 2.

Set-up is the same, but there are a few differences:

First, they double the amount of participants, from 80 to 160 (40 per cell). Yay. More power!

Second, instead of priming ferocity, they prime size. Like in the first experiment, they test up a number of animals, as follows:

  • Large: Whale, Elephant, Hippo, Rhinoceros.
  • Moderately large: Antelope, Cow, Lion, Tiger
  • Moderate: Wolf, Sheep, Pig, Goat
  • Moderately small: Porcupine, Gopher, Groundhog, Cat.
  • Small: Snail, Flea, Minnow, Ant.

As in the first experiment, the Middle category were rated, whereas the other categories were used in the priming task.

Here are the results.

Low size prime High Size prime
real ambiguous real ambiguous
extreme 13,73 11,99 extreme 12,10 10,68
moderate 13,96 10,76 moderate 13,16 12,68

Eyeballing the ratings of the real animals, they don’t seem to really differ much. They are actually always rated as larger than the ambiguous animals. But, their analysis also shows that the low-size prime results in larger ratings of the real animals, than the high size prime.

For the unreal animals there is the predicted interaction. The moderate primes seem to result in assimilation, whereas the extreme prime results in contrast, and this interaction is significant: F(1,152) = 6,70, p = .011

Yay for more power!

Herr, Paul M, Sherman, Steven J., & Fazio, Russell H. (1983)  On the consequences of priming: Assimilation and contrast effects. Journal of Experimental Social Psychology, 19, 323-340

*Jim and Russ were professors in my department when I was going to graduate school, although this was published long before I ever thought of doing a PhD in anything anywhere.

Posted in Uncategorized | Leave a comment

Tracking Srull & Wyer (1979): Bargh & Pietromonaco, 1982

I’m working my way through the papers that have cited Srull & Wyer (1979), which is rather illuminating. Right now I’m going through those papers that can be considered direct extensions of that research (and hence warrants a more careful look at the experiments). I’m planning on writing this up more comprehensively, but paper 3 in the list is Bargh and Pietromonaco’s 1982 paper, which uses the same Donald vignette, and trait measures as the Srull & Wyer paper from 1979, but attempts to prime hostility outside awareness. I think this is a classic paper also. Here’s a proto-writeup.

Bargh & Pietromonaco, 1982.

The first extension of the priming task that is not done by Srull & Wyer is Barg and Pietromonaco’s 1982 paper, which I would think is also a classic.

They take the Donald Vignette, and the trait rating task directly from Srull & Wyer (1979). (It doesn’t seem like they take the two vignettes, just the one that is published).

But, the priming set up here is different. They are really after the “outside awareness” priming idea, and I’m fairly impressed by the amount of care they place on demonstrating that individuals are not aware of the priming.

Making sure people are not aware that they are primed.

The priming task here is a vigilance task. Participants are brought into the lab and told their task is to indicate on which side of the screen they see a flash by pressing on one of two buttons, not surprisingly labeled “left” and “right”. The flashes are really words that are presented for 100 ms. Some of them are related to hostility, and some are neutral. All of the fifteen hostile words come from the Srull & Wyer (1979). The 15 neutral words are also carefully selected according to standards. (Yes, words do get presented more than once). The words are presented parafoveally to ensure that people don’t become aware of the meaning of the words. A great deal of space is taken up by describing the details, which, for my cognitively trained self, is very very nice. There are angles, and distance from the monitor and all that. No chin rest though, but as they point out, should they move closer, the presented words will just move further away from the fovea.

In experiment 1, there are actually 3 conditions that are solely geared towards probing whether participants are aware of the meaning of the flashed words. In the two first conditions called the “rate” conditions, participants are exposed to either 20% or 80% hostile words. After the vigilance task, participants are given a recognition memory test – 60 words: half hostile, half control. Of these, half appeared in the vigilance task (targets), and the other half did not (distractors). The task is to indicate which words they might recognize from the test.

In the third, the “guess” condition, participants were presented with an 80% hostile mix. The guess participants also did not do the vigilance task. Instead, they were told that the flashes were words, and they were supposed to guess what it said.

Experiment 2 is essentially a repeat of the “guess” condition and the 80% mix “test” condition, but with even tighter measure. For the “rate” task, participants’ recognition memory was tested after each trial in a 3-word choice task, and in the “guess” task, participants are no longer allowed to pass – they must guess something.

It actually doesn’t much matter. If we take the two “Guess” repeats first, participants were really bad at guessing correctly, even with (what the authors say) rather lax coding rules. In experiment 1, 16 out of 900 trials (9 participants, 100 trials each) are correctly guessed words. 4 of these are hostile, the rest neutral. They are not better when forced to guess. Out of 1000 trials, 10 hostile and 6 neutral were correctly guessed.

For the rating task, recognition, in both versions were not different from chance.

These are a lot of repetitions, and I agree that most likely participants were not consciously aware of the content of the primes. I’m quite impressed by the care they took here.

The Priming experiment

Then comes the main experiment – the reason for all of this care, which is the three “rate conditions”. Here participants start with the vigilance task in either a 0% mix, a 20% mix or an 80% mix of hostile words. The idea, following the results from S & W (1979) is that the more you are exposed to the hostile words, the more likely you are to rate Donald higher on traits related to hostility.

But, before they get into the results on this one shot rating task, they investigate responses in the vigilance task to detect whether there is some difference in processing depending on the proportion of hostile words in the mix. They call this the “Amount of processing” measure. What they are after is trying to find evidence that the hostile words have activated some kind of processing, which then spills over in the rating task. What they claim (to cite directly)

“…direct support for the proposed mediating process of automatic category activation would be provided by poorer performance on the vigilance task by the 80% hostile word group relative to the 20% group, and by the 20% group relative to the 0% group.”


“the subject would have less of his limited processing capacity for the demands of the vigilance task”.

Then, combining this with lack of awareness, they think it would be compelling evidence for automatically activating these categories.

Perhaps. I kinda buy it.

To test this, they looked at percent correct, percent incorrect, non-responses, and reaction times. (The RT’s included both correct and incorrect responses). They did not look at it collapsed across the entire experiment, but divided it up into chunks of 20 responses. That way, they get a time-line of errors and reaction times. Interesting.

The reaction times yield nothing of interest. The non-responses are too few to analyze (doesn’t surprise me – I have not bothered with non-responses in my analyses), and the error rates end up, not surprisingly, being the complement of the proportion correct.

The last one is the one they do spend some time on analyzing. There seems to be an effect of proportion of hostile words. From my own calculation (transcribing their graph into numbers, and taking means), it looks like the more hostile the mix is, the fewer correct responses

Mix Prop correct
0% 0,972
20% 0,968
80% 0,958

These are % correct rates that I would be fine with in an RT task under speed-accuracy tradeoff rules.

Proportion correct also varies across block, which is not surprising. There are fewer correct in the first block, and in the last block than in the middle blocks. There is also a marginal interaction (p = .07). Looking at the pattern, responses in the vigilance task without hostile words looks flat. For the mixes with hostile words, there is a pronounced drop in performance in the fifth block.


Interesting. I would have liked to see something similar for the test task.

And, now, finally, how did the primed participants do on the Donald rating? Remember, it is on a 0-10 likert scale, the same as the one used by S & W (1979).

Descriptively related Evaluatively related
80% 7,47 5,94
20% 6,75 5,77
0% 6,99 4,95

Let’s compare that to the immediate rating of the Srull & Wyer of 1979. Note that the 60 80% etc refers to the lenght of the sentence unscrambling task. In this case, 60 sentences to unscramble, 80% of which were related to hostility.

Descriptively related Evaluatively related
60 80% 9,7 7,9
30 80% 8,5 6,8
60 20% 6,7 5
30 20% 5,7 3,2

And, finally, with the immediate ratings from Srull & Wyer 1980. (note, the reason there are two 70% and 30% here is because they were testing type of delay – either between priming and reading the vignette, or between the vignette and the judgment. These are only the immediate conditions, so they should be the same).

Descriptively related Evaluatively related
70% 6,9 5,5
30% 5,2 4,6
70% 7,1 6,2
30% 5,5 4,3

(In all cases, I have transformed graphs into numbers, so they could be mildly off).

The contrast between high and low proportion of priming is not as strong, but there are some effects. (The paper actually provides the output for the ANOVA, where they compare the descriptively related and the evaluatively related).

But, again, this is a one-shot measure. They have 25 individuals in each cell, compared to the 8 in each cell in both Srull & Wyer papers (yes, those ratings that I show above are based on the ratings by 8 individuals per cell. High likelihood of over-estimation).

I’m not sure what to think about it. I do think there is a priming effect, but that it should be replicated with more than 25 participants in each cell. (Still, it is an improvement over 8 participants per cell). The effect looks small. I’m more impressed with the work assuring that the words are presented outside awareness than the possibility that there is a prime.

Bargh, John A. & Pietromonaco, Paula (1982) Automatic information processing and social perception: The influence of trait information presented outside of conscious awareness on impression formation. Journal of Personality and Social Psychology, 43, 437-449.

N per cell – because I think this will be interesting to track.

Experiment condition N N/cell
Experiment 1 Rate 75 25
Test 24 12
Guess 9 9
Experiment 2 Test 10
Guess 10
Posted in Uncategorized | Leave a comment

Finger-ratio curiosity: A satisfying answer!

So, I posted some curiosities here on the blog. And, got one of them satisfied! Within hours of wondering about how robust the digit-ratio marker is for exposure to androgen in the womb, Ruben Arslan tweeted a link to this meta-analysis by Martin Voracek. Conclusion: Not very.

The meta-analysis collects studies that has compared repeat polymorphism for two androgen receptor genes with digit ratio. I wasn’t sure this was getting to the point first. I guess I’m a bit wary after all those SNP searches for genes for traits that we know are heritable (intelligence, personality) which have turned up no results. But, I think I understand that this is different, more clearly established, and why finding no relationship is actually very suggestive that the finger-ratio is not a useful maker for the type of research I might be interested in.

Why would it be interesting with a marker such as the 2D:4D digit ratio? From what I understand, it is well established that exposure to androgens during gestation has enduring organizational effects on the mind. Exposure will alter the individual in a way that is permanent (for example, making them men, or not making them men as in those xy individuals who do not have androgen receptors). How well established this is, I don’t know. Again, I have to take researchers word for it (both Voracek, and Marc Breedlove from an earlier review). Mostly, I think, it has been established through animal studies, and studies of unusual humans, such as those xy females. Of course, it would be very interesting to study, in more detail, how androgen exposure may influence human traits in more detail.

But, as Voracek points out, testing this directly by checking for actual prenatal androgen exposure in humans is not usually feasible for many reasons. For one, amniocentesis is not risk free. I doubt anybody would allow this simply for research purposes. And, even if it was, it would be a rather massive undertaking to measure and follow a large enough sample of kids to see how this would work. But, if there could be a fairly reliable, non-invasive, and easily measured marker for androgen exposure, one could use it as proxy in research. Which is exactly what they have done for the 2D:4D marker.

Breedlove (who, overall, is a lot more convinced by evidence that it is useful) is careful to mention that the ratio does not have discriminant value. That is, you can’t look at someone’s fingers and determine from the ratio alone whether they are male or female, gay or straight, autistic or not, nice to their partner or not. (Like I mentioned in my other post, both my daughter and MIL have longer 4D than 2D – to the degree that my daughter once mentioned to me that the ring-finger was longer than the index finger, and I mentioned that this really varied with people in interesting ways). What it can be used for, according to Breedlove, is doing research on groups.

I should understand this, being the kind of researcher that I am. But, as Breedlove also notes, in the daily press and popular accounts, it is always talked about as something discriminant. Look at their hands, what does it say about them. (After all, this is something we would like to have. Some marker we can notice to make judgments about others without having to take the risk of getting to know them closer).

Still, even if it is not a marker you can use to directly say something about an individual, it could potentially be useful for research about how certain individual differences (possibly related to what we think of as masculinity) may arise from prenatal androgen exposure.

If it is a reliable marker, that is.

In 2003, Manning et al published a small study (50 people) that suggested that the length of a particular region that codes for Androgen Receptors is positively correlated with finger ratio (actually, it looks like it was positively correlated with the ratio on the right hand, and also with the difference between the ratios of the two hands). This was taken as evidence that, yes indeed, finger ratio is a marker for androgen exposure in the womb.

I think the chain goes somewhat like this. Cells will not respond to androgens if they don’t have receptors that bind to the androgens and then allow for changes to take place. (XY women lack androgen receptors). The genes coding for androgen receptors are evolutionarily old, and fairly well preserved across species. There are two different coding sites – the CAG and the GGC. Most of the comparison with finger-ratio has been done on the CAG site. Moreover, these are “repeat polymorphic” sites. That is, the snippet that codes for the receptor (or part of the receptor) comes in multiple copies. The measured ranges are 7-37 or 9-41 repeats. Moreover, the repeats are active. They code for the receptor. The effect is linear: the more repeats, the more receptors. And, the idea then is, the more receptors, the more sensitive to prenatal androgen. This can all be established without measuring fingers (and I assume it has). The next step to establish is that, indeed, the finger-ratios are related to the length of the coding-sites.

Length of coding ->prenatal androgen exposure -> 2D:4D finger ratios

And it doesn’t.

The finger measures are the Right hand 2D:4D, the Left hand 2D:4D and then the difference in ratio between the right hand and left hand.

For the 18 CAG studies (2909 individuals overall), the correlation for right hand is .005 [-,032 to .042]. for 16 (2803 individuals) looking at the left hand, r is -.003 [-.041 to .034]. Finally, for the 16 (2796 individuals) looking at the difference r is .013 [-.024 to .051].

It doesn’t look much different for the 5 studies (1497 individuals) that look at the GGC site: Right hand: r = .045 [-.006 to .095]; left hand r = .034 [-.017 to .085] and difference r= .019 [-.032 to .070].

The samples are from all over the world: UK, USA, Spain, Australia, China, Belgium, Slovakia, Tanzania. They look at men and women of different ages, as well as a couple of samples of male-to-female transsexuals.

It really doesn’t look convincing at all.

In the 4.6 section of the paper, Voracek considers where there may be alternative reasons that 2D:4D ratios would be indexing prenatal androgen exposure, even if the genes coding for receptors are not correlated with finger-length. The paths may be rather complex with unknown feedback-loops. (Could very well happen, but it is not clear how). But, as it stands now, I see no clear reason to believe that the ratio is related to prenatal androgen exposure.

Sure, there could still be interesting correlates between finger-ratios and behaviors related to masculinization, but this really weakens the explanatory power of this measure. I don’t think I would want to use it for explaining any traits in humans at this point.

Thank you Ruben, for satisfying my curiosity!

Voracek, Mark, (2014). No effects of androgen receptor gene CAG and GGC repeat polymorphism on digit ratio (2D:4D): a comprehensive meta-analysis and critical evaluation of research. Evolution and Human Behavior, 35, 430-437.

Breedlove, S. Marc, (2010). Minireview: Organizational hypothesis: Instances of the Fingerpost. Endocrinology, 151, 4116-4122.

February 18 edit: An artist friend made a comment that made me realize i need to explain 2D:4D a bit better. You can measure your index finger and your ring finger, and check which is the longest. An easy summary is to take the ratio of the lengths, so you divide length of index finger with the length of the ring finger. There are lots of sites about this, but I think this one was illustrative enough.

I checked my boys hands, they look like mine with longer index fingers. My daughter has a longer ring finger. I know it doesn’t have discriminant validity, but I still thought it was illustrative about how non-discriminant it is. They all seem like fairly normal boys and girls, without being extreme.

Richard Harper told me he has read something about it being a short window where there is a tug-of-war thing between estrogen and testosterone – I think that was brought up in the Voracek article towards the end, suggesting paths to go. But, now I’m thinking that the relative length of the digits have multiple causes, and thus are hopelessly confounded. The reverse-inference thing again.

Posted in Psychology | Leave a comment