Focus on collaboration, not individual fame.

Fame makes a man take things over
Fame lets him loose, hard to swallow
Fame puts you there where things are hollow (fame)
Fame, it’s not your brain, it’s just the flame
That burns your change to keep you insane (fame)

Written by Carlos Alomar, David Bowie, John Lennon • Copyright © Universal Music Publishing Group, BMG Rights Management US, LLC, Tintoretto Music

 

When I read Diederik Stapels autobiography, I was struck by his self-professed hunger for fame – to be one of those researchers fawned upon by grad-students and colleagues at conferences. To reach fame, he crafted intriguing ideas based on the literature, but the data misbehaved. So he crafted better data.

Now, there is a section in Perspectives on Psychological science called “Special Section on Scholarly Merit in Psychological Science”, all invited papers. The introductory summary paper, by Robert Sternberg is called “Am I Famous Yet?” Judging Scholarly Merit in Psychological Science: An Introduction”

Some of the papers are thoughtful. Roddy Roediger shows that fame is elusive and ephemeral – you are better off working on something that interests you. Simonton summarizes those aspects of eminence you may find, after eminence has already been reached, but laments that those indicators, sadly has low predictive validity, and implores “Please don’t ask that question”. Likewise, Feist advises scientists to disregard the hope for fame and instead balance their work between intrinsic and extrinsic rewards (enthusiasm for the research and possibility for reaping rewards). But others, like Ruscio champion indices like the h-index, as if pursuing science is akin to fantasy football, and that there are good metrics with which we for certain can discern the champions from, presumably, the chaff of the Ph.D’s.

All of these are aimed at the individual scientist, as if science is a lonely enterprise of the hopeful geniuses, but I will take the position that focusing on individual is mistaken. The question on assigning merit (and fame) is the wrong question, if we want to have a science that is worth believing in.  What is important is ideas and knowledge, which emerges collaboratively over time. How can we assure that we have a system that promotes knowledge over fame?

Social Proof and science

Who to bet on is, of course, not unique to science. Recording companies, book publishers, film studios, investors, risk capitalists and gamblers also yearn to find winners, and, as anyone involved in any of these businesses knows, it is a bit of a crap-shoot. Analyzing success in hind-sight can’t, much like Simonton suggests, provide one with easy reliable measures for predicting success beforehand.

Duncan Watts took on this question in his book “Everything is obvious once you know the answer”, but prior to writing his popularization, he and Salganik published a couple of experimental papers tackling the question of predictability in cultural market (Salganik & Watts, 2008, 2009 I will focus on the 2009 paper). The question was, why, if it seems so obvious in hindsight that a cultural product like Star Wars or Harry Potter would become break-out successes, was it so difficult for professionals to predict beforehand that these were good bets? After all, both were rejected by studios and publishers multiple times. In addition, they asked, if we re-run the world would they, once again become hits?

Salganik and Watts took advantage of the internet to pursue this question. They created a platform (this was pre-facebook and Spotify), loaded it with 48 songs from real but unknown bands, and invited 2930 participants. The participants could listen to as many songs as they liked, and as a thank you they could down-load whichever songs they wanted. All they had to do was to rate the songs they had listened to. The participants were sorted into several isolated “worlds”. In the control world, there were only the songs, with no information about which songs were popular. In the experimental worlds, participants were given real-time number of downloads.

The control world, could be considered a base-line, where the ratings of the songs indicated their appeal, as they prefer to call it. The down-load frequency of these songs was relatively flat. No clear hits.

The experimental worlds were different, though. As downloads accumulated, there emerged a clear top group of songs that kept becoming more popular. Intriguingly, the top songs differed in the different worlds. Popularity, or Social Proof (to use Cialdini’s terminology) was a clear factor, and furthermore, it was capricious. Songs rose to the top, because others had chosen them too, not because they had better appeal. But, appeal was not un-important. Songs that were rated low in the control world were never among the top hits. Social proof cannot overcome low-appeal, but once appeal is there, the crowd decides, and it will be different every time.

A scienctific enterprise that looks for stars rather than results is likely to behave similarly. In fact, the term Mathew effect was coined by Merton to specifically illustrate this, and, as the Salganik and Watts papers suggests, those who reap the rewards are arbitrary.

But, should science model itself on this type of businesses?

Rank and Yank

Diener talks with admiration about a system of rating for merit at the university of Illinois, which, he claims resulted in very little “deadwood”. This recalls the “rank and yank” method that was wildly popular among corporations in the 90’s but has since softened, if not been discredited. Employees are evaluated on individual performance and ranked. The top rated receive bonuses and rewards. The bottom performers are let go. There’s plenty of writings about the perils of this technique, but I rely, in this paper, on an article from the Economist (November 16, 2013). The logic was to introduce competition within the company, and thus spur performance, but it resulted in people being so concerned about their rank that it instead spurred secrecy and information poaching. It is considered one of the reasons for the downfall of Enron. One of the problems, as mentioned in the article, is that as you continue this culling, you start firing average worker, but lots of good work is done by the average workers. It also discourages cooperation, which is vital for a corporation to function. The competition should, properly, be between businesses, not within. To Diener’s immense credit his solution is not in discarding, but in implementing systems to help faculty develop.

Paula Stephan’s indictment

Even though the university system doesn’t explicitly use the rank-and-yank technique, its overproduction of PhD’s (in the US system – Sweden does not), and obsessive assignment of merit with the rewards falling to the top makes it a de-facto Rank and Yank system.

Paula Stephan, in her book “How Economics Shapes Science” (2012) provides a scathing indictment of the tournament business model for academic research. She summarizes her arguments in a 2012 nature commentary, and I cite the summary points:

  • Science is full of incentives that encourage bad financial choices, such as expanding labs and hiring too many temporary scientists.
  • These incentives hurt both individual scientists and society as a whole, which gets minimal return on its investment when someone is trained for a field with no career prospects.
  • The way forward is to fix incentives that are damaging the system, by considering their true social and personal cost

Her focus is on the bio-medical complex, which has additional problems, but her analysis can easily be applied to other academic fields as some of the incentives are general.

There is an incredible waste of talent, especially of doctoral students and post-docs, and much of this is because the incentives favors paper-production and citations when allocating resources, and resources are allocated to individuals (at best individual labs) in strong competition. The rewards are reaped by the universities, while workers and the public pay the price, as she writes in her final chapter:

“In one sense, U.S. universities behave like high-end shopping malls. They are in the business of building state-of-the art facilities and a reputation that attracts good students and faculty. They then turn around and “rent” the facilities to faculty in the form of indirect costs on grants and the buy-out of salary. Faculty, in turn, create research programs, staffing them with graduate students and postdocs, who contribute to the research enterprise by their labor and the fresh ideas that they bring, but who can also be easily downsized, if and when times get tough. Universities leverage these outcomes into reputation. The amount of funding universities receive, as well as the citations and prizes awarded to their faculty, determine their peer group—the club to which they belong. They also attract donations and students and affect the university’s ranking.”

 

Science as a process.

The gist of the symposium seems to be that it is of great importance to identify and credit meritorious – eminent – individuals, and that considerable time should be taken to perfect this system of credit, but is it really the eminent individuals that drive science forward?

My go-to author on philosophy of science isn’t Kuhn or Lakatos, but David Hull, and specifically his book “Science as a process” from 1988. His thesis is that science advances in an evolutionary manner. A wealth of ideas are produced, only some of these are selected and survive, and which ideas survive depend both on their scientific merit as well as a social process, the process that involves production of papers, citations, and engagement of groups of scientists. Ideas that are not interacted with will die, no matter how profound they are. Ideas that are interacted with by groups of scientists (demes, as he calls them – borrowed from evolutionary biology) will grow and change, and perhaps bring our knowledge closer to the truth. I provide two citations from the preface and first chapter

Preface:

“In the manuscript, Nelson (1973c) complained of the way that the views of Leon Croizat had been treated through the years by such authorities as G.G. Simpson and Ernst Mayr. I decided that the sort of thing Nelson was investigating with respect to Croizat was the sort of thing I would like to do in philosophy of science.  What is the relative importance in science of reason, argument, and evidence on the one hand, and power, prestige, and influence on the other? I thought that answers couched totally in terms of one sort of influence or the other were sure to be wrong and that the interplay between the two was likely to be fascinating.”

Page 3, Chapter 1

The system of cooperation and competition, secrecy and openness, rewards and punishments that has characterized science from its inception is both social and internal to science itself. The conceptual development of science would not have the characteristics it has without this social system. Quite obviously science is a social process, but it is also “social” in a more significant sense. The objectivity that matters so much in science is not primarily a characteristic of individual scientists but of scientific communities. Scientists rarely refute their own pet hypotheses, especially after they have appeared in print, but that is all right. Their fellow scientists will be happy to expose these hypotheses to severe testing. Science is so structured that scientists must, to further their own research, use the work of other scientists. The better they are at evaluating the work of others when it is relevant to their own research, the more successful they will be. The mechanism that has evolved in science that is responsible for its unbelievable success may not be all that “rational,” but it is effective, and it has the same effect that advocates of science as a totally rational enterprise prefer.

(Emphases mine).

This is very far from the focus on eminence and individual fame. Competition is a factor, sure. Some people thrive on competition, and it can be engaging to take on a theory or an experiment to expose its flaws (and merits). But, cooperation is vital. Producing original research is just about always team-work, involving co-authors, researchers, assistants, and in psychology, participants. And, for the ideas to survive, multiple labs need to engage with the ideas either as champions or as severe adversarial testers. In this churning and testing of ideas, we may come closer to understanding our world. (I’m reminded of Mercier & Sperber’s (2012) theory how reasoning is improved via argument)

If we only focus on who may become eminent, or who is eminent, we are losing some of the power of the scientific process. The eminent scientists would be nowhere without the collaborators and the adversaries that are willing to engage with the ideas, and science is littered with these sole ideas that went nowhere. We just don’t know about them, because like failed commercial products, they disappear.

I would also argue, without much evidence that in the focus on production and individual eminence, and protection of reputation, the argument portion – the other scientists “happy to expose these hypotheses to severe testing” has broken down. The tendency to overwhelmingly publishing only positive results in psychology, based on underpowered studies, with no clear avenue for publishing failures to confirm means that as scientists we are not grappling with the real field, and the social churning that Hull describes cannot take place (see Chris Chamber’s recent book).

On a more hopeful note, I’m reminded of the current focus on improving our methods and statistics in psychology. The complaints about business as usual are old. There are continuous reports of authors who complained about p-values, the use of NHST, etc, published 20, 40 or 50 years ago, that were never heeded. Some of these are, of course, eminent (Cohen, Tukey, Meehl), and many of us came across their complaints in graduate school, but then the business of publishing and surviving took over, we copied the social practices of those who had stayed in business (see Richerson & Boyd for a discussion on mechanisms of cultural transmission), and buried our concerns. What may be different this time is that, through interconnectivity via social media, we are no longer lone voices in the wilderness, but can build alternative demes of champions.

Considering that good scientific progress depends on a collaborative social process, it is misguided to focus on potential superstars and the accumulation of individual merit. Instead, look at how to better create a collaborative environment – at least within individual departments (or across virtual academies), where the diverse talents can come to their own right in joint efforts. Dieners suggestions are a start, but I think we need to go further.

The obsession with publications and citations.

Diener mentions that Sternberg has published 1500 papers. Of course this is impressive. Presumably each of these papers involved an action editor and a median of two peer peviewers, as well as an army of co-authors, research assistants and (since it is psychology) participants. Elsewhere he laments the low productivity (1.5 papers a year), and that most papers are never cited. Is this focus on productivity a viable avenue to proceed?

We are flooded in a tsunami of papers. There are more researchers, and higher pressure to publish, and it is now impossible to overview even one’s own field. With so many papers, cumulative science becomes near impossible. I recently rejected a paper because the introduction did not bring up crucial theoretical and empirical papers for their work (the crucial papers were older), and the method they used did not connect with the extensive development of that method thus they used it inappropriately. This is not the first time.

Sure, high productivity can be good fodder for that selection process outlined by Hull, but, to spin further on the evolutionary idea, there seems to roughly be two strategies for passing on genes to the next generation – the r and K selection strategies. In the r-selection strategy many offspring are produced (fish, Birches), little effort is invested and most of the offspring become food. In the K selection, such as humans, few offspring are produced but they are then carefully nurtured. Both are viable strategies, but tend to depend on whether the environment is stable or not (in evolutionary time). In fact, this is echoed in Feist’s “Prescription for a successful scientific career”, especially his figure of productive scientists as adaped from Cole and Cole. He also, importantly, points out that what, in the current system, is good for the individual may not be good for the field. (This tension between what is good for the individual and what is good for the group/field turns up frequently within those areas that take an evolutionary view of different developments, such as Evonomics, Clio-dynamics)

The problem of measurements

But current measurement system in science does not favor the slow, nurturing type of creating and developing ideas, which perhaps is at our peril. Ruscio made much of the objectivity of the h-index as a good selective mechanism for identifying the stars. First, this presupposes that the peer review system assures that papers are reasonably solid, and that citations can be used as a reasonable proxy for quality.

But, as others have also argued, citations seem to work much more like popularity in the Salganik and Watts (2008, 2009) papers, and citations fill multiple roles in a paper, where some are more central than others.

I recently undertook a project where I looked at all of the papers that cited Srull & Wyer(1979) for the first 5 years after publication (53 in total). I extracted all of the citations from the papers (when possible). For the vast majority of the papers, Srull & Wyer were only cited once. I give you a typical example.

However, because of recent theoretical and methodological developments in cognitive psychology, considerable effort is now being made to analyze these operations (Carlston, 1980, Ebbesen, 1980; Hastie & Carlston, 1980; Srull & Wyer, 1979; Wyer & Carlston, 1979; Burnstein, & Schul, 1982.)

Note that the single citations were appropriate! When we write papers, we do a lot of single citations to indicate where we get the ideas even if we are not directly building on them, and this is incredibly useful. But this is where the popularity comes in. We need something to cite for part of the argument building up our introduction, or clarifying our conclusions, and most likely we have a series of go-to papers to cite. This is now built into citation indices.

The h-index is considered good, because it can presumbably not be gamed, but one should more properly say, one hasn’t figured out how to game it yet. But, even that is not true. Dorothy Bishop uncovered an h-index boosting ring involving several authors and editors at separate journals, which she describes in a series of blog-posts.

This focus on rewarding frequent spawning is also one that opens up for poor scientific practices such as salami slicing, corner cutting, questionable research practices and p-hacking. It has been proposed by many, but I think the paper by Smaldino and McElreath (2016) where they model the consequences is illustrative.

A recent paper using net-work modeling that attempted to find when, in a career, a scientists most impactful work occurred, and the distribution is random!(Sinatra, Wang, Deville, & Song, 2016). The measure is still focused on the individual, as one factor is productivity, with the two other factors being a factor Q (which may indicate creativity), and a factor for luck.

As Ulrich Schimmack (among others) have pointed out, there is also no viable disincentive to publish weak and irreproducible work. Sure, the vast majority of papers go unread and uncited, as mentioned by Diener, but they now clog the publication record as so much algae soaking up oxygen. The only predators are the predatory journals, and they simply add to the problem.

I don’t want to take away from Sternbergs impressive productivity (surely he belongs among the eminent), but productivity varies and cannot be used as a proxy for quality. It is necessary to keep spaces open for the lower producers, and perhaps for those broader, cross-disciplinary collaborations with lower yield, but, in the end, perhaps with higher and more long lasting true impact.

Not everyone thrives on competition

My daughter recently quit her elite-team of team-gymnastics, a sport she has enthusiastically, almost obsessively pursued for 10 years. As she approached the top, the stress took all of the fun out of the enterprise, and after the last competition – where her team did well – she felt the effort was no longer worth it.  In many ways, she’s her mother’s daughter.

Competition can be engaging, and for some individuals spur them onto performance, or possibly reaching for the enhancers, but for some it can become demoralizing. Thus, it isn’t clear that competition is the sole way to go in order to maximize performance.

Some years back, I watched an intriguing colloquium by Uri Gneezy (the research is discussed in this paper Gneezy, Leonard & Gist, 2009). The colloquium can be found on Itunes U in the “Center for Behavioral Evolution and Culture” series from UCLA) where they investigated gender-differences in competitiveness. They were particularly interested in teasing apart possible cultural influences, and thus they took the pains to locate a patriarchal tribe (Maasai), and a matrilineal tribe (Khasi). The task was a simple ball-throwing task with the goal of getting as many balls as possible into a bucket. The task was specifically selected because there were no gender differences in ability. Participants could select whether they wanted to do it competitively (the winner took home all the rewards), or piece-meal (you got paid according to how many balls you got in). In the Maasai group, many more men chose to compete, and as Gneezy mentions in the task, you would get a similar result if you tested UCLA students. In the matrilocal group, the pattern was reversed. As he mentioned, there were not enough reliable data to discern who performed better, but the focus was on choosing to compete. He also mentioned research by Dreber and Hoffman that have investigated gender and competitiveness in children across several cultures, and there appears to be clear cultural differences, that seems to co-vary with how egalitarian the culture is.

Now, if we posit that in a group of western men you will find more individuals that elect to compete than in a group of western women, setting up an enterprise so that it rewards those that are more willing to compete (fair or foul) may very well stack the deck against women, even when the sheer intellectual ability and creativity are the same. As Alice Eagly lamented, where are the women among the eminent (and surely there are some). Perhaps it is because they don’t respond to the same incentives. We do know of instances where men have reached glory based in part on work of women (the Watson and Crick story tends to be top of mind). Surely there are more women (and non-competitive men) on whose shoulders those eminent have clambered up upon, without necessarily giving credit where credit is due, distorting our perception.

Via Negativa

So, how are we going to proceed?  In the book “Antifragile” Nassim Taleb lays out a strategy for betting in an uncertain world – optionality. We know that graduate schools attract just about only the best minds for doing science. These are people who want to do scientific work, and are competent enough to be admitted. As Simonton points out, we don’t have reliable indicators on who will end up pursuing the runaway success that will advance a scientific field, and as Taleb points out, there isn’t a way to know. Instead, once the “low appeal” have been sorted away, do an even bet on all. That is, he says, how at least some venture capitalists work when funding start-ups, and is part of his own strategy when investing. This is a non-linear enterprise, and even one wild success can pay for all the non-producing bets.

He calls this via-negativa. You can quickly sort away those that seem unpromising (the non-appealing songs), but then place your bets evenly on the appealing ones.

Perhaps better still, following Hull, don’t bet on individuals. Create collaborative groups working on problems.

Science is high risk and low yield

Like so many others, I have a file-drawer of ideas that didn’t pan out. We are at the edge of knowledge, and most of our attempts are, most likely, carving out what doesn’t work. Putting a productivity demand is, like I have pointed out, likely to distort rather than enhance.

Most of the research work at Universities is tax-payer based. Mariana Mazzucato is currently advancing the idea that the state here functions like a risk capitalist – an entrepreneur (Her book the Entrepreneurial state is a must read, but her Ted talk gives a reasonable, short overview). The internet and GPS (among others) are the results of basic projects financed by tax-payer money (frequently via the military). Like the risk-capitalists I mentioned in the Via Negativa they fund risky ideas where only a few will yield dividends, and most likely far in the future, and the dividends may also be reaped by private entrepreneurs, such as Apple. The pay-off for the tax payers, she proposes, should be in the taxes the eventual successful projects may reap.

But, in the mean-time, scientists need to be allowed to pursue risky projects, with a high likelihood of failure. Treating scientists like factory workers is unlikely to be a good strategy. Scientists shouldn’t be punished for failing to find anything interesting, but perhaps for producing sloppy unreliable work.

The myth of the eminent scientist

I have had students who wistfully said that they could never become as great as a Newton or a Darwin or a Skinner (to pick the one eminent psychologist that Roediger thought most of us knew something about). I pointed out that not even Newton or Darwin or Skinner were the same as their mythological figures. Don’t let the myths stop you from pursuing your dreams.

Hull brings up how we use mythologized older scientists (who may be dead, and thus can’t protest their image) as a rhetorical devise to lend credence to one’s own ideas, which is one reason why we may want to mythologize some that came before.

Perhaps also our human tendency to look for individuals with prestige in order to learn from them has something to do with it, but I think this remains an interesting psychological conundrum to test. (We also have a number of mythologized experiments, as we have discovered, such as Milgram, and the story behind Kitty Genovese, which evidently the journalist responsible for the initial article polished to fit a story he wanted to tell more than the messy facts).

It may also be a way for us to cognitively sort and tag ideas. If I mention Ekman and Russell to fellow emotion researchers they know that they are the proponents of the categorical vs the dimensional theories of emotion, even though they are not the sole researchers pursuing and testing these theories. Perhaps it would be better to move towards theory names rather than researcher names.

A great antidote for the myth of the great man (because they are mostly men) is to read some good historians of science (My favorite is Thony Christie @rmathematicus) where they carefully excavate the real scientist from the bronzed mythology.

Matthew Francis recently wrote a scathing blog post against the Nobel Prize. It was anticipated that the 2016 Nobel in physics would go for work on LIGO (the Laser Interferometer Gravitational-wave Observatory). (In an upset, the Nobel did instead go to work on topological phase transitions). His objection was not that a Nobel wasn’t deserved. He thought the achievement was fantastic. His objection was that the Nobel continues to manifest the mistaken notion that science is advanced by eminent individuals rather than communities of researchers, and this false emphasis on eminence. As he says

“The Nobel Prize is simply … a reminder that despite our advances, we still promote the idea that Science is done by the Lone White Male Genius, maybe with an adoring female assistant standing by to do the thing.”

Let’s move away from Science as an enterprise for fame, from the cultural markets model. Focusing on merit will not fix science, it will not fix the woman problem, and not fix the non-white-male problem. Move more towards the collaborative, cumulative work that seems to define our species (Henrich). As one eminent scientist allegedly said (Newton) he stood on the shoulder of giants. Well, perhaps the shoulders are not of giants, but of all the humans that came before and all the humans that collaborated and to crown a single king of the mountain is to distort what it actually takes to move science forward.

 

Coda

Let’s give Stapel the fame he craved. He wanted it, he faked it, and came clean. He should never work in science again, but let’s enshrine him in the history of psychology, to remind us about the danger of seeking to reward eminence, over the hard collaborative work that really advances science.

Partial references (because it is a blog…)

Ranked and Yanked. The Economisthttp://www.economist.com/news/business/21589866-firms-keep-grading-their-staff-ruthlessly-may-not-get-best-them-ranked-and-yanked November 16, 2013. Pulled December 1, 2016.

http://deevybee.blogspot.se/2015/02/editors-behaving-badly.html Pulled December 1, 2016.

https://galileospendulum.org/2016/10/03/dethroning-the-nobel-prize/ October 3, pulled December 1.

Salganik, Matthew I., & Watts, Duncan J. (2009) Web-Based experiments for the study of collective social dynamics in cultural markets. Topics in Cognitive Sciences 1, 439-468. DOI: 10.1111/j.1756-8765.2009.01030.x

Salganik, Matthew J. & Watts, Duncan J. (2008) Leading the herd astray: An experimental study of self-fulfilling prophecies in an artificial cultural market. Social psychology Quarterly, 71, 338-355.

Stephan, Paula (2012). Perverse Incentives. Nature, 484, 29-31.

Stephan, Paula (2012). How Economics Shapes Science. Harvard University Press.

Posted in Uncategorized | Leave a comment

Where the non-famous are

Well, I, and several others followed up on that Perspectives on psychological science symposium I critiqued in my last blog post.

I got a revise and resubmit, with a very long action letter where I, am others, were asked to show all sorts of things about eminence (are they not better than non-eminent?) or working alone (show that group work is better than solo work).

Well, a bit hard in 1500 words (which is where I edited down my first 5000 word manuscript), but whatever.

Turns out, I wasn’t alone, and actually not alone with what I thought. So, six of us uploaded pre-prints of our papers and shared them with the world here:

Perspectives You Won’t Read in Perspectives: Thoughts on Gender, Power, & Eminence

As we say, they are in the original submission format. If you are interested in one of the reviews for mine, Bobbie Spellman put hers up

I was a reviewer

I originally wrote around 5000 words as a responses, and had to do quite a bit of slashing – perhaps not always as well thought out as one would want (hence, the reviews could be quite helpful). But, before I did the slashing, I saved the long copy, thinking I might post it on the blog once this response symposium was published (whether or not my contribution was there).

I like it better – I have more time to flesh out my ideas (which, true to me, are branching rather than deep diving), although it certainly would also benefit from good feedback.

So, now I will put it up here on the blog (next page). Then we’ll see where we go from here.

 

 

 

Posted in Uncategorized | Leave a comment

Bully for you, Chilly for me: Scientific fame

 

 

Perspectives on Psychological Science published an invited symposium on eminence in psychology, which starts with Robert Sternberg’s Introductory article called “Am I famous yet? Judging Scholarly Merit in Psychological Science: An Introduction ”*

As Bobbie Spellman pointed out on facebook – Only One Woman. Guess what topic?

Sure, judging scholarly merit is an interesting question (Meehl discussed it in his recorded last lecture series – along with its problems), and inquiring into why some individuals are considered eminent in a field, and others not is certainly a legitimate area of research both in psychology and sociology (not to speak of history).

But, the question – and the answers – seem ill posed. Science is about ideas. It is about advancing knowledge. It is created by people, but most likely not by individuals, and they seem to be looking for a way of discovering the feature of individuals that can predict eminence, rather than looking at systems for advancing ideas.

I’m reminded of Duncan Watt’s book “everything is obvious once you know the answer”. Once something has reached fame – be it Mona Lisa, Harry Potter, Star Wars, or your choice of famous scientist. What he claims is that beforehand, there was nothing in particular that suggested that this piece of art or that particular scientist was especially note-worthy. (Mona Lisa hung around for a long time until someone stole it. Harry Potter and Star Wars were rejected multiple times). But, once something or someone is famous, we tend to – in hind sight – attribute a lot of special features that we think are of obvious merit. Now, if these features are so obvious, how come it took so long to discover them, and why hasn’t anybody been able to come up with nice sure fire metrics for identifying the wheat among all the chaff?

I teach two papers that he wrote with Salganik where he investigated what might be the forces that create eminence (hint – people). I’ve blogged in more detail about it here (and here), but I’ll do a short summary.

Their product of choice was the pop-song. In fact 48 pop-songs by unknown bands. Participants were a thousands of people that were contacted via the internet (this was prior to facebook). They were invited to listen to and rate the songs, and as thanks they could download one of them for free. Only catch was that they would have to listen to them. (They didn’t have to listen to all 48). The participants were divided into “worlds”. In one control world they got no information about what other people thought of the songs. But, in the experimental worlds, participants got access to ratings and to popularity (number of times down-loaded). The ratings in the control world could be seen as a base-line on appeal (how “good” the songs were when you are not influenced by what others think). In the worlds where participants got information about ratings and downloads there were top songs that emerged. Interestingly, they were different in all of the worlds (in the second paper they had 8). Clearly people use other peoples endorsements when they decide which song to listen to and rate. The only thing that they noted, quality wise, is that those rated low in the independent world never made it to the top. To put it bluntly – we know crap, but we don’t know quality.

Might it be the same in science?

And, as always, I’m reminded of Hull’s “Science as a process”. As he states in the introduction, he was particularly interested in the interaction of the social and the knowledge-seeking aspects of research. His thesis is an explicitly evolutionary account of scientific progress. Individual scientists may have good ideas but if nobody is engaging with them – either collaboratively or adversarially – they won’t be passed on. Now, that lots of people engage with an idea is, of course, not a guarantee that in the end it is right. As long as it is engaged with (and allowed to morph) there will be an advance. For example, we like to, in hindsight, sneer at the idea of Phlogiston. But as rmathematicus lays out in this blog post, it was a highly fertile idea that likely paved the way for the discovery of the role of oxygen in combustion.

Do any of you know the names of the Phlogiston theorists? The names of the early oxygen proponents may be more known (Kuhn brings them up), but I can only recall Scheele, because he’s Swedish. The idea lives on.

Sure, when one over produce scientists, there may be a need for understanding better how to staff ones university, and making bets that may be a bit better than a coin toss. But, like grooming the next boy-band, that is extreme decision making under uncertainty.

But, eminence and fame shouldn’t be what to look for in science. It is ideas. And the aim might instead to be how to collect good teams that can tackle interesting questions. No eminence needed.

*I must confess, besides reading all of Sternbergs article, and skimming through Eagly’s I have only read the abstracts. Some of them are somewhat thoughtful, but still, to my mind, misguided.

 

Posted in Uncategorized | 2 Comments

Reflection on Laird and Facial Feedback.

“If the quality of emotional experience is derived from expressive behavior, then, if people were induced to express an emotion, it would be expected that they would subsequently report feeling that emotion” Laird (1974).

 

When I did my little review of studies that Laird claimed showed evidence that facial feedback results in an emotional experience (and experience is key), I didn’t pay much attention to theory. I focus on method and results. It is not because theory is uninteresting. It is more that the theoretical background may not be that well developed. It was used to justify the experiments, but I think the methods and results are more interesting at this point (in order to better chisel out new versions of theories. Theories may well be wrong, they are subject to fashion – such as the old computer model of psychology which I noted when I went through the Srull & Wyer trace.)

But, I looked a little bit at what motivated Laird in the first place – because it does come through in the additional studies.

As the title of his paper suggests, he thinks of the emotional experience as kind of an outcome of self-attribution. Several parts go into this self-attribution as data (in his words), and these can strengthen and discount the different components.

This is grounded in the Schachter and Singer (1962) experiment, as well as the James-Lange notion that experienced emotion is the result of perception of bodily changes.

Physiological arousal serves as data about the intensity of the experience. Expectations and context can then shape this experience in different ways. One could take it as a signal of an intensity of emotional experience, but, if one has another explanation for why one feels aroused (epinephrine, coffee, high bridges, quick run) the emotion-attribution can be discounted. (I’m not afraid, I just had too much coffee).

The ambiguity or non-ambiguity of the situation also matters. If, for some unfathomable reason I found myself riding Himmels-skippet at Tivoli, I would correctly attribute my feelings to my very reasonable fear of heights, and not to the lovely double-shot latte I would have downed shortly before, whereas if I sit in my first-floor office, and feel the same thing, I might wonder if had had one too many coffees.

But, he asks, more information than just arousal and situation could contribute to this self-attribution of emotion. If my eyebrows are raised, my mouth agape, my shoulders hunched, the feedback from my body would contribute to my emotional experience – which is what he is interested in.

So he is actually interested in whether people experience the emotion they are physically expressing.

Which, of course, means that you have to disguise this interest, considering that participants usually want to be helpful, and answer as suggested.

So much of the experiment happens “in plain sight” so to speak. The faces are posed, their emotions are assessed (frequently), but it is all very cleverly disguised by giving the cover-story (interest in perception when tensing-relaxing muscles. Need to measure what you feel, because the emotional state might influence the results), and removing those participants that guessed that the position of their face might have something to do with the emotional state (though the details of the debriefing aren’t stated – just that if at any point participants indicated a connection between the two, they were considered aware, and then excluded).

I can see why the Strack paradigm became so appreciated. The set-up is very elaborate, the cover story more complex than the cover story for Strack, and, even with all the care, all people that were aware of the connection may not have been removed.

Moving quickly into “use”

One thing that struck me is how quickly the paradigm was moved into use, rather than testing for boundary conditions and replicability. (Of course, I have a very small sample, and only the one that Laird provided as evidence for facial feedback, but a very quick check on citations suggest that there may not have been many more that posed facial expressions at that time – this is something to probe further in the future).

In paper two they establish an individual difference between those that are susceptible to the facial feedback and those that seem to be more susceptible to environmental cues (and, somewhat interestingly, the data on how facial feedback results in emotional experience is actually so weak in this paper that this really should be counted as a non-replication than a replication).

This division into those that are susceptible to feedback and those that are not are then used in several subsequent papers. The result from the manipulation is often not reported directly (that is, their emotion rating). The method for dividing them up is also somewhat convoluted. Participants have their faces posed, they view something of the opposite emotional meaning, they rate their emotional state. Then, they combine the ratings to get a final number that indicate whether they were more influenced by the facial position, or the emotion signal from what they viewed.

The upshot is that even if the posing method is the same for just about all of the experiments (bar one), the DV that is reported vary quite a bit, to the point that one has to be really careful interpreting the p-curve I included, because the DV’s vary so much. Can you really compare number of correct recall of emotion-congruent items with the ratings of emotions on a check-list?

The p-curve I included in the blog-post was also the most generous p-curve (I included the DV with the highest F or t from each experiment). I played around with it a bit and did a p-curve where I only included ratings on aggression/negativity whenever it was feasible. (I kept 2 where there is no valence indicated). The curve looks flatter, does not suggest evidentiary value, but still doesn’t suggest p-hacking. (P-curve aggression/negativity)

 

p-curve-8-item-aggression-mainly

 

 

Emotion congruence – example of moving quickly to use

I wanted to highlight the paper on emotion-congruent recall (Laird, Wagener, Halal & Szegda, 1982). It was published the year after Bower’s Mood and Memory, which I think is considered the Ur mood and memory paper (it is the one that should be cited). They use the “individual difference” technique, and end up with a very small sample in the self-produced (that is, emotional) groups: 9 and 10. They do use repeated measures, because, as they state, there is too much heterogeneity between people when it comes to memory to safely use a between subjects design. But, the samples are very small for an induction that is rather weak.

I spent my graduate years working on emotion-cognition questions (mostly about congruent processing and emotional response categorization), and getting data on emotion congruence is – difficult. It requires quite a bit of power. We routinely started out with 30 participants in each condition, and used repeated measures designs (although we kept the emotion condition between participants). We used movies and/or music to induce emotion. Participants spent 12 minutes getting induced before doing their task, and, if at all possible, we kept the emotional music going during the task.

The induction was very effective. We always did a manipulation check, using something very similar to the MSCL – the BMIS. 16 adjectives that were rated on a 4 point scale. That the induction worked was never evident by simply looking at each participants BMIS ratings. It only became clear when we aggregated across adjectives, but then it was evident. In fact, I collected a lot of the emotion ratings that I still have for a paper on how we use film in psychological research to induce emotions.

But, a lot of the data I collected were from research that we never published, because on the subsequent task we either got nothing, or got results we couldn’t interpret. So, when I see the results for emotion-congruent recall for a lot fewer participants with a much weaker induction, it gives me pause. It is not that I don’t think they got what they got. I’m just not sure what it should be attributed to, especially since we don’t have their ratings of emotionality.

Reporting of information

This leads me into a brief comment on the reporting of information in old studies. It is very hit and miss. I saw no standard deviations anywhere. Lots of intermediate results missing (what were the aggregated emotion ratings after having posed the face – even though that wasn’t the main part that was interesting). It is also not very standardized (which, perhaps, wasn’t to be expected). For several of the papers, there was no information that I could use.

Repeated measures

A plus is that they do use repeated measures for the posing in many instances. Each participant is exposed to all of the conditions, and sometimes they are quite a few.

The long durations of holding the expression

But, perhaps one would question some of the long durations for holding ones face still. The original Laird paradigm asks participants to hold their face in a pose for 15 second. That’s a while. Now, of course, I think many of us have experienced those situations where nothing could wipe the expression off our faces (I particularly recall youthful encounters with enchanting individuals, or times of skeptically pulling ones eyebrows together for a longish time), but generally expressions are quick, dynamic and fleeting. Fifteen seconds is still somewhat reasonable. Seven minutes (as in the Rhodewalt & Comer Paper), while writing a counter-attitudinal essay…. In fact, in a couple of papers, they employed an experimenter that looked at the participant to make sure they kept their face in the pose for the duration (which was a lot longer than 15 seconds).

The high-level constructs

The initial theoretical background is, in many ways, rather high-level. There is input from the face, from the viscera and from the environment (the pictures), which then may give rise to experienced emotion. But, I don’t see anything there moving closer to anything more physiological. It is all posing followed by adjective ratings, followed by classical social-psychological tasks. But, in at least one (one of the works comparing normal weight and underweight) the posing of the face and subsequent experience of emotion (or not) is taken as a measure of proprioception. And, perhaps it is, but there seems to be a number of steps more that one would need to check. Now, I know from my other readings that there have been experiments where they have posed the face and then measured physiological reactions, so it may be that it is just in this small sample that this is not well covered.

Individual differences.

Finally, the individual differences (which I mentioned above). The handful of papers is dominated by this individual difference between those that use facial feedback for emotion attribution, and those that use the situation. But in the small handful of papers on facial feedback published later that I have looked at, this is not mentioned. Has it been forgotten? Is it not interesting? Has this possible individual difference in sensitivity to bodily feedback been pursued elsewhere?

Posted in Uncategorized | Leave a comment

Posing the Face – an overview of early Laird research

Let me start with this link to a lovely blog by Lynneguist on the meaning of Frowns.

http://separatedbyacommonlanguage.blogspot.se/2016/09/frowns.html

Because, evidently it varies! I always considered it meaning that you pull your eye-brows down in a somewhat angry expression – frowning on something that you disapprove of. But, clearly (and I had come across this) there are those that considers the frown a sad face.

Now, the term “frown” is used in the scientific literature on emotional expressions –as you will see below. And, I’ll tell you up front that in most cases it was used as synonymous for the angry face, but in one case it was used for the sad face. Let that be a warning about using folk-psychological terms, because they may indicate very different things.

Nevertheless, I won’t heed my own advice in the work below, but I will let you know if the frown means a sad face.

 

 

In all the brouhaha that the non-replication of Strack brought, someone linked in a relatively early paper by Laird, where he responded to another non-replication of facial feedback from surreptitiously posing faces..

The paper is “The real role of facial response in the experience of emotion: a reply to Tourangeau and Ellsworth, and Others. Published in JPSP 1984. On the first page, he lists a simple nose-count of papers that have replicated the face-posing effects. As we should know, at least since Meehls asterisk paper, simple nose-counts is not good enough evidence that an effect exists, considering that we now understand there are as many un-interesting ways to get significant results as there are uninteresting ways of failing to get a result, and only a file-drawer separates the two.

So, I figured as a warmup for a longer review I should find those nose-counted papers and look at what they say (the were not that many).

It starts with his 1974 paper “Self attribution of emotion: The effects of expressive behavior on the quality of emotional experience.  The paper wass, in part, based on a his doctoral dissertation. Is this important? I don’t know. People talk so much about expertise as being a factor. It was early career work, at least research wise.

His theoretical background (which I’m less interested in – I want to look at what was measured, how it was measured, and the results thereof – theories develop one would hope), is grounded in Bem’s self-perception theory , and Schacters work on arousal and external cues. He thinks that changes in physical arousal, and changes in patterns of bodily expression are both parts that will change self-attribution of emotion. If one knows that there may be an external reason for arousal, one can then discount this effect.

Experiment 1

Lets start with experiment 1. Sixtyfive undergraduate males participated. Not all (as we will see) were in the experimental group, and even among those, some were excluded.

The experiment is quite elaborate: There is a cover story: Participants were told that it was about “the activity of facial muscles under various conditions”.  This was backed up by the presence of scientific looking apparatus, and by placing electrodes between the eye-brows and to the corner of their jaws. The electrodes seem to have had a function – but not as electrodes. That was a complete sham. Instead they were used to direct the participants to pose their faces so they appeared like facial expressions of emotion without letting on that that was being done. Here are the quotes from what they were told:

For the “angry” Position:

[Touching lightly the electrodes between the eyebrows] Now I’d like you to contract these muscles. [If this was unsuccessful, then ] Contract them by drawing them together and down [and if this was unsuccessful, then ] Pull your brows down and together. [Whenever the experimenter was satisfied, he said ] Good, now hold it like that. [Now touching lightly the electrodes at the corners of the jaw] Now contract these [if this was unsuccessful, then ] Contract them by clenching your teeth.

 

For the “Happy” Expression:

[Touching lightly the electrodes near the corners of the mouth] Now I’d like you to contract these muscles under here [If this was unsuccessful, then ] Contract them by drawing the corners of your mouth back and up [When satisfied, the experimenter said] Good, now hold it like that.

In the discussion he actually notes the mean number of steps in instruction to get the expressions right: 2,80 for the smiles, and 2.63 for the angry expression (but they did not differ).

But, to slightly move back. While the electrodes were placed on the face, the experimenter explained that there could be some subtle emotional changes, so after each trial, the participant would rate their emotional experience so that could be controlled for.

So, what we have here is – placing fake electrodes, explaining that emotional experience could be a confound (to justify measuring their emotional state), stating that the experiment involved tensing and relaxing facial muscles, and instruction on how to do that tensing.

We are ready for the experiment.

Once the face was positioned, participants were shown a picture for 15 seconds, before filling in the mood-adjective questionnaire. There were 4 pictures total – two of Ku Klux Klan members, and two of playing children. The participants saw all 4 pictures. A KKK and Kid picture while “smiling” and the other KKK and kid picture while looking angry.

The mood checklist

The mood adjective list was adapted from the Nowlis-Green Mood Adjective Check List (Nowlis 1968). It contained 40 mood words, and these were related to factors indicating Aggression, Anxiety, Remorse, Elation, Social Affection and Surgency. (Interesting to look at the names of the factors actually). The interesting set of adjectives would be those related to Aggression, Elation and Surgency (which is reasonable). Each adjective was rated on a 5 point scale ranging from “did not feel” to “Strongly felt”. Then, to get an index, the ratings for all adjectives that would indicate Aggression was averaged. Fairly standard procedure (it was what I used with the BMIS when we measured emotion).

Control

He performed an interesting control for experimenter bias also. As much as possible, participants were run in pairs. One participant got his (they were all dudes) face manipulated, whereas the other one didn’t. The two subjects were separated by a screen so they could not see each other, but the experimenter could see both of them. The idea here is that if the researcher would inadvertently indicate what was intended, both participants would show this particular bias, but as only one received the manipulation, the bias could possibly be detected by looking at how similar the mood scores were between the two participants.

This pairing didn’t work perfectly. There were only 20 instances where both showed up, and 25 where the subject was alone. In total then, there were 45 manipulated participants, and 20 controls.

Seven of the manipulated participants seemed to be aware of a connection between the facial manipulation and their mood, and were then excluded from the analysis.

To re: The point of interest is – whether facial feedback result in an emotion signal, even if you don’t realize that your face is posed into an expression, and that the supposed control questionnaire is the actual dependent measure.

The emotional content of the pictures seems to have not been of a main interest here, but it is analyzed, and, not surprisingly, all people rated themselves as more aggressive after viewing the KKK pictures, and more elated after viewing the kid pictures.

This is how I translated the table of the results into graphs for the manipulated participants.

graph1

Laird posts the F-values, so I actually took those and the degrees of freedom and stuck them into Schimmacks nifty R-index sheet so I could get some p-values.

Study 1, Experimental                       Aggression N F df1 df2 p
Expression main effect 38 8,18 1 37 0,007
Expression x Picture interaction 38 4,18 1 37 0,048
Elation
Expression main effect 38 7,21 1 37 0,011
Expression x picture interacton 38 4,5 1 37 0,041
Surgency
Expression main effect 38 5,91 1 37 0,020
Control
Agression
Aggression x picture interaction 20 3,26 1 19 0,087
Elation
Expression main effect 20 1,66 1 19 0,213
Expression x picture interaction 20 1,54 1 19 0,230

Note that in the control condition (where the participants didn’t screw up their faces), there were no effects expected. Laird notes down some of the F-values (that are not less than 1), so I stuck them in here just for completeness.

He also goes into doing a manipulation check between the experimental and observer participants (there ends up being only 16 pairs), and find that they do differ as expected but it is quite weak, but I won’t discuss it here. In fact, I would recommend people read his own discussion, because it is quite detailed and thoughtful.

Some commentary

He uses a complete within-subjects design, with an interesting control. He measures their emotions quite openly, but most of them think that this is not of interest. They have to hold their facial expressions for quite a long time. He actually asked if it was distracting or uncomfortable, and some did. Three for all, six for anger and four for smile. Most of them didn’t

Experiment 2

In experiment 2 he addresses what will happen when the situational cue is ambiguous. (the pictures in experiment 1 weren’t ambiguous). To do so, he uses cartoons. He argues that the participants will attribute the source of their emotion to the cartoons. The selected cartoons had received a moderate humor rating.

The setup was similar to that in experiment 1, but a few differences. It was (again) a within-subjects design, but it appears there were only two repeats – one in the happy condition, one in the angry condition. The main measure was the ratings of the cartoon (but that was, as in the earlier experiment, tossed off as a control measure rather than the main measure), this time on a 9 point scale going not at all funny, to funniest ever. The mood checklist had been shortened to just 6 items, 3 from aggression and 3 from elation. The same post experimental questionnaire was used. No observer subjects this time.

32 undergraduates this time (no mention of gender). Six were excluded because they guessed the hypothesis.

And here are the results, copied from the paper, and with cohen’s d added (using Daniel Lakens nifty effect-size spread sheet).

Angry Happy t p d
Humor rating 4,42 5,5 2,8 0,01 0,55
Elation 4,11 4,42 < 1.0
Aggression 2,81 1,88 2,46 0,021 0,48
N = 26

Laird & Crosby (1974) individual differences in the self-attribution of emotion

Laird & Crosby’s work was a chapter in the book “Thought and Feeling: Cognitive Alteration of feeling states.”

I’ll focus on the results of the face-manipulation only.

They started out with 32 undergraduates, but removed 6 because they were aware of the hypothesis.

The cover story and face-manipulation was the same as in Laird 1974. The stimuli were cartoons, and for the emotion measure they used 3 adjectives for Elation (carefree, elated, pleased) and three for Aggression (Angry, annoyed, Defiant), rated on 5 point scales. The scores for each factor was summed. Then the aggression factor was subtracted from the elation factor, resulting in a single score for emotional experience.

The participants went through the procedure on two separate sessions, with 2-3 days delay. In each session they were asked to do both poses, while presented with a cartoon.

Smile Frown t p p one tailed d
Day 1 2,23 2,38 ns ns
Day 3 3,19 1,04 1,78 0,087 0,044 0,35
N = 26

 

The first day the manipulation did not make any difference in the ratings of emotional state. And, in some ways, it didn’t happen the second day either, as the test is 1 tailed. They proceed to divide people up into those that rated their emotions as negative both days, positive both days and those that switched, in order to investigate individual differences. It is interesting, but less interesting for a review on whether there is good evidence that posing the face in emotional expressions gives rise to emotional feelings. But, it turns out that some of the subsequent papers use the results from this part to divide particpants up in internal-cue sensitive and external-cue sensitive.

Paper 3.

The Duncan & Laird (1977) paper is very much more elaborate, but I think one can simply look at the face-manipulation part. The title of the paper is “cross-modality consistencies in individual differences in self-attribution”, and involves a very complex set-up where people are first tested on their attitudes, then about a month later are asked to do a counter-attitudinal video, which has some snags in it so, oh, by the way, could you help with this other work on perception while tensing and relaxing facial muscles.

As in the Laird & Crosby paper above, I’ll only focus on the results of posing the face.

They started out with 40 undergraduates (men and women, but, as in paper 2, they found no gender difference). In the end, they removed 14 subjects, because they were aware of aspects in the two different paradigms.

The set up for posing the face is the same as above. But, rather than pictures or cartoons, they are told the experimenters are interested in the reversing perspective of the Necker cube. They also added a neutral condition, in order to make a clearer base-line comparison.

All participants did two smile and two frown trials (and, presumably also a neutral trial), properly counterbalanced. After each trial, they filled in a mood adjective list, as always. This time it consisted of 26 descriptive adjectives from that same Nowlis-Green Mood Adjective Check list, again rating them on a 5 point scale (0-4). They used 6 items from aggression, 5 from Surgency and 4 from Elation, and some fillers. They summed the scores within each factor, and averaged them across the two trials of each type.

Frown Neutral Smile t frown vs neutral p frown vs neutral t smile vs neutral p smile vs. Neutral
Elation 1,9 3 4,4 3,49 0,002 2,83 0,008
surgency 3,7 4,9 6,2 2,1 0,022 2,14 0,041
Aggression 6,3 2,4 2,3 2,43 0,011 0,21 0,835
N=31

 

Paper 4

The next study, Laird Wagener, Halal and Szegda “remembering what you feel: The effects of emotion on memory (JPSP 1982) also uses the same face posing work, but using it in a p-curve is – a stretch. I will, but using emotion-congruent recall as a measure of facial feedback is several processing steps away.

As the title says, they were interested in emotion-congruent recall. Half of them started out reading a couple of Woody Allen anecdotes (positive stories), the other half a couple of editorials (anger inducing stories).

Then, as in the earlier studies, participants had their faces posed in frowns and smiles, there is a casual mention that felt emotion may bias the results so could they fill in this questionnaire after each pose. They actually even have a pre-measure of emotion before they start the facial poses.

The perceptual stimuli this time are four abstract paintings that have received titles with an emotion connotation: For happy “spring” and “dancing”. For angry “rip-off” and “betrayal”. And, the little twist here is that they were shown the angry-titled pictures while their faces were screwed up in smiles, and the happy titled pictures when they were frowning.

They do a rather elaborate summing of the mood scores (which I don’t want to go into). What they want to do is sort the participants into two separate groups – the self-produced group (the facial expression seems to dominate in the mood measure) and the situation cue group (those that take their mood cue from the pictures rather than from their faces). This results in 19 people that seem to take their cues from facial feedback, and thirty-two in the situational cue.

There is no report on the results from this section. The outcome is simply used as a separator for individual differences.

Instead, they proceed to the next stage, where people again get to pose their faces, and then they recall as much as they can for each story (written response). In the self-produced group, nine of them recalled the Woody Allen anecdotes, one while smiling, one while frowning. Ten recalled the editorials, one while smiling, one while frowning. The cell-numbers for the situational cue group was 17 and 15 respectively.

Their dependent variables where number of correctly recalled facts, and number of errors (assessed by two independent judges). Everybody recalled more from the editorials, but that was, in part, because they had more statements to recall. Thus, that is not terribly interesting.

What they were more interested in was whether there was evidence for more emotion-congruent recall for the self-producing participants when comparing them to the situational-cue group. (This was a planned comparison).

So they do a planned comparison on number of facts recalled of expression x passage x individual differences and it reaches significance: F (1,47) = 4,31 p = .043 (per p-checker). They do the same for number of errors and the result here is F(1,47) = 18,76, p < .000 (Is that one weirdly high).

For the situational cue group, there is no interaction between the posed expression and either correct recall or errors.

 

Self-produced cue group
Woody allen (n = 9) Editorials (n = 10) F
smile frown smile frown Passage x express p value
correct recall 3,3 2,2 6,7 8,3 9,98 0,0065
errors 0,6 1,2 2,1 1,6 4,13 0,0602
df per paper (1,15). Some must have dropped out, as this should have a df of 1,17

 

Here are the results, but, as I have noted, there are some issues. The listed df in the paper is 1,15, but they do not note any drop-outs. The df really should be 1, 17. There are also discrepancies between the reporting of the first F value in text and in table. It is small (9,96 vs 9,98). In addition there is a second discrepancy in the reported p-value for the errors. In the text, the p-value is reported as  < 0.55, but the p-value I get from Schimmacks r-index is higher than that (I report the one from df 1, 15).

They claim the results are as expected, but somewhat ambiguous (in that we don’t know if the supposed emotion congruent recall is due to actual congruence, or to a general positivity/negativity effect), which then then attempt to address in experiment 2.

Experiment 2

They note three major changes, and I quote

  1. a) to use different expressions during the memory and mood parts of the procedure, b to employ only material and expressions of negative emotions, which were fear, anger, and sadness, and c) to manipulate expressions during the initial encounter with the material as well as during recall. “

This time, there were 22 undergraduates – two were removed for awareness.

In the first part, they went through the same expression manipulation as in experiment 1 (I think, to separate out those that do produce an emotional feeling from those that don’t)

Then they were to judge “72 different slides on a variety of emotional scales”. I actually don’t know what was on those slides, because it is not described. What was more interesting (to the researchers) were two sentences that were read prior to each slide – one read by a woman, the other by a man. The sentences were read with emotional intonation, and also had emotional content, such as “did you hear that noise?” (for fear). During this part, the participants faces were also manipulated, but this time to a fearful, angry and sad position. All participants had their faces placed in all three positions.

To be more precise – the sentences/pictures were presented in 6 blocks. During each block, the participant held their face in one of the three positions (so they held each position for two blocks). Each block contained 24 sentences, 8 of each emotion. So, a total of 144 sentences (which really then should be considered the trials).

The blocks were about 3,5 minutes long – which is a long time to hold a static facial expression. In fact, they state that an experimenter was watching them so they could be reminded to hold their face in position.

What they were really interested in was in the recall of the sentences, while having their faces (again) positioned in the same three expressions (also within subjects). The subjects thought this was just a manipulation check. They didn’t want them to spend any effort trying to memorize them. The recall also took place after each block.

The recall was scored for correctness (and they were fairly generous with that).

Results

Self-produced Situational cue
Fear anger sad fear anger Sad
Fearful 4,9 3,6 3,2 3,9 4 3,7
Angry 2,9 5,7 2,6 3,3 5,6 4,9
Sad 2,5 3 5,5 2,9 4,1 5,2
n = 10 in each group.
Overall interaction F(4,72) = 3,68

 

If participants were recalling everything more or less correctly, they would have gotten 16 in each cell.

I’ll post the means here, because I can’t really make head or tails out of the df’s for the various sub-analyses. The one I post above seems correct when it comes to df’s anyway. When one throws in all of the data, and analyze it in a mixed ANOVA with the between factor being self-produces vs situational cued participants (2), and the two within-subjects factors being posed face (3) and emotional tone of sentence (3), those make sense.

They do a planned comparison (hey, maybe there is the df problem) to check out the difference in the sentence/face interaction between the two groups, and come up with a not-significant result F(1,72) = 3.38, p = .066, but that was in those days when this was not considered not significant.

Then they report the self-produced and the situational cue separately, and I think they mess up on the df’s here again. There is a significant interaction between story and expression for the self-produced, F(4,72) = 2,78 – but I think it should be F(4,36), as they are only testing half the subjects here.

For the situational, the same effect was not significant, F(4,72) = 1.03 ns (again, I think F(4,36).

So, what they are claiming is that there is emotion-congruent recall, which is emotion specific, but only for those that are sensitive to facial feedback.

I have no idea how I should go about coding this into my r-index and my p-curve data-sheets. At least not now.

And, I really don’t know whether this should be interpreted as a replicated instance of facial feedback. They are actually assuming that facial feedback occur (at least in some of the participants), which then spills over into emotion congruent recall. For both types of participants, there appears to be more correct recall for the emotion-congruent. It is just not significant for the situational cues.

But, is the recall a reasonable measure of whether facial feedback works (as in, giving rise to an emotion that corresponds to the expression). In this work, it is simply assumed that the facial feedback does exactly that. The measures where they ask about how they feel are simply used for sorting people into two types, and in that measure they are receiving two conflicting types of information – from their face, and from the label.

Kellerman & Laird

In the last Laird paper: Kellerman & Laird, “The effect of appearance on self-perception” there is no data to scrape! They did the facial positioning, had people rate their emotions, went through an elaborate scoring, and then used it simply to sort people into self-produces and situational cue responders. Evidently, they could do that, but it provides nothing that I can use to keep assessing whether we have decent evidence for some sort of facial feedback.

 

I now move into the papers that he cites, that he didn’t also co-author.

Rhodewalt & Comer

The first one is Rhodewalt, & Comer (1979) “Induced-compliance attitude change: once more with feeling.

A total of 60 participants, divided across 4 conditions. Well, they started out with 69, but there were drop-outs as usual.

It is all very elaborate, in order to get to their research question, but I’ll gloss over the parts that are not directly about measures of facial feedback.

They start with a pretest session, done in groups, which are mainly about information, but where, oh by the way, another researcher needed help with filling out an opinion survey of 18 issues.

A week later they return (individually) to the experimental sessions. There are 4 groups smile, frown, neutral and control. In the 3 first, they are asked to write counter-attitudinal essays while holding the expression. In the fourth they simply copy down some written materials.

The posing instructions are taken from Laird 1974. As with the previous, possible changes in mood are explained as an artifact to control for (hence the measure). Mood was measured with the Nowlis-Green mood adjective check list- using 18 adjectives measuring Elation, Surgency, Social Affection,Anxiety, Remores and Aggression. (3 adjectives for each).

Each participants had  7 minutes to write the counter-attitudinal essay (and, topic, of course, coming from the pre-session), while keeping their face frozen in whatever expression was designed for them. There was an observer present to make sure they kept their face in the pose. (Boy, that is a long time!)

Results:

For the mood measure, they created a single composite score for the positive factors, and a single composite score for the negative factors. Plus, they calculated a difference score.

Positive mood Negative mood Difference
Smile 3,8 1,91 1,89
Neutral 2,43 2,54 -0,11
Fown 1,71 3,58 -1,87
Control 1,84 1,71 0,13
f(3,56) 3,21 3,32 4,93
P 0,029775 0,026179 0,004148
n = 15 in each group.

 

I’m not reporting the attitude change data. It is just too many steps away to say anything interesting about facial feedback.

Zebrowitz McArthur et al

Next up is Zebrowitz McArthur, Solomon & Jaffe (1980) Weight differences in Emotional Responsiveness to Proprioceptive and Pictorial Stimuli

The topic here is to investigate difference in emotional responsiveness between overweight and norma weight participants. For this they recruit 24 overweight participants, and 36 normal weight participants.

They use the Laird paradigm – but with some changes. The smile one was the same, but here they use the “bottom mouth” meaning for the frown – they place the mouth area into a sadness expression. Here is the instruction:

Please contract your lips by drawing them together and down. Now push out your lower lip a little… Good, now hold it like that.

They also had a neutral instruction

Please relax your face, keeping your mouth closed…. Good, now hold it like that

Now, onto the set-up. Each participant went through 9 trials. In each trial, they were shown a picture for 15 seconds. The pictures themselves depicted humans in sad postures (3) animals that were in “happy” postures (3) and microorganisms, (also 3 which I presume are considered neutral).

The first three pictures were shown while in one facial configuration, the next three in the next, and the final three in the final expression. And, of course, there was a happy, a sad and a neutral picture for each expression. Nice counter-balancing and all.

After each projection, the participant rated how they felt on a sub-set of the MACL. The target emotions were two adjectives for elated, and two adjectives for sad, and then there were 4 fillers. Instead of 5 pt scale, they used a 9 pt scale.

So, it is a within-subject manipulation, all possible combinations.

Additional part for control – they tested participants in pairs, where each individual in the pair posed a different expression – to control for possible experimenter influence. They also tried, as much as they could to counter balance the seats used by males and females, as well as by over-weight and normal weights. The participants could not see each other. (Of course, because they all were viewing the stimuli together, the order of the pictures was the same for everyone).

Results

I’ll do the matrix of mean scores first (the mean score is a composite of the elation and sadness. More negative, more sad. More positive more happy).

 

Posed facial exprssion
Smile Neutral Sad All
Positive normals 6,661 5,917 4,806 5,778
overweight 6,583 6,75 6,042 6,458
Neutral Normals 0,361 0,333 -0,083 0,204
overweight 0,5 -0,5 0,333 0,111
Negative normals -4,722 -7,528 -5,917 -5,056
overweight -4,75 -3,625 -3,458 -3,944
All normals 0,75 0,574 -0,398
overweight 0,778 0,875 0,972
n overweight = 24/cell. N= 72 row and column
n normalweight 36 /cell, 108 in row and column totals.

The interesting (for us) is the two bottom rows, because that would indicate the net-effect of posing the faces. Clearly for the over-weight there is none (which is what they were checking), but there is some for the non-overweight.

And, luckily, they have a planned simple effects analysis for that:

Expression effect for normal weight: F(2,72) = 4.77, p = .0113

For overweight, F was less than 1, so nothing is reported.

They go further into the results, and see that this seems to be a sadness result – most of the effect being driven by that expression. In addition, they look at the scores for the other emotions, but find no effect. So, suggestion is that the feedback effect is expression specific. There were picture effects also, but those are not so important here.

Kleinke et al

The Kleinke & Walton (1982) paper is different from those above. Title is “Influence of Reinforced Smiling on affective responses in an interview”. They claim that the results support a facial feedback theory, and possibly it does, but it is rather messy. It doesn’t involve posing faces into different expression. Instead, they have subjects in the experimental condition, who get reinforced every time they smile (they get a nice green light when they smile, and their task is to try to have as much green light as possible. ) They weren’t told that it was smiling that would be reinforced. In two yoked groups participants were either told to smile whenever a light came on (same schedule as the reinforcement). I won’t go deeper into the experiment, because I think there are many other possible reasons for the results other than some kind of facial feedback (the paper would be important in a larger meta-analysis.

Edelman

The final article that Laird sites is Barbara Edelman’s 1984 paper, A multiple factor study of body weight control.  And, again,there is nothing here that I can use. The facial manipulation (which, they claim, closely follow Laird & Crosby) is simply used to separate participants into self-percievers and situational-sensitive.

Some brief comments

I think this is interesting to note. Early on, Laird and Crosby uses their feedback in order to separate participants that are sensitive to facial feedback from those that are not, and in some of the subsequent articles, that is simply what they use it for, with no possibility for anyone to evaluate how strong the facial feedback effect was.

Of course, this notion of individual difference in facial feedback sensitivity was not part of the original Strack, it was not part of the work I did with Niedenthal where we interpreted the effects of pen-holding as mimicry disruptor, it was not considered for the Strack replication, and, as far as I have gotten in the review of Stracks list of conceptual replications, this is not considered either.

p-curve

p-curve-8-items

I did a p-curve using one focal text from each experiment (I also did it with all, but that makes it throw in repeats for the same manipulation – just slightly different ways of doing it). My shiny-app scripts are listed below.

It does suggest some evidential value, but we only have 8 data-points, and some of them are rather oblique (mood-congruent recall).

This is my first pass at this. I’ll do some better coding/clean-up. The posing is very similar for all. The measures aren’t always reported. There are several repeated measures designs, where there really are several repeats. Frequently they are asked to pose the face for a long time. 15 seconds – 7 minutes! (yikes).

In all of this, the only measure of facial feedback is the self-reported moods after each trial. There are theoretical accounts, but they are rather abstract – self-perception, perception of arousal. Situational cues.

This is just a small sliver of the literature, but I’m interested in whether there is work connecting this closer to additional biological function (physiological measures, brain-imaging etc), and that may very well exist. Or not. It is also hard to know if they really really really didn’t realize it was about emotion. It is measured, after all, even if mentioned that it is to control for unwanted affect. There are just a lot of questions, that one needs to see if the literature has answered.

 

Duncan, J.W. & Laird, J.D. (1977) Cross-modality consistencies in individual differences in self-attribution. Journal of Personality, 45, 191-206

Edelman, B. (1984) A multiple-factor of body weight control. Journal of General Psychology, 40, 363-369

Kellerman, J, & Laird, J.D. (1982). The effect of appearance on self-perception. Journal of Personality, 50. 296-315

Kleinke, C.L., & Walton, J.H. (1982). Influence of reinforced smiling on affective responses in an interview. Journal of Personality and Social Psychology, 42, 3, 557-565

Laird, J.D:, (1974) Self-attribution of emotion: The effects of expressive behavior on the quality of emotional experience. Journal  of Personality and Social Psychology, 475-486

Laird, J.D. (1984). The real role of facial response in the experience of emotion: A reply to Tourangeau and Ellsworth, and Others. Journal of Personality and Social Psychology,47, 909-917.

Laird, J.D. & Crosby, M (1974). Individual differences in the self-attribution of emotion. In H. London & R. Nisbett (Eds.). Thinking and feeling. The cognitive alteration of feeling states. Chicago: Aldine.

Laird, J.D. Wagener, J.J., Halal, M., & Szegda, M. (1982) Remembering what you feel: The effects of emotion on memory. Journal of Personality and Social Psychology, 42, 4, 646-657

McArthur, L. A., Solomon, M. R., & Jaffee, R.H. (1980). Weight and sex differences in emotional responsiveness to proprioceptive and pictorial stimuli. Journal of Personality and Social Psychology, 39, 308-319

Rhodewalt, F., & Comer, R. (1979). Induced-compliance attitude change: Once more with feeling. Journal of Experimental Social Psychology, 15  35-47

 

 

 

 

 

 

My shiny app data – two versions. One where I throw in all, so to speak, although several are separate measures for the same thing. In the second, I cross out duplicates. That is, I only select one measure for each experiment (and, when possible, the one that seems to be the strongest).

# GO AND REPLACE THE EXAMPLES!

# Easy mode (‘#’ starts a comment)

#Paper 1 Laird 1974

#Experiment 1

F(1,37) = 8.18 #Aggression

F(1, 37) = 7.21 # Elation

F(1, 37) = 5.91 #Surgency

#Experiment 2

t(25) = 2.8 #Humorrating

t(25) = 2.46 #Aggression

#Laird Crossby 1974

t(25) = 1.78 #Day 3 effect

#Duncan Laird

t(39) = 3.49 #Elation, frown vs neutral

t(39) = 2.83 #Elation, Neutral vs. Smile

t(39) = 2.1 #Surgency, frown vs neutral

t(39) = 2.14 #Surgency, Neutral vs. Smile

t(39) = 2.43 #Aggression, frown vs neutral

t(39) = 0.21 #Aggression, Neutral vs. Smile

#Laird Wagener Halal Szegda

#Experiment 1

F(1,15) = 9.98 #correct recall, self perceivers

F(1,15) = 4.13 # Errors, self perceivers

#Experiment 2

F(4,72) = 2.78 #Interaction sentence, expression self perceivers

#Rhodewalt and Comer

F(3,56) = 3.21 #Positive index

F(3,56) = 3.32 #Negative Index

F(3,56) = 4.93 #Difference score

#zebrowitz et al

F(2,72) = 4.77 #Facial expression effect of normal weights

 

# GO AND REPLACE THE EXAMPLES!

# Easy mode (‘#’ starts a comment)

#Paper 1 Laird 1974

#Experiment 1

F(1,37) = 8.18 #Aggression

#F(1, 37) = 7.21 # Elation

#F(1, 37) = 5.91 #Surgency

#Experiment 2

t(25) = 2.8 #Humorrating

#t(25) = 2.46 #Aggression

#Laird Crossby 1974

t(25) = 1.78 #Day 3 effect

#Duncan Laird

t(39) = 3.49 #Elation, frown vs neutral

t(39) = 2.83 #Elation, Neutral vs. Smile

#t(39) = 2.1 #Surgency, frown vs neutral

#t(39) = 2.14 #Surgency, Neutral vs. Smile

#t(39) = 2.43 #Aggression, frown vs neutral

#t(39) = 0.21 #Aggression, Neutral vs. Smile

#Laird Wagener Halal Szegda

#Experiment 1

F(1,15) = 9.98 #correct recall, self perceivers

#F(1,15) = 4.13 # Errors, self perceivers

#Experiment 2

F(4,72) = 2.78 #Interaction sentence, expression self perceivers

#Rhodewalt and Comer

#F(3,56) = 3.21 #Positive index

#F(3,56) = 3.32 #Negative Index

F(3,56) = 4.93 #Difference score

#zebrowitz et al

F(2,72) = 4.77 #Facial expression effect of normal weights

 

 

 

 

Posted in Uncategorized | Leave a comment

Blog post on the occasion of Strack Stepper & Martin not replicating, and thoughts about what to do next.

Strack didn’t replicate. STRACK DIDN’T REPLICATE. If you wonder which Strack (which, really, one should as he is prolific), I’m clarifying – it is the one where you stick a pen in your mouth and it makes you think a cartoon is more (or less) amusing depending on how that pen-holding is screwing up your face. Correct is to call it Strack Martin & Stepper (1988).

And, I’m a bit sad. We were going to be part of the replication effort, but last fall semester hit hard, and I had to give up. We needed to collect data before students had heard of the experiment, and we just did not get it together in time. I had predicted there would be an effect.

But, perhaps that should have been a bit moderated. I believed SOMETHING would happen, based on work that were done in the Niedenthal lab, but that effect was a little bit different.

She did a series of morphing studies, where faces changed from one expression to another, and people had to detect the change. The main exploration was whether emotional state has an effect on perception of emotional stimuli. But, in one variant she used the Strack manipulation. For good reason. You want to tease out the edges of an effect. Would it be enough with just facial feedback, or did we need to do the full-blown emotion induction? But, what seems to have happened instead was that holding the pen in the mouth disrupted mimicry – that is the published story.

But, I really think we need to put the study in context. As I wrote in my blog on my beginning trace, the paper is just one tile in the mosaic of studies investigating the role of facial feedback in emotion  processing (and I’m deliberately vague).

One can easily trace this back all the way to the James-Lange theory of emotion, which crudely (and somewhat errouneously) is portrayed in introductory books as you feel afraid, because you are running away from the Bear.

But lets narrow it a bit more – the Strack experiment was part of a much larger body of research looking at the role of facial feedback.

The facial feedback story (as I tell it to my undergraduates) goes roughly like this (admittedly with plenty licence).

When Ekman, Friesen and Ellsworth were figuring out which facial movements could be considered primitives of expressions (the FACS), evidently they noticed that when they worked on furrowing brows, and gaping their mouths into snarly shapes, they got into more snapping and actual snarling * Could it possibly be that screwing up your face into emotional expressions resulted in a feedback to the emotion processing areas in the brain, possibly giving rise to a faint experience of that actual emotion. And, from there, they proceeded to experiment on that notion, usually by asking people to position their face in a certain way (e.g. pull down the outer corners of your mouth. Stick out your tounge. Wrinkle your nose).

I don’t think they were alone pursuing facial feedback. Zajonc has worked on this. Laird has worked on this. Hess has worked on this. Alan Fridlund looked at this. Levenson, Lanzetta, and Gross (the beginning of emotion regulation work), and on and on and on.

What the Strack paradigm specifically addressed was the objection that people may figure out that they were asked to screw up their face in disgust, and, being compliant participants (which my experience says is more common than the recalcitrant) they reported more disgust or amusement etc.

It really was addressing this  particular objection in a very clever way. According to the standards of the time, it worked. And, for some reason, it became THE experiment (in textbooks etc) which demonstrated the existence of facial feedback.

Which, of course it isn’t. No single study ever is!

If someone thinks this refutes the facial feedback hypothesis, or embodiment, that person is doing the same reasoning fallacy as when someone tests a gaggle of undergraduates in Georgia, and then claims to have found evidence for some universal principle of human function.

Instead, be more precise – sticking a pen in the mouth of an individual in order to make them pose their face in a semblance of smile or pout, without alerting them to the fact that you are interested in what happens when the face is put in different positions – seem to have no effect on how funny a (by now) fairly large sample of participants think funny cartoons are.

And, yes, after this experiment, I actually strongly believe just that – which is a very very narrow area.

If we really want to evaluate the veracity of the facial feedback thesis, we must do better than single directed RRR, because this is a web of experiments evaluating a theory.  We need to undertake a comprehensive review.

There needs to be a review of mimicry – human tendency to mimic the facial expressions they are exposed to. There are lots of experiments. Some filming faces, some measuring EMG, some looking at brain correlates, and there are a lot of papers here (I used to read this as a doctoral student).

Next, what happens when mimicry is disrupted? Through instruction (don’t show what you feel – keep a stone face), or physical disruption (e.g pens in mouth, botox).

Then, we need to review what we know this mimicry (or disrupted mimicry) results in for the individual. (Suggestions – mild experience of the same emotion, changes in physiological signatures, perceptual sensitivity to congruent materials, enhanced emotional reaction to other materials).

I don’t think the facial feedback hypothesis is stupid. Humans have a tendency to imitate and entrain (we think anyway), and it is a feasible first mechanism for trying to understand how we communicate, and how we understand one another. (Even my son has heard that mimicry is the basis of empathy – and he is 13 – it has face validity, but the evidence needs scrutiny). I tend to take an ecological/evolutionary view of things, which is why I think it is non-stupid.

Now, there is a lot of research on this. Why not evaluate it, see how strong (or not) it is, possibly do some very directed experiments once there is a better map (if warranted), and do it on more than undergraduates.

I think I will actually do this – but, of course, I will have to get help.

* I have no idea where I picked up this anecdote. Could be you-tube, could be the conference I went to 2003, could be some paper.)

Posted in Uncategorized | 4 Comments

On Brannigan (rise and fall of Social Psych), and Henrich (the secret of our success) and psychological research in general.

When I read David Hull’s ”Science as a process (1988), he reiterated one controversy that I found interesting for Psychology. His area were systematics – how to best classify animals and plants (the stuff of Linnaeus – science is never done). The controversy were between those that thought classification needed to have an evolutionary grounding – species have a history, and that  ought to be reflected in the classification – and those that thought one needed to classify based on (more or less) observable features existing right now. (The controversy is discussed in the two chapters “Down with Darwinism-Long live Darwinism, and Down with Cladism – Long Live Cladism, if I recall right).

From my naïve outsider view I first thought that of course you want to use the evolutionary history to figure out how to classify living things, but as the opposing side pointed out – even if one doesn’t doubt the importance of evolution, the actual evidence available was so spotty that it wasn’t possible to use it as basis for classification. Instead, one should stick to what is observable now for classification. As Hull points out, what is observable is also not quite straight forward. (Visible traits? Genetic markers – which requires a whole lot of apparatus to detect?  Also, what a particular trait is had at one point been hotly debated – his example is what is the dorsal and ventral part of an animal. Observation of current traits is theory-laden, which perhaps we as scientists forget).

My mind went to social psychology/evolutionary psychology. Of course humans, and their psychology are a product of evolution, but, as many critics have pointed out, minds and behaviors don’t fossilize well, so much of the work has to be done by careful (but still in part speculative) theory applied to present day humans. Perhaps there is a real point in cataloging current humans and their traits and behaviors, before considering evolution. Or some iterative work combining the two.

This weekend I read two books (well, I’m not finished with one of them yet): August Brannigan’s “The rise & fall of Social Psychology (subtitle the Use and Misuse of the Experimental Method) published 2004, and Joseph Herich’s “The secret of our success: How culture is driving human evolution, published 2015.

I got the tip for Brannigan’s book in a Facebook discussion (from a European student). I think it is a must read. Of course, Social Psychology doesn’t seem to be in any kind of post-experimental wasteland, although it has been in the focus the current so called crisis following the refusal to publish a non-replication of Bem’s ESP work, and the vast fraud of Diederik Stapel.

I think it is a must read for anyone interested in social psychology, and anyone interested in psychology as a science. Did you know that Festinger left Social psychology to take up work on perception, and then eventually “exploring prehistoric and archaeological data” (as per the Wikipedia page here) , evidently being disappointed in psychology.

Did you know that there have, time and again, been accomplished researchers that critiqued Social Psychology scathingly? Decades ago? I recognized none of the names, possibly, as Hull also points out, that they were alone in the wilderness with no Deme advancing their position.

As a grad student, I felt frustrated that there seemed to be no larger theoretical framework from which to reason about psychology, and my adviser pointed out that, yes that is the case. The field is filled with mini-theories, but nothing over-arching. Evidently, from this book, far better scientists than I have noted this, complained about it (for example – chapters in a social psychology book could be shuffled, with no ill effect) thus, there is no cumulative understanding, no placing effects in a larger frame (e.g what can be attributed to situation, what to traits, what to larger social circumstances etc – Brannigan is a sociologist, and works in criminology).

He is especially critical about the sine qua non of the experiment in social psychology (at the exclusion of field work and other methods). This, he claims, has in part lent social psychology an air of proper hard science which has allowed it a great deal more influence in the actual world than he thinks is warranted (e.g. work on violence in movies). But, the experiments seem more to be performances and demonstrations rather than actual tests of theories. There are few falsifications (as we know). Positive supportive results are the only thing presented. In effect, the experiments are de facto anecdotes that support a narrative that is already decided.

As his cases in point he uses Festingers dissonance theory (there are aspects of it that are absurd – like the enormous payments some of the students get. 20 dollars then was quite a bit more than it is now – sometimes I first saw Tom Stafford point out); Muzafer Sherif’s work on the autokinetic effect, which was claimed as evidence for norms and and who one conforms to – but which is (per Brannigan) very much removed from actual social situations, and a rather minute piece of evidence for building a larger piece a theoretical narrative: Zimbardo’s Prison experiment – with the ethics problems; Milgrams work on obedience which perhaps is not so much about obedience to authority as it has to do with the expectations of an experiment (e.g. it is an experiment, they will not allow anything bad to happen, so it is OK to comply – this is not what happened in Nazi Germany); Asch’s work on group pressure, and all the work on the horrors of TV imparting violent behavior to our children/making guys watching Porn being more OK with rape – which he deems rather shallow (so many more interesting questions), an expression of class (we don’t belong to those nasty unwashed TV watchers) unduly influential (bans on violence on TV, bans on Porn), and neglecting truly interesting questions such as immersion, narrative, separation of story from reality (although I do think there is work on this – albeit maybe not so splashy in the news).

Yes, there is more to Social Psychology, but, no, we are much too enamored of the experiment, see earlier critiques by Paul Rozin (2001) Social Psychology and Science. Some Lessons from Solomon Asch, Robert Cialdini (2009) We have to break up, Martin Orne (1962) On the social psychology of the psychological experiment. We may not know enough about a phenomenon to actually do an experiment (difficulty falsifying, because we are making too long chains of assumptions between how we do it, and what we actually want an answer to), we do it on populations that already have an idea about how to be good subjects, and will thus behave in a manner that has to do with the experimental situation and not give us any answer to what we want to test, etc.

But, go read. Even if you don’t agree. As scientists, we need to have statements that we can use as foundation of our critique.

Which brings me to the Henrich book.  His thesis (in my interpretation) is that our special feature is our capacity for cumulative cultural learning. More-over, that capacity is something that feeds back on genetic evolution. One example is lactose tolerance (which is fairly standard). If you have animals that give milk, there may be an advantage to those individuals that don’t shut down the lactase digesting hormone once weaning is done to get better nutrition and hydration from unprocessed milk (cheese and yoghurt chews up the lactose so you don’t have to be a lactase mutant).

Another example is the human as long-distance runner – in order to track down and kill large animals. He suggests a number of adaptations – one of particular interest is how to keep cool while running long distance in a hot climate – hairlessness and sweating. But, sweating means that there must be a good supply of water that can be sweated out to cool us. Now, that is not something we are born with, unlike, for example, camels. We can only store so much water. His point is that this adaptation must have occurred after humans figured out how to externally access water: Water pouches, straws to access pools in tree-trunks, recognition of plants and other signs that indicate where water or watery plants may be, lore to keep track of water-holes, etc.

By now, I’m reading about kinship, sharing rules, food preparation rules, imitation and faith. All of these abilities must hinge on some psychological capacity, some bred in bias on where it is best to look: who to imitate, who to listen to, how to police others to do the “right thing”, and all without us necessarily understanding the why. He claims cumulative cultural evolution is smarter than us.

Many of these “biases” do show up in Cialdini’s six ways to yes in persuasion (and I’m sure Cialdini is quite aware that there is an evolutionary quality to those). We reciprocate, we look to authority (possibly more the prestige type than the dominant type), we look to the crowd, tough rituals make us more committed, we look to those we like and like us and are like us. These little biased hooks are what allows us to accumulate culture over long times.

I’m thinking, here is a more overarching theoretical framework from which to reason about psychological phenomena. It may not be right, but it is useful, and it is something one could test. There will be cultural differences – where is the underlying invariant?

I’m a big fan of ecological psychology also – but it seems like it is still best applied to perception/action (although I have seen attempts at ecological social psychology). This is also, in many ways, grounded in an evolved thinking – minds and bodies have evolved to capitalize on our surroundings. Perhaps eventually these can be brought together (or not).

There is a point in doing research on contemporary beings (because, who else?), without necessarily using a deeper evolutionary thought. But, perhaps a though on what brought this part out could help guide where to look and what to attempt to falsify. I’m a little bit tired of the narratives in social psychology textbooks. The effects must be placed in the context of effects of traits/personality, class, social systems, cultural systems, etc., and I rarely see that. (Nazar Akrami has looked at the relative contribution of personality factors and more social psychological factors and found that personality dominates – but more like this is needed  ).

Cialdini, Influence.

Cialdini (2009). We have to break up. Perspectives on psychological science, 4, 5-6.

Orne, Martin (1962) On the social psychology of the psychological experiment. With particular reference to demand characteristics and their implications. American psychologist, 17 776-786.

Rozin, Paul (2001) Social psychology and science: Some lessons from Solomon Asch. Personality and Social Psychology Review, 5, 12-14.

Brannigan, Augustine (2004) The rise and fall of social psychology.

Henrich, Joseph (2015). The secret of our success.

Posted in Uncategorized | Leave a comment

Tracing Strack, the preamble

I’ve started a second trace! A bit to pursue proof of concept, get a feel for extendability. A trace is, after all, a bit like a case study. I’ve selected the Strack, Martin and Stepper (1988). Better known as the one where participants get to hold a pen in their mouths to get their face shaped like a smile or a frown, without them really realizing this is happening. This is also a paper that is under PoPS registered replication. I was going to participate as one of the independent labs, but work hit me and my team (I became director of our international Masters), and I just had to give up. (Still feel a bit sad about that).

It is, to cite Jens Förster (citation # 102 in my trace) a Classic. * It is so generally well known that in the instruction for the registered replication we were asked to make sure that the population we tapped were not aware of the effect – that is, get the psych undergrads before the Emotion module, or tap other undergrads (we were planning on using the film/linguistics/humanities set). The paper has a reputation! Of course we want to replicate it.

I pulled the data for the trace on the 29th of April, 2016, and at that time, the paper had been cited 544 times (all social science citation index). Not as many as Srull & Wyer (1979) that I pulled a year before, but this was published a decade after. Not shabby.

I also decided to be more ambitious with the trace this time. Instead of the 5 first year (53 article), I decided to trace the first 150 articles. (I am going solo so far, and I am going the artisanal way. No automatic scraping of info here!)** First citation is in 1989, the last in 2006 so we are spanning over a decade and a half. I figure that might also be enough to find  possible citation distortions (I have). This actually includes a paper where I’m co-author. We used the Pen in the Mouth technique, but it didn’t seem to work as enhancement, but more as a mimicry-disruptor. (Niedenthal  et al 2001), which still is some kind of effect on Facial Feedback that is interesting.

I realized early on that the trace here had a different nature from the Srull & Wyer trace. The Srull & Wyer paper were very much an origin paper for subsequent work on Social Priming***, whereas the Strack et al is a relatively recent paper in the tradition of facial and bodily feedback, which can reasonably be traced back to the James-Lange theory of emotion, was under ongoing investigated by the Ekman Deme**** and the Zajonc Deme, and in many ways was an ingenious technical solution to the pesky demand objection that came from asking people to pose their faces in emotional configurations.

Classifying the papers (based on the abstracts) also was different. For the Srull & Wyer trace I classified papers as either extending related or oblique. My intention was to particularly pay attention to those papers that extended the priming idea, whereas for other papers I would only look closer at the citation patterns. This was not so evident for the Strack Paper. Yes, there were clear obliques (Emotions and God, Education, Robots – although that turned out to actually fit within extension), but it was far less obvious which papers extended the work and which were related but not extending. This is quite possibly because the Strack Paper isn’t an origin paper for a particular area of research, but a paper that is mid-stream in ongoing research on bodily feedback on affective processes. Even if I did a rough sorting, I then went in and made a somewhat more fine-grained classification of the topics. The majority involve research on facial feedback (39), but there are also papers on Arm Flexion (17), emotional expression (7), Embodiment (7), Emotion regulation (6), mix of other types of bodily feedback including head nodding (11) and effects of induced mood or emotional states (25) which all seem to be somewhat relevant and could potentially be extending.

So far, I have pulled citations from all the papers I could find without having to go too far out of my way. (I have a handful from papers like Cognition and Emotion and Cortex which evidently I have to request prints rather than download PDF’s, and there are also a few non-english papers that I can’t get to – I included a bit more than just peer reviewed papers in this trace).

That is 128 papers. One thing I noticed when doing my Srull & Wyer (1979) trace was that in papers that extended their work, they tended to be cited multiple times  (both for the theoretical and empirical background as well as for methods and in the discussion). In the related and oblique papers they tended to be cited maybe one or two times.  This is the citation patterns so far for Strack et al.

Times cited in paper Frequency
1 90
2 15
3 12
4 7
5 0
6 1
7 1
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 1
128

 

As I actually go and pull the citations manually (with the help of the search function, when that works), I do get a quick feel for what is going on. The paper is highly cited, because this is an important addition in the ongoing work on bodily feedback, as it rules out demand. But, direct extensions of the technique are not that common. (The one with 15 cites most definitely did a replication).

In the trace, I’m most of all interested in the direct extension of the source work (it doesn’t have to be like that. Lots of questions can be asked of a trace), so what I’m directly scrutinizing may be rather small in the end. But, I’m starting to look closer at the various experiments on bodily feedback to see what that can yield.

Some refs

Förster, J (2004) How body feedback influences consumers’ evaluation of products. JOURNAL OF CONSUMER PSYCHOLOGY.

Hull, Davis (1988). Science as a process.

Niedenthal, PM; Brauer, M; Halberstadt, JB; Innes-Ker, AH (2001). When did her smile drop? Facial mimicry and the influences of emotional state on the detection of change in emotional expression. Cognition and Emotion.

Strack, F, Martin, L, & Stepper, S. (1988) Inhibiting and facilitationg conditions of the human smile: A nonobtrusive test of the facial feedback hypothesis. Journal of personality and Social Psychology, 54, 768-777.

 

 

*Förster collaborated with Strack quite a bit on similar questions, so I think he spoke about it from inside this particular Deme.

** because I haven’t spent time to figure out how to.

*** Yes, I know people object to this, because there are so many different variants that this doesn’t capture what it is about. But, it is useful to distinguish it from the type of priming that seem just focused on associative networks – like the doctor-nurse, apple-orange thing, which is quite robust.

**** A deme is, in biology, a local breeding population. I also found out (looking for definition) that it is an old greek word for a village or district (distinct from Polis). I got it from Hull (1988)  (who most likely got it from  biology. In his meaning a scientific Deme is a group of scientists that work more or less cooperatively on a particular idea in science (the cooperation doesn’t need to be uncontentious).

*****Simine Vazires penchant for asterisks are spreading. A bit of cultural evolution (the social copying kind).

Posted in Uncategorized | Leave a comment

Tracing Srull & Wyer manuscript

I wrote a manuscript detailing my Srull & Wyer trace (the one I have been blogging about). If anybody cares to read and give some comment, I’d be grateful. At some point I’d like to submit it.

https://osf.io/9vkxt/

Posted in Uncategorized | 1 Comment

All references are equal, but should some be treated as more equal than others? On the data-set authorship discussion.

There’s been an interesting discussion on data-sharing and how to properly give credit when you are using someone else’s data on both Facebook and Twitter. Candice and Richard Morey  did a nice blog-post on why sharing data should not automatically mean authorship. Talking to other researchers, that seems to be part of what the Vancouver and beyond suggests for criteria for authorship. The proper way to credit a shared data-set is to include a reference.

Authorship and references are the two traditional ways of assigning credit. For the individual scientist authorship signals origination, and reference signals the use other scientists find in the original work.

But, references are a strange measure of success of an idea/work. When I was going through my Srull & Wyer (1979) trace, I collected all the places in the manuscripts where they had been cited in the first 53 articles that cited them. The reason for citing them ranged from the peripheral to the profound. Examples of the peripherals was an opening sentence where the author cited them (along with others) as evidence social psychologists were now interested in cognitive explanations for social phenomena, and a foot-note where they stated that the current paper was not interested in the priming phenomenon, but one should look to Srull & Wyer 1979 if one was interested. In the profound, they were cited multiple times because the research essentially extended the original research.

This shouldn’t be surprising. We are trained to cite just about everything we have gotten from other researchers, be it trivial, profound or antagonistic, and this is perfectly fine. I like being able to look in the references to pursue ideas that may not be central to the present research. I even find it disconcerting when it doesn’t exist. I started reading William James “Principles of Psychology” and found it distracting that there were no references to statement that it was clear he had learned from others. But, of course, in our citing practices, papers will vary in their degree of centrality.

None of that is evident from a reference list.

It seems we may need to look over how we are apportioning credit, especially when authorship and references are given so much weight in important measures of success. I don’t have a clear thought on how to do this, because there are always downsides, and simply complicating things by grading the importance of a cite is something that I instinctively think can become problematic.

Perhaps one needs to abandon the traditional ways of indexing success is the way to go (I doubt that will be the case).

But, should we distinguish between peripheral and central contributions from earlier research? Sharing stimuli or sharing data-sets or using tested paradigms, questionnaires, analysis schemes – are they “worth” more than the more peripheral citations, or do we run other risks of conflict and credit arbitrage?

Posted in Uncategorized | Leave a comment