You don’t even have to have your finger on the pulse of academic news to have heard about the Lacour and Green research debacle. It’s been bouncing around in my brain since it’s related to the way we maneuver in a world of information, and it is relevant to my work as a librarian and as a researcher. In a drama-filled nerdly nutshell (with links to further reading for the details), the situation:
Brief Unofficial Timeline of the Study, and Discovery of Possible Misconduct
- an important study on persuasion coauthored by a UCLA political science graduate student (Lacour) and a big-name political scientist at Columbia University (Green) was published in (and then retracted from) the peer-reviewed journal Science;
- the large-N study indicated that attitudes about same-sex marriage could be significantly changed long-term by brief exposure to someone who was gay;
- because this would be huge news, it was picked up by NPR’s This American Life;
- because the conclusions go against most research on persuasion (in general, people tend to sway only slightly from their beliefs, and tend to “snap back” to their initial beliefs quite quickly), another graduate student tried to replicate the findings;
- Broockman, Kalla, and Aronow release a report recounting the extreme irregularities in Lacour and Green’s article;
- Lacour responded to the report critiquing his work in a 23-page document;
- Another statistics hound explains why Lacour’s rebuttal is weaksauce;
- under closer scrutiny, it appears that Lacour may have faked a number of other things, including grants and teaching awards listed on his CV, research integrity documents, and perhaps data for other studies;
- not only could the study’s findings not be replicated, under closer scrutiny, it appears the study may have been faked. While the political canvassing was actually done, the pre- and post-surveys to determine baseline attitudes and aftereffects of intervention were what were fabricated. (It appears that Lacour used CCAP data (an already-done but not publicly available dataset) as the baseline data, and made up the post-survey data).
Whew. So, in an even smaller nutshell, a UCLA grad student and a Columbia U. Famous Faculty Dude coauthored an influential article on gay marriage that turned out to be based on data that it appears the grad student COMPLETELY MADE UP.
Why This is a Big, Hairy, Hulking Deal
First, the “so what?” question. There are many reasons why this is such a big deal, and I’m only going to articulate a few of them:
- First, this is the sort of political behavior research that changes how people actually approach issues, and how agencies distribute grant funding.
- Talking to people changes their minds in the long run? Then political organizations will send out canvassers to speak to people instead of spending their money on television ads and paper mailings.
- Grant organizations start shifting their funding away from projects not using that methodology, since the published research makes them think that Lacour’s way – exposing people to people holding different political views – is more persuasive. This means research projects based on other research, namely that it is perennially difficult to get people to change their minds, and to keep their minds changed, see less funding.
- It puts a dent in the trust we have in institutions of higher education, and in our peer-reviewed published research.
- Our whole scientific structure rests on trust. Some scientific journals are more rigorous than others and ask authors to share their data so that statistics can be verified. But who guards the realm against complete fakery?
- Will UCLA actually grant Lacour the PhD now that the cat is out of the bag that his research design was a lie? If they do, what does this say about our expectations for the highest research credential you can earn?
- This event has implications for how higher education hires new faculty.
- Lacour was hired as a new tenure-track professor at Princeton University, a plum gig for someone straight out of graduate school. Will Princeton keep his contract live now that they know about his research ethics failure? Or will they let him come in and see what he does to earn reappointment into a second year?
- Lacour claimed he had brought in over $700,000 in grant money. If he was working as a tenure-track professor, there is all sorts of documentation that would have been required for him to include that in the portfolio reviewed each year for his reappointment. But because he was just a graduate student, and the grant information was on his CV, no one bothered to double-check his claims against the foundations themselves. What mechanisms do we have in place to catch such shameless CV-padding?
- Broockman, one of the graduate students who discovered the foul play, was repeatedly advised against publishing or discussing his concerns. There’s a lurking shadow in academia that whistleblowers are not to be trusted, or supported. How does this play out with the purported search for Truth? What does this say about our willingness to critique and do thorough peer-review on our scholars’ work?
- There are other issues, related to research integrity, data documentation, co-authorship, and academic job-seeking; expect more blog posts.
The Role of IRB, and the Problem of Data
The role of an Institutional Review Board or IRB, is to review proposed research to determine that it will have no ill effects on the people/animals/phenomenon studied. For those not familiar with the process, usually if you are going to do research it has to be approved by an institution’s IRB. This involves lengthy amounts of paperwork, articulation of the research project in great detail, and detailed explanation of how subjects and data will be protected. (You can see my university’s IRB page and paperwork here, if interested, to get some idea. You can also see an example of an IRB application I myself submitted here.) I filled out IRB paperwork for my dissertation research back at UT-Chattanooga, and have filled out IRB paperwork here at CSUCI for new research projects. it’s generally considered a necessary evil, a dotting-of-the-is.
The Data Problem
One of lacour’s defenses appears to be that he destroyed the raw data file, and so he cannot provide that to back up his research. I don’t know if all IRBs have this issue, but I’ll note that my own institution’s IRB paperwork contains no option for permanent storage of anonymized data – I had to write it in on one of my IRB applications, and then re-explain it in detail because it didn’t fit within the antiquated practice of destroying all data so many months after the project was complete. We live in the future. Sharing our data with other researchers can add to the amount of information available to study. In fact, this reminds me to ask data guru and librarian Abigail Goben about this, since I want to bring it up – with an elegantly worded solution – to my university’s IRB committee for their forms, since I think we *should* be encouraging researchers to share data.
I should note here that I am describing keeping anonymized data, where all identifying characteristics and variables have been removed. For instance, when I submit my dataset to my institutional repository so others can use it, I will remove columns with names of individuals, email addresses, and their institution, as well as comb through the open-ended responses to remove identifying information that may have ended up there. then each respondent will be given a randomly generated unique number. Nothing identifiable from the respondent remains, but now I can share the data with others interested in the phenomenon, so that they can try to replicate my work, or use the data to answer their own research questions, if the data is what they need.
This practice of anonymizing data is common (and usually required). It is also standard practice. I say this as someone who was an Economics major in undergrad and then did doctoral level study in political science: for someone with a background in statistics and doing doctoral work in political science, I would expect Lacour to know this. That Lacour deleted all his original data files and kept nothing is beyond suspicious, and claiming he had a responsibility to keep certain data points confidential doesn’t excuse him from the responsibility of maintaining the data. This is not just data used to publish in Science, lest we think this a one-off–this is his dissertation data. Which he claims he does not have and cannot share. How, then, to discuss the merits of his dissertation? (Yes, I still do have my dissertation data. Anonymized. Which I am happy to share with any interested parties.)
The IRB & Outside Researcher Problem
One of the big gaffes in this whole Lacour and Green research scandal is that Green, the senior researcher and statistician, claims he did not have access to the raw data, nor did he want that access, since gaining IRB approval from his institution to work on the research project our of UCLA would have been a huge hassle. I won’t recreate this entire argument, since Scatterplot has a great post on this very issue. What I will say is that IRB should have very much been involved, and that faculty efforts to avoid IRB at all costs (due to delays, hindrances, and paperwork) does nobody, including our institutions of higher education, any good. Still, the argument exists that the Lacour and Green issue could have happened even if Green had gotten proper IRB approvals to look at the data–it still would have been Lacour’s fake data in the file he would have shared with Green. Would Green have recognized it as fake, the way broockman, Kalla, and Aronow did when they really dug into the statistics? We’ll never know, but he surely would have been concerned at there being no sourcefiles in Qualtrics.
The Role of the Chair and Co-Author
Very little has been made, to date, of Green’s role in this whole debacle, or of Lacour’s dissertation advisor and what her responsibilities might have been.
First, let’s discuss the dissertation chair. Professor Lynn Vavreck at UCLA served as Lacour’s dissertation advisor, and the data for the retracted study is purported to have come from Lacour’s dissertation, which puts Vavreck in the hot seat. My dissertation advisor was all up in my data booch while I was doing my dissertation–he had access to my Qualtrics instance (the software doing the data collection), though I don’t know if he ever used that access to track progress. For instance, I could log in on any day and see how many respondents had answered my survey to date. My chair also had me run and re-run numbers to his satisfaction, and had me address any oddities in the findings. Anything that went against decades of established research would have been something he would have raised an eyebrow at, and picked away at. I don’t know if the chair has much of a defense against straight up data fabrication; the assumption during the dissertation phase is that the student is spending their time doing the collecting and analyzing. Something should have smelled fishy about his crazily positive results, but Vavreck didn’t catch it. Should she have? Should she have checked his data against existing sets and discovered he had co-opted the CCAP data, as Lacour’s detractors did? If a graduate student can figure it out, I’d expect the dissertation chair to have at least as much invested. *Spock eyebrow*
I’ll admit that I have some pretty serious misgivings about Green’s involvement in this whole affair. Green is a professor of political science at Columbia University, and formerly taught at Yale. He’s a known big name in the field. (Having a big, famous name on your article makes it much more likely that the universe – especially the academic universe in one’s discipline – will pay attention and talk about your research.) It appears that Green was approached by Lacour to serve as coauthor of the Science article. Green claims he helped with the writeup, but never looked at the original data. When Green saw the data skewed opposite of other research in the area, he asked Lacour to replicate the experiment, and depended on Lacour’s confirmation that he did. Green applied his statistical expertise and found the same results in the data Lacour did. Green wrote of his disappointment in various statements, he requested the retraction from Science, and reflected in a statement to Retraction Watch:
“Convinced that the results were robust, I helped Michael LaCour write up the findings, especially the parts that had to do with the statistical interpretation of the experimental design. Given that I did not have IRB approval for the study from my home institution, I took care not to analyze any primary data — the datafiles that I analyzed were the same replication datasets that Michael LaCour posted to his website. Looking back, the failure to verify the original Qualtrics data was a serious mistake.”
I would posit that it’s a serious mistake on a number of levels, and that Green’s statement is a declaration of absentee-co-authorship in that he didn’t expect to have to do much work, just to put his name on the article. The Famous Guy gets an article for his CV, and the Up-And-Comer gets a great article in an important journal plus the halo-effect and credibility boost of coauthoring with Famous Guy. With this sort of relationship, then, Green overtrusted Lacour, and likely figured that lacour was just using Green’s name as leverage. Green may have re-run the statistics to be sure his results were the same as Lacour’s, but the issue isn’t the statistics that were run, it’s the data itself. Had the co-author been more intimately involved in the data collection process, he might have noticed Lacour’s vague explanations. As the LA Times stated,
“if close collaborators aren’t going to catch the problem, it’s no surprise that outside reviewers dragooned into critiquing the research for a journal won’t catch it either. A modern science article rests on a foundation of trust.”
How much do you trust your co-authors? Enough to not have the same access to the data that they do? I’ve actually struggled with this, and let a great research project idea die because a prospective co-author would not share the necessary instrument and data analysis. I’m not famous. I’m not even on the job hunt. But I’d never put my credibility on the line for research that I can’t vouch for from cradle to grave. Is that because I’m a librarian with an overactive imagination? is it because my default mode is transparency? Maybe a little bit of both. And if things get squirmy at that beginning stage of discussion and IRB paperwork, one should be on alert moving forward with that project and co-author.
The Role of Replication
Broockman was repeatedly warned against discussing or publishing his findings that Lacour and Green’s study had serious problems. It appears academic is no kinder to whistleblowers in research than it is to whistleblowers in academic administration. Broockman was warned off because Green is Famous and Lacour was an Up-And-Comer. He was warned off because folks thought he might get a reputation for ‘merely replicating’ instead of developing his own research agenda. I’d like to point out that replication is crucial for research. It might not get you a PhD, but it will definitely bring out nuances in the data, and let you know if findings are a fluke,a product of research design, or an actual phenomenon. Interestingly, I’m involved in replicating my dissertation study in slightly different populations to see if the findings hold. Replication is worthwhile, especially if done conscientiously. It is just that conscientiousness, and how Broockman tried to determine why his study wasn’t bringing back the results found in Lacour’s study, that led to the discovery of fraud in the first place.
What it Means for Academic Job Seekers
The best way to go on the market as a newly-minted PhD is with a published article in hand, and the more of those the better, especially if you want to land at a research institution. Lacour was on his way to Princeton this July, though there’s been no word on whether or not that has changed in light of this scandal. Was the job market a stressor inducing Lacour to cheat his way to astounding, news-making results? Why don’t other new PhDs fake their data? Or DO THEY, and we just don’t know it? Who is getting advantaged in this situation? It seems that Lacour put much time and effort into creating his fictions; in my experience, it might have been less effort to actually do the research properly and avoid this whole clustersuck. I’ll be interested to see how (and whether) this shakes out into any changes in the hiring process or publication in general, such as requiring publication of datasets. Repository librarians, be ye ready! Maybe this is our inroads to discuss data storage and publication with our faculty.
Since I’m teaching a course on information in the fall, I’m intrigued by all levels in this case and hope to use parts of it for my students’ reading. I wish I were teaching a methodology course, we would have so much fun with this. As a librarian and researcher, it just makes me angry and sad. Why the lie? Why the continued defense of the lie? And how on earth did it pass before so many sets of eyes and only come out because Broockman couldn’t let it slide, even if it meant his professional reputation?