The Big Hairy Deal: Research Ethics , Roles of IRBs, and Responsibilities of Chairs/Coauthors in Light of Lacour and Green,

You don’t even have to have your finger on the pulse of academic news to have heard about the Lacour and Green research debacle. It’s been bouncing around in my brain since it’s related to the way we maneuver in a world of information, and it is relevant to my work as a librarian and as a researcher. In a drama-filled nerdly nutshell (with links to further reading for the details), the situation:

Brief Unofficial Timeline of the Study, and Discovery of Possible Misconduct

Whew. So, in an even smaller nutshell, a UCLA grad student and a Columbia U. Famous Faculty Dude coauthored an influential article on gay marriage that turned out to be based on data that it appears the grad student COMPLETELY MADE UP.

Why This is a Big, Hairy, Hulking Deal

First, the “so what?” question. There are many reasons why this is such a big deal, and I’m only going to articulate a few of them:

  • First, this is the sort of political behavior research that changes how people actually approach issues, and how agencies distribute grant funding.
    • Talking to people changes their minds in the long run? Then political organizations will send out canvassers to speak to people instead of spending their money on television ads and paper mailings.
    • Grant organizations start shifting their funding away from projects not using that methodology, since the published research makes them think that Lacour’s way – exposing people to people holding different political views – is more persuasive. This means research projects based on other research, namely that it is perennially difficult to get people to change their minds, and to keep their minds changed, see less funding.
  • It puts a dent in the trust we have in institutions of higher education, and in our peer-reviewed published research.
    • Our whole scientific structure rests on trust. Some scientific journals are more rigorous than others and ask authors to share their data so that statistics can be verified. But who guards the realm against complete fakery?
    • Will UCLA actually grant Lacour the PhD now that the cat is out of the bag that his research design was a lie? If they do, what does this say about our expectations for the highest research credential you can earn?
  • This event has implications for how higher education hires new faculty.
    • Lacour was hired as a new tenure-track professor at Princeton University, a plum gig for someone straight out of graduate school. Will Princeton keep his contract live now that they know about his research ethics failure? Or will they let him come in and see what he does to earn reappointment into a second year?
    • Lacour claimed he had brought in over $700,000 in grant money. If he was working as a tenure-track professor, there is all sorts of documentation that would have been required for him to include that in the portfolio reviewed each year for his reappointment. But because he was just a graduate student, and the grant information was on his CV, no one bothered to double-check his claims against the foundations themselves. What mechanisms do we have in place to catch such shameless CV-padding?
  • Broockman, one of the graduate students who discovered the foul play, was repeatedly advised against publishing or discussing his concerns. There’s a lurking shadow in academia that whistleblowers are not to be trusted, or supported. How does this play out with the purported search for Truth? What does this say about our willingness to critique and do thorough peer-review on our scholars’ work?
  • There are other issues, related to research integrity, data documentation, co-authorship, and academic job-seeking; expect more blog posts.

The Role of IRB, and the Problem of Data

The role of an Institutional Review Board or IRB, is to review proposed research to determine that it will have no ill effects on the people/animals/phenomenon studied. For those not familiar with the process, usually if you are going to do research it has to be approved by an institution’s IRB. This involves lengthy amounts of paperwork, articulation of the research project in great detail, and detailed explanation of how subjects and data will be protected. (You can see my university’s IRB page and paperwork here, if interested, to get some idea. You can also see an example of an IRB application I myself submitted here.) I filled out IRB paperwork for my dissertation research back at UT-Chattanooga, and have filled out IRB paperwork here at CSUCI for new research projects. it’s generally considered a necessary evil, a dotting-of-the-is.

The Data Problem 

One of lacour’s defenses appears to be that he destroyed the raw data file, and so he cannot provide that to back up his research. I don’t know if all IRBs have this issue, but I’ll note that my own institution’s IRB paperwork contains no option for permanent storage of anonymized data – I had to write it in on one of my IRB applications, and then re-explain it in detail because it didn’t fit within the antiquated practice of destroying all data so many months after the project was complete. We live in the future. Sharing our data with other researchers can add to the amount of information available to study. In fact, this reminds me to ask data guru and librarian Abigail Goben about this, since I want to bring it up – with an elegantly worded solution – to my university’s IRB committee for their forms, since I think we *should* be encouraging researchers to share data.

I should note here that I am describing keeping anonymized data, where all identifying characteristics and variables have been removed. For instance, when I submit my dataset to my institutional repository so others can use it, I will remove columns with names of individuals, email addresses, and their institution, as well as comb through the open-ended responses to remove identifying information that may have ended up there. then each respondent will be given a randomly generated unique number. Nothing identifiable from the respondent remains, but now I can share the data with others interested in the phenomenon, so that they can try to replicate my work, or use the data to answer their own research questions, if the data is what they need.

This practice of anonymizing data is common (and usually required). It is also standard practice. I say this as someone who was an Economics major in undergrad and then did doctoral level study in political science: for someone with a background in statistics and doing doctoral work in political science, I would expect Lacour to know this. That Lacour deleted all his original data files and kept nothing is beyond suspicious, and claiming he had a responsibility to keep certain data points confidential doesn’t excuse him from the responsibility of maintaining the data. This is not just data used to publish in Science, lest we think this a one-off–this is his dissertation data. Which he claims he does not have and cannot share. How, then, to discuss the merits of his dissertation? (Yes, I still do have my dissertation data. Anonymized. Which I am happy to share with any interested parties.)

The IRB & Outside Researcher Problem

One of the big gaffes in this whole Lacour and Green research scandal is that Green, the senior researcher and statistician, claims he did not have access to the raw data, nor did he want that access, since gaining IRB approval from his institution to work on the research project our of UCLA would have been a huge hassle. I won’t recreate this entire argument, since Scatterplot has a great post on this very issue. What I will say is that IRB should have very much been involved, and that faculty efforts to avoid IRB at all costs (due to delays, hindrances, and paperwork) does nobody, including our institutions of higher education, any good. Still, the argument exists that the Lacour and Green issue could have happened even if Green had gotten proper IRB approvals to look at the data–it still would have been Lacour’s fake data in the file he would have shared with Green. Would Green have recognized it as fake, the way broockman, Kalla, and Aronow did when they really dug into the statistics? We’ll never know, but he surely would have been concerned at there being no sourcefiles in Qualtrics.

The Role of the Chair and Co-Author

Very little has been made, to date, of Green’s role in this whole debacle, or of Lacour’s dissertation advisor and what her responsibilities might have been.

The Chair

First, let’s discuss the dissertation chair. Professor Lynn Vavreck at UCLA served as Lacour’s dissertation advisor, and the data for the retracted study is purported to have come from Lacour’s dissertation, which puts Vavreck in the hot seat. My dissertation advisor was all up in my data booch while I was doing my dissertation–he had access to my Qualtrics instance (the software doing the data collection), though I don’t know if he ever used that access to track progress. For instance, I could log in on any day and see how many respondents had answered my survey to date. My chair also had me run and re-run numbers to his satisfaction, and had me address any oddities in the findings. Anything that went against decades of established research would have been something he would have raised an eyebrow at, and picked away at. I don’t know if the chair has much of a defense against straight up data fabrication; the assumption during the dissertation phase is that the student is spending their time doing the collecting and analyzing. Something should have smelled fishy about his crazily positive results, but Vavreck didn’t catch it. Should she have? Should she have checked his data against existing sets and discovered he had co-opted the CCAP data, as Lacour’s detractors did? If a graduate student can figure it out, I’d expect the dissertation chair to have at least as much invested. *Spock eyebrow*

The Co-Author

I’ll admit that I have some pretty serious misgivings about Green’s involvement in this whole affair. Green is a professor of political science at Columbia University, and formerly taught at Yale. He’s a known big name in the field. (Having a big, famous name on your article makes it much more likely that the universe – especially the academic universe in one’s discipline – will pay attention and talk about your research.) It appears that Green was approached by Lacour to serve as coauthor of the Science article. Green claims he helped with the writeup, but never looked at the original data. When Green saw the data skewed opposite of other research in the area, he asked Lacour to replicate the experiment, and depended on Lacour’s confirmation that he did. Green applied his statistical expertise and found the same results in the data Lacour did. Green wrote of his disappointment in various statements, he requested the retraction from Science, and reflected in a statement to Retraction Watch:

“Convinced that the results were robust, I helped Michael LaCour write up the findings, especially the parts that had to do with the statistical interpretation of the experimental design. Given that I did not have IRB approval for the study from my home institution, I took care not to analyze any primary data — the datafiles that I analyzed were the same replication datasets that Michael LaCour posted to his website.  Looking back, the failure to verify the original Qualtrics data was a serious mistake.”

I would posit that it’s a serious mistake on a number of levels, and that Green’s statement is a declaration of absentee-co-authorship in that he didn’t expect to have to do much work, just to put his name on the article. The Famous Guy gets an article for his CV, and the Up-And-Comer gets a great article in an important journal plus the halo-effect and credibility boost of coauthoring with Famous Guy. With this sort of relationship, then, Green overtrusted Lacour, and likely figured that lacour was just using Green’s name as leverage. Green may have re-run the statistics to be sure his results were the same as Lacour’s, but the issue isn’t the statistics that were run, it’s the data itself. Had the co-author been more intimately involved in the data collection process, he might have noticed Lacour’s vague explanations. As the LA Times stated,

“if close collaborators aren’t going to catch the problem, it’s no surprise that outside reviewers dragooned into critiquing the research for a journal won’t catch it either. A modern science article rests on a foundation of trust.”

How much do you trust your co-authors? Enough to not have the same access to the data that they do? I’ve actually struggled with this, and let a great research project idea die because a prospective co-author would not share the necessary instrument and data analysis. I’m not famous. I’m not even on the job hunt. But I’d never put my credibility on the line for research that I can’t vouch for from cradle to grave. Is that because I’m a librarian with an overactive imagination? is it because my default mode is transparency? Maybe a little bit of both. And if things get squirmy at that beginning stage of discussion and IRB paperwork, one should be on alert moving forward with that project and co-author.

The Role of Replication

Broockman was repeatedly warned against discussing or publishing his findings that Lacour and Green’s study had serious problems. It appears academic is no kinder to whistleblowers in research than it is to whistleblowers in academic administration. Broockman was warned off because Green is Famous and Lacour was an Up-And-Comer. He was warned off because folks thought he might get a reputation for ‘merely replicating’ instead of developing his own research agenda. I’d like to point out that replication is crucial for research. It might not get you a PhD, but it will definitely bring out nuances in the data, and let you know if findings are a fluke,a product of research design, or an actual phenomenon. Interestingly, I’m involved in replicating my dissertation study in slightly different populations to see if the findings hold. Replication is worthwhile, especially if done conscientiously. It is just that conscientiousness, and how Broockman tried to determine why his study wasn’t bringing back the results found in Lacour’s study, that led to the discovery of fraud in the first place.

What it Means for Academic Job Seekers

The best way to go on the market as a newly-minted PhD is with a published article in hand, and the more of those the better, especially if you want to land at a research institution. Lacour was on his way to Princeton this July, though there’s been no word on whether or not that has changed in light of this scandal. Was the job market a stressor inducing Lacour to cheat his way to astounding, news-making results? Why don’t other new PhDs fake their data? Or DO THEY, and we just don’t know it? Who is getting advantaged in this situation? It seems that Lacour put much time and effort into creating his fictions; in my experience, it might have been less effort to actually do the research properly and avoid this whole clustersuck. I’ll be interested to see how (and whether) this shakes out into any changes in the hiring process or publication in general, such as requiring publication of datasets. Repository librarians, be ye ready! Maybe this is our inroads to discuss data storage and publication with our faculty.

Since I’m teaching a course on information in the fall, I’m intrigued by all levels in this case and hope to use parts of it for my students’ reading. I wish I were teaching a methodology course, we would have so much fun with this. As a librarian and researcher, it just makes me angry and sad. Why the lie? Why the continued defense of the lie? And how on earth did it pass before so many sets of eyes and only come out because Broockman couldn’t let it slide, even if it meant his professional reputation?

Ask an Expert! Or, How Statistics, Facebook and Polychoric Correlation Matrices Made Me My Own Library User

Frustrated with some data and fed up with my own inability to locate an appropriate statistical technique, I finally posted to Facebook in the hopes that a friend would commiserate with me:

“Bending my brain around ILL stats and thinking about exploratory factor analysis with categorical variables, despite the issues with it. Desperately missing [my old group of Emory PoliSci nerdbuddies and profs who were excellent at stats] and brainstorming these sorts of things.”

Five seconds later, the prof I had tagged in the post replied, “Three words: polychoric correlation matrix.” And I had four distinct reactions in rapid succession. They were as follows:

First reaction: sarcasm. Well OF COURSE polychoric correlation matrix, duh. Who WOULDN’T know that? Certainly not I. Pshaw.

Second reaction: confirmatory exploration. A quick Google search of that conglomeration of words, a quick scan of the Wikipedia description, and yep, this is much closer to what I need for what I want to do than I’ve gotten scouring statistics textbooks and incomprehensible math journal articles for two weeks. Until my eyes felt like they were bleeding, and my brain was mushy. Until all I wanted to do was curl up and cry in a corner until someone brought me a puppy. (Interestingly, my husband just got me a puppy for my birthday.)

Third reaction: gratitude. Thank you, Jeebus (and Professor Chris Zorn) that I have a direction and didn’t have to pray I’d trip over this technique on my own. I was already stretching my husband’s patience and our booze budget due to this thing.

My fourth reaction, and the one that prompted the blog post: chagrin. We beg our students and researchers to come to us as librarians for good direction before they get mired in the research process. Why didn’t I go to the experts in the first place, the way I beg my students and faculty to do? The way the lit review section on expertise in my own darn dissertation says folks should do?

I know my reasons, and they likely echo those of my researchers. First, I thought I should be able to find the answer myself. Why didn’t I ask my local methodologist professor buddies? Well, they’re all on my dissertation committee, and I haven’t touched my dissertation in forever, so I’m doing some guilt-hermiting (in which I crawl into a dark space and don’t contact folks until I have something productive and useful to show them). I didn’t want to look stupid for not knowing something, even though that something is admittedly quite far outside my wheelhouse.

Sigh. A lesson learned for myself: if even I fall into these traps, I need to make sure I continue to let me researchers know that it is okay to ask questions, if only so they’re not making their lives harder than they have to be. My new mantra: Don’t let guilt or ignorance waste your time. Ping an expert. Do it from the beginning.