Gould’s fundamental miscalculation

[[Updated on 6 July 2012, to fix a few errors or poor phrasings in my original summary of Lewis et. al.’s paper, following on a productive and private discussion I had with one of the authors. My fixes are in brackets or indicated by strike-throughs.

I encourage historians of science to read this paper. As I wrote in my personal notes on it: “This is a fascinating paper—it makes Gould’s ‘summer of 1977’ look rather half-hearted and inexact.” And it turns out it was even more interesting than I thought: the authors find that the Morton did in fact mismeasure some of his skulls using his shot method (which Gould, and I, and many others) generally assumed to be accurate, but even those mismeasurements do not appear rooted in bias. I misread this conclusion and misrepresented it in my post.

But my post was never supposed to be a full review of Lewis et. al. — I intended to talk about the state of the field in science studies and I continue to argue that while Lewis et. al. show here that proper measures can limit experimenter bias in measurement (which is important, and they do it well), the most interesting work in HOS and STS today is looking at the influence of culture on science in other places. Thus I still cannot agree with Lewis et. al.’s most sweeping conclusion at the end of their article.

For a shorter summary of their article, see this, from the New Scientist. It is a good summary and the authors lay out their points well. The last couple lines of the piece are the only place where they lose me: “Truth is hard, but it is sometimes obtainable despite even our strongest biases. What a marvellous thing, this science.” That’s a leap too far for a discussion of the Morton case. But the rest is well worth reading.

The original post, with corrections, follows:]]

I missed the boat with this post. It should have come in June. If normal blogosphere time standards prevailed, I would remain silent. But I have faith that we in the scholarly blogging community are perfectly happy to contemplate even that which is months old. So what happened last June?

Alas, poor Yorick!

Jason E. Lewis of Stanford’s Anthropology Department (now Rutgers) and 5 colleagues from prominent institutions around the country published a refutation in PLoS Biology of Steven Jay Gould’s famed Samuel George Morton indictment. The paper takes aim most directly at Gould’s 1978 Science article (full text available, behind pay-wall), which made the case that “unconscious finagling” was the norm in science, in even such apparently objective activities as counting and measuring. It saves some space to refute Gould’s broader version of this story as it appears in The Mismeasure of Man, a book I know much better and unabashedly love. Lewis et. al.’s cogent and convincing reassessment of Gould’s study demands our attention, not only because it focuses on Gould (a FHSA fellow traveler, if not a member—I don’t know if he ever belonged) and touches on a prominent figure in the history of science in America, but also for what it can tell us about the landscape of “science studies.”

In the summer of 1977, Steven Jay Gould began working through Samuel George Morton’s two monumental works of physical anthropology: Crania Americana and Crania Aegyptiaca. Both volumes teemed with measurements of skulls accumulated by Morton and his world-wide network of collectors and were illustrated in a truly lavish form with beautiful cranial illustrations (see an example above). Morton included in his works an extensive accounting of his measurements, alongside his conclusions, which used cranial capacity to lend greater weight to a five-part racial taxonomy and hierarchy. Morton adopted the racial categories of the German naturalist Johann Friedrich Blumenbach, breaking his skulls into the broad categories of American, Caucasian, Ethiopian, Malay, and Mongolian. Measuring the skulls in his collection (which reached unprecedented proportions, especially in the vastness of its “American” contents), Morton demonstrated a different average cranial capacity for each class of skulls measured and concluded that “race” rather than climate or circumstance led to these physical differences. His findings further supported a racial ordering that placed Caucasians on top, Americans in the middle, and Ethiopians on the bottom.

Gould took advantage of Morton’s commitment to objective principles and took a second look at all the data that Morton so assiduously collected and then published. Gould argued, in the end, that not only Morton’s conclusions were faulty, but that his measurements and analysis were as well. Morton, according to Gould’s recalculations, came to his averages by choosing (unconciously) subsamples to include or exclude in a manner that supported his case and by ignoring variables (like the sex or stature of a skull’s original owner) that might otherwise explain differences in cranial capacity. Gould also took advantage of a sort of natural experiment in Morton’s data to look at the place of bias in measurement. Morton measured one set of skulls twice: first with mustard seeds and later with lead shot. Seeds, as Morton himself realized, produced less reliable numbers—varying from measure to measure—than did shot. Morton, who cared about his methods, settled on shot for his final measures. When Gould looked at the seed measures and the shot measures, he found that the average seed measures were lower than the average shot measures for those who Morton considered lower in the hierarchy. Gould saw here evidence of how bias could work if scientists were not so careful as Morton to choose the best methods of measuring.

Gould also went a step further and played with Morton’s data a bit. By thinking about the variables that Morton ignored (like sex) and adopting a different stance toward subsamples, Gould found that Morton’s hierarchy dissolved into essential equality. (Well, it persisted, but the gaps became vanishingly small.)

Lewis et. al. set out to remeasure Morton’s skulls (thereby going a step further than Gould) and again reconstruct Morton’s numbers. They conclude: 1) Morton’s skull measurements were quite accurate [but that even Morton’s gold-standard shot measurements had errors which did not suggest bias, see my introductory note]; 2) his subsampling had fewer problems than the new methods that Gould introduced; and 3) his seed vs. shot measures only demonstrated bias on the levels of the mean, but were far more variable from skull to skull. The first conclusion ought not be surprising [although the finding of actual shot mismeasuring is surprising, see my introductory note]. Even Gould accepted Morton’s shot measurements as essentially reliable [which it turns out was a reasonable assumption, but not a perfectly correct one, see my introductory note]. The second and third are surprising and important. You can judge for yourselves, but I am convinced by the authors’ arguments and their data. Gould made some crucial errors in his subsampling analysis and, as the authors show, the charges of finagling he leveled against Morton did not always make sense. I am particularly sad to see the death of the seed-to-shot natural experiment, but I accept the authors’ claims that the sort of bias Gould proposed should show itself more consistently from skull to skull–not just at the level of the mean. If Morton patted the seed a bit tighter in his Caucasian skulls and looser in American skulls, he would have done that to some degree for all (or even most) Caucasian and American skulls.

The authors seem most concerned with refuting Gould’s conclusion that scientists inevitably finagle. They pay little attention to the bigger picture that Gould presents in Mismeasure, beyond a sentence wherein they admit that they themselves no longer believe scientific evidence supports the idea that racial categories explain much of anything  [explain that modern biological anthropology shows no connection between “race” and skull size, see my introductory note. Lewis and DeGusta put it better in the New Scientist: “Furthermore, the generally small cranial capacity differences within humans do not correlate with intelligence or much else other than hat size.” ]. They adopt a strict stance toward their data and criticize Gould for making so many suppositions. They refuse, for instance, to consider the idea that sex might have played a role in Morton’s skull averages, because they have no objective way of sexing Morton’s skulls.  In the end, they reject Gould’s revised and equalized cranial capacities as barely founded speculation. I think their objective purity might be getting the better of them here. They don’t prove Gould[‘s revisions to be necessarily wrong, although they poke plenty of serious holes in them, see my introductory note. (It’s important to note that poking holes in this line of reasoning from Gould in no way lends support to the idea that skull size supports claims of racial difference in reality, as the authors would surely agree.)]. They prove we cannot prove Gould right, and therefore reject the entire enterprise.

Lewis et. al. want us to reconsider Morton as a hero of objective science [“find that Morton’s initial reputation as the objectivist of his era was well-deserved”]. They laud his methods and his commitment to publishing all his data. In fairness, Gould offered similar praise. In his Science piece, he called on his colleagues to “cultivate, as Morton did, the habit of presenting candidly all our information and procedure, so that others can assess what we, in our blindness, cannot.” (505) But Lewis et. al. go a step farther. Morton, it seems, did no (or very little) wrong. [As stated in my introduction: the authors do not claim that Morton was perfect. They in fact show that Morton mismeasured some skulls. I regret the original error on my part.] When measurement errors appear [otherwise], they belong [authors leave it to Morton to blame] to his untrustworthy assistant—although the authors have no way of knowing this to be true, beyond Morton’s claim that he had a bad assistant. (I can’t believe they would let Gould get away with a similar assumption.) In contrast, Gould, they argue, offers a “stronger example of bias influencing results.”(5) I am not quite sure what to make of the clear moral distinctions being drawn between Morton and Gould. The paper is clearly dedicated to proving that bias can be limited by proper scientific methods. Such a claim would seem to make the scientist behind the measurements less important. Yet Lewis et. al. behave as if proving Morton to have the right values and to be a particularly competent measurer is very important. The proper scientific method, it seems, requires a certain kind of scientific self. The man and his values still matter. And yet…

The authors clearly relish refuting Gould’s critique. The Morton case study, they write “has served for 30 years as a textbook example of scientific misconduct” and lent credibility to the idea that scientists are inevitably affected in all aspects of their work by their “cultural contexts.” The authors note, with what reads to me like a sneer, that the cultural groundedness of science has “achieved substantial popularity in ‘science studies.'”(5) Their article ends with a ringing confirmation of the scientific method: “The Morton case, rather than illustrating the ubiquity of bias, instead shows the ability of science to escape the bounds and blinders of cultural contexts.”(6)

Here I part paths with Lewis et. al.

They convinced me that Gould made two kinds of miscalculations. The first set of miscalculations involved his analysis of Morton’s data–this was what Lewis et. al. wanted me to notice. The second miscalculation was more fundamental: Gould used Morton to speak to the ways that scientists’ humanity and cultural bounds can interfere with their measurements. That was never the best case to make. As these authors show and Gould suggested, appropriate methods can limit the ways in which the observer can interfere in a measurement. Objectivity can be approached asymptotically if you design the investigation right.

But what is the cost of objectivity? In Lewis et. al.’s case, giving priority to objectivity meant discounting the plausible assumptions that Gould used to refute racial orderings. It also meant privileging Morton’s accurate measurements of cranial capacity, without any justification for why anyone should care to measure such a thing. Lewis et. al. defended Morton’s measurements, but in doing so they end up overlooking a much more important set of “bounds and blinders of cultural contexts.” Gould made a version of the same error. By focusing on the scientist measuring, we miss all of the intellectual baggage carried by the choice of measurements in the first place.

Two historians of science–operating near to “science studies,” if not in it–point us in a better direction for thinking about Morton (and objectivity) than either Gould or Lewis. These studies point in the direction where “science studies” has been going and they make clear that culture still matters.

First, consider John Carson’s “Minding Matter/Mattering Mind: Knowledge and the Subject in Nineteenth-Century Psychology.” (1999) For Carson, what matters about Morton is the assumption that measuring cranial capacities mattered–that it could speak to questions of racial or species difference and ultimately allow for assessing minds. As Carson puts it, “Morton’s research helped to codify a pattern of investigation that would flourish until the end of the century….the analyzing and ordering of races or groups, achieved through an investigative strategy centered around the fashioning of anonymity and its translation into numerical quantities that could be easily arrayed into linear hierarchies and aligned with mental attributes.”(358) By stripping away the particularity of the skulls and defining them only by their internal capacity, Morton made it possible to group, average, and rank skulls and thereby tied those skulls to racial distinctions and orderings.

Ann Fabian, in her wonderful new book The Skull Collectors, pays even more attention to what gets lost when skulls become numbers in a table. In a few fascinating cases where the evidence allows her to do so, Fabian painstakingly traces Morton’s skulls (and those held by his successors) back to their original possessors. In one such case, she considers a skull collected by the US Exploring Expedition. In fact, the Ex. Ex. collected a person: Veidovi, a Fijian elite taken captive in retaliation for an earlier assault on American traders. I cannot do Fabian’s story justice here. But she concludes the chapter on his skull with a characteristic worry. If we take Veidovi only as a skull that fits into Morton’s taxonomic scheme, he comes across in black and white as a racially pure Fijian. Yet the rest of Fabian’s story suggests that racial purity had little to do with early nineteenth century Fiji: a cosmopolitan place caught up in an earlier globalized era. Throughout her book, Fabian rejects simple characterizations of Morton as a racist. She fears rightly that such characterizations have prevented scholars from interrogating Morton’s collections and collecting practices more carefully and thus ignored the wealth of fascinating cultural assumptions underlying Morton’s entire enterprise. Lewis et. al. assure us that they no longer [reject] accept scientific racism and then feel free to move on to vindicating Morton’s measurements as culture-free. But Fabian demonstrates that Morton’s skulls, his questions, and his methods cannot be extricated from their historical time and place.

[Note: At least one of Lewis’ gang found Fabian’s book and was not pleased. See the review-of-sorts by David DeGusta (apparently Lewis’ former mentor; also it seems that Lewis and Fabian are neighbors at Rutgers). I have to remain uncommitted on DeGusta’s biggest contention–that Morton never really cared about establishing a hierarchy and did not think bigger brains were better in general–until I can re-read Morton more carefully.[[I now disagree with him on this, after further reading.]] But I don’t buy DeGusta’s contention that Crania Americana posed no aid to slavery’s proponents or that Morton’s idea of replacing Blumenbach’s “races” with “families” evidenced a concern for “diversity.” In both cases, DeGusta undervalues the power of a polygenist position. Morton’s families would have increased the number of separately created human races/species. But any evidence that some humans were created apart from other humans gave slavery’s proponents all they wanted: proof of a fundamental difference that could justify fundamentally different treatment.
Clearly, DeGusta is concerned that Fabian wants to destroy the basis for his discipline (physical anthropology), which explains why he wants historians to think more critically about the invasions of privacy they regularly practice. I accept that physical anthropology has value. A few historians have joined forces with anthropologists to give historical voice to people who have no historical records (for instance). The medical analysis of old bones offers particularly valuable opportunities here. I also accept DeGusta’s call for historical self-critique.
Yet I don’t think Fabian is so dangerous to DeGusta as he fears. And I also think Fabian has much more to offer the anthropologists than they currently accept.]


4 thoughts on “Gould’s fundamental miscalculation

  1. Mike Pettit

    Great post, Dan! I agree with your take on Gould's historiographical sensibilities and the limitations of the newer authors' approach. There has been a whole cottage industry within the history of psychology of taking issue with almost every claim made in Mismeasure of Man. Franz Samelson's original review of the book in Science is worth tracking down. I have occasionally used Gould's chapter in the classroom because it is a provocative starter of conversations. I don't think any of it is definitive. I much prefer Carson & Fabian. One thing that puzzled me when I read the methodological appendix to the Morton replication is that they didn't seem to blind the person doing the measurements. I assume I missed or misunderstood something as this strikes me as basic social science methodology and the issue at hand was unconscious bias. I am happy to be corrected on this point, but remain confused.


  2. Dan

    Hey Mike. Thanks for the note. Scanning the reactions to the PLoS piece is worth its own analysis. Physical anthropologists have been very happy, but so have a large number of bloggers who saw Gould as attacking science. (I think Gould thought he was saving science.) Also, it seems Gould drove many people to dislike him, and that much of the pleasure over this debunking comes from seeing him proven wrong. I don't have strong feelings toward the man one way or the other—aside from generally admiring his style and scholarship. Still I can see that his personality somehow fed divisive responses: Gould gave a lecture at MSU when I was an undergrad there. I don't much remember it, but I do remember my roommate (a smart and generous guy) coming back thinking Gould far too erudite and pompous for his own good.

    As for the measurements, I'm not sure about double-blinding. I would tend to think it doesn't matter: Gould never claimed that Morton's shot measurements were wrong. No one seems to catch that. He would, I think, have expected the measuring results that we get in the new study. There's no overwhelming reason to believe that bias would have much room to sneak in to a shot-style measuring system. Of course, I suppose it wouldn't have hurt to do it double-blind…


  3. Mike Pettit

    I might be misremembering, but I thought one of the issues Gould raised was that Morton (when he was relying on mustard seed and unsavory research assistants) miscalculated the volumes by unconsciously over- or underpacking the seed. This was less of an issue with the more uniform shot. I thought this prevalence of unconscious bias in science was what irritated the authors of this study. I also seem to remember that replication team used a different method altogether. Of course, I am no expert in the techniques of skull measuring.

    I had noticed something similar in the online reaction. The other interesting thing is that Gould and his generation are increasingly not so much secondary sources for historians of science but the primary objects of investigation.


  4. Dan

    Mike: you're absolutely right about the seed vs. shot measurements. I still think of that as a pretty brilliant natural experiment–even though it looks like the variation between seed and shot measurements was not so clearly biased as Gould presented it. But the premise of the experiment was Gould's belief that Morton's earlier seed measurements were wrong and his later shot measurements were right. In other words, he expected later researchers to determine that Morton had made accurate shot measurements. And that is what this study found.

    Nice observation on Gould's generation—now we all just need to write an extraordinary number and variety of scientific/historical and polemical works to give the next generation something to debate.



