Stephen Jay Gould: His Mismeasure of Baseball


by Cyril Morong


Click here to see my sabermetric blog called Cybermetrics



(scroll down to the end to see my Wall Street Journal Letter)


Before he died in 2002, Stephen Jay Gould was one of the world’s most famous scientists and one of the world’s most famous baseball fans. That is why his posthumous book, “Triumph and Tragedy in Mudville: A Lifelong Fan’s Passion for Baseball” is so disappointing. He makes several statements that do not stand up to scientific scrutiny.

Lets start with Dummy Hoy, who played outfield in the late nineteenth and the early twentieth centuries. He writes “Dummy Hoy belongs in the Hall of Fame by sole virtue of his excellent, sustained play over a long career (p. 128).”

No, he is not even close. He is 462nd in Win Shares (WS) per Plate Appearance and 546th in Total Player Rating or TPR (.425 per 648 plate appearances-see the links on my homepage). His career OPS (on-base percentage + slugging percentage) was .759 while the position average was .767. Bill James gives him a B+ in fielding while Pete Palmer has him with +10 in career fielding runs. His career offensive winning percentage (OWP) is .568 while the position average is .560 (this is park adjusted). Through 2002, he was tied for 306th in career runs created against the league average. He was 299th in total WS through 2001. This does not sound like a Hall of Famer to me. My sources on this are the book Win Shares by Bill James and The Baseball Encyclopedia by Pete Palmer.

Lets move to Curt Flood, the outfielder who challenged the reserve clause in the early 1970s. “Curt Flood was one of the best ball players of the 1960s, a fine outfielder with nearly a .300 lifetime batting average (p. 276).”

Besides the obvious, that batting average is not a good metric (it does not include getting on by walks or tell us how much power a hitter has, both important elements that Gould surely learned from Bill James, whom he calls “our premier guru of baseball stats” on p. 153), for players with at least 2500 PAs in the decade, he is 52nd among non pitchers in WS per PA and 74th in OWP with .525. I think his TPR for the whole decade was 1.8. He was 21st in total Win Shares (including pitchers) but he his helped by the fact that the decade covered almost all of his career. 11 outfielders had more. James considers 30 or more to be an MVP season and Flood’s best was 27. Now he batted high in the order of a relatively high scoring team, so he got a lot of  PAs, helping his WS. Also not missing games helped is total WS, playing 150+ games 7 times. In his three best seasons, I counted at least 20 players having more (I think they were all non-pitchers). He does get an A+ in fielding from James. He stole 88 bases in his career and was thrown out 73 times, not a good percentage. He hit 44 triples while an average hitter would have had 50. Gould made a big deal about Mickey Mantle getting a .512 on base percentage in 1957 (p. 89). Why not look at Flood’s OBP? Because Flood’s career OBP was just .342, while it was .334 for the average centerfielder. Flood hardly excelled here (data from the cd-rom called “The Lee Sinins Sabermetric Encyclopedia”).

Then there is Catfish Hunter, who pitched for Gould’s favorite team, the Yankees.  “His classy pitching was indispensable in Yankee Championships of 1977 and 1978 (p.284).” In 1977, his record was 9-9 with an adjusted ERA of 84 (100 is average, adjusting for park effects) in 143 IP. The next year he was 12-6 but his adjusted ERA was only 101 in 118 IP. In the 1978 LCS, he started one game, allowing 7 hits, 3 walks and 3 earned runs in 6 IP, no decision. In the 1977 World Series, he pitched a total of 4.33 innings with an ERA of 10.38. He was 0-1. The next year, he was 1-1, with an ERA of 4.15 in starts and 13 IP. He did win the final game, allowing only 2 runs in 7 IP. But he was well supported, with 7 runs. 

Perhaps he won some key games in September in 1977 and/or 1978. So I looked on the Retrosheet website. He pitched 9.2 IP in September of 1977 with an ERA of 9.31. Allowed 16 hits and 4 walks. So Hunter was basically a liability to the Yankees in 1977. But in September 1978, he pitched 35 innings. ERA was 1.80 and he was 3-1. In August, he was 6-0 with an ERA of 1.64. It was in these months the Yankees made their great comeback. So he played a big role, perhaps even indispensable, as Gould said. But he also pitched poorly early in the season (those games count, too). In 13.33 IP in April he had an ERA of 9.45. In 10 IP in July, he had an ERA of 8.10. As I wrote yesterday, his overall ERA was just about average for 1978. So maybe it was those last two months of 1978 that Gould was thinking of.

About Ty Cobb: “Probably the finest player in the history of baseball.” He did add, parenthetically, that others like Ruth have their defenders but “why quibble among the paragons (p. 328)?” Cobb is 17th  in TPR per 648 PA and 5th in WS per 648 PA (Bonds has probably passed him since 2001). The evidence Gould uses in support of Cobb is batting average and stolen bases. Those are not the best sabermetric measures. He never mentions caught stealing, which Cobb may have had a lot of. He later says flat out “Cobb was the greatest ballplayer in American history (p. 342).” Never mind that he did not have to play against blacks or Hispanics. Why did Bill James, the stat guru, list him as the second best centerfielder in The New Historical Abstract?

            Gould is also famous for explaining why we don’t have any more .400 hitters. He basically says it is because overall excellence has increased and there is less variation between the best and worst hitters. But again, why care about batting average? It is a misleading statistic. Why not ask why no one had a .700 slugging percentage for so long? The last time that happened before 1994 was in 1957. It seemed like no one would ever do that again. But it finally happened again and it has happened several times since. Or even better, what about an .800 slugging percentage? We went 80 years between those (Ruth in 1921, Bonds in 2001). Or strikeouts. No one struck out 340 or more batters between 1905 and 1945. Then Bob Feller did it in 1946. So other benchmarks besides the .400 average have been reached. Doesn’t this call into question his variation hypothesis?

            But more importantly, other research has found the opposite conclusion. The following was posted on the SABR list (the bulletin board of the Society for American Baseball Research) by Cliff Blau on May 28, 2004:

"Maybe There Were Giants, or at Least Outliers: On the .400 Batting Average Myth and the Absolute Limits of Hitting for Average in Major League Baseball" by Charles Hessenius.  This appeared in the Journal of Sport Behavior, volume 22, number 4 in 1999, pages 514-544.  Many of you have read Stephen J. Gould's article on the disappearance of the .400 hitter, in which he concludes that hitters in general have improved steadily and that variance among them has declined.  Mr. Hessenius examines the same issue and reaches the opposite conclusion.  He finds that over the years 1901-95, the range of and variance in batting averages has not changed significantly, and that "great hitters seem to be of equal absolute talent in all decades."


            Another issue involves what social class a player came from. According to Gould, around the turn of the century, baseball drew “professional players almost entirely from the proletarian population of agricultural and industrial workers (p. 114). This has been called into question by Steven A. Riess. Here is what he posted on the SABR list on May 21, 2004:


“In a study of the occupations of players of major leaguers active 1900-1919 these were the results


Professionals   10.2%

Farm owners    20.9

other props.      27

clerks                 7.4

skilled                23.7

semiskilled           7.4

unskilled              3.3


overall, 44.6 % white collar fathers;  20.9% farming fathers; 34.4% blue collar fathers


Players in the 20s and 30s also came mainly form middle class backgrounds. 48 % came from white collar, 30% blue collar, 22% farm.


In 1940s  big change;  38.9% blue collar, 35.5% white collar,  25.6 % farmers.


In 1950s, 38.75% blue collar,   46.25% white collar,  and 15 % farmer


For further information, including methodology and analysis, see  Steven A. Riess, "Touching Base: Professional Baseball and American Culture in the Progressive Era (1999); and Steven A. Riess, "Professional Sports as an Avenue of Social Mobility in America: Some Myths and Realities." In Donald Kyle and Garya Stark, "Essays on Sport History and Sport Mythology" (1990), 83-117.”


            In an article originally published in The Wall Street Journal, Gould wondered why Barry Bonds was hitting so many home runs. I refuted his contention that Ruth hit many HRs off of relievers. The paper published my letter on October 24, 2001. Here is the letter:

“Stephen Jay Gould wondered how many dingers (home runs) Babe (Ruth) hit off tired starters in late innings ("A Happy Mystery to Ponder," editorial page, Oct. 10). Checking the Sporting News Record Book and the baseball encyclopedia, we find that Babe Ruth hit 15 of his 60 home runs in 1927 after the seventh inning. Five of those were hit off pitchers who relieved more often than they started that year. The other 10 were hit off pitchers who occasionally relieved. So it might not have been very many. Furthermore, Ruth hit 16 home runs in the first inning that year. Ruth always batted in the first inning (he was third in the lineup), for 150-160 plate appearances. But that is less than one-fourth his 680 or so for the whole season, yet he hit more than one-fourth of his homers then. So he did more damage against fresh pitchers than in general. Ruth hit 14 home runs during the fifth and sixth innings (almost the same as after the seventh) off pitchers who probably were not too tired. In 1998, Mark McGwire hit 17 of his 70 after the seventh inning. This year, Barry Bonds hit 14. Those numbers are in line with what Ruth did. Finally, Mr. Gould forgot to mention that Mr. Bonds uses a maplewood bat, not ash. Maple is harder and does not break as easily. This might partially account for the high home run total.”


Yet, Gould’s original article was reprinted word for word in this book and there was no further discussion on the Ruth issue. My sources included the Baseball Encyclopedia and the Sporting News Record Book.

            Gould is also well known for analyzing the probability of someone getting a hit in 56 consecutive games. He most likely did some great analysis here. But he never raised the important question of whether or not the streak mattered. Did it help the Yankees win more games? Would the Yankees have been better off if DiMaggio had not gotten a hit in a couple of those 56 games but had a higher average overall than he actually did during those games? Would you rather have a .300 hitter who goes hitless in some games than a guy who hits .250 yet has exactly 1 hit in every game for a 162 game hitting streak?  Gould, for all of his scientific know how, does not offer us any guidance on this issue.

            Other disappointing, if not disturbing, quotes pop up in the book. When discussing the great home run hitting of Mark McGwire and what caused it, he writes: “I don’t care if the thin air of Colorado encourages home runs. I don’t care if expansion has diluted pitching. I don’t care if the ball is livelier and the strike zone smaller. And I especially don’t care if McGwire helps himself train by taking an over-the-counter substance regarded as legal by major league baseball (p. 59-60).” To Gould, McGwire’s success was due to “training and study (p. 60)” Now I don’t know what the answer is. But a scientist needs to consider all factors.

            But earlier in the book, when explaining why it was okay for umpire Babe Pinelli to call a pitch high and outside a strike to end Don Larsen’s perfect game in the 1956 World Series, Gould writes: “Context matters. Truth is a circumstance, not a spot (p. 48).” Then why not look at the context that McGwire played in? Gould seems inconsistent here.

            There is also no index in this book. It makes it a little hard to check facts and cross check things. Those are important in science.

Back to Cyril Morong's Sabermetric Research