The Problem With “Total Clutch” Hitting Statistics

The Problem With “Total Clutch” Hitting Statistics

by Cyril Morong

Click here to see my sabermetric blog called Cybermetrics

Introduction

A recent article (late 2003) appeared in BusinessWeek magazine called “Ball Park Figures You Can Bet On” which described a “new statistic” developed by Benjamin Polak and Brian Lonergan of Yale University which measures “wins contributed” by major league hitters. (another article on their work appeared in Nov. 2004 in the NY Times-see sources below) From the online version:

“Here's how their method works: Let's say the home team is down by two runs in the bottom of the fifth inning, with no outs and a runner on second base. At that moment, the home team has a 39% chance (or 0.39 probability) that it will win. If the batter grounds out, and the runner at second fails to advance, the team's chance of winning falls to 33%. The difference between the two, -0.06, is assigned to the batter who just grounded out.”

Now they add this up for the whole season, every plate appearance and get wins contributed for each player (see link in sources).

The problem with this approach is that it is not new and that it really tells us nothing about what a ball player is worth since in the long run this “total clutch” stat is highly correlated with normal hitting statistics, as I will demonstrate(my critique is not new-See the book "The Hidden Game of Baseball" by John Thorn and Pete Palmer. They discuss what Dick Cramer had to say about the Mills brothers) By “total clutch” stat, I mean one that takes into account every plate appearance and each one is weighted by its importance according to the score and inning. Hits when the game is late and close will count for more than hits when the game is early and the score is lopsided.

History

This is definitely not a new stat. It goes back at least as far as 1970 when Eldon G. and Harlan D. Mills published their book Player Win Averages. Polak and Lonergan’s “wins contributed” stat is similar. So is the “Player Game Percentage” in the book Curve Ball by Jim Albert and Jay Bennett. So is the “Game State Victories (or Wins) found at the Rhoids Sports Analysis website (see sources). So is “player's win value” by Ed Oswalt (his link is also in sources). So what Lonergan and Polak have done is definitely not new.

Analysis

Let’s start with the Ed Oswalt’s measure “player’s win value” (or PWV) since he uses thirty years of data, covering the years 1972-2002. The best hitters on his list will not surprise you and his stat divided by plate appearances (or PA) is highly correlated with stats like on-base percentage (OBP) and slugging percentage (SLG) as well as OPS (OBP + SLG).

First, I looked at the top 100 players in plate appearances from 1972-2002. I then correlated relative OPS (relative to the league average for each player) with Oswalt’s PWV/PA. The correlation was 0.948. This is very close to a one-to-one relationship. If you square this (called r-squared), you get 0.898, meaning that 89.8% of the variation across hitters in PWV/PA is explained by relative OPS. This is important because it shows that a very simple, non-clutch, non-situational, non-context stat like OPS pretty much explains a much more complex context dependent stat that is supposed to tell us the value of hitters.

The linear regression equation is PWV/PA = .00022*OPS - 022

In the figure below, you can see the relationship where PWV/PA is a function of relative OPS.

In the table below, you can see each player’s PWV/PA and his relative OPS. Bonds, for example has 137, which means his OPS was 37% higher than the league average for the 1972-2002 period. The top ten or twenty hitters will not surprise you.

Rank	Player	PWV/PA	Relative OPS
1	Barry Bonds	0.0080	137
2	Mark McGwire	0.0068	131
3	Jeff Bagwell	0.0060	127
4	Will Clark	0.0050	118
5	Mike Schmidt	0.0049	126
6	Ken Griffey Jr.	0.0048	124
7	George Brett	0.0045	119
8	Rod Carew	0.0045	118
9	Tony Gwynn	0.0042	115
10	Jack Clark	0.0041	118
11	Reggie Jackson	0.0040	118
12	John Olerud	0.0040	116
13	Fred McGriff	0.0040	119
14	Rafael Palmeiro	0.0039	119
15	Kirby Puckett	0.0038	114
16	Mark Grace	0.0038	111
17	Rickey Henderson	0.0038	114
18	Eddie Murray	0.0037	114
19	Dwight Evans	0.0037	117
20	Wade Boggs	0.0036	116
21	Ken Singleton	0.0036	111
22	Keith Hernandez	0.0036	110
23	Fred Lynn	0.0035	117
24	Darrell Evans	0.0033	110
25	Jose Canseco	0.0033	115
26	Jim Rice	0.0032	118
27	Dave Parker	0.0031	112
28	Harold Baines	0.0031	111
29	Ken Griffey Sr.	0.0031	109
30	Don Mattingly	0.0030	113
31	Toby Harrah	0.0030	108
32	Ellis Burks	0.0030	117
33	Andres Galarraga	0.0029	113
34	Tim Raines	0.0028	111
35	Lou Whitaker	0.0027	108
36	Dave Winfield	0.0027	114
37	Cecil Cooper	0.0026	111
38	Pete Rose	0.0026	107
39	Paul Molitor	0.0025	110
40	Gary Matthews	0.0025	108
41	Ted Simmons	0.0025	110
42	Dale Murphy	0.0024	114
43	Al Oliver	0.0024	112
44	Roberto Alomar	0.0024	110
45	Brian Downing	0.0024	110
46	Jose Cruz	0.0024	107
47	Craig Biggio	0.0023	110
48	Dusty Baker	0.0023	109
49	Sammy Sosa	0.0023	118
50	Wally Joyner	0.0023	111
51	Paul O'Neill	0.0022	111
52	Barry Larkin	0.0022	110
53	Bobby Grich	0.0021	112
54	Steve Garvey	0.0020	109
55	Bobby Bonilla	0.0020	115
56	Ron Cey	0.0019	111
57	Tony Phillips	0.0019	101
58	Carlton Fisk	0.0019	111
59	Ryne Sandberg	0.0019	110
60	Robin Ventura	0.0018	107
61	Andre Dawson	0.0018	111
62	Chili Davis	0.0018	108
63	Alan Trammell	0.0016	105
64	Chris Chambliss	0.0014	105
65	Robin Yount	0.0014	107
66	Cal Ripken	0.0014	106
67	Julio Franco	0.0013	106
68	Graig Nettles	0.0013	105
69	George Hendrick	0.0012	108
70	Brett Butler	0.0011	104
71	Don Baylor	0.0011	108
72	Chet Lemon	0.0009	110
73	Gary Carter	0.0008	107
74	Brady Anderson	0.0008	104
75	Bill Buckner	0.0007	101
76	Ruben Sierra	0.0007	104
77	Carney Lansford	0.0007	104
78	Buddy Bell	0.0004	104
79	Joe Carter	0.0004	104
80	Willie Randolph	0.0004	100
81	Tony Fernandez	0.0004	101
82	Todd Zeile	0.0003	103
83	B.J. Surhoff	0.0001	99
84	Jay Bell	0.0001	102
85	Steve Finley	0.0000	103
86	Willie McGee	0.0000	100
87	Gary Gaetti	-0.0003	100
88	Tim Wallach	-0.0005	101
89	Terry Pendleton	-0.0005	97
90	Dave Concepcion	-0.0006	96
91	Devon White	-0.0007	99
92	Steve Sax	-0.0009	96
93	Lance Parrish	-0.0009	103
94	Omar Vizquel	-0.0012	92
95	Willie Wilson	-0.0013	96
96	Ozzie Smith	-0.0015	92
97	Garry Templeton	-0.0016	93
98	Frank White	-0.0018	93
99	Bob Boone	-0.0025	91
100	Larry Bowa	-0.0035	87

I then ran a linear regression in which PWV/PA was the dependent variable and relative OBP and SLG were the independent variables. It used all 284 players with 5000 or more plate appearances from 1972-2002. The regression equation was

PWV/PA = -.0246 + .000149*OBP + .000097*SLG

The r-squared was .935, meaning that 93.5% of the variation in PWV/PA is explained by the model. The standard error is .00056 or about .39 wins for a 700 PA season. The correlation between OBP and SLG is about .52. Each of those has a correlation of over .8 with Wins/PA. So again, two very simple stats explain what is going on with the much more complex, clutch stat.

To see how each player did compared to what the above equation predicted, go to the following link. You can also see which hitters exceeded their predicted PWV/PA the most.

How Many Games Do Clutch Hitters Really Win?

Now another stat, the Game State Victories (GSV) from the Rhoids Sports Analysis website, shows the same tendencies.

I have a data set with 191 players who have 900 or more at bats over the years 2001-2003. (this is not all of them-I'll explain later). So I ran a regression in which each hitter's GSV for the three seasons was the dependent variable and their cumulative totals for various other stats were the independent variables. The r-squared I got was .91, meaning that 91% of the variation in GSV across hitters is explained by regular counting stats that are not at all context dependent (well, not quite, again I will explain later-it concerns SACs). That is, these independent variables have nothing to do with the score or the inning. Yet they explain almost all of the variation in the context dependent variable.

Here are the coefficient estimates

SAC = -.021

SF = -.049

GIDP = -.1098

CS = -.059

SB = .0299

BB = .053

HR = .106

3B = .099

2B = .0877

1B = .0603

OUTS = -.01537

Intercept = -1.47

Outs means outs not counting the GIDP. BB includes hit by pitch. The only variables not statistically significant (T-value less than 1.96) are SAC, SF and CS. That is a surprise for CS. I believe that GSV or anything like it is a clutch stat. Yet non-clutch data does a very good job of explaining the variation in GSV. It is possible that the unexplained variation is due to chance and not any kind of clutch hitting ability.

Now the data. I chose 900 at-bats because the data given that I was able to download from the Rhoids website, listed other stats, but not walks. So I wanted a convenient cutoff for which players to count and I chose 300 for individual years, figuring anyone who gets 400 plate appearances probably has at least 300 at bats. And over three years that is 900. I also used the data from Doug Steele's website to get walks, GIDP, etc.

The data problems. In some years, players who obviously did very well had zero for their GSV. Very often they were rookies who also had a zero listed for their salary and the Rhoids people wanted to do something with runs per dollar. Maybe that is why a zero is given, since you cannot divide by zero. Also some players simply had an "NA" listed. Others clearly had the wrong number, like Juan Sosa getting the same GSV as Sammy Sosa one year. Some other players were just not listed. I did not see Edgar Martinez in the "data dump" for this year. I could not always tell which Alex Gonzalez I was looking at or if I did they were not listed for all years. Some player names were not spelled the same way each year (I went through and made the necessary corrections to allow for doing subtotals in excel). So I did not have all of the players with 900 or more at bats from this period. I think about 30 got left out.

The correlation between GSV per plate appearance for players with 300 or more at bats in both 2001 and 2002 was .5. But the correlation between OPS in 2001 and GSV per PA in 2002 was actually higher, at .519. So if you wanted to predict a player's GSV per PA in 2002, his OPS in 2001 would do a slightly better job than his GSV per PA in 2001.

I also calculated a predicted GSV per PA for these players in both 2001 and 2002 using the coefficient values from the regression which used 1Bs, 2Bs, 3Bs, HRs, BBs, SBs, CSs, Outs and GIDPs. Then I calculated the difference between the actual GSV per PA in each year and the predicted GSV per PA in each year. Then I found the correlation between the differences or residuals for the two years and it was .081. That seems very low. I think this means that players who were especially good in the clutch in 2001 (who had a higher GSV per PA than predicted) were not likely to again, in 2002, have a higher GSV per PA than predicted in 2002 (I think this is the kind of analysis that Dick Cramer performed on the Player Win Average of the Mills brothers).

Sources

"Ballpark Figures to Bet On," Nov. 21, UPFRONT section BusinessWeek magazine. Author was Brian Hindo.

“What's a Ball Player Worth?” can be found at:

http://www.businessweek.com/print/bwdaily/dnflash/nov2003/nf2003115_2313_db016.htm?db

Player Win Averages by Eldon G. and Harlan D. Mills. 1970. A.S. Barnes, publisher.

Curve Ball: Baseball, Statistics, and the Role of Chance in the Game by Jim Albert and Jay Bennett. Revised 2003. Copernicus Books.

Rhoids Sports Analysis: http://www.rhoids.com/

Ed Oswalt’s site is at: http://www.livewild.org/bb/playervalues/index.html

The Nov. 7, 2004 NY Times article is at

http://query.nytimes.com/gst/abstract.html?res=F30A1EFA39580C748CDDA80994DC404482

But you will probably have to pay to read all of it.

Other sites where you might find it are

http://www.iht.com/articles/2004/11/07/sports/base.html

http://redsox.mostvaluablenetwork.com/wp-content/sites/schwarzWRAP.html