RBIs, Opportunities and Power Hitting
Opportunities
significantly affect a hitter’s RBI totals
by Cyril Morong
(This is an expanded and corrected version of my paper that appeared in the 2002 Baseball Research Journal (Volume 31), published by SABR, the Society for American Baseball Research. The results are different than those found in the article because I found a discrepancy in the data I used. How I corrected this is explained at the end of the paper in Part 4 (doing so only strengthened the results and conclusions). Part 1 is the revised article. Part 2 discusses an additional regression that shows that strikeouts probably don’t have a big impact on RBIs. Part 3 shows a regression that uses players who had between 3000 and 6000 plate appearances during the 1987-2001 period.)
Thanks to Ted Lukacs and Jerry Wachs for their comments an early draft of the paper
Part 1
RBIs have long been one of the staples of measuring a hitter’s contribution to his team’s success. Sometimes a player is said to be “a good RBI guy.” Newspapers and record books list the annual RBI leaders, scoreboards and broadcasters tell us how many RBIs a hitter has, almost as if getting them is a special skill, separate from power hitting or hitting for average. But RBIs are also often criticized as being misleading since all hitters don’t get the same number of opportunities to drive in runs. One hitter might get more RBIs than another because he had more opportunities and not because he is somehow better at driving in runs. So the important question is: “Exactly how much difference do RBI opportunities make?”
They make a big difference and exactly how big can be learned from statistical analysis. The following equation, derived using the linear regression technique, explains a hitter’s RBIs per at bat and the value of opportunities:
(1) RBI/AB = .187*OPP + .196*AVG + .468*ISO - .303
where
OPP = a hitter’s number of RBI opportunities per at bat, with each man on base being an opportunity as well as the batter
AVG = a hitter’s batting average
ISO = a hitter’s isolated power which is slugging percentage minus batting average
How does this work? Equation (1) predicts that Juan Gonzalez would get .206 RBIs per at bat because:
.187*(1.72) + .196*(.297) + .468*(.271) - .303 = .204
Gonzalez actually had .205 RBIs per at bat. The equation is also generally very accurate (I explain the statistical results and the data below).
But first, what does the equation mean from a baseball perspective? With the coefficient on OPP being .187, two players who differ by, say, .082 OPP (49 RBI opportunities for a 600 at bat season), will end up with an 9.17 difference in RBI’s over a 600 at bat season (9.17 = .082*.187*600). This is significant in baseball terms as well as statistically. Why look at a .082 difference in OPP? This study includes all players (61) who had 6000 or more plate appearances during the 1987-2001 seasons and whose situational statistics were listed on the CNN/SI website.1 Juan Gonzalez had the highest OPP/AB at 1.724 for his career. More than half of the other players were at least .082 less than this, including other power hitters like Rafael Palmeiro, Gary Sheffield, and Barry Bonds. Bonds was even lower at 1.587. Gonzalez would get 15.338 (or .137*.187*600) more RBIs than Bonds solely as a result of having more opportunities.
For a single season, the differences in OPP can be even greater. In 1995, for example, Paul O’Neill was the leader at 1.85 while Barry Bonds had 1.61. Everything else being equal, O’Neill would get about 26.93 more RBI’s over a 600 at bat season. So opportunities play a big role in RBI totals.
Hitters vary quite a bit in RBI opportunities. For example, the two lowest in OPP/AB were Rickey Henderson and Kenny Lofton, at 1.46 and 1.47, respectively. The two highest were Juan Gonzalez and Ruben Sierra at 1.72 and 1.71, respectively. Of course, Henderson and Biggio are both primarily leadoff men while Gonzalez (usually fourth) and Sierra (usually third through sixth) have been largely used in the middle of the lineup. But the difference between Henderson and Gonzalez (.26 OPP) would be 156 more RBI opportunities over the course of a 600 at bat season. Just about half of that, say .13, would mean about 78 more.
An actual example supports the importance of opportunities. Juan Gonzalez has a career AVG of .297 and an ISO of .271. Ken Griffey Jr. had .296 and .270, almost identical numbers. Yet Gonzalez had .205 RBI/AB or 123 RBIs over a 600 at bat season. Griffey had .187 RBI/AB or 112 over a 600 at bat season. The difference results largely from Gonzalez having 1.72 opportunities per at bat while Griffey had 1.64.
As for the data, opportunities include one for every time at bat and one for each runner that was on base during an at bat. This means that OPP does not include opportunities from plate appearances when the batter walked. (A regression was run that included these opportunities and the results were similar).2 Isolated power is a hitter’s slugging percentage minus his batting average and is a better measure of power hitting since it only includes bases on hits beyond singles.
As for the statistical results, the r-squared is .974, which means that 97.4% of the difference in RBIs per at bat across players is explained by equation (1). The standard error, which measures dispersion in the equation’s predicted RBI/AB for each player, is .005651 or just 3.39 RBIs for a 600 at bat season (600*.005651 = 3.39). The numbers in front of the variable abbreviations are referred to as coefficient estimates. So, for example, a .010 increase in batting average means a .00196 increase in RBI/AB (.196*.010 = .00196). That is 1.18 RBIs for a 600 at bat season. A .010 increase in ISO would add 2.81 RBIs for a 600 at bat season.
The T-values, which indicate statistical significance, are:
OPP = 13.58
AVG = 5.02
ISO = 28.25
This says that the three variables are all significant at the 1% level (or lower), meaning that there is less than a 1 in 100 chance of getting the coefficient estimates in equation (1) if their true value were zero.
Equation (1) also shows the bigger role played by power hitting in driving in runs. Consider Players A and B who have the following statistics:
Player |
AB |
HITS |
2B |
3B |
HR |
AVG |
SLG |
ISO |
A |
600 |
192 |
40 |
8 |
16 |
.320 |
.493 |
.173 |
B |
600 |
162 |
20 |
4 |
32 |
.270 |
.477 |
.207 |
Who will drive in more runs? Using equation (1) and assuming they each get 1.6 OPP, Player A will drive in 83.93 runs while Player B will drive in 87.6 runs. Player B’s edge in home run power gives him the edge in RBI’s despite a much lower batting average and a deficit in doubles and triples. For Player A to get up to 87.6 RBIs, his average would have to jump to .351 (assuming all additional hits are singles). If Player A had just 20 doubles and 4 triples, along with a .320 AVG, he would drive in just 76.55 runs. To get up to 87.17 RBIs, he would then have to raise his AVG to .461!! (Again, assuming all additional hits are singles)
But, are all RBI opportunities of the same quality? No, a runner on third is better than a runner on first. So a runner on third counted as a four-point opportunity, a runner on second as a three-point opportunity, a runner on first a two-point opportunity and the batter as a one-point opportunity. So I ran another linear regression with points per at bat replacing opportunities per at bat.
The following equation shows the results:
(2) RBI/AB = .069*POINTS + .206*AVG + .475*ISO – .189
The r-squared is .973. The standard error is .005771
or just 3.46 RBIs for a 600 at bat season. This result is about as good as the
one summarized in equation (1). Notice that the value of ISO is still much
greater than the value of AVG, so power hitting is still the dominant force.
The three variables were all statistically significant at the 1% level. Also, a
regression was run that included opportunities from walks, as converted into
points, with similar results. 3
A hitter’s RBIs are determined by his ability to hit for average, hit for power and the quality and quantity of his opportunities. There probably is no special “RBI ability.” The vast majority of hitters will get about the number of RBIs predicted by their general hitting ability and opportunities. Any deviations are probably just random chance. That would be consistent with the well-known research on clutch hitting.
Table 1: Predicted RBIs vs. Actual RBIs
AVG |
ISO |
RBI Opportunities per AB |
RBI per 600 AB* |
Predicted |
Difference |
|
Harold Baines |
.291 |
.172 |
1.67 |
94.28 |
87.35 |
6.93 |
Wally Joyner |
.289 |
.149 |
1.64 |
84.48 |
77.85 |
6.63 |
Delino DeShields |
.270 |
.190 |
1.49 |
52.73 |
47.52 |
5.20 |
Tino Martinez |
.274 |
.270 |
1.70 |
103.93 |
99.13 |
4.80 |
Frank Thomas |
.319 |
.258 |
1.66 |
117.90 |
113.67 |
4.23 |
Jeff Bagwell |
.330 |
.251 |
1.65 |
113.06 |
108.93 |
4.13 |
Andres Galarraga |
.291 |
.219 |
1.65 |
102.89 |
99.01 |
3.88 |
Tim Raines |
.288 |
.134 |
1.54 |
65.42 |
61.61 |
3.82 |
David Justice |
.280 |
.227 |
1.65 |
103.19 |
99.40 |
3.80 |
B.J. Surhoff |
.281 |
.135 |
1.64 |
75.98 |
72.23 |
3.75 |
Robin Ventura |
.271 |
.176 |
1.67 |
9.57 |
86.85 |
3.72 |
Dante Bichette |
.299 |
.200 |
1.67 |
99.95 |
96.46 |
3.50 |
Jose Canseco |
.268 |
.252 |
1.68 |
111.68 |
108.35 |
3.33 |
Mark McLemore |
.260 |
.800 |
1.58 |
51.36 |
48.68 |
2.68 |
Mark Grace |
.370 |
.140 |
1.61 |
76.18 |
73.86 |
2.31 |
Rickey Henderson |
.274 |
.140 |
1.46 |
55.40 |
53.57 |
1.83 |
Kenny Lofton |
.320 |
.123 |
1.47 |
55.16 |
53.33 |
1.83 |
Mark McGwire |
.263 |
.327 |
1.65 |
127.26 |
125.99 |
1.27 |
Gary Sheffield |
.295 |
.226 |
1.61 |
98.15 |
97.07 |
1.08 |
Juan Gonzalez |
.297 |
.271 |
1.72 |
123.21 |
122.18 |
1.03 |
Greg Vaughn |
.245 |
.232 |
1.67 |
10.19 |
99.20 |
.99 |
Todd Zeile |
.267 |
.162 |
1.65 |
8.47 |
79.73 |
.74 |
Marquis Grissom |
.270 |
.134 |
1.54 |
6.58 |
59.92 |
.66 |
Tony Gwynn |
.342 |
.127 |
1.58 |
71.58 |
7.93 |
.65 |
Will Clark |
.340 |
.196 |
1.65 |
93.75 |
93.23 |
.52 |
Cal Ripken |
.271 |
.163 |
1.64 |
79.41 |
78.98 |
.43 |
Tony Fernandez |
.286 |
.112 |
1.58 |
6.07 |
59.86 |
.20 |
Paul O'Neill |
.288 |
.182 |
1.71 |
94.80 |
94.60 |
.20 |
Gregg Jefferies |
.289 |
.132 |
1.58 |
66.30 |
66.63 |
-.32 |
Eric Karros |
.268 |
.194 |
1.66 |
89.70 |
9.06 |
-.36 |
Craig Biggio |
.291 |
.145 |
1.49 |
59.81 |
6.21 |
-.40 |
Ken Griffey Jr. |
.296 |
.270 |
1.64 |
112.30 |
112.79 |
-.49 |
John Olerud |
.300 |
.176 |
1.66 |
87.83 |
88.35 |
-.51 |
Jay Bell |
.267 |
.153 |
1.54 |
64.50 |
65.27 |
-.77 |
Sammy Sosa |
.277 |
.265 |
1.64 |
108.32 |
109.11 |
-.79 |
Jay Buhner |
.254 |
.240 |
1.71 |
106.52 |
107.32 |
-.80 |
Fred McGriff |
.287 |
.228 |
1.66 |
10.76 |
101.56 |
-.80 |
Chuck Knoblauch |
.293 |
.118 |
1.49 |
51.53 |
52.46 |
-.93 |
Matt Williams |
.269 |
.222 |
1.68 |
99.05 |
99.98 |
-.93 |
Dave Martinez |
.279 |
.114 |
1.55 |
55.81 |
56.83 |
-1.02 |
Travis Fryman |
.278 |
.171 |
1.69 |
86.69 |
87.88 |
-1.19 |
Ron Gant |
.256 |
.212 |
1.61 |
86.37 |
87.66 |
-1.29 |
Omar Vizquel |
.274 |
.770 |
1.58 |
47.01 |
48.42 |
-1.41 |
Brady Anderson |
.257 |
.170 |
1.51 |
63.65 |
65.15 |
-1.50 |
Edgar Martinez |
.319 |
.211 |
1.65 |
97.80 |
99.38 |
-1.58 |
Ken Caminiti |
.272 |
.175 |
1.68 |
85.88 |
87.52 |
-1.64 |
Larry Walker |
.315 |
.257 |
1.63 |
107.72 |
109.37 |
-1.65 |
Roberto Alomar |
.360 |
.149 |
1.58 |
71.11 |
72.90 |
-1.78 |
Ray Lankford |
.274 |
.290 |
1.60 |
86.15 |
87.97 |
-1.82 |
Devon White |
.264 |
.156 |
1.55 |
64.56 |
66.85 |
-2.28 |
Barry Larkin |
.300 |
.155 |
1.55 |
68.58 |
7.98 |
-2.40 |
Luis Gonzalez |
.286 |
.198 |
1.63 |
87.50 |
9.16 |
-2.66 |
Barry Bonds |
.295 |
.299 |
1.59 |
111.72 |
114.49 |
-2.78 |
Rafael Palmeiro |
.295 |
.225 |
1.64 |
96.95 |
99.77 |
-2.82 |
Bernie Williams |
.350 |
.194 |
1.68 |
93.49 |
96.43 |
-2.94 |
Bobby Bonilla |
.280 |
.200 |
1.67 |
9.35 |
93.97 |
-3.63 |
Ruben Sierra |
.270 |
.184 |
1.71 |
89.29 |
93.54 |
-4.26 |
Benito Santiago |
.260 |
.151 |
1.67 |
71.44 |
78.15 |
-6.71 |
Steve Finley |
.275 |
.164 |
1.58 |
66.49 |
73.81 |
-7.32 |
Wade Boggs |
.317 |
.117 |
1.58 |
56.42 |
64.67 |
-8.25 |
Ellis Burks |
.292 |
.220 |
1.68 |
93.29 |
102.72 |
-9.43 |
Explanatory notes: The * indicates that RBIs from sacrifice flies and bases-loaded walks are not included. The number in the predicted is based on equation (1).
Part 2
I also ran a regression in which all RBIs per plate appearance were a function of the following variables:
H/PA = hits per plate appearance
XB/PA = extra bases per plate appearance
PTS/PA = points per plate appearance
K/PA = strikeouts per plate appearance
RBIs from sacrifice files are included in this recession. Walks are included in plate appearances and Points from walks are included. But excluded are intentional walks and points from intentional walks. I included strikeouts to see if players who stuck out more hurt their RBI totals by perhaps driving in fewer runs with sacrifice flies or groundouts. Of course, many players who strikeout often also hit many of home runs, so that would increase their RBIs.
Here are the results:
RBI/PA = .163*(H/PA) + .492*(XB/PA) + .068*(PTS/PA) – .033*(K/PA) – .165
The r-squared was .968. The standard error was .005681 or 3.98 RBIs per 700 plate appearances.
The t-values were:
H/PA = 3.16
XB/PA = 20.86
PTS/PA = 13.09
K/PA = -1.33
Now, notice that the t-value is not significant for K/PA (it needs to be 1.96 or more in absolute value). Strikeouts do not significantly affect RBIs. In fact, their coefficient value is the smallest, too. The highest K/PA was about .24, for Jay Buhner. The average for this group was about .144. The difference between Buhner and the average hitter would be about 2.24 RBIs over 700 PA.
Part 3
This part shows a regression that uses players (N = 114) who had between 3000 and 6000 plate appearances during the 1987-2001 period. The regression equation is
RBI/AB = .198*OPP + .227*AVG + .445*ISO - .323
The r-squared was .962. So the results are similar to equation (1) in terms of the coefficient estimates:
(1) RBI/AB = .187*OPP + .196*AVG + .468*ISO - .20
I wanted to do this to see if the analysis worked for another group of players. The standard error was higher in this case, .00663, but that is expected since the players have fewer plate appearances. Randomness will play a bigger role. When I applied equation (1) to this group of players, it predicted all of them to with ten RBIs of their actual total for a 600 at bat season. 87 were predicted to within five RBIs.
Part 4
I came across a
discrepancy in the data I used. I used data from the CNN/SI site for each
player and I calculated each player's RBI opportunities. For example, I
added up the at-bats that CNN/SI reported for Rafael Palmeiro for each
situation where runners might be on base
Runner on 1B-1262
Runner on 2B-714
Runner on 3B-236
Runners on 1B, 2B-614
Runners on 1B, 3B-249
Runners on 2B, 3B-142
Bases Loaded-163
This adds up to 3380. But the discrepancy comes in where, in a
separate line, they report that he had 3852 at-bats with runners on base, not
3380. With no runners on, they give him 4521. Adding this to 3852,
you get 8373, the same total that he has in the Lee Sinins sabermetric
encyclopedia (for 1987-2001). So the 3852, not the 3380, must be right.
In my study I looked at
opportunities per at-bat. So I had to calculate opportunities for each
hitter. For Palmeiro it was
1262*2=2524
714*2=1428
236*2=472
614*3=1842
249*3=747
142*3=426
163*4=652
If you add that up, you get 8091. The reason you multiply 1262 by
2 is that an at-bat with a man on first is two opportunities, the man on first
and the batter. Adding the 8091 to the at-bats with none on (which is one
opportunity each time), 4521, you get 12612. That is, Palmeiro had 12612
opportunities. To get this per at-bat, you divide 12612 by his total
at-bats, 8373, you get 1.51.
Now this is not right, because I
did not have the right numbers for each of those base situations. I don't
know why CNN/SI had the discrepancy. I discovered it on August 1, 2003.
Then I looked at how many
runners on base at-bats were missing for Palmeiro. That would be 3852 - 3380 or
472. The 3852 is 13.96% higher than the 3380. I assumed that
Palmeiro got those other 472 at bats and that the seven base situations came up
with the same frequency as they did for the at-bats listed. So I raised
the at-bats with a runner on first 13.96%, the at-bats with a runner second
13.96%, and so on. So Palmeiro gets more opportunities and then
opportunities per at-bat goes up to 1.64. I did this for all the hitters
and re-ran the first regression from the paper. The results were actually
more accurate, with the coefficients on ISO and AVG changing a little but the
coefficient on OPP (RBI opportunities) went up to .187 from .125. That is
quite a jump and it means that opportunities might be alot more important than
I thought. In fact, I had done a study on the 1995 season a couple of years
ago, and the coefficient on OPP was .174. The standard error per 600 at bats
fell from 5.03 to 3.39.
If you went to the
CNN/SI site to look for these discrepancies, you would not be able to find
them. They no longer list all of the
base out situations. They leave one
out, the one with runners on first. So
you can’t check the total at bats from the seven on-base situations with the
total given in the separate “runners on base” line. Since they had a
discrepancy before, it might still exist and you could not detect it.
I think the
corrections I have made are very reasonable.
They result in average number of opportunities per at bat of 1.62. Before the corrections, it was just 1.52. The 1.62 is more in line with what I got in a
study of the 1995 season (I used data from the STATS, INC. Scoreboard
book for that study) and also with the frequency of the different base-out
situations (see Tom Ruane’s website, the last one listed below). There was quite a range of discrepancies. Two hitters were missing more than 20% of
their at bats with runners on base while two were missing less than 1%. That is, the discrepancies varied quite a
bit across players.
In my 1995 study, I subtracted two opportunities per sacrifice fly for each batter (since the Scoreboard book only gave total opportunities-it was not broken down by base situations). I looked at the 134 hitters from 1995 that had at least 400 at bats. The regression equation was:
RBI/AB = .174*OPP + .289*AVG + .465*ISO - .303
The r-squared was .867. Not as high probably because it is just one season of data and randomness is playing a stronger role. Anyone who has the 1996 Scoreboard book could verify what I got. The results are similar to what I now have after the corrections, especially the value of OPP. .174 is much closer to .187 than .125 is. I don’t know why there is a big difference in the coefficient for average.
End
Notes
1. Some outstanding hitters of recent times, Manny Ramirez and Mike Piazza, for example, were not in the study since they had not achieved 6000 plate appearances through the 2001 season. Both were high in opportunities per at bat at 1.71 and 1.68, respectively. Ramirez had 4.89 more RBIs per 600 at bats than expected and Piazza had about 5.91 more.
2. RBIs from sacrifice flies are also not included. Neither are opportunities that were available when the player hit an SF. For the average player in this study, sacrifice flies make up less than 1% of his plate appearances and no more than 1.5% for any one player. So excluding SF’s matters very little. RBIs from bases-loaded walks were not included in the equation (1) or equation (2) results. They were included in the unreported regressions that included opportunities from walks. In those regressions, all variables were divided by plate appearances rather than at bats. HBPs were also included in those cases. But again, the results were similar with basically the same meanings as the two regressions reported here.
3. If I used walks, plate appearances and the point system, the regression results show that opportunities alone would give Juan Gonzalez 11.67 more RBIs than Barry Bonds over a 660 plate appearance season. That is less than 15.338, but still very high.
Data
Source
Individual player links at the CNN/SI website
Various editions of the STATS, INC. Player Profiles books and The Great American Baseball Stat Book
Brooks Harold, “The Statistical Mirage of Clutch
Hitting,” Baseball Research Journal 1989
Conlon Tom, “Or Does Clutch Ability Exist? By
The Numbers March 1990
Richard
D. Cramer, "Do Clutch Hitters Exist?" Baseball Research Journal
1977
Gary
Gillette, “Much Ado About Nothing,” SABERMETRIC REVIEW, July 1986
Tom
Hanrahan, “Clutch Teams in 1999” By the Numbers May, 2000
Tom
Hanrahan, “What Makes a “Clutch” Situation?” By the Numbers February,
2001
Karcher Keith, “The Power of Statistical Tests,”
By The Numbers June 1991
Eldon G. Mills and Harlan D. Mills, Player
Win Averages, 197. A.S. Barnes
Cyril Morong, “Clutch Hitting and Experience,” By
the Numbers November 2000
Pete Palmer, “Clutch Hitting One More Time,” By the Numbers, March 1990
Willie Runquist, Baseball by the Numbers, 1995. McFarland.
Runquist Willie, “Clutch Hitters and Other Mythological Animals,” By The Numbers March 1994
Rob Wood, “Clutch Ability: Myth or Reality?” By the Numbers, December 1989
Note: By the Numbers is the Newsletter of
the SABR Statistical Analysis Committee.
The Baseball Research Journal is also published by SABR.
Web Sites
http://nexus.sscl.uwo.ca/economics/faculty/jpalmer/Eco182/Clutch/Clutch.html
http://math.la.asu.edu/~grabiner/
http://math.la.asu.edu/~grabiner/fullclutch.txt
http://math.la.asu.edu/~grabiner/risp91.txt
http://www.diamond-mind.com/articles/neyerclutch.htm
http://www.baseballstuff.com/btf/scholars/ruane/articles/situational_hitting.htm