Title: RBIs, Opportunities and Power Hitting

RBIs, Opportunities and Power Hitting

Opportunities significantly affect a hitter’s RBI totals

by Cyril Morong

Click here to see my sabermetric blog called Cybermetrics

(This is an expanded and corrected version of my paper that appeared in the 2002 Baseball Research Journal (Volume 31), published by SABR, the Society for American Baseball Research. The results are different than those found in the article because I found a discrepancy in the data I used. How I corrected this is explained at the end of the paper in Part 4 (doing so only strengthened the results and conclusions). Part 1 is the revised article. Part 2 discusses an additional regression that shows that strikeouts probably don’t have a big impact on RBIs. Part 3 shows a regression that uses players who had between 3000 and 6000 plate appearances during the 1987-2001 period.)

Thanks to Ted Lukacs and Jerry Wachs for their comments an early draft of the paper

Part 1

RBIs have long been one of the staples of measuring a hitter’s contribution to his team’s success. Sometimes a player is said to be “a good RBI guy.” Newspapers and record books list the annual RBI leaders, scoreboards and broadcasters tell us how many RBIs a hitter has, almost as if getting them is a special skill, separate from power hitting or hitting for average. But RBIs are also often criticized as being misleading since all hitters don’t get the same number of opportunities to drive in runs. One hitter might get more RBIs than another because he had more opportunities and not because he is somehow better at driving in runs. So the important question is: “Exactly how much difference do RBI opportunities make?”

They make a big difference and exactly how big can be learned from statistical analysis. The following equation, derived using the linear regression technique, explains a hitter’s RBIs per at bat and the value of opportunities:

(1) RBI/AB = .187*OPP + .196*AVG + .468*ISO - .303

where

OPP = a hitter’s number of RBI opportunities per at bat, with each man on base being an opportunity as well as the batter

AVG = a hitter’s batting average

ISO = a hitter’s isolated power which is slugging percentage minus batting average

How does this work? Equation (1) predicts that Juan Gonzalez would get .206 RBIs per at bat because:

.187*(1.72) + .196*(.297) + .468*(.271) - .303 = .204

Gonzalez actually had .205 RBIs per at bat. The equation is also generally very accurate (I explain the statistical results and the data below).

But first, what does the equation mean from a baseball perspective? With the coefficient on OPP being .187, two players who differ by, say, .082 OPP (49 RBI opportunities for a 600 at bat season), will end up with an 9.17 difference in RBI’s over a 600 at bat season (9.17 = .082*.187*600). This is significant in baseball terms as well as statistically. Why look at a .082 difference in OPP? This study includes all players (61) who had 6000 or more plate appearances during the 1987-2001 seasons and whose situational statistics were listed on the CNN/SI website.¹ Juan Gonzalez had the highest OPP/AB at 1.724 for his career. More than half of the other players were at least .082 less than this, including other power hitters like Rafael Palmeiro, Gary Sheffield, and Barry Bonds. Bonds was even lower at 1.587. Gonzalez would get 15.338 (or .137*.187*600) more RBIs than Bonds solely as a result of having more opportunities.

For a single season, the differences in OPP can be even greater. In 1995, for example, Paul O’Neill was the leader at 1.85 while Barry Bonds had 1.61. Everything else being equal, O’Neill would get about 26.93 more RBI’s over a 600 at bat season. So opportunities play a big role in RBI totals.

Hitters vary quite a bit in RBI opportunities. For example, the two lowest in OPP/AB were Rickey Henderson and Kenny Lofton, at 1.46 and 1.47, respectively. The two highest were Juan Gonzalez and Ruben Sierra at 1.72 and 1.71, respectively. Of course, Henderson and Biggio are both primarily leadoff men while Gonzalez (usually fourth) and Sierra (usually third through sixth) have been largely used in the middle of the lineup. But the difference between Henderson and Gonzalez (.26 OPP) would be 156 more RBI opportunities over the course of a 600 at bat season. Just about half of that, say .13, would mean about 78 more.

An actual example supports the importance of opportunities. Juan Gonzalez has a career AVG of .297 and an ISO of .271. Ken Griffey Jr. had .296 and .270, almost identical numbers. Yet Gonzalez had .205 RBI/AB or 123 RBIs over a 600 at bat season. Griffey had .187 RBI/AB or 112 over a 600 at bat season. The difference results largely from Gonzalez having 1.72 opportunities per at bat while Griffey had 1.64.

As for the data, opportunities include one for every time at bat and one for each runner that was on base during an at bat. This means that OPP does not include opportunities from plate appearances when the batter walked. (A regression was run that included these opportunities and the results were similar).² Isolated power is a hitter’s slugging percentage minus his batting average and is a better measure of power hitting since it only includes bases on hits beyond singles.

As for the statistical results, the r-squared is .974, which means that 97.4% of the difference in RBIs per at bat across players is explained by equation (1). The standard error, which measures dispersion in the equation’s predicted RBI/AB for each player, is .005651 or just 3.39 RBIs for a 600 at bat season (600*.005651 = 3.39). The numbers in front of the variable abbreviations are referred to as coefficient estimates. So, for example, a .010 increase in batting average means a .00196 increase in RBI/AB (.196*.010 = .00196). That is 1.18 RBIs for a 600 at bat season. A .010 increase in ISO would add 2.81 RBIs for a 600 at bat season.

The T-values, which indicate statistical significance, are:

OPP = 13.58

AVG = 5.02

ISO = 28.25

This says that the three variables are all significant at the 1% level (or lower), meaning that there is less than a 1 in 100 chance of getting the coefficient estimates in equation (1) if their true value were zero.

Equation (1) also shows the bigger role played by power hitting in driving in runs. Consider Players A and B who have the following statistics:

Player	AB	HITS	2B	3B	HR	AVG	SLG	ISO
A	600	192	40	8	16	.320	.493	.173
B	600	162	20	4	32	.270	.477	.207

Who will drive in more runs? Using equation (1) and assuming they each get 1.6 OPP, Player A will drive in 83.93 runs while Player B will drive in 87.6 runs. Player B’s edge in home run power gives him the edge in RBI’s despite a much lower batting average and a deficit in doubles and triples. For Player A to get up to 87.6 RBIs, his average would have to jump to .351 (assuming all additional hits are singles). If Player A had just 20 doubles and 4 triples, along with a .320 AVG, he would drive in just 76.55 runs. To get up to 87.17 RBIs, he would then have to raise his AVG to .461!! (Again, assuming all additional hits are singles)

But, are all RBI opportunities of the same quality? No, a runner on third is better than a runner on first. So a runner on third counted as a four-point opportunity, a runner on second as a three-point opportunity, a runner on first a two-point opportunity and the batter as a one-point opportunity. So I ran another linear regression with points per at bat replacing opportunities per at bat.

The following equation shows the results:

(2) RBI/AB = .069*POINTS + .206*AVG + .475*ISO – .189

The r-squared is .973. The standard error is .005771 or just 3.46 RBIs for a 600 at bat season. This result is about as good as the one summarized in equation (1). Notice that the value of ISO is still much greater than the value of AVG, so power hitting is still the dominant force. The three variables were all statistically significant at the 1% level. Also, a regression was run that included opportunities from walks, as converted into points, with similar results.³

A hitter’s RBIs are determined by his ability to hit for average, hit for power and the quality and quantity of his opportunities. There probably is no special “RBI ability.” The vast majority of hitters will get about the number of RBIs predicted by their general hitting ability and opportunities. Any deviations are probably just random chance. That would be consistent with the well-known research on clutch hitting.

Table 1: Predicted RBIs vs. Actual RBIs

Player	AVG	ISO	RBI Opportunities per AB	RBI per 600 AB*	Predicted	Difference
Harold Baines	.291	.172	1.67	94.28	87.35	6.93
Wally Joyner	.289	.149	1.64	84.48	77.85	6.63
Delino DeShields	.270	.190	1.49	52.73	47.52	5.20
Tino Martinez	.274	.270	1.70	103.93	99.13	4.80
Frank Thomas	.319	.258	1.66	117.90	113.67	4.23
Jeff Bagwell	.330	.251	1.65	113.06	108.93	4.13
Andres Galarraga	.291	.219	1.65	102.89	99.01	3.88
Tim Raines	.288	.134	1.54	65.42	61.61	3.82
David Justice	.280	.227	1.65	103.19	99.40	3.80
B.J. Surhoff	.281	.135	1.64	75.98	72.23	3.75
Robin Ventura	.271	.176	1.67	9.57	86.85	3.72
Dante Bichette	.299	.200	1.67	99.95	96.46	3.50
Jose Canseco	.268	.252	1.68	111.68	108.35	3.33
Mark McLemore	.260	.800	1.58	51.36	48.68	2.68
Mark Grace	.370	.140	1.61	76.18	73.86	2.31
Rickey Henderson	.274	.140	1.46	55.40	53.57	1.83
Kenny Lofton	.320	.123	1.47	55.16	53.33	1.83
Mark McGwire	.263	.327	1.65	127.26	125.99	1.27
Gary Sheffield	.295	.226	1.61	98.15	97.07	1.08
Juan Gonzalez	.297	.271	1.72	123.21	122.18	1.03
Greg Vaughn	.245	.232	1.67	10.19	99.20	.99
Todd Zeile	.267	.162	1.65	8.47	79.73	.74
Marquis Grissom	.270	.134	1.54	6.58	59.92	.66
Tony Gwynn	.342	.127	1.58	71.58	7.93	.65
Will Clark	.340	.196	1.65	93.75	93.23	.52
Cal Ripken	.271	.163	1.64	79.41	78.98	.43
Tony Fernandez	.286	.112	1.58	6.07	59.86	.20
Paul O'Neill	.288	.182	1.71	94.80	94.60	.20
Gregg Jefferies	.289	.132	1.58	66.30	66.63	-.32
Eric Karros	.268	.194	1.66	89.70	9.06	-.36
Craig Biggio	.291	.145	1.49	59.81	6.21	-.40
Ken Griffey Jr.	.296	.270	1.64	112.30	112.79	-.49
John Olerud	.300	.176	1.66	87.83	88.35	-.51
Jay Bell	.267	.153	1.54	64.50	65.27	-.77
Sammy Sosa	.277	.265	1.64	108.32	109.11	-.79
Jay Buhner	.254	.240	1.71	106.52	107.32	-.80
Fred McGriff	.287	.228	1.66	10.76	101.56	-.80
Chuck Knoblauch	.293	.118	1.49	51.53	52.46	-.93
Matt Williams	.269	.222	1.68	99.05	99.98	-.93
Dave Martinez	.279	.114	1.55	55.81	56.83	-1.02
Travis Fryman	.278	.171	1.69	86.69	87.88	-1.19
Ron Gant	.256	.212	1.61	86.37	87.66	-1.29
Omar Vizquel	.274	.770	1.58	47.01	48.42	-1.41
Brady Anderson	.257	.170	1.51	63.65	65.15	-1.50
Edgar Martinez	.319	.211	1.65	97.80	99.38	-1.58
Ken Caminiti	.272	.175	1.68	85.88	87.52	-1.64
Larry Walker	.315	.257	1.63	107.72	109.37	-1.65
Roberto Alomar	.360	.149	1.58	71.11	72.90	-1.78
Ray Lankford	.274	.290	1.60	86.15	87.97	-1.82
Devon White	.264	.156	1.55	64.56	66.85	-2.28
Barry Larkin	.300	.155	1.55	68.58	7.98	-2.40
Luis Gonzalez	.286	.198	1.63	87.50	9.16	-2.66
Barry Bonds	.295	.299	1.59	111.72	114.49	-2.78
Rafael Palmeiro	.295	.225	1.64	96.95	99.77	-2.82
Bernie Williams	.350	.194	1.68	93.49	96.43	-2.94
Bobby Bonilla	.280	.200	1.67	9.35	93.97	-3.63
Ruben Sierra	.270	.184	1.71	89.29	93.54	-4.26
Benito Santiago	.260	.151	1.67	71.44	78.15	-6.71
Steve Finley	.275	.164	1.58	66.49	73.81	-7.32
Wade Boggs	.317	.117	1.58	56.42	64.67	-8.25
Ellis Burks	.292	.220	1.68	93.29	102.72	-9.43

Explanatory notes: The * indicates that RBIs from sacrifice flies and bases-loaded walks are not included. The number in the predicted is based on equation (1).

Part 2

I also ran a regression in which all RBIs per plate appearance were a function of the following variables:

H/PA = hits per plate appearance

XB/PA = extra bases per plate appearance

PTS/PA = points per plate appearance

K/PA = strikeouts per plate appearance

RBIs from sacrifice files are included in this recession. Walks are included in plate appearances and Points from walks are included. But excluded are intentional walks and points from intentional walks. I included strikeouts to see if players who stuck out more hurt their RBI totals by perhaps driving in fewer runs with sacrifice flies or groundouts. Of course, many players who strikeout often also hit many of home runs, so that would increase their RBIs.

Here are the results:

RBI/PA = .163*(H/PA) + .492*(XB/PA) + .068*(PTS/PA) – .033*(K/PA) – .165

The r-squared was .968. The standard error was .005681 or 3.98 RBIs per 700 plate appearances.

The t-values were:

H/PA = 3.16

XB/PA = 20.86

PTS/PA = 13.09

K/PA = -1.33

Now, notice that the t-value is not significant for K/PA (it needs to be 1.96 or more in absolute value). Strikeouts do not significantly affect RBIs. In fact, their coefficient value is the smallest, too. The highest K/PA was about .24, for Jay Buhner. The average for this group was about .144. The difference between Buhner and the average hitter would be about 2.24 RBIs over 700 PA.

Part 3

This part shows a regression that uses players (N = 114) who had between 3000 and 6000 plate appearances during the 1987-2001 period. The regression equation is

RBI/AB = .198*OPP + .227*AVG + .445*ISO - .323

The r-squared was .962. So the results are similar to equation (1) in terms of the coefficient estimates:

(1) RBI/AB = .187*OPP + .196*AVG + .468*ISO - .20

I wanted to do this to see if the analysis worked for another group of players. The standard error was higher in this case, .00663, but that is expected since the players have fewer plate appearances. Randomness will play a bigger role. When I applied equation (1) to this group of players, it predicted all of them to with ten RBIs of their actual total for a 600 at bat season. 87 were predicted to within five RBIs.

Part 4

I came across a discrepancy in the data I used. I used data from the CNN/SI site for each player and I calculated each player's RBI opportunities. For example, I added up the at-bats that CNN/SI reported for Rafael Palmeiro for each situation where runners might be on base

Runner on 1B-1262
Runner on 2B-714
Runner on 3B-236
Runners on 1B, 2B-614
Runners on 1B, 3B-249
Runners on 2B, 3B-142
Bases Loaded-163

This adds up to 3380. But the discrepancy comes in where, in a separate line, they report that he had 3852 at-bats with runners on base, not 3380. With no runners on, they give him 4521. Adding this to 3852, you get 8373, the same total that he has in the Lee Sinins sabermetric encyclopedia (for 1987-2001). So the 3852, not the 3380, must be right.
In my study I looked at opportunities per at-bat. So I had to calculate opportunities for each hitter. For Palmeiro it was

1262*2=2524
714*2=1428
236*2=472
614*3=1842
249*3=747
142*3=426
163*4=652

If you add that up, you get 8091. The reason you multiply 1262 by 2 is that an at-bat with a man on first is two opportunities, the man on first and the batter. Adding the 8091 to the at-bats with none on (which is one opportunity each time), 4521, you get 12612. That is, Palmeiro had 12612 opportunities. To get this per at-bat, you divide 12612 by his total at-bats, 8373, you get 1.51.
Now this is not right, because I did not have the right numbers for each of those base situations. I don't know why CNN/SI had the discrepancy. I discovered it on August 1, 2003.
Then I looked at how many runners on base at-bats were missing for Palmeiro. That would be 3852 - 3380 or 472. The 3852 is 13.96% higher than the 3380. I assumed that Palmeiro got those other 472 at bats and that the seven base situations came up with the same frequency as they did for the at-bats listed. So I raised the at-bats with a runner on first 13.96%, the at-bats with a runner second 13.96%, and so on. So Palmeiro gets more opportunities and then opportunities per at-bat goes up to 1.64. I did this for all the hitters and re-ran the first regression from the paper. The results were actually more accurate, with the coefficients on ISO and AVG changing a little but the coefficient on OPP (RBI opportunities) went up to .187 from .125. That is quite a jump and it means that opportunities might be alot more important than I thought. In fact, I had done a study on the 1995 season a couple of years ago, and the coefficient on OPP was .174. The standard error per 600 at bats fell from 5.03 to 3.39.

If you went to the CNN/SI site to look for these discrepancies, you would not be able to find them. They no longer list all of the base out situations. They leave one out, the one with runners on first. So you can’t check the total at bats from the seven on-base situations with the total given in the separate “runners on base” line. Since they had a discrepancy before, it might still exist and you could not detect it.

I think the corrections I have made are very reasonable. They result in average number of opportunities per at bat of 1.62. Before the corrections, it was just 1.52. The 1.62 is more in line with what I got in a study of the 1995 season (I used data from the STATS, INC. Scoreboard book for that study) and also with the frequency of the different base-out situations (see Tom Ruane’s website, the last one listed below). There was quite a range of discrepancies. Two hitters were missing more than 20% of their at bats with runners on base while two were missing less than 1%. That is, the discrepancies varied quite a bit across players.

In my 1995 study, I subtracted two opportunities per sacrifice fly for each batter (since the Scoreboard book only gave total opportunities-it was not broken down by base situations). I looked at the 134 hitters from 1995 that had at least 400 at bats. The regression equation was:

RBI/AB = .174*OPP + .289*AVG + .465*ISO - .303

The r-squared was .867. Not as high probably because it is just one season of data and randomness is playing a stronger role. Anyone who has the 1996 Scoreboard book could verify what I got. The results are similar to what I now have after the corrections, especially the value of OPP. .174 is much closer to .187 than .125 is. I don’t know why there is a big difference in the coefficient for average.

End Notes

1. Some outstanding hitters of recent times, Manny Ramirez and Mike Piazza, for example, were not in the study since they had not achieved 6000 plate appearances through the 2001 season. Both were high in opportunities per at bat at 1.71 and 1.68, respectively. Ramirez had 4.89 more RBIs per 600 at bats than expected and Piazza had about 5.91 more.

2. RBIs from sacrifice flies are also not included. Neither are opportunities that were available when the player hit an SF. For the average player in this study, sacrifice flies make up less than 1% of his plate appearances and no more than 1.5% for any one player. So excluding SF’s matters very little. RBIs from bases-loaded walks were not included in the equation (1) or equation (2) results. They were included in the unreported regressions that included opportunities from walks. In those regressions, all variables were divided by plate appearances rather than at bats. HBPs were also included in those cases. But again, the results were similar with basically the same meanings as the two regressions reported here.

3. If I used walks, plate appearances and the point system, the regression results show that opportunities alone would give Juan Gonzalez 11.67 more RBIs than Barry Bonds over a 660 plate appearance season. That is less than 15.338, but still very high.

Data Source

Individual player links at the CNN/SI website

Various editions of the STATS, INC. Player Profiles books and The Great American Baseball Stat Book

Brooks Harold, “The Statistical Mirage of Clutch Hitting,” Baseball Research Journal 1989

Conlon Tom, “Or Does Clutch Ability Exist? By The Numbers March 1990

Richard D. Cramer, "Do Clutch Hitters Exist?" Baseball Research Journal 1977

Gary Gillette, “Much Ado About Nothing,” SABERMETRIC REVIEW, July 1986

Tom Hanrahan, “Clutch Teams in 1999” By the Numbers May, 2000

Tom Hanrahan, “What Makes a “Clutch” Situation?” By the Numbers February, 2001

Karcher Keith, “The Power of Statistical Tests,” By The Numbers June 1991

Eldon G. Mills and Harlan D. Mills, Player Win Averages, 197. A.S. Barnes

Cyril Morong, “Clutch Hitting and Experience,” By the Numbers November 2000

Pete Palmer, “Clutch Hitting One More Time,” By the Numbers, March 1990

Willie Runquist, Baseball by the Numbers, 1995. McFarland.

Runquist Willie, “Clutch Hitters and Other Mythological Animals,” By The Numbers March 1994

Rob Wood, “Clutch Ability: Myth or Reality?” By the Numbers, December 1989

Note: By the Numbers is the Newsletter of the SABR Statistical Analysis Committee. The Baseball Research Journal is also published by SABR.

Web Sites

http://nexus.sscl.uwo.ca/economics/faculty/jpalmer/Eco182/Clutch/Clutch.html

http://math.la.asu.edu/~grabiner/

http://math.la.asu.edu/~grabiner/fullclutch.txt

http://math.la.asu.edu/~grabiner/risp91.txt

http://www.diamond-mind.com/articles/neyerclutch.htm

http://www.baseballstuff.com/btf/scholars/ruane/articles/situational_hitting.htm