This paper was published in the August, 2002 issue of By the Numbers, the statistical newsletter of the Society for American Baseball Research (SABR).
The Impact of Lineup Balance on Scoring, 1920-89
by Cyril Morong
This paper analyzes whether or not a more balanced lineup, holding overall team batting percentages (on-base percentage or OBP, slugging percentage or SLG, etc.) constant, has, in the past, increased run scoring.
Team balance (BAL) is the sample standard deviation (SD) of various hitting statistics like OBP and SLG calculated for a teams most frequently used eight position players. Only teams that had eight players with at least 400 at-bats at each of the non-pitching positions according to the 8th edition of the Macmillan Baseball Encyclopedia were included in the study. This allows for a fairly constant lineup throughout the season. There were 50 such teams in this time period. In one case, team balance was the range of OPS (OBP + SLG) from the highest to lowest player in the starting lineup.1
The question was analyzed using ordinary least-squares regression analysis. In the first regression, the dependent variable was team runs per game (R/G). The independent variables were team OBP, team SLG and the error rate (ER). The error rate is simply 1 minus the fielding percentage for the entire league in the year in which a given team played. ER is added since it varied over the period and impacts scoring. For example, it was much different in the 1921 AL (.035) than it was in the 1982 AL (.020). I assumed that the error rate committed against all teams in a league in a season is the same. This is not perfect, but it is an improvement over not including anything at all for the error rate.
The results for the first regression, which used the 50 teams in the study:
(1) R/G = -7.34 + 24.97*OBP + 8.39*SLG + 11.01*ER
The r-squared was .918, meaning that 91.8% of the variation across teams in R/G is explained by the independent variables. The T-values for the 3 independent variables were 10.42, 5.97, and 1.44, respectively. When this regression was run for all teams from 1920-98, the equation was:
(2) R/G = -5.87 + 17.63*OBP + 10.7*SLG + 13.51*ER
The r-squared was .922. In this case ER had a much higher T-value (11.11). The other big difference was the coefficient value of OBP. It was much lower than for the group of 50 teams that I am working with here.
When the balance variables were added to the first regression, the results were:
(3) R/G = -7.04 + 25.40*OBP + 6.96*SLG + 11.60*ER 3.54*BAL/OBP + 3.30*BAL/SLG
The r-squared was .923, not much higher than equation (1). So adding in variables to represent balance does not add much to our ability to explain R/G. Neither BAL variable was significant, with T-values of -0.89 and 1.58. But having a more balanced team in OBP increased scoring since the sign on the coefficient is negative. The higher the standard deviation, the less balanced the team is in OBP and the lower the scoring. BAL/SLG is the opposite. It helped to be less balanced in SLG.
The mean standard deviation of OBP for the eight players on each team (or BAL/OBP) is .036. The extreme high and low both differed from it by about .020. Multiplying this times the 3.54 coefficient from equation (1) we get -.07 R/G. For 162 games this is about 11 runs. But 38 teams were within .010 of the .036 mean standard deviation for OBP. So for those teams this is a difference of five runs or less per season. The difference between the most balanced and least balanced teams is about .040 or 22 runs a season. The standard deviation of BAL/OBP was .0137. Multiplying this by 3.53 gives -.048 or -7.84 runs per season. So increasing your balance by one standard deviation added 7.84 runs per season. This does not seem to be a large effect.
The mean standard deviation for SLG (or BAL/SLG) is .070. The extreme high differed from it by about .053. Multiplying this times the 3.30 coefficient we get .26 R/G. For 162 games this is about 28.33 runs. So the team that was the least balanced in SLG scored 28.33 more runs than the average team. For the most balanced team, the difference was .025. This cost them about 13 runs a season. But 39 teams were within .020 of the .070 mean standard deviation for SLG. So for those teams this is a difference of 10.69 runs or less per season. But all this says is that it paid to be unbalanced in SLG. The standard deviation of BAL/SLG was .041. Multiplying this by 3.30 gives .135 or 21.88 runs per season. So decreasing your balance by one standard deviation added 21.88 runs per season. This seems like a large effect.
Regression (3) was run using isolated power, total bases divided by at-bats and extra bases divided by at-bats in place of SLG. The results were generally similar. Adding in the BAL variables did not increase the r-squared very much. The signs on the coefficients were the same. None of the BAL variables had T-values of 2 or more (or even close), so they were not significant. But the coefficient on BAL/OBP in the regression that used isolated power was much higher than in the other regressions at 5.32. Multiplying that by the standard deviation of BAL/OBP of .0137 gives -.073 or 11.81 runs a season. So increasing your balance by one standard deviation added 11.81 runs per season.
One regression used OPS instead of OBP and SLG. Again, BAL had little impact on the model. In fact, less balance added more R/G. The range of OPS from the highest to lowest hitter on each team was also used as a BAL variable. It also had little impact on the model. In fact, the coefficient on the BAL variable was positive, meaning that the bigger the range in OPS from the highest to lowest hitter the higher the R/G.
One problem with the 50 teams in the data set is that they generally scored more runs than the average team in their league. On average, it was 6% more for the 50 teams. There were only 16 teams that were below average in R/G for their league. So I ran a regression using the 32 lowest scoring teams (so there was an equal number of teams above and below their league average R/G). In that regression, the independent variables were team OPS, ER, and BAL/OPS. As with the above analysis, the BAL variable had little impact on the r-squared. It was not significant. Its coefficient was negative, but only -.29. This would cause a difference of less that .5 runs per season between he most and least balanced teams in OPS.
Then I ran a regression on those same 32 teams that was the same as (3). The results were:
(3A) R/G = -7.33 + 28.79*OBP + 5.98*SLG + 6.16*ER 7.21*BAL/OBP + 0.12*BAL/SLG
In this case of the 32 lowest scoring teams, the coefficient is much stronger on BAL/OBP. But 24 of those 32 teams are within .010 of the mean standard deviation for OBP (or BAL/OBP) of .034. That .010 means a difference of 11.68 runs per season. The standard deviation of BAL/OBP was .022. Multiplying this by 7.21 gives -.158 or -25.66 runs per season. So increasing your balance by one standard deviation will add 25.66 runs per season. This shows a big advantage for making your lineup more balanced in OBP. The coefficient on BAL/SLG is very slight, indicating that being balanced in SLG has little effect one way or another.
To conclude, it seems that having a more balanced lineup generally has had little positive impact on scoring. The only analysis that supports the importance of balance is the last one which had a limited number of teams and that was only for OBP.
Cyril Morong, 723 W. French Place, San Antonio, TX 78212, email@example.com
1. Seven teams that played under the DH-rule were used. They each had nine players with 400 or more at-bats. There were two non-DH teams that actually had nine players with 400 or more at-bats, the 1937 Pirates and the 1971 Tigers. They were not used.