Filed under:

# Turn and Face the Strange Changes: Does Throwing Changeups Help Pitchers Sustain Lower BABIP?

Earlier today (or yesterday, depending on which timezone you're in and when you're reading this), woodman663 posted a really interesting article demonstrating that changeup specialists may have a predilection for sustaining low babip.  Many of the examples of pitchers that he looked at (for example, Ted Lilly) were extreme flyball pitchers.  Since flyballs are more likely to turn into outs than grounders, it forces us to disentangle these two factors from one another.

Woodman and I have talked back and forth on the piece a bit and I suggested that we run some statistical analyses so that we could tease out whether the changeup effect was truly meaningful or if it was just an artifact of the flyball effect.  I said much of what is in this post in the comments section, but here it is full-blown and with the output (which is important, in case I'm making mistakes here -- please let me know if you notice any).

I included all starting pitchers with 300+ innings since 2009 and used R v2.12.1 to fit a linear model for babip to fixed effects of flyball-rate, strikeout-rate, changeup frequency, total value by linear weights of all changeups, and value by linear weights per changeup. At Woodman's suggestion (and as justified in the body of the post), I included splitters as changeups.

Keep in mind that the p-values refer to whether the evidence suggests that a factor is significant (the lower the p-value, the more confident we can be that the effect is real) and the R-squared values refer to how well the model describes the variance (the higher the R-squared value, the better the description).

The model accounted for about one quarter of the variance in pitcher babip.  After testing the significance of effects, I also used the Lindeman, Merenda and Gold (lmg) method to determine the relative importances of contributions from each factor.  Here is the output:

`> summary(fit1)`

```Call:
lm(formula = babip\$BABIP ~ babip\$fly + babip\$K + babip\$chfreq +
babip\$chtot + babip\$chperc, data = babip)

Residuals:
Min        1Q    Median        3Q       Max
-0.040995 -0.007808 -0.000659  0.008768  0.032989

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.322e-01  8.910e-03  37.283  < 2e-16 ***
babip\$fly    -1.213e-01  2.342e-02  -5.179  9.2e-07 ***
babip\$K       1.609e-02  3.379e-02   0.476   0.6348
babip\$chfreq  9.532e-03  2.125e-02   0.448   0.6546
babip\$chtot  -5.992e-05  1.570e-04  -0.382   0.7034
babip\$chperc -2.969e-03  1.544e-03  -1.924   0.0568 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.01358 on 119 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.2564,	Adjusted R-squared: 0.2251
F-statistic: 8.206 on 5 and 119 DF,  p-value: 1.100e-06```

The "1 observation deleted due to missingness" (what a great word, by the way), was Tommy Hanson, who has zero changeups and splitters on record.  Anyway, what we find is that the effects of flyball-rate are highly significant (p = 2 × 10**-16). The effects of value per changeup are moderately significant (p = 0.0568). None of the other effects (including K%!) were significant.  A model including only those two factors actually fit the data slightly better than the initial model, which also included k-rate, changeup frequency and total changeup value.  Here is the output for that model:

```> summary(fit2)

Call:
lm(formula = babip\$BABIP ~ babip\$fly + babip\$chperc, data = babip)

Residuals:
Min        1Q    Median        3Q       Max
-0.040973 -0.008040 -0.001156  0.008629  0.032851

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.3340417  0.0078308  42.657  < 2e-16 ***
babip\$fly    -0.1159427  0.0213364  -5.434 2.87e-07 ***
babip\$chperc -0.0031551  0.0008792  -3.589  0.00048 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.01344 on 122 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.254,	Adjusted R-squared: 0.2418
F-statistic: 20.77 on 2 and 122 DF,  p-value: 1.726e-08```

Next, I used the Lindeman, Gold and Merenda (1990) (lgm) method to describe the relative importances of each factor.  Here is the output for the first model:

```> calc.relimp(fit1,type=c("lmg","last","first","pratt"), rela=TRUE)
Response variable: babip\$BABIP
Total response variance: 0.0002381301
Analysis based on 125 observations

5 Regressors:
babip\$fly babip\$K babip\$chfreq babip\$chtot babip\$chperc
Proportion of variance explained by model: 25.64%
Metrics are normalized to sum to 100% (rela=TRUE).

Relative importance metrics:

lmg        last      first       pratt
babip\$fly    0.66176163 0.862561629 0.48939293  0.72622533
babip\$K      0.02597355 0.007290960 0.04968005 -0.02154839
babip\$chfreq 0.05245325 0.006468146 0.10981283 -0.03433426
babip\$chtot  0.08818158 0.004685193 0.14603247  0.05044396
babip\$chperc 0.17162998 0.118994071 0.20508171  0.27921336

Average coefficients for different model sizes:

1X           2Xs           3Xs           4Xs
babip\$fly    -0.1141989299 -0.1126788278 -0.1150599466 -1.190098e-01
babip\$K      -0.0518129769 -0.0316319422 -0.0157599046  1.072460e-03
babip\$chfreq -0.0425860063 -0.0276718376 -0.0142778771 -1.016283e-03
babip\$chtot  -0.0002423128 -0.0001693865 -0.0001124105 -7.509733e-05
babip\$chperc -0.0030463088 -0.0028538654 -0.0028884462 -3.005977e-03
5Xs
babip\$fly    -0.1213197877
babip\$K       0.0160889358
babip\$chfreq  0.0095322944
babip\$chtot  -0.0000599228
babip\$chperc -0.0029691976```

In terms of relative importance, flyball-rate was most important but the changeup inputs made important contributions to the model as well. K-rate made the least important contribution (just 2% relative importance).  We can either combine the relative contributions of the changeups here or use this method to calculate relative importances for our second model.  Here is the output for the second model:

```> calc.relimp(fit2,type=c("lmg","last","first","pratt"), rela=TRUE)
Response variable: babip\$BABIP
Total response variance: 0.0002381301
Analysis based on 125 observations

2 Regressors:
babip\$fly babip\$chperc
Proportion of variance explained by model: 25.4%
Metrics are normalized to sum to 100% (rela=TRUE).

Relative importance metrics:

lmg      last     first     pratt
babip\$fly    0.7004244 0.6963282 0.7046952 0.7005284
babip\$chperc 0.2995756 0.3036718 0.2953048 0.2994716

Average coefficients for different model sizes:

1X          2Xs
babip\$fly    -0.114198930 -0.115942720
babip\$chperc -0.003046309 -0.003155121```

Basically, this tells us that flyball-rate accounts for about 70% of the usefulness of the model and changeups account for about 30% its usefulness.

On the overall, according to the methods and models described above, flyball-rate accounts for about 17.8% of pitcher babip variability. The total contributions of per pitch changeup value, total changeup value, and changeup frequency account for about 7.6% of pitcher babip variability.

Of course, this method has a critical flaw.  The problem with using linear weights pitch value data is that those linear weights values are affected by BABIP, so they aren't independent of one another.  However, changeup frequency should be independent of babip, so we can use a simple model that looks only at flyball-rate and changeup frequency.  Here is the output:

```> summary(fit2)

Call:
lm(formula = babip\$BABIP ~ babip\$fly + babip\$chfreq, data = babip)

Residuals:
Min        1Q    Median        3Q       Max
-0.039630 -0.009275 -0.000403  0.009383  0.032987

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.333697   0.008176  40.813  < 2e-16 ***
babip\$fly    -0.107534   0.022798  -4.717 6.44e-06 ***
babip\$chfreq -0.024325   0.017948  -1.355    0.178
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.01402 on 122 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.1875,	Adjusted R-squared: 0.1742
F-statistic: 14.08 on 2 and 122 DF,  p-value: 3.158e-06
```

As we can see, taking the linear weight values out of the model and using only changeup-frequency weakens the model quite a bit and does not demonstrate as clear a relationship between changeups and babip.  We can also look at the relative importance of each factor in the model using the lgm method described earlier:

> calc.relimp(fit2,type=c("lmg","last","first","pratt"), rela=TRUE)
Response variable: babip\$BABIP Total response variance: 0.0002381301 Analysis based on 125 observations 2 Regressors: babip\$fly babip\$chfreq Proportion of variance explained by model: 18.75% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first pratt babip\$fly 0.8625042 0.92373356 0.8167360 0.8801962 babip\$chfreq 0.1374958 0.07626644 0.1832640 0.1198038 Average coefficients for different model sizes: 1X 2Xs babip\$fly -0.11419893 -0.10753360 babip\$chfreq -0.04258601 -0.02432455

These values merely confirm that the influence of flyball-rate is still relatively much more important than the influence of changeup frequency (which may still be a somewhat important factor).

So this more conservative approach, which excludes linear weights, does not sufficiently demonstrate a significant relationship.  The non-significant relationship demonstrated by this conservative approach suggests that changeup frequency may account for about 2.5% of pitcher babip variance.

Overall, K-rate is extremely unlikely to be a significant factor and, even if it were, it would be an extremely unimportant one, accounting for only about 0.5% of pitcher babip variance.  As a side-note, this also serves as further evidence that there are serious flaws in the calculation of SIERA.  I propose that SIERA should be reconstructed so as to include the effects of flyball-rate, NOT K-rate, on babip.  Essentially, the only reason it works slightly better than xFIP or FIP is because it uses K-rate as a proxy for flyball-rate.  Since flyball-rate is easily measured and batted ball data are readily available, there's no reason to proxy flyball-rate.

So what do you all think?  What are some other factors we can test for effects on babip?

Thanks to David Bowie and Woodman663!