Turn and Face the Strange Changes: Does Throwing Changeups Help Pitchers Sustain Lower BABIP?
Earlier today (or yesterday, depending on which timezone you're in and when you're reading this), woodman663 posted a really interesting article demonstrating that changeup specialists may have a predilection for sustaining low babip. Many of the examples of pitchers that he looked at (for example, Ted Lilly) were extreme flyball pitchers. Since flyballs are more likely to turn into outs than grounders, it forces us to disentangle these two factors from one another.
Woodman and I have talked back and forth on the piece a bit and I suggested that we run some statistical analyses so that we could tease out whether the changeup effect was truly meaningful or if it was just an artifact of the flyball effect. I said much of what is in this post in the comments section, but here it is full-blown and with the output (which is important, in case I'm making mistakes here -- please let me know if you notice any).
I included all starting pitchers with 300+ innings since 2009 and used R v2.12.1 to fit a linear model for babip to fixed effects of flyball-rate, strikeout-rate, changeup frequency, total value by linear weights of all changeups, and value by linear weights per changeup. At Woodman's suggestion (and as justified in the body of the post), I included splitters as changeups.
Keep in mind that the p-values refer to whether the evidence suggests that a factor is significant (the lower the p-value, the more confident we can be that the effect is real) and the R-squared values refer to how well the model describes the variance (the higher the R-squared value, the better the description).
> summary(fit1)
Call:
lm(formula = babip$BABIP ~ babip$fly + babip$K + babip$chfreq +
babip$chtot + babip$chperc, data = babip)
Residuals:
Min 1Q Median 3Q Max
-0.040995 -0.007808 -0.000659 0.008768 0.032989
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.322e-01 8.910e-03 37.283 < 2e-16 ***
babip$fly -1.213e-01 2.342e-02 -5.179 9.2e-07 ***
babip$K 1.609e-02 3.379e-02 0.476 0.6348
babip$chfreq 9.532e-03 2.125e-02 0.448 0.6546
babip$chtot -5.992e-05 1.570e-04 -0.382 0.7034
babip$chperc -2.969e-03 1.544e-03 -1.924 0.0568 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.01358 on 119 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.2564, Adjusted R-squared: 0.2251
F-statistic: 8.206 on 5 and 119 DF, p-value: 1.100e-06
The "1 observation deleted due to missingness" (what a great word, by the way), was Tommy Hanson, who has zero changeups and splitters on record. Anyway, what we find is that the effects of flyball-rate are highly significant (p = 2 × 10**-16). The effects of value per changeup are moderately significant (p = 0.0568). None of the other effects (including K%!) were significant. A model including only those two factors actually fit the data slightly better than the initial model, which also included k-rate, changeup frequency and total changeup value. Here is the output for that model:
> summary(fit2) Call: lm(formula = babip$BABIP ~ babip$fly + babip$chperc, data = babip) Residuals: Min 1Q Median 3Q Max -0.040973 -0.008040 -0.001156 0.008629 0.032851 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.3340417 0.0078308 42.657 < 2e-16 *** babip$fly -0.1159427 0.0213364 -5.434 2.87e-07 *** babip$chperc -0.0031551 0.0008792 -3.589 0.00048 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.01344 on 122 degrees of freedom (1 observation deleted due to missingness) Multiple R-squared: 0.254, Adjusted R-squared: 0.2418 F-statistic: 20.77 on 2 and 122 DF, p-value: 1.726e-08
Next, I used the Lindeman, Gold and Merenda (1990) (lgm) method to describe the relative importances of each factor. Here is the output for the first model:
> calc.relimp(fit1,type=c("lmg","last","first","pratt"), rela=TRUE) Response variable: babip$BABIP Total response variance: 0.0002381301 Analysis based on 125 observations 5 Regressors: babip$fly babip$K babip$chfreq babip$chtot babip$chperc Proportion of variance explained by model: 25.64% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first pratt babip$fly 0.66176163 0.862561629 0.48939293 0.72622533 babip$K 0.02597355 0.007290960 0.04968005 -0.02154839 babip$chfreq 0.05245325 0.006468146 0.10981283 -0.03433426 babip$chtot 0.08818158 0.004685193 0.14603247 0.05044396 babip$chperc 0.17162998 0.118994071 0.20508171 0.27921336 Average coefficients for different model sizes: 1X 2Xs 3Xs 4Xs babip$fly -0.1141989299 -0.1126788278 -0.1150599466 -1.190098e-01 babip$K -0.0518129769 -0.0316319422 -0.0157599046 1.072460e-03 babip$chfreq -0.0425860063 -0.0276718376 -0.0142778771 -1.016283e-03 babip$chtot -0.0002423128 -0.0001693865 -0.0001124105 -7.509733e-05 babip$chperc -0.0030463088 -0.0028538654 -0.0028884462 -3.005977e-03 5Xs babip$fly -0.1213197877 babip$K 0.0160889358 babip$chfreq 0.0095322944 babip$chtot -0.0000599228 babip$chperc -0.0029691976
In terms of relative importance, flyball-rate was most important but the changeup inputs made important contributions to the model as well. K-rate made the least important contribution (just 2% relative importance). We can either combine the relative contributions of the changeups here or use this method to calculate relative importances for our second model. Here is the output for the second model:
> calc.relimp(fit2,type=c("lmg","last","first","pratt"), rela=TRUE) Response variable: babip$BABIP Total response variance: 0.0002381301 Analysis based on 125 observations 2 Regressors: babip$fly babip$chperc Proportion of variance explained by model: 25.4% Metrics are normalized to sum to 100% (rela=TRUE). Relative importance metrics: lmg last first pratt babip$fly 0.7004244 0.6963282 0.7046952 0.7005284 babip$chperc 0.2995756 0.3036718 0.2953048 0.2994716 Average coefficients for different model sizes: 1X 2Xs babip$fly -0.114198930 -0.115942720 babip$chperc -0.003046309 -0.003155121
Basically, this tells us that flyball-rate accounts for about 70% of the usefulness of the model and changeups account for about 30% its usefulness.
On the overall, according to the methods and models described above, flyball-rate accounts for about 17.8% of pitcher babip variability. The total contributions of per pitch changeup value, total changeup value, and changeup frequency account for about 7.6% of pitcher babip variability.
> summary(fit2) Call: lm(formula = babip$BABIP ~ babip$fly + babip$chfreq, data = babip) Residuals: Min 1Q Median 3Q Max -0.039630 -0.009275 -0.000403 0.009383 0.032987 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.333697 0.008176 40.813 < 2e-16 *** babip$fly -0.107534 0.022798 -4.717 6.44e-06 *** babip$chfreq -0.024325 0.017948 -1.355 0.178 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.01402 on 122 degrees of freedom (1 observation deleted due to missingness) Multiple R-squared: 0.1875, Adjusted R-squared: 0.1742 F-statistic: 14.08 on 2 and 122 DF, p-value: 3.158e-06
So this more conservative approach, which excludes linear weights, does not sufficiently demonstrate a significant relationship. The non-significant relationship demonstrated by this conservative approach suggests that changeup frequency may account for about 2.5% of pitcher babip variance.
Overall, K-rate is extremely unlikely to be a significant factor and, even if it were, it would be an extremely unimportant one, accounting for only about 0.5% of pitcher babip variance. As a side-note, this also serves as further evidence that there are serious flaws in the calculation of SIERA. I propose that SIERA should be reconstructed so as to include the effects of flyball-rate, NOT K-rate, on babip. Essentially, the only reason it works slightly better than xFIP or FIP is because it uses K-rate as a proxy for flyball-rate. Since flyball-rate is easily measured and batted ball data are readily available, there's no reason to proxy flyball-rate.
So what do you all think? What are some other factors we can test for effects on babip?
Thanks to David Bowie and Woodman663!
48 comments
|
2 recs |
Do you like this story?
Comments
Basically
I have no idea what you just did.
Good job though.
Sad, Drunk, And Poorly
My friends, love is better than anger. Hope is better than fear. Optimism is better than despair. So let us be loving, hopeful and optimistic. And we'll change the world. - JL
by Pikachu on Sep 29, 2011 11:05 PM EDT reply actions 1 recs
Ha, I tried to understand.
But I uh…ya I get the conclusion drawn though.
A day that will live in infamy: August 4th, 2011
7 pissed off members of the Aaron Hill fanclub
I've sort of said this already
but I wouldn’t think that more changeups, a pitch that is known to cause more swinging strikes than other pitches, would cause a lower BABIP. I’d be interested in correlating movement to BABIP, because I’d guess that more movement would lead to worse contact and therefore fewer hits.
I’d also be secondarily interested in correlating the number of quality pitches to BABIP, because I’d guess that the more quality pitches in a pitcher’s arsenal, the lower the quality of contact.
Granted, both of these are pure speculation. But if pitchers have some control over batted ball types, it stands to reason that they may well have some limited control over BABIP. of course, if it comes at the expense of more fly balls it doesn’t necessarily make them better pitchers.
"Let us go forth awhile, and get better air in our lungs. Let us leave our closed rooms... The game of ball is glorious." - Walt Whitman
But how would you describe movement?
Horizontal/vertical movement of the four-seamer? Horizontal movement of the cutter compared to the four-seamer, or in absolute terms? Etc.
the best proxy for "movement"
is whiff rate. swinging strikes are the ultimate form of “weak contact”.
on a related note, what we’re trying to test here is effect of a “good changeup” on BABIP, and we’re using changeup % as a proxy for “good changeup”. the better the relationship between “changeup quality” and “changeup %”, the better a correlation we’re going to get with BABIP (assuming there is a real effect). i think we should try to think of a better way to classify “good changeups”. if we can find a relatively straightforward way to do that we could run the model again with taht instead of changeup %, and it might show even more significant results.
Right.
I don’t know where to access a dataset that would have every pitcher’s whiff-rate by pitch type.
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
You can try Texas Leaguers
Though, that would be a very long and tedious process.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Yeah, I know they keep track of it
but I’m not searching for the changeup whiff-rates for the past 3 seasons of 100 or so pitchers. I may not be The Most Interesting Man in the World, but even I have better things to do.
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
if we're okay using perhaps tenuous proxies
you could maybe use every pitcher with a positive wCH score? too small a sample?
sorry, meant to add
because it’s probably likely that the guys with positive-value changeups get lots of SwStrs
Well, remember
we regressed it on weighted changeup value and found that babip was lower for pitchers with good changeups but a lot of that may be tied up in the fact that the changeups looked good because babip was lower. Because pitchers throw so few changeups (so not that many are put into play), you’d need a really large sample to remove the element of luck (or random variance, if you’re Ben Kenobi).
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
good point
hadn’t thought about that. I guess we’d need a FIP-type weighted pitch type value or something
they keep track of a lot of that information.
I just don’t know where it is conveniently summarized
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
I prefer Ben Ki Moon
I'm more than a little jealous of Grantland's ability to use footnotes rather than excessively long bracketed statements.
I'm not so sure about that
sinkers have plenty of movement, certainly more than four seamers, but don’t tend to induce as many strikeouts.
"Let us go forth awhile, and get better air in our lungs. Let us leave our closed rooms... The game of ball is glorious." - Walt Whitman
Right
if we were looking at the effects of a different pitch (say a two-seamer or a cutter), whiff% might not be as good of a proxy.
As an aside, I’d think the reason changeups miss bats has more to do with the change in velocity than the movement.
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
Well done!
BBB is getting hard core.
Follow me @BBBMinorLeaguer | 2011 Jays record while in attendance: 12-12 (.500)
You should work for Fangraphs
This is incredible work! Well done!
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
haha, thank you
I much prefer the BBB community! this is way more writer-reader interactive and way less full of douchebags. some great (and some less-than-great) work is done at fangraphs but that doesn’t mean that other sites can’t do interesting work, too.
It is true that this is a fan site and a lot of this stuff is really more MLB-wide, but I don’t think that’s necessarily a bad thing. We aren’t all completely bluejays-centric here after all.
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
I'm always kind of surprised (though I shouldn't be)
at how many complete asshats there are on FanGraphs
Man
did you read the comments in the “Women in Baseball” piece(s)? Terrible.
Sad, Drunk, And Poorly
My friends, love is better than anger. Hope is better than fear. Optimism is better than despair. So let us be loving, hopeful and optimistic. And we'll change the world. - JL
I've seen things
You people wouldn’t believe.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
/b/
Sad, Drunk, And Poorly
My friends, love is better than anger. Hope is better than fear. Optimism is better than despair. So let us be loving, hopeful and optimistic. And we'll change the world. - JL
Believe it or not
Worse.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
how
that’s unpossible
Sad, Drunk, And Poorly
My friends, love is better than anger. Hope is better than fear. Optimism is better than despair. So let us be loving, hopeful and optimistic. And we'll change the world. - JL
This is the Internet
It’s seemingly limitless
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
exactly how I feel
Sad, Drunk, And Poorly
My friends, love is better than anger. Hope is better than fear. Optimism is better than despair. So let us be loving, hopeful and optimistic. And we'll change the world. - JL
My advanced statistical analysis professor in University was Swedish, he reminded me of the Swedish chef and made me laugh everytime he talked.
I don’t know how I passed.
I think you'll find I'm universally recognised as a mature and responsible adult.
Twitter is the thing with all the tweets...
My first year math prof could barely speak English (He was from China), so most of the students (including myself) had trouble understanding what he was saying. When he did say things we understood, he sounded a lot like Borat for some reason. It was funny.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
My first year calculus prof was from England
He had a very monotone voice. Combine that with calculus and it was the only class I actually felly asleep in. I barely passed that course.
Hic sunt fortuna dracones
I remember having a high school history teacher who was so monotonous, so boring, that the entire class would fall asleep during each lecture. Think Ben Stein’s character in Ferris Bueller.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
"voodoo economics"
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Hah, my first year calc prof had a very distinctive voice and I couldn’t figure out where I had heard it before. It was driving me nuts that I couldn’t figure out who he sounded like.
Mid semester in the very middle of the class I snapped my fingers and went ALF!
The guy in front of me turned around looking completely relieved and said Oh thank you!
Apparently he had been trying to figure that out all year as well. Great times, we went and got drunk afterwards to celebrate the discovery.
I think you'll find I'm universally recognised as a mature and responsible adult.
Twitter is the thing with all the tweets...
by JohnnyG on Sep 30, 2011 1:59 PM EDT up reply actions 3 recs
Where's Mylegacy?
He needs to see this.
Sad, Drunk, And Poorly
My friends, love is better than anger. Hope is better than fear. Optimism is better than despair. So let us be loving, hopeful and optimistic. And we'll change the world. - JL
Likely three or four fingers
deep in a bottle of lagavulin
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
Since flyball-rate is easily measured and batted ball data are readily available, there’s no reason to proxy flyball-rate.
It’s easily measured, but not a sufficiently high degree of precision.
Fly ball vs liner classifications are frequently skewed by whether the outfielder gets to the ball or not (or how long it takes him to get there after it lands) rather than basing it on the angle of the flightpath.
Maybe one day they’ll accidentally leak Hitfx…
And, herp derp I like this post, well done.
I'm more than a little jealous of Grantland's ability to use footnotes rather than excessively long bracketed statements.
While that is true to a certain extent
the central limit theorem dictates that we don’t necessarily have to be concerned with those mistakes in classifications over large samples because they should be normally distributed (and, thus, cancel the effects of one another out).
Also, remember: SIERA doesn’t deliberately use K-rate as a proxy for fly-rate, it assumes that K-rate is a significant and important factor on BABIP and this only works because pitchers with high K-rates also tend to have high flyball-rates (partially because most pitchers with high flyball rates couldn’t get by in the majors without high K-rates).
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
but GB%
is a lot less dependent on scorers right? So you could just use the inverse of GB%, which is something like BallInAir%. I much prefer HR/BIA as provided by Statcorner over HR/FB, especially when using a regression like in xFIP.

by 























