Hi everyone. If you're somewhat new around here, there's a good chance you don't know who I am. Don't worry, you're not missing out. Those of you who do know me (biblically or otherwise) likely know that I'm interested in how much pitchers control whether batted balls become hits or outs (the seemingly ubiquitous "batting average on balls in play", or BABIP). Indeed, I'm kind of one-trick pony in that regard, so there's a good chance you're already familiar with some of my musings on the subject. Anyway, this isn't just a cheap trick to get you to click on articles I wrote a year ago (though I will admit that it is mostly one). In fact, I'll summarize a lot of those articles for you in one sentence: most of the variance in pitcher BABIP each season seems to be random, though a good bit of it can be attributed to how many groundballs vs. flyballs that pitcher tends to induce. However, a lot of folks who are a lot smarter than me have done a lot of work looking at a lot of pitcher seasons and found that pitchers have a lot more control over their BABIP than the numbers I've crunched suggests. So, today, I decided to crunch the numbers a bit differently; instead of looking at seasonal BABIP variance, I looked at variance over pitcher careers.
In order to make sure that pitchers had long enough careers to be included, I set an arbitrary minimum of 1500 innings. Since previous studies suggested that batted ball types strongly influence BABIP, I wanted to make sure that I could include groundball-rate into the model, so I included seasons since 2002, only (2002 is as far as batted ball data go back, at least on Fangraphs, anyway, a total of 37 pitchers). In order to determine whether or not hit suppression is actually a skill, rather than happenstance resulting from flyball-yielding tendencies, I decided to compare large samples of each pitchers careers to one another. To do so, I split each pitcher's career into two separate meta-seasons, even years (2002, 2004, 2006, etc.) and odd years. Using meta-seasons comprising odd vs. even seasons helps prevent biases resulting from pitchers whose performance levels have changed dramatically over time (see, for example, Cliff Lee). In order to avoid overweighting performance in shortened seasons, I used a weighted mean in aggregating pitcher BABIP and GB-rate for each metaseason.
Next, I built linear models using the open-source statistical software R. The first model fit a pitcher's even metaseason BABIP to his odd metaseason BABIP, his even metaseason GB-rate, and their interaction. Since the interaction was not significant (p = 0.457), I eliminated it and built the model without the interaction. The model suggests that the effects of odd metaseason BABIP were highly significant (p < 0.001) whereas the effects of groundball-rate during even seasons was not significant (p = 0.692; Figure 1). Without the term for groundball-rate, the model was highly significant and suggested that a pitcher's BABIP in odd seasons accounted for about half of the variance in his BABIP in even seasons (F = 34.17, R**2 = 0.480, p < 0.0001; Figures 2). For the odd metaseasons, the effects of groundball-rate were marginally significant (p = 0.0516) and the effects of even metaseason BABIP were highly significant (p < 0.001).
So this suggests that pitchers do exhibit quite a bit of control over BABIP over large samples. Considering what I've found previously, I will admit that I'm surprised at how much stronger the effects of BABIP were than the effects of batted ball type. Nonetheless, it is important to remember that the effects we found here are not present in smaller samples, so teams should not be making personnel decisions based on hit suppression unless a the pitcher has exhibited a large body of work. Even in such cases, it should be noted that, even considering only very large samples, we should still be regressing previous BABIP almost halfway to league-average to find a pitcher's true talent. Thus, we should be careful not to read too much into any one pitcher's results, lest we risk eternal damnation for telling a post-hoc narrative. Disagree with the premise, methods, or conclusions? Have any advice for future stuff? Just want to be recognized for getting through all this nonsense? Let me know in the comments.
Thanks to They Might Be Giants for today's post title. That one goes way back to John Henry. Oh, and if you think I only wrote this so I could put in that last link you're right.