Baby, There's No Guidance When Random Rules: Autocorrelation in Pitcher BABIP from 2010 to 2011
Our collective interest in the nature of BABIP is no secret around these parts. A few months ago, we quantified a fairly weak, but highly significant link between BABIP and flyball-rate. As a long-delayed follow-up, I wanted to look at the actual correlation of a pitcher's babip from one year to the next.
I constructed a simple linear model attempting to fit 2011 pitcher BABIP to 2010 pitcher BABIP. I excluded pitchers with fewer than 170 IP in either season (a total of 57 pitchers in the sample). The model did not incorporate batted ball profiles, k-rates, or anything else that might be correlated with pitcher BABIP.
A significant relationship was not established (F = 1.085, df = 55, p = 0.302, R**2 = 0.02). With larger samples, I bet we would see a significant relationship, but I don't think the correlation would be any stronger (R**2 = 0.02 is extremely weak).
You'll likely notice that the "perfect correlation" is slightly off. That's because of a very slight decrease in BABIP leaguewide in 2011, relative to 2010. The "no correlation" line shows a horizontal line at league-average BABIP in 2011 (0.28815). Essentially, a strong correlation would be much more closely aligned with the "perfect correlation" line than with the "no correlation" line. That a. the points are not clustered around the actual correlation line; and b. the actual correlation line is quite similar to the horizontal, the correlation between a pitcher's BABIP in 2011 and his BABIP in 2010 is extremely weak.
So what does this mean for predicting BABIP in 2012? Personally, I think we can probably throw a pitcher's 2011 BABIP out the window and concentrate on his flyball-rate instead. In fact, after incorporating a pitcher's 2010 GB-rate into the model, the R**2 value increased to 0.08 and the relationship became much more significant (F = 3.447, df = 54, p = 0.039). Unsurprisingly, the relative importance of 2010 GB-rate (92%) on fitting the model was far greater than the relative importance of 2010 BABIP (8%). As it is commonly held that the longer into his career a pitcher has pitched, the better read we have on his hit-suppressing tendencies, the next topic I want to look at is whether a pitcher's single season batted ball profile is a better predictor of his next season's BABIP than his career BABIP.
What do you all think would likely be the better predictor?
Thanks, by the way, to the Silver Jews for today's post title.
32 comments
|
3 recs |
Do you like this story?
Comments
As usual, very interesting article
I find the results not surprising at all.
I would guess, a priori, that the single season batted ball profile would be more predictive for the next season, but I’m thinking there could be serious correlation problems here. Career batted ball profile should line up with career BABIP pretty well, and single season batted ball profile should, in aggregate, line up with career batted ball profile. Which means I would think career BABIP and single season batted ball profile should be fairly correlated. Anyway, I’ll be interested in seeing the results.
BTW – that graph is really helpful, having the three lines that you included. The lack of correlation is really clear when you can explicitly compare perfect to no correlation.
Right, there's definitely the problem that
pitchers change over time and the pitchers that we’d be looking at have actually had the most time to change. As such, previous year’s batted ball profile could likely be more indicative of the pitcher’s ability the next season than his career in aggregate.
On the other hand, I’d think that information would be really meaningful, wouldn’t it? Think about how much ink could be spilled trying to determine how good a pitcher might be in his next season based on how good he’s looked over his career. If simply looking at his recent batted-ball profile does the same job just as well, I’d think we’d be saving a lot of time and headaches
Thanks for reading — and for the kind words, by the way
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
A Wild Jessef Appears!
Rent this for cheap!!
by Bowling_Guy25 on Dec 4, 2011 2:57 PM EST reply actions 2 recs
Great analysis
you should really submit these to a more widely read SABR-type blog (BtB, HBT, BP, etc) rather than only having us Jays fans read them.
by SuckaMD on Dec 4, 2011 3:03 PM EST reply actions 1 recs
thanks
while a lot of this work is more applicable in a league-wide context (and probably might appeal to a broader fanbase), I kind of prefer the feedback and constructive criticism from the readers here. For all of its faults, SBNation does have an excellent interface and I really do think we have built something of a community.
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
A chart is fine, too...
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
Excellent analysis
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
Just out of curiosity
I did a regression analysis between the 2010-2011 BABIPs of pitchers (Min. IP: 350; N = 60 pitchers) and their career BABIPs using Excel. The R^2 value I got was 0.68, and it was statistically significant (F = 123.7; p = 0.016). Some pitchers with young careers may have skewed the data (eg. Ricky Romero).
Glove tap to Fangraphs for the numbers.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
using Excel and MyStat
Fix’d
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
Yeah, I'd think that the young pitchers would throw that off quite a bit
I’m doing one now where I’m randomly selecting a year and looking at how well it correlates with that pitcher’s career BABIP for pitchers with 1500 + IP since 2002 (since that’s when fangraphs battedball data start)
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
Yeah
That’s probably a better model than the one I put together.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
This is interesting
Min 350 IP between the 2 season combined, regardless of a minimum in each season?
You have n = 60, jessef has n = 57. Given his criteria meant a minimum of 340 IP between two seasons, if I understand your criteria correctly, I find it almost impossible that a pitcher from his sample didn’t make it into yours. Which would mean just 3 more pitchers in yours, which shouldn’t mean such radically different results.
Am I missing something?
That’s just the number of pitchers I got from Fangraphs.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
When I stated the conditions (340 IP; 2010-2011), it got me 60 starting pitchers.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
Oh, I think I understand what you're talking about now
I’m not using the same variables as what jessef used. I’m looking at the relationship (if that’s the best term to use) between a pitcher’s career BABIP to total BABIP between the 2010 and 2011 seasons.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
Now that I'm straightened out
Yeah, it makes sense that young pitchers could really bias that. Out of curiousity, would it be easy to add a screen to only include pitchers whose have career IP / 2010-11 IP > 2 (this is arbitrary, you could go higher, I wouldn’t go lower), and then re-run the correlation analysis with only those pitchers?
I would
But I need to study for my final exams. =P
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
What you’re calling a “perfect correlation” is actually a slope of 1 – it’s possible to have a best-fit line with that slope and have any magnitude of correlation. The entirety of the graph is wholly misleading, in fact – the slope does not determine the magnitude of the correlation.
by cwyers on Dec 5, 2011 4:32 PM EST reply actions 1 recs
right, it isn't perfect in the sense that it describes all of the variance
The figure is not “wholly misleading” — if you read the article, I make it a point to explain that the slope is only part of what is valuable information; it is the clustering around the line, obviously that determines how strong the correlation is.
And, while the slope does not determine the “magnitude of the correlation,” it is what determines how strongly the values are autocorrelated in any meaningful sense. That the slope approximates zero suggests that one season’s worth of babip data is essentially useless in predicting the next season’s babip. The farther that slope is away from 1, the less autocorrelation there is in the data from one season to the next.
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
by jessef on Dec 5, 2011 7:29 PM EST up reply actions 1 recs

by 






















