It's no secret that we are interested in determining what factors allow some pitchers to sustain lower ERAs than their strikeout-, walk-, and groundball-rates would suggest. Hence, this, another installment for this endless series in which we try to determine what factors allow pitchers to outperform their peripherals. In this installment, we'll be looking at whether pitchers with more diverse arsenals are able to keep hitters off-balance, allowing them to induce weaker contact. I have long considered this a possible reason that Shaun Marcum, who is notable for his array of pitches and his willingness to use any pitch in any count, has been able to outperform his peripherals for so long.
Before we can start looking at possible effects of arsenal diversity, we need to quantify the diversity of each pitcher's arsenal. As such, I chose the Shannon-Wiener index, a measure commonly used to estimate biodiversity in ecological communities. I used a sample of all pitchers who pitched 100+ innings in 2011 (a total of 145 pitchers) and exported their pitch type data from fangraphs and used the vegan package in R to calculate Shannon-Wiener diversity. Essentially, pitchers were analogous to communities and pitch types were analogous to species. The index takes both the number of different types of pitches a pitcher throws and the evenness of his usage of those pitches into account. The pitch distributions is an important factor here -- the index should be less influenced by a "see-me" changeup used two or three times a start than by a pitcher who uses his changeup five or six times as frequently. The index scales from zero (which would be a pitcher who uses the same pitch 100% of the time) to the natural logarithm of the number of different pitches a pitcher throws. As an example, a realistic maximum might be a pitcher who throws seven different pitches and uses them all equally. His arsenal would have a diversity index of log(7) = 1.946, so we can say that the index (for pitchers) scales from roughly 0 to 2.
Ever wonder which pitchers have the most diverse arsenals? Well, at the top of the list is actually our old friend, Shaun Marcum at 1.525 (mean diversity = 1.084; see the end of the article for the entire list of pitchers). Remember that these values are calculated on a log-scale, so Shaun Marcum has a much more diverse arsenal than the average pitcher. At the bottom of the list, as you might have guessed, you'll find extreme one-pitch specialists, like Tim Wakefield and Justin Masterson. Due to the innings exception, Mariano Rivera is not included, but his diversity score is 0.407. If you were unconvinced about the method before, I hope seeing Marcum near the top and these other pitchers near the bottom has assuaged your fears.
Now that we have figured out a way to estimate a pitcher's arsenal diversity, we need to figure out a way to evaluate that pitcher's outperformance of his peripherals. I chose to create an "Unluckiness Index" which is simply that pitcher's xFIP subtracted from his ERA (ERA - xFIP). If our original hypothesis (that pitchers with more diverse arsenals would be better equipped to outperform their peripherals) was correct, we should see an inverse correlation between the Unluckiness Index and the Diversity Index. Unfortunately, we don't. Although the best-fit line does have a negative slope, the findings are not significant (R**2 = 0.005, df = 143, p = 0.579), so I chose not to include it:
Nonetheless, these results do not necessarily mean that there is nothing to our hypothesis. I may rerun these analyses using a population of pitchers with 750 innings over the past 4 seasons (or something to that effect), which should weed out much of the actual luckiness or unluckiness present. Any other suggestions? Should I use a different index of diversity? Should I use a different index to estimate outperformance of peripherals? Is the hypothesis just completely off-base?
Thanks to Woody Guthrie for today's post title, from "She Came Along to Me" a song later written and recorded by Billy Bragg and Wilco.
Pitcher Arsenal Diversities