In an excellent piece this past week, hugo looked at some of the difficulties Brandon Morrow's been having this season. He noted that Morrow has seemed to have had a lot of trouble stranding runners, possibly as a result of yielding a greater number of flyballs with runners on, which may or may not be related to having problems locating his pitches at the bottom of the strike zone when pitching from the stretch. The article got me to thinking -- what factors actually control a pitcher's strand-rate?
First, very little is actually known about strand-rate. To be honest, not only did I not know the method used to calculate it, I didn't even know what it actually purports to measure. Does strand-rate even attempt to account for the effect of homeruns automatically scoring runners who are already on? While the more sabrmetrically-inclined of y'all may not need this review, I certainly did. Per Hardball Times, the stat is calculated as: (H+BB+HBP-R)/(H+BB+HBP-(1.4*HR)). In words, this just means the number of baserunners a pitcher strands (the number he lets on minus the number he allows to score) divided by the number of baserunners he could potentially strand (total number of baserunners minus the number that score on homeruns). Note that the method does not do a perfect job of describing the probability that a pitcher has stranded any given baserunner because it does not include hitters who have reached on errors and it uses a general formula to calculate how many of a pitcher's runs were scored on HR (essentially, it assumes each homerun scores 1.4 runners). While this is not perfect in small samples, thanks to good friend to every statistician and half-cousin to Charles Darwin, Sir Francis Galton (to be honest, I'd never actually heard of him), who postulated the central limit theorem, in large samples, it's certainly good enough to allow us to make inferences.
To determine how closely strand-rate was correlated with (not affected by!) a number of factors, I performed separate simple linear regression analyses.
Peripheral stats and fastball pitching:
I expected that K-rate affects strand-rate (positive correlation) more than any other single factor. I also expected that BB% would be weakly positively correlated with strand-rate. Upon reaching base on a walk, the hitter is at first base. Since non-HR hits can go for doubles and triples (and thus those baserunners are inherently more likely to score), it makes sense that a pitcher who has a higher proportion of his baserunners reach via the walk should have a better strand-rate. Additionally, walks are less likely to drive in runs than hits. It is also possible that pitchers who throw harder would have better strand-rates. As estimates of how hard pitchers throw, I used fastball velocity, fastball % (of pitches thrown), and weighted fastball value (per 100 fastballs) by linear weights.
BABIP, ERA, and ERA Estimators:
Strong negative correlations (as x increases, y decreases) should exist with BABIP (BABIP should drive strand-rate) and ERA (strand-rate should drive ERA). As BABIP is likely driven by LD-rate, I expected strand-rate should also be correlated with LD-rate. Since defence-independent (DIPS) ERA estimators/predictors take K% into account, I also predicted weaker, but still significant, negative correlations with DIPS metrics, such as FIP, xFIP, SIERA, and tRA.
Batted Ball Types:
Since linedrives are much more likely to become hits, I expected a negative correlation between linedrive-rate (LD-rate) and strand-rate. I did not expect to see a correlation with batted ball types besides linedrive-rate because, although groundballers are more likely to induce double plays and less likely to allow extra base hits, they are also more likely to allow singles. Over extremely large samples, there must be an effect of pitcher batted-ball splits, but -- outside of LD-rate -- I expected that effect to be very small and obscured in one-year samples.
For those of you unfamiliar with what p- and R-squared values mean, here is an extremely quick and relatively painless explanation. The p-value refers to whether or not the factor is significantly correlated (i.e., if there is a relationship between the two variables at all). A general rule of thumb is that a p-value below 0.05 means there is likely to be a significant effect of the factor. A p-value greater than 0.05 means the evidence does not strongly support there being a correlation. The R-squared value refers to how strong the relationship between the two variables is. An R-squared value of 0.5 means that about 50% of the variation in y (say, strand-rate, for instance) is related to variation in x (say, K%) and the other 50% is controlled by other factors or random statistical noise.
Peripherals and Fastballs
BABIP, ERA, and ERA Estimators
Batted Ball Types
Discussion and Conclusions
Peripherals and Fastballs
As expected, pitchers with higher strikeout-rates tend to strand more runners. This makes sense intuitively and has been discussed previously. I expected walk-rate to be positively correlated with strand-rate but the results show very little evidence that a correlation exists. It is likely that any positive effects of walking more batters on strand-rate may be obscured by other factors associated with wildness (such as wildness in the strike zone, an increasing number of wild pitches, or more trouble holding runners on base). Fastball value was strongly correlated with strand-rate, but that does not mean that fastball-pitchers are better at stranding runners, all it means is that pitchers who throw good fastballs strand runners better. In fact, not only does throwing more fastballs not increase pitcher strand-rates (I wouldn't necessarily expect it would), but increases in fastball velocity are not correlated with increases in strand-rate. Unexpectedly, simply throwing harder has no bearing on whether a pitcher will be better or worse at stranding runners than his peers.
BABIP, ERA, and DIPS
Again, expectedly, both BABIP and ERA are strongly negatively correlated with strand-rate. The most likely relationship is that increases in BABIP drive decreases in strand-rate. Corresponding drops in strand-rate are directly associated with increases in ERA. DIPS ERA estimators and predictors are all weakly negatively correlated with strand-rate. The relative strengths of these correlations are likely due in large part to the relative weights of the inputs for each of these ERA estimators. tRA uses actual batted ball data, which include a linedrive-rate component, and is most strongly correlated with strand-rate among DIPS stats. SIERA uses K% as an estimator of BABIP. Although this may cause it to overestimate the influence K% has on ERA, it does cause it to be more tightly coupled to strand-rate than FIP or xFIP.
Batted Ball Types
As predicted, pitcher linedrive-rate is correlated with strand-rate (presumably through BABIP). Somewhat unexpectedly, there is some evidence that groundball pitchers can maintain lower strand-rates than flyball pitchers. On the other hand, both the significance (p-value = 0.098) and strength (R-squared = 0.0285) of this correlation suggest that it is not all that meaningful. At the same time, since groundball-pitchers tend to have higher BABIP than flyball-pitchers, the fact that we see even a slight positive effect of groundballs is surprising and likely stems from groundballers being better able to suppress extra-base hits and induce double plays. There is virtually no correlation between HR/fly-rate.
Thanks to fangraphs for 2011 pitching statistics, the Velvet Underground for the post title, and hugo and benk for giving me the idea to do this.