/cdn.vox-cdn.com/uploads/chorus_image/image/70307477/1139777292.0.jpg)
In the background part of this mini-series on Statcast data, I ran through some data on exit velocity (EV), launch angle (LA), and how the latter is arguably even more important than just raw exit velocity. Having thought about this a fair bit, I come up with a simple tweak for improving the usefulness of exit velocity data.
But let’s start with the Statcast xwOBA model, which estimates expected production for each combination of launch angle and exit velocity. The overall results are further improved by adjusting for batter speed for balls on or near the ground. My main quibble is it still doesn’t account for spray angle, which really matters for fly balls between 300 and 400 feet (I’m still vexed by what was going on with Kendrys Morales and fly balls).
But in grand scheme, it’s the gold standard. How well does it perform? Running actual production on each ball from 2020 against against the expectation, the correlation is very strong at +0.69 and the linear regression model has a R-squared of 0.47. So on any given ball, just under half the variance in actual results can be predicted from xwOBA (and in turn launch angle/exit velocity). Considering all the factors at play and inherent randomness, that’s a very good model.
But this output lacks the intuition of exit velocity—we know that a ball hit 100+ MPH is very hard, 70 MPH is very poor. So what about the predictive power of just exit velocity? Running raw exit velocity against actual results gives a moderately strong correlation of +0.31 and R-squared of 0.09, so exit velocity alone explains less than 10% of the variance in results on a ball. That’s not so good, and very inferior to xwOBA (just 20% as good).
One caveat is that as demonstrated before, batted ball results only take off when batted ball results get above 90 MPH. They’re really poor below that level, but essentially uniformly poor independent of EV.
:no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/23121037/EEV_exit_velocity_wRC__anno.png)
If we adjust the tweak the data to treat all balls below 90 MPH the same and only measure the change above that level (subtract 90 MPH from each ball with a 0 floor), the model is a meaningfully improved. The correlation rises to +0.42 and R-squared of 0.18, but that’s still quite lackluster compared to the model.
My goal was to create a simple statistic that was a single number, first and foremost easy and intuitive to understand and use, but incorporating launch angle into exit velocity. It’s necessary not going to be as good as xwOBA, but that’s not the point: it’s like the difference between between wOBA/wRC+ and OPS/OPS+. The first is the gold standard, using linear weights to precisely value different offensive outcomes in relation to each other.
By contrast, OPS doesn’t weight things properly (and is strictly speaking a math error, adding together two things with different denominators). But if you’re trying to introduce metrics to a casual fan, OPS is really intuitive in a way that “linear weights” is...not (to put it mildly). It’s just taking two familiar stats and adding them together, with the underlying value of not making outs and power being obvious. OPS+ then just puts that on a scale that’s easier to compare across players and time (with a few adjustments).
More importantly, while it is less accurate and thus inferior than wRC+, the correlation between the two is very strong, and both correlate very strongly with runs. wRC+ is a little better than OPS+, but OPS+ is a lot better than batting average and RBIs. It’s somewhat like the Dow Jones Industrial Average, which continues to be a widely quoted index despite significant theoretical shortcomings as a price-weighted index of just 30 companies (which is the reason it exists, it was practical to calculate before computers), but in the end it’s highly correlated with better, broader indexes (such as the S&P 500).
My result is what I have dubbed Effective Exit Velocity (EEV).
As shown in the first part, offensive production peaks from a launch angles 10 to 30 degrees, and declines on either side. So splitting the difference between the two, use 20 degree as the “optimal” launch angle, and then subtract from exit velocity for each deviation from that.
Effective Exit Velocity = Exit Velocity minus absolute difference of launch angle and 20
Basic EEV = EV - |(LA-20)|
So the idea is a ground ball hit 100 MPH but deeply downward at -20 LA (expected batting average of about .125) results in an effective exit velocity of 60 by subtracting 40. Likewise, a big league can of corn hit that hard but upward at a 55 LA would be rated similarly (EEV of 65), In both cases, that’s much more representative of the quality of contact.
On the flip side, a solid line drive hit 85 MPH at 15 degrees still scores very well, as the EEV of 80 (85-5) still ranks at the 65th percentile. It’s not crushed, but it doesn’t have to be at that angle to do damage.
So it’s a simple little bit of mental math that ends up on roughly the scale as exit velocity. But how does it stack up? Regressing actual production against EEV results in a correlation of +0.40 and R-squared of 0.16, which is meaningfully better than EV alone.
:no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/23121039/EEV_basic.png)
However, again, we see that same trend where wRC+ is pretty flat and negligible until the “takeoff range”, in this case an EEV of about 60 MPH. When the data’s adjusted for this as was done above for exit velocity, the correlation rises to +0.53 and R-squared of .028. Both adjusted an unadjusted, EEV is significantly better than raw EV.
There’s nothing special about that formula, and I played around the numbers, changing the “optimal LA” as well as subtracting more or less than 1 for each degree of deviation. But nothing more than minutely improved the model (21 or 22 degrees is the slightest bit more efficiently), so for simplicity I’m sticking with those numbers.
There is one tweak that made the EEV a bit more involved, but that did meaningfully improve the model and I actually think it better reflects the data so I’m going to present a second “advanced” version.
Going back to the chart of production by launch angle, batting average peaks with LA in the low teens, whereas power peaks in high 28. In between, they are counter to each other but the sharp increase in power dominates. The “dual peak” is really a distortion is caused by outfield positioning and fundamentally, the distribution is more triangular and non-symmetric:
:no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/23121043/EEV_launch_angle_anno.png)
The rise is gradual, as first batting average increases quickly with no power, then power starts increasing but batting average quickly falls off, then finally when power output reverses both move together and production collapses. Put another way, it takes almost 40 degrees of LA for wRC+ to rise from 0 (around -12 LA) to the absolute peak (around 28 LA), but only about 12 degrees for it to fall back to 0 (at 40 LA).
Given that, it doesn’t really make sense to have the same penalty on both sides. If a batter hits a ball 15 degrees away from 20, it’s much better to be 5 rather than 35 degrees. The tradeoff is the calculation gets more complex, because it’s now a “piecewise” formula: if it’s less than X do Y, if greater than X do Z. But it is more faithful to the nature of the data.
I played around with the numbers, and got two formulas that basically are equally as good, and improve the R-squared a little to 0.29
Advanced EEV = EV - [(25-LA) if LA<25; 2*(LA-25) if LA>25]
Advanced EEV = EV - [(28-LA) if LA<28; 3*(LA-28) if LA>28]
In both cases, the “optimal” LA is moved higher, if the first case to 25 degrees with a double penalty for each degree above as below; in the second case to 28 degrees with with a triple penalty. They both work out about the same; given the relative rises the second is probably technically the better specification.
A final thing to recommend effective exit velocity is that for all versions, the scale justhappens to line up really well with the familiar grading scheme for intuition. Essentially, anything over 90 represents A+ contact (excellent), mid-80s A contact (very strong), low 80s A- contact (very good), 70s B contact (okay), 60s C contact (mediocre/poor), 50s D contact (very poor); below 50 is totally failing.