/cdn.vox-cdn.com/uploads/chorus_image/image/54837413/usa_today_10038607.0.jpg)
Hit Probability
For this 2017 season, the MLB has begun to release Hit Probability data for every Batted Ball Event. More info on the HP% metric can be found here, but essentially it calculates historically how often each ball would be expected to fall for a hit, based on its exit velocity and launch angle. Data for each game is publicly available on BaseballSavant’s gamefeed page.
Comparing each ball in play to historical data will essentially limit the effect of good/poor defense and ballparks affecting the results of hitters. On the BaseballSavant Hit Probability Breakdown page, it shows results from a sample of 240,595 Batted Ball Events (balls in play plus foul outs), so there is a significant sample to compare to.
Limitations
- Foot speed - player’s ability to outrun soft ground balls may result in reaching base on lower HP% balls (soft grounders for example)
- Players who consistently hit into extreme shifts - may be hitting the ball hard to very defined areas, which would mislead the model into thinking it is getting a good opportunity for a hit, while in reality the defense can actually prepare themselves to take away this hit
- Some events such as bunt, and the occasional hit or out in play are left off of Gamefeed, which is addressed in the model
- Power can’t effectively be calculated, as the results for how often a certain ball turns into more than a single are not publicly available
How it Was Calculated
I extracted the data from BaseballSavant’s gamefeeds after every game. From there, I created a model that would give a player credit for the probability of a hit from their batted ball event.
For example, a hit with 75% hit probability would count for 0.75 of a hit, and an out with 45% hit probability would count for 0.45 of a hit. While the results may show that the batter achieved one hit on two attempts, the batter actually did a little better, and over a larger sample, would likely come up with more hits if those two events were repeated. A strikeout would count as a true 0-for.
The Model and What It Can Do
The model was built in an attempt to determine which players have been making solid contact with quality at bats, but not getting the results they deserve, as well as determine which players have been making weak contact but still getting results. It could be used as a model to predict future season performance, and see who might be due for regression. It could help determine whether player improvements are real, or just a matter of ‘BABIP luck”.
How the Numbers were Generated
To account for the few events that were missed by gamefeed, I calculated both the average hit probability % on hits, and the hit probability % on batted ball events that weren’t hits. I then multiplied the hit data by the player’s actual average, as in theory they have only truly ‘earned’ that portion of a hit on their hits so far. The same was then applied with the non-hit data to the percentage of times a batted ball event didn’t result in a hit.
Combining these two values would give the expected total average the player would have, based on the hit probability. I decided to add walks into the final formula, and create an adjusted on-base-percentage, that way it can be used as a metric to both compare expected production to actual results, and to view total expected production on its own.
The Results (Hard to see? Follow this link)
The Lucky:
Ezequiel Carrera - Zeke is getting on base at a rate of (0.336), which is slightly above his career rate of 0.315. This is roughly what we should expect from Carrera, in terms of OBP. The problem is that he’s walking at a lower rate (4.3%, compared to 6.4% career), and this jump in OBP could largely be due to the low HP% hits he’s been accruing this year. The negative difference between Actual OBP and Adj. HP% OBP is 0.045, the largest on the team. However, due to his foot speed, there is a chance that some of the reaching on low HP% balls is partially sustainable.
Kevin Pillar - Pillar is getting on base at a rate of 0.357, which is much higher than his career rate of 0.310. It’s obvious that Pillar has made some adjustments which have seen his average jump up significantly this year, and he’s walking at a much higher rate (7.1% compared to 4.5%). The ‘luckiness’ associated with Pillar is not extreme (-0.013 difference between actual OBP and adjusted OBP), and could simply be a product of his plus foot speed.
Right Where They Should Be:
Russel Martin - Russ has been easily the most fun player on the list to track as the season went on. At the beginning of the year, he wasn’t getting on base at all, and the model projected him to make a serious jump, given his good contact. Well, the regression worked out in his favour and he became one of Toronto’s most productive hitters before going down with an injury, with a 0.365 OBP. The sample has him at 45 BBE’s to judge off of, largely due to his high strikeout rate and injury time, but that was enough for Russel to regress to near what the model says he should be hitting for the year.
Darwin Barney - Barney has only 56 BBE’s to judge off of, but his OBP is right near what the model says he should have, at 0.311 which is just above his career OBP.
The Unlucky:
On a team that began the year with astronomically bad numbers compared to their expected true talent, there was bound to be some bad luck associated with that. To little surprise, this list is quite long.
Devon Travis - The other fun player to follow throughout the season, Devon has been surprisingly underwhelming at the plate so far. He leads the team in the difference between adjusted and actual OBP with a difference of 0.074, but those numbers were nearly tripled at some points this year. It seemed like he was making solid contact, but not getting hits. Lately, there have been plenty of hits falling for Travis, and his actual OBP has been rising closer to what the model projects him to have. This adjusted number is 0.328, which is close to his career number of 0.325, so it is entirely possible that some positive regression and luck will get Travis back on track for the rest of the season if he keeps up what he’s been doing so far.
Kendrys Morales - Morales has been getting on base well below his career numbers so far this season (0.286 compared to 0.329 for his career), however, the model predicted him to have reached base at a much higher rate of 0.354. His abysmal baserunning speed, and his high tendency to hit into a shift are certainly factors when it comes to Kendrys, but his true expected results are probably in between the two numbers.
Justin Smoak - Even with Smoak’s hot start, the model projects him for even better results. Surely, some of this is due to his weak ability to outrun soft hit balls, but encouraging nonetheless, and at the very least it puts aside the notion of Smoak having a lucky start so far; he’s been mashing the ball. He has the highest average HP on all BBE’s on the Jays so far this season, and is expected to have a 0.050 jump in his OBP.
Steve Pearce - Pearce has been getting on base at a rate of 0.256 since he was brought in to Toronto, which is much less than expected. The positive is that the model assesses him as should have reached base significantly higher, at a rate of 0.290. He would be far from our best hitter, but if he continues at this rate and gets some better luck with his hits, it would be a decent improvement for the Jays investment in him.
Ryan Goins - Even with Goins’ recent hot streak at the plate, right now Goins is hovering near his career numbers in most categories. A positive look on this though, is that he’s earning these hits, and that hot streak may not be attributed to BABIP luck. His actual OBP is 0.257 which lower than his career OBP, so he could be due for an even larger statline in the future (the model assesses him at an expected 0.298 OBP), if he keeps up this performance.
Jose Bautista - Jose has had an underwhelming start to the season, and while his OBP of 0.330 is lower than his career rate of 0.367, his adjusted OBP numbers so far this season give hope that there might be some positive regression due, with a difference of 0.030 thus far.
As a Team:
The rest of the Jays hitters don’t have enough BBE’s for their numbers to be statistically significant yet. As a team though, they may be considered unlucky as their OBP so far has been 0.302, compared to an adjusted OBP of 0.330. That would be good for 5th in the AL if it were true . Let’s hope that this is a sign for the future, and the Jays are due for some positive regression and good luck soon.