Hate and War, the Only Things We Got Today: Does Replacement Level Properly Scale to Average?
There are probably still some folks out there who have never heard of WAR, but they're increasingly becoming few and far between and very few of them likely are at this site. However, what actually constitutes this "replacement player"? It's been fairly widely-cited that a replacement-player is roughly two wins twenty runs per 600 plate appearances worse than a league-average player. In fact, that's supposedly what our "replacement-level" is based on -- not who is actually readily available, but what "league-average" production is.
Thus, we say that a two-2.5-win player is "league average." Of course, note the tautology here. We're saying that a two-2.5-win player is league-average based on the fact that he's worth two 2.5 wins more than a replacement player, but we're saying that a replacement player is worth two wins less than league average. However, it's entirely possible that the relationship between WAR and wins actually changes from year-to-year because there's no perfect way to define "average." Even though WAR is based on the statistical mean of player performance in any given year, the distribution of that performance may change. This is particularly true when looking at the game across long timescales. Consider for a moment how much the base level of play to be considered "replacement level" changed during wartime, league expansion, integration, etc.
Part of the reason that Honus Wagner was able to accrue four consecutive 10-WAR seasons was that he was an incredible ballplayer but another part of it was how many players were systematically excluded from baseball at that time. Thus, there was a smaller talent pool available, leading to a lot of players in the league who would not be in a similarly-sized league today. Of course, the leagues have expanded, so we kind of assume these things even out, but they don't necessarily.
To make matters even more confusing, the two most frequently-cited sites for WAR (fangraphs, FWAR, and baseball-reference, rWAR) calculate it differently. In fact, not only do they calculate WAR differently, they calculate replacement-level differently. But which method actually scales more closely-related to actual numbers of wins? I decided to try and figure out what the relationship between fWAR, rWAR and wins has been the past three seasons.
Before we get started, I'm not going to go into details about the differences between the two metrics because other folks have done that already. If you're interested, there are plenty of folks who'd be more than happy to explain in the comments section.
So, first, I downloaded the past three seasons-worth of team WAR across MLB (a total of 90 team-seasons). Then, using R, I constructed linear models to determine the relationship between wins above replacement-level and actual wins.
Here is the relationship between wins and fWAR:
and here is the relationship between wins and rWAR:
These graphs might look similar at first glance, but note the differences in scale on the x-axis (!). As expected, the slopes are similar (one more WAR should mean one extra team win), but where that relationship begins is very different depending on whether we're looking at fWAR or rWAR. So a main reason we see a big difference between fWAR and rWAR is that rWAR assumes that replacement-level is higher than fWAR assumes it is. How much higher?
Well, the y-intercept of these graphs should tell us how much a replacement-level team would win (since team WAR would be equal to zero).
The linear model describing the relationship between rWAR and wins is:
Wins = 53.8 + 0.88(rWAR)
So a replacement-level team would win about 54 games (only very slightly more than the 52 games that baseball-reference purports replacement-level teams should win, which could be due simply to variance, since we only used 90 separate team-seasons). We see the model doesn't quite get us to a 1:1 ratio between WAR and wins, but it is pretty close.
The linear model describing the relationship between fWAR and wins is:
Wins = 45.2 + 0.93(fWAR).
In this case, the replacement-level team would win only 45 games. Again, the model is pretty close to a 1:1 ratio between WAR and wins (and actually somewhat better than the rWAR model). So we see that the Fangraphs "replacement-level" player is likely significantly worse than the Rally "replacement player." This might not necessarily a major concern when comparing between the two because replacement players don't really exist. However, average teams do exist. So how big is the difference?
Well, if we assume that 81 wins is how many games an "average team" should win, we see that it would take 38.5 fWAR or 30.9 rWAR. The ratio for average teams is 1.25 fWAR : 1.00 rWAR, so either 1) fangraphs is overestimating players or 2) rally is underestimating them.
What does this mean when we try to determine how valuable an "average player" should be (in terms of fWAR or rWAR)? Well, we need to break this model down a bit further -- to account for the fact that we have to look at pitchers and position players separately. According to fangraphs, of 3445.2 WAR the past three seasons, position players have accounted for 59.3% (2043.3) and pitchers have accounted for 40.7% (1401.9). Of 2779 total rWAR the past three seasons, position players have accounted for 57.5% (1597.6) and pitchers have accounted for 42.5% (1181.4).
We should also separate out starter value from bullpen value. Unfortunately, we're unable to do that with rWAR, though we can do it for Fangraphs. This muddles up calculations with rWAR because they include leverage in pitching WAR -- which means that bullpen arms get a bonus for pitching in high pressure situations. No worries, we will use the fangraphs value ratio and then convert that based on average leverage index afterwards. According to fangraphs, this season, on average, starters were worth 81.1% and bullpens were worth about 18.9% of pitching runs above replacement.
Over the three seasons I looked at, the average team accrued something like 1445 innings (1445.6). Starters pitched 970 and bullpens pitched about 475 of those innings. Each team also accrued something like 6200 plate appearances (6197).
So to calculate the value of an average position player per plate appearance, our formula would be:
(WAR produced by an average team) * (proportion of that WAR that is produced by position players) / (6200 plate appearances per team)
So, for fangraphs WAR, we work it out to be:
38.5 team WAR * (0.593pos player WAR / 1 team WAR) / 6200 PA = 0.00368 WAR / plate appearance.
Over 650 plate appearances, that works out to:
(0.00368 WAR/PA) * (650 PA) = 2.4 fWAR / 650 PA
Now let's work out rally WAR:
30.9 team WAR * (0.575pos player WAR / 1 team WAR) / 6200 = 0.00287 WAR / plate appearance.
Over 650 plate appearances that works out to:
(0.00287 WAR/PA) * (650 PA) = 1.9 WAR rWAR / 650 PA
Now let's work out the starting pitchers. This should work similarly, except we will replace the 6200 plate appearance scale with a 970 inning scale.
First, fWAR:
38.5 team WAR * (0.407 pitchwar / 1 team WAR) * (0.811 starterWAR / 1 pitching WAR) / 970 IP = 0.0131 WAR / IP
Over a 185 inning season, this works out to:
(0.0108 WAR / IP) * (185 IP) = 2.4 WAR
Next, rWAR:
30.9 team WAR * (0.425 pitchWAR / 1 team WAR) * (0.811 starterWAR / 1 pitching WAR) / 970 IP = 0.0110 WAR / IP
Over a 185 inning season, this works out to:
(0.00910 WAR/IP) * (185 IP) = 2.0 WAR
Now, the relief pitchers.
fWAR:
38.5 team WAR * (0.407 pitchwar / 1 team WAR) * (0.189 starterWAR / 1 pitching WAR) / 475 IP = 0.0062 WAR / IP
Over a 70 inning season, this works out to:
(0.0062 WAR / IP) * (70 IP) = 0.4 WAR
rWAR:
30.9 team WAR * (0.425 pitchwar / 1 team WAR) * (0.189 starterWAR / 1 pitching WAR) / 475 IP = 0.0052 WAR / IP
However, there is also a leverage index of 1.27, which means that, per inning, an average reliever would be worth:
0.0052 * 1.27 = 0.0066 rWAR
Over a 70 inning season, this works out to:
(0.0066 WAR / IP) * (70 IP) = 0.5 WAR
So, interestingly, we see that a 2.0 2.5 WAR season as average is actually somewhat off. Considering a 2.5 fWAR season as average production for position players and starting pitchers actually overcorrectly estimates position player and starting pitcher performance in general by quite a bit. It is a good, but is a poor benchmark by rWAR. Comparing relief pitchers by WAR is quite difficult to do, since not every pitcher gets equal opportunity to produce. However, it is worthwhile to note that, due to leverage index, 1 relief pitcher rWAR is actually slightly less valuable than 1 relief pitcher fWAR.
So, now that we've discussed what these mean for our "average" players, it comes time to attempt to answer the question of which metric is "better." As we can tell from the slopes, wins scale slightly better with fWAR than with rWAR, suggesting the fWAR method is better. The r-squared values, however, suggest that neither method is superior. The fWAR model has an r-squared value of 0.7748 while the rWAR model has an r-squared value of 0.7755. A model that incorporates both (r-sq = 0.825) is actually slightly better than including only one of them, which suggests that using both methods of calculation (i.e., FIP vs. RA and UZR vs. TZ) does give us a better picture of a player's actual contribution to team wins. Additionally, the relative importances of each metric to that model are essentially the same (rWAR = 0.50045, fWAR = 0.49955), so the model thinks they're both equally useful.
In summation, I'd say that neither version of WAR is necessarily more useful (or "better") than the other, however, it is important to keep in mind that the methods do scale differently. Consider this for a moment: we talk about a 7 WAR player being a likely MVP candidate. Well, add a 7-WAR player to a team full of fangraphs replacement players and they're likely to win no more games than a team full of baseball-reference replacement players without that 7 WAR player. So what do you all think about that?
63 comments
|
3 recs |
Do you like this story?
Comments
and . . . you're right
even the article I linked suggested that it’s 20 runs per 600 plate appearances . . . about 2.3 or 2.4 runs per 650. Thanks for catching that . . .
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
Jessef isn't really Hugo's brother
I think he’s a computer.
Hic sunt fortuna dracones
Haha, just kidding Jessef
great stuff, as usual.
Hic sunt fortuna dracones
Screw rWAR!
WOOOOOOOOOOOOOOOOOO
Sad, Drunk, And Poorly
My friends, love is better than anger. Hope is better than fear. Optimism is better than despair. So let us be loving, hopeful and optimistic. And we'll change the world. - JL
No, screw fWAR!
Follow me @BBBMinorLeaguer | 2011 Jays record while in attendance: 12-12 (.500)
by Minor Leaguer on Oct 4, 2011 4:24 PM EDT up reply actions
Will somebody think of the children!?
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
tl;dr
Actually just kidding. Really nice work there jesse, and good eye by benk! I just wish SBNation will include an equation form that lets you type out formulae a la Microsoft Equation instead of having to do this linear * and / stuff.
Follow me @BBBMinorLeaguer | 2011 Jays record while in attendance: 12-12 (.500)
I'll be entirely honest
I just skimmed the article while in class. I posted the original comment thinking that jessef’s post proved my initial thinking wrong, not that my thought was actually correct
haha, it did prove your line of thinking wrong
from the perspective of using rWAR. just not from the perspective of using fWAR.
In any event, at least now when we’re throwing around what we call “average” players, etc., we know that the math adds up. That was what I was really hoping to do. Well, that and rectify why it seemed that rWAR always seems to be a bit lower.
I should actually partially credit DavidLondon on this one. My interest in this actually began with the conversation we had with him regarding Kelly Johnson. I noticed that fangraphs seemed to credit him with more batting runs above average than baseball-reference did and couldn’t figure out why. I’m not sure how I never noticed the difference before that conversation, but I didn’t.
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
Thanks for the mention, jessef, but you did all the work. And it’s terrific. I do have a question. As I’ve mentioned before, I’m skeptical about the defensive metrics used by either method, since they vary greatly from year to year. That is, we have values for them for each player, but the inherent error is enormous. Given that, the inherent error on the WAR attributed to any position player is significant (even assuming that the oWAR is fine). Of course, you did not include any errors in your fit, but if you did, do you think that you could even tell the difference between fWAR and rWAR?
I don't think you're correct
in saying that defensive metrics are wrong because of inherent error. they’re far from perfect, yes, but there are two reasons I can give that support random variation in defensive data: look at offense. players have wild 40 point swings in wRC+ all the time (or in the case of Adam Dunn, 75 point swings). second, it’s important to remember that while single-season data tells you somewhat accurately what a player did in that one year it’s pretty likely that the one year is not representative of that player’s true talent level. so when Scott Rolen was worth “only” 8 runs above an average third baseman in 2003, it means that Rolen was probably about that valuable that year (though using DRS and TZL in addition can clarify things) but it says little to nothing about Rolen’s true talent – he posted plus-23 in 2002, and plus-21 in 2004, so his likely true talent around that time was plus-17 runs or so – plus-8 runs is still excellent, but a far cry from the nearly two wins that three-year window would suggest.
needing more data to represent a player’s true talent doesn’t necessarily indicate unreliability (especially when the data says something completely different from the eye test, e.g. Adam Lind’s plus-6 UZR in half a season in 2007). it does however fairly reliably tell you about what did happen that year based on the defensive metric’s idea of value. for what it’s worth (practically nothing; I’m a not-that-smart commenter on an Internet blog) I think WAR should use FanGraphs’ Aggregate Defensive Ratings stat (which encompasses UZR, DRS, TZL and a crowdsourced “eye test”) in their calculation of UZR.
Yeah
I’m of the belief that 1-year samples of defensive metrics aren’t very good at predicting true talent. Doesn’t mean that 1-year samples should be ignored. It still shows how good the player’s defense was in that one season
Sad, Drunk, And Poorly
My friends, love is better than anger. Hope is better than fear. Optimism is better than despair. So let us be loving, hopeful and optimistic. And we'll change the world. - JL
Don’t be so hard on yourself, you’re an honours student!
I believe the reason they don’t use ADR is because some of the metrics only put out numbers at the end of the season (like Tango’s FSR, for example, which is only voted on from August onwards). I suppose they could update the formula once all the ratings are released, though I assume they just don’t want the added confusion.
Agree entirely on the defensive-metrics-in-WAR issue — WAR is a descriptive stat, it is meant to quantify what happened that season, not how good the player actually is. Single season UZR/DRS/TZ does that.
I'm more than a little jealous of Grantland's ability to use footnotes rather than excessively long bracketed statements.
Hold on. I’m not talking about evaluating a player’s true level of talent. I’m talking about evaluating what he has done. The point is: if we have a model to describe a set of events, there is an uncertainty inherent in the model. For example, consider wRC+. Now, I don’t know how this is calculated, so the details of what I am about to say are certainly wrong, but the conclusion holds. We write
RS = a*1B + b*2B + c*3B + d*HR + e*SB + …
Then, we look at each of the teams, and by correlating the number of runs scored to the number of singles, doubles, etc., we can get the best values of a, b, c, … This is all good. However, there are uncertainties on these values. Say we find that b=2. In fact, this is only the “average” value – there is an uncertainty, say it is +/- 0.2. All of these values have errors associated with them, and this leads to an error associated with wRC+. This is simply the error associated with the model itself.
In principle, this error should be kept when quoting stats like wRC+. That is, when we say that player X has wRC+ = 104, it might really be 104 +/- 7, and this error is due to the model. Similarly, the error is transmitted to oWAR. Now, different methods of calculating oWAR might give different numbers, but, in general, they roughly agree. That is, if one metric says that player X is good, another metric will also say that he is good. They may disagree on whether he is 3rd- or 9th-best in the league, but they agree that he is good.
My problem with defensive metrics is not that they vary from year to year, or that they are not good predictors. My problem is that different metrics don’t agree on which players are good. For example, player X might have UZR=+9, but DRS=-5. So is he good, or is he bad? More to the point, this indicates that the error in the model must be quite large, and that’s why I’m skeptical. And if this error is large, the error in WAR must also be significant. That’s what I was asking jessef about.
it depends which metric you like better
AVG, SLG, OBA, HR, RBI, wOBA, wRC+… those are all metrics, and they very frequently disagree about a player’s value. AVG would have you believe that Michael Young was the third best hitter in baseball in 2011. wOBA would have you believe that that was Ryan Braun.
sorry, for a better example
take Adam Dunn in 2010. AVG says he’s a bad player. wOBA says he’s a good player. two different metrics, disagreeing wildly.
except wOBA properly scales the relative value of events (in a context-neutral manner)
whereas average doesn’t even attempt to do that
wRC+ doesn't work like that
There is zero uncertainty in wOBA/wRC since they are derivatives of linear weight formulae. They are entirely context-neutral, and evaluate a player’s batting contribution solely on his output, not on the actions of other players on the team.
wRC+ is precisely as accurate at measuring what it is meant to measure as is OBP or HR.
Here’s the formula for wOBA: ((0.72 x NIBB) + (0.75 x HBP) + (0.90 × 1B) + (0.92 x RBOE) + (1.24 × 2B) + (1.56 × 3B) + (1.95 x HR) / PA
(I’m aware that wasn’t the point of what you’re saying, but it’s important from a

standpoint.)
Regarding the actual point: As long as you are consistent in using the same metrics year after year, that would eliminate at least part of the issue. It won’t increase your confidence in comparing players to one another (though as WAR currently exists, it doesn’t purport to be a definitive ranking – the generally accepted error bars are about +/-0.5), but it at least makes comparing a given player’s year-over-year output reliable since any systematic errors will be consistent.
Also, that there are differences between different defensive metrics does not indicate that “error in the model must be quite large,” it indicates that there are different methods used by each metric. This is similar to my comparing wRC+ to OBP or HR above – the fact that each one produces different rankings of players absolutely cannot be used to imply that any particular one has errors inherent to its measurement technique, just that they attempt to measure offensive output in different ways. You’ll note that this has nothing to do with the usefulness of each metric from an analysis standpoint, just the accuracy in the measurement of each one. Some may be better or worse than others at illustrating the actual talent of a player in a specific facet, but none of those metrics can be said to measure what they attempt to measure with anything less than 100% accuracy.* Calibration issues with, say, UZR or pitchf/x are errors in the model. That different methods show different results does not necessarily mean that any one of them are systematically flawed.
Note that this isn’t to say that I think defensive metrics are perfectly accurate – I don’t believe that at all, and I doubt they’ll ever be as accurate as hitting metrics since hitting is a closed system with a fixed number of potential outcomes while fielding has a near-infinite number of tiny permutations- just that, in the general case, differences in outputs of any two metrics cannot be used to imply that there exists systematic errors in one of the measurement techniques.
*@ jessef: I’m aware this seems tautological, but think about something like SLG – SLG attempts to measure power, but fails to accurately account for the fact that doubles vs triples are more dependent on the batter’s speed and/or whether the ball is hit to LF, CF or RF and/or a defensive miscue (and has a number of other obvious flaws).
I'm more than a little jealous of Grantland's ability to use footnotes rather than excessively long bracketed statements.
OK, some questions/comments:
“There is zero uncertainty in wOBA/wRC”: I see what you’re saying. However, then the question is: why are they of interest? It must be because they correlate well with runs scored (unlike AVG, etc.). I would guess that the stat that correlates best would be the one we would choose to evaluate the offensive production of players. (And given that the correlation isn’t perfect, there may be more than one stat.) Which ones are best?
“WAR: the generally accepted error bars are about +/-0.5”: this is great! First time I’ve seen it, and it’s very good to know.
“that there are differences between different defensive metrics does not indicate that "error in the model must be quite large," it indicates that there are different methods used by each metric.”: Absolutely. However, these metrics purport to entirely measure the defensive contribution of a player. What you’re saying is that they don’t – they only measure a piece of that contribution. In that case, I don’t see why they are used at all in WAR. Or, given that they are, their differences should be taken into account in comparing fWAR and rWAR.
to the third part, not really
they do purport to measure a player’s whole defensive ability, but they do it in different ways. I guess a good comparison (best I can think of, anyway) is comparing wOBA to OPS – they both measure a player’s hitting ability, but they do so in different ways: wOBA uses linear weights, and OPS uses OBA+SLG. just as in UZR vs. DRS vs. TZL, each system can underrate or overrate certain players (OPS underrates high-OBA, low-SLG guys, though I don’t know what wOBA underrates) but they are still trying to show the same thing.
if WAR wants to be a tell-all stat – that is, to fully encompass a player’s production – defense has to be included somehow, and somehow objectively. as I said on a different thread, it wouldn’t be helpful to compare Franklin Gutierrez and Jonny Gomes because Franklin Gutierrez gets so much of his value from defense, and Gomes’ defense (or lack thereof) severely depresses his ability to generate value for a team.
the reason you can't think of a good comparison to offensive stats
is because there isn’t one. we’ve basically figured out exactly how to measure offense (well, the value of the average single/double/etc that is).
the only analogy would be if we tried to measure offense by instead of looking at singles/doubles/etc, we look at “hard line drives a foot to the right of 1B”, “soft grounders down the third base line”, etc and then tried to assign estimated average results from those hits (ie maybe the first one is 10% double, 50% single, 40% out)
So, linear weights based on hitf/x and field/fx data? That would be very cool. Let’s hope that someone gets paid a ridiculous amount of money to one day put that together (and then unwittingly leaks it to the public).
I'm more than a little jealous of Grantland's ability to use footnotes rather than excessively long bracketed statements.
essentially
linear weights assigned to every possible batted ball type (ie angle+velocity)
Just trying to think of the number of elements they'd need on a matrix for this
is mind-boggling.
very quick estimate:
X: 1 per horizontal degree from LF to RF
Y: 1 per vertical degree from 0 to somewhere beyond 180 (since there’s foul ground behind the plate)
Z: 1 per mph from [whatever the observed minimum to date is, minus, say, 10] up to 110
90 (horizontal) x 190 (? vertical) x 90 (? mph)
=1539000 elements!
I suppose you could (and would probably have to) double the size of each bin…
=192375 elements
I'm more than a little jealous of Grantland's ability to use footnotes rather than excessively long bracketed statements.
and then if you're going that far
why not include the pitch type and location from the pitcher.
and the positioning of the fielders.
etc.
the more you think about this stuff at a fundamental level, the more you realize just how insanely difficult it is to separate the contributions of the hitter, the pitcher, the fielders and just plain luck even in theory.
Just realized
I included foul ground for vertical but not horizontal, so tack another few thousand onto that initial estimate.
I don’t think you would want to control for anything related to the incoming pitch in the final WAR total since then you’d turn all pitchers into the “average pitcher”, which doesn’t better illustrate what the batter managed to do. That said, it would be a useful thing to track for the “splits” page.
If it’s essentially non-outcome-based linear weights, then the positions of the fielders wouldn’t actually matter to the equation since it’s based on the flightpath of the ball, not whether it lands in for a hit or not.
One thing they would have to better quantify is the batter’s speed, and build a function to add bonuses on certain types of outfield hits and slow rollers to the left side of the infield.
Anyway, I know you’re more musing aloud – as I am here – than making specific suggestions.
If baseball, the closed-system-iest sport of them all, is this impossible to model, I’d be amazed if something like hockey or soccer could ever reach even the point we’re at now. I know both are starting to embrace advanced stats nowadays, but there are so many moving parts and entangled variables that I’m not sure how much progress can be made without violating cost-benefit logic.
I'm more than a little jealous of Grantland's ability to use footnotes rather than excessively long bracketed statements.
OK when yall are finished discussing
I’d appreciate an executive summary :-)
Follow me @BBBMinorLeaguer | 2011 Jays record while in attendance: 12-12 (.500)
by Minor Leaguer on Oct 5, 2011 12:01 AM EDT up reply actions
execitove summary
the more complicated stats get in order to tell you “the whole story”, the less useful they actually become. not only do you get diminishing returns (the “market inefficiency” resulting from moving from AVG/HR/RBI to OPS was much larger than from OPS to wOBA) but eventually your returns become negative
i'd say the executive summary is more:
when you look at 1B, 2B, HR, etc, you’re not really looking at “what happened”. really, what happened is: the hitter hit a ball at a certain angle and certain velocity and a fielder got to that ball in a certain amount of time and then threw it to a certain location.
but the problem with looking at the game in that way is that it gets much much harder to distribute credit/debit for the play among the catcher, pitcher, hitter and fielders.
this isn’t really a problem with evaluating hitters if you have a big sample size, since after about 2 years the 1B, 2B, etc outcomes have pretty much stabilized to their true talent. but it is a problem in small sample sizes, because if a guy’s gone 6-14 recently with 5 2B, that doesn’t actually tell us if he’s been hitting well – it only tells us that he’s been getting good outcomes. we’d need to look at the batted ball level to really tell if he’s been hitting well.
regarding fielding not mattering: that’s only true if you think the batter has no control over where he hits the ball. if you think he has control, then he must be taking into account where the fielders are positioned. and not only that, but the particular ranges in each direction of each fielder.
same with the pitchers – if we assume that hitters are changing their approach on a pitch to pitch/PA to PA/pitcher to pitcher basis, then we really need to look at count, who the pitcher is, sequencing up until that point, what the hitter is thinking/guessing going into the pitch, what we expect the pitcher is thinking/guessing, etc.
as soon as you move from outcomes (1B, 2B, etc) to the underlying processes, the logical extreme is really to take everything into account, which really isn’t possible.
Agreed re: fielders. Good point.
I think re: pitchers you’d include all of that stuff in pitcher WAR, but not in batter WAR. The pitcher is the one fully in control (or at least is the only one with any direct control) of the primary variables in that situation – pitch type, location. It’s the batter’s job to take the given pitch and hit it well, but he doesn’t directly select* the incoming pitch.
*I know his career/season line vs. a given pitcher or pitch type would make a difference, but I think that the onus should be on the pitcher to adjust accordingly (and credit given as such) rather than breaking a PA down into a per-pitch basis for the batter.
I'm more than a little jealous of Grantland's ability to use footnotes rather than excessively long bracketed statements.
but the batter does impact the pitch selection through the game theory of the pitcher-batter matchup.
also, isn’t it normally the catcher, not the pitcher who selects the pitch? so really we want to credit/debit the catcher for pitch selection, the pitcher for execution of the pitch, and the hitter for what he does with the pitch.
or do we want to credit the hitter with what we’d expect him to do that pitch given his mental state going into the pitch/.1 seconds before he starts his swing/etc.
like, if you give the exact same pitch to the hitter in the exact same situation many times, his result is going to be different each time. do we want to assign him the average of those results to each one, or what actually happened?
fielding addendum
I don’t think you should include fielders’ range – it’s a bit unreasonable to expect a batter to account to minuscule differences in most players’ range factors and penalize him for hitting a scorching liner that a +5 range 2B gets to but a +4 wouldn’t have reached.
If you did include it, it would probably just have to use fairly large bins of range factors, with the “average” one containing all players between something like -10 and +10
I'm more than a little jealous of Grantland's ability to use footnotes rather than excessively long bracketed statements.
maybe the difference between +5 and +4 isn't that big
but +5 and -5 is. you don’t think hitters would be more likely to try to hit a ball in the direction of a poor fielder than a good fielder?
or a more concrete example – if you have a runner on 3rd with less than two outs, the batter is probably aiming away from the outfielder with the strongest arm (so that the runner has a higher chance of scoring on an outfield fly).
I somewhat doubt that
I’d think the average batter would be much better off trying to hit the ball hard than he would trying to aim it.
Sure, there may be some guys who are known for bat control (Ichiro, for example), who would do that, but I’d think the average hitter would just be trying to hit the ball hard somewhere
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
According to plenty of players, there’s a lot more intent in the location they try to drive the ball to than most people tend to think. The classic example is always ‘the ground ball to the right side’ following a lead off double, and I doubt that people factor in anything beyond noteably bad defense. But still, there’s lots of examples of people taking advantage of defensive shifts, positioning and men on base to try and find the holes.
I think that’s been the shame of the bunt slowly losing it’s effectiveness. Used to be the third baseman had to play the line closer and shallower, which is why light hitting but all glove shortstops were closer to the norm.
that's not really the point though
I talked a little bit about it in my BABIP primer. you’re correct, it’s not luck/random variation where the ball ends up, but players aren’t nearly as good at “hitting ‘em where they ain’t” as we thought ten or fifteen years ago (and pitchers are much less capable of limiting hits than we thought)
I don’t see how you can use aggregate stats to make a judgement about a situational hitting approach.
if players were good enough to hit where fielders are
then they would. but they don’t. players can say “I’m going to try to hit a fly ball to right field” but they usually can’t say “I’m going to hit a fly ball to the gap in right field.” and of course, they often fail in their attempts to do stuff (no one says “I’m going to hit a popup to the second baseman”)
I think you’re having a different discussion than I am. I’ve never once mentioned ‘hit ’em where they ain’t’. I was contending that the average player just trying to hit the ball hard somewhere ignored situational hitting, where the goal is to locate the ball, or try to put the ball on the ground/in the air, etc. The sheer number of veriables involved in first identifying those situations, defining what is considered a successful outcome versus unsuccessful outcome, and then applying in across a large enough body of data makes is daunting, and certainly is not dismissable by an aggregate stat like BABIP.
ohhh
I completely missed what you were trying to say, sorry. I’m not sure if I agree with you (and kind of Jono) or with jessef, though
It’s one of those things that personal experience will skew, I guess. As a not very good batter at 17, I still knew how to slap the ball down the right hand side of the infield and what pitches I could do it with. It doesn’t mean I was always successful or that it worked all the time or that I always got a pitch that I could work with, but if I still had that level of control, I can’t imagine that major league hitters are essentially stuck solely with an uncontrolled binary outcome at the plate.
I think you took my comment to be way more specific than I meant it
I was just trying to point out that — in the general sense — taking the quality of fielders into account seems like it’s overkill. Just because something makes a model fit better doesn’t mean that it makes it a better model.
From what you’re saying re: identifying situations where that’s likely to be the case, defining what’s a successful vs. unsuccessful outcome, etc., I’d imagine that we likely agree on this.
Basically, my position is that the number of situations in which a batter’s approach would be affected by the quality of the fielders is likely to be so far exceeded by the number of situations in which it wouldn’t be that identifying it as anything other than random noise would be essentially impossible.
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
i definitely agree that we probably can't measure the effect
but i’m talking more about in theory, how do we want to divide up responsibility for outcomes. and in theory, we certainly want to take into account the fielders’ abilities and positioning.
whether we can do it in practice isn’t the question i’m interested in at the moment. because until we know what we want to do in theory, we can’t build a practical model.
"It must be because they correlate well with runs scored "
No that’s not right. wOBA works because it’s linear weights. it does what it purports to do perfectly (measure the context-neutral impact of a player’s batting outcomes). it doesn’t work because it correlates well with run scoring. it works because of how linear weights are derived – looking at the average change in win expectancy that results from each type of offensive event over the course of a season. it goes right to the heart of how baseball works.
no this is wrong
there is no regression involved in wOBA/wRC+. it’s all linear weights. their values are exactly right (as in the value they assign to a single is exactly the average run impact of all singles over the course of that season).
re: effects of UZR on the model
significance and lgm relative importance of each fWAR component to the overall model
Batting Runs Above Average: p = 5.49 * 10**-12; importance = 0.384
Pitching Runs Above Average: p = 1.98 * 10**-13; importance = 0.421
Defensive Runs Above Average: p = 8.49 * 10**-6; importance = 0.128
Baserunning Runs Above Average: p = 0.0203; importance = 0.068
So, in spite of the uncertainty inherent in UZR it is still a pretty important factor in making team WAR correlate well with actual wins.
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
Math I get
YUSS!
A day that will live in infamy: August 4th, 2011, Sept. 28, 2011 wasn't bad either.
7 members of the Aaron Hill fanclub cheering for Arizona.
Very nice....
What a wicked, detailed explanation. I really enjoyed this post.
"What's so special about Lou Gehrig? Shouldn't EVERY Yankee have a disease named after him? They all make me sick"
"Well, add a 7-WAR player to a team full of fangraphs replacement players and they're likely to win no more games than a team full of baseball-reference replacement players without that 7 WAR player. "
Right. But a B-R replacement level player would be like +.25 fWAR, so there’s really no inconsistency. They set the baseline at slightly different places (and use a different defensive metric), but other than that there’s no real difference. As long as you don’t try to compare one player’s fWAR with another player’s rWAR, you should be fine.
Yet again, great work
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr

by 






















