Recently, we've had some some excellent posts (courtesy MjwW) and really interesting discourse (courtesy the comments sections) here on the site regarding valuation of projected and produced wins. I'm not going to reiterate the whole discussions but, to sum up (and likely oversimplify), MjwW has been statistically investigating the potential linearity $ / WAR relationship and yours truly has been conceptually analysing it. Now, there are really two separate questions here. The first question is whether $ / projected WAR scales linearly (basically, whether teams are willing to pay more per win for better players). The second question is how to determine whether players have been overpaid in retrospect. It's easy to get lost in a big dataset, which is why I think it's important to explicitly state the questions we're asking before we try to answer them.
Now, to address the first question, whether $ / projected WAR scales linearly. Since it is the teams doing the paying, it's their projection systems -- not ours -- that we should be using to derive the answer to this question. Of course, since we don't have access to their projections, we have to use our own. By the central limit theorem, as long as our datasets are large enough, this may not be a problem but it's still something we need to consider. Another potential pitfall (and possibly a greater one) is the simple fact that it is always in the team's best interests to pay players as little as possible. Thus, what we should be seeing in $ / projected WAR for players should range from something less than what the teams are willing to pay up to what they are willing to pay. Look, for example, at the Phillies' extension for Roy Halladay and then look at their extension for Ryan Howard. I can think of only four ways to evaluate these contracts from the perspective that the free market is dictating what's going on:
1. The Phillies thought Howard was better than Halladay. Maybe I'm biased, but I don't think this argument holds any water. Sure hitters may be less likely to bust and Howard had an MVP under his belt but Halladay had won a Cy Young Award of his own and had placed top-5 in Cy Young voting the previous four seasons. Even more recently, he was coming off, arguably, the two best seasons of his career and may have been the consensus pick for best pitcher in baseball at the time (and likely still would be), coming off a season where he was fourth in pitcher with 7.4 (rWAR: 6.8). Howard was coming off a very good season but he was not even the best player at his own position was 36th in fWAR amongst position players with 4.6 (rWAR: 4.4).
2. The Phillies thought Howard was a better bet to age well than Halladay. When you look at the terms and average annual values of the contracts, I think this is insufficient. Doc was signed to just a 3 year deal through age 35 (with an option for his age 36 season). Howard was extended for five years through age 36. Considering the contracts they gave to Cliff Lee, who seems like a good bet to age similarly to Doc, and Jonathan Papelbon, who is a reliever partially because of worries for his arm and may be a good bet to age worse than Doc, they also don't seem terribly worried about pitchers aging.
3. The Phillies strategy changed at some point between the Halladay and Howard extensions. Given that Ruben Amaro was the GM at both times and the Phillies had previously signed Ibanez at right around what we might consider "market value" I seriously doubt they changed their valuation much. Again, taking the Cliff Lee and Jonathan Papelbon contracts into account, the Phillies don't seem to be pinching their pennies.
4. Ryan Howard was in a better bargaining position than Roy Halladay. This argument is also false. If anything, Halladay was actually in a better bargaining position than Howard. Howard was under contract for a few more years at the time, while the Phillies would have been unable to trade for Halladay unless they had already agreed to an extension.
So I think the problem here is pretty evident. How do we use the data at hand to establish the proper baseline? Halladay's extension almost certainly should not be included in an analysis that tries to get at what teams are willing to pay for his $ / projected WAR because we can be almost certain that teams would be willing to pay more than he actually got. However, if we only use the highest average annual $ / projected WAR contract we've only got one datapoint, which essentially tells us nothing. Also, what one team is willing to do is not necessarily the same as what another team is willing to do. Perhaps a different way to approach the question of whether projected WAR scales nonlinearly is to look at the highest average annual values for elite players, the highest average annual values for average players, and the highest average annual values for fringe players. Unfortunately, this greatly reduces our sampled population which not only reduces the power of the study but also may cause us to violate the assumption that we're properly estimating teams' player projections.
Another related (but different) question (and, if memory serves correctly, the one MjwW initially gathered his quite-impressive dataset for) is whether big contracts for free agents result in overpaid players. Now, I think most readers here would probably agree that, even if $ / projected WAR scales linearly, the $ value of already-produced WAR does not scale linearly. In fact, given our understanding of risk management, I'd think it could be argued that, if $ / projected WAR scales linearly, $ / produced WAR must scale nonlinearly. If that's the case, to determine whether or not free agents were overpaid in retrospect, we first need to figure out the proper scale by which to judge their produced WAR. I think this would be a really good question but it's also a bit daunting. If anyone is interested, think of this as a crowdsourcing of sorts; let's try to tackle this question together. I could take a lot of work but the results might be worth it. If you're interested in this, please read on.
Since, as has been pointed out, two 3 WAR players produce 6 WAR together, the question we're trying to answer here really just looks at playing time: how much better is some combination of players who accumulates 6 WAR in 650 plate appearances (basically one player-season) than a player who accumulates 6 WAR in 1300 plate appearances (basically two player-seasons)? Now, it would not be difficult to answer this question if we know how good the player taking up the remaining 650 plate appearances is: assuming he's replacement-level, there's no difference. Assuming he's average (2.5 fWAR / 650 plate appearances), the first player (or group of players) is about 2.5 fWAR better per season. So the next step should be to figure out how good the actual replacement should be? Now, because different players are available for different salaries and teams operate on different payrolls, I think the quality of that replacement should be contingent on the team. A team that can afford to keep an average player lying around should derive a lot more benefit from making one big upgrade rather than a few little ones.
So, when considering already produced WAR and assuming that the quality of the replacement is contingent on team payroll (not a perfect assumption but one that I think makes some sense), the Yankees or Red Sox (who might have a 2 WAR replacement) could derive greater benefits from the same WAR in less playing time than a mid-market team like the Blue Jays (who might have a 1 WAR player replacement) would have. A team like the Marlins or Pirates might be giving that extra playing time to a replacement- (or sub-replacement(!)-) level player, which would mean that they wouldn't derive any benefit (and could potentially accrue a cost(!)) from having one 6 WAR player rather than two 3 WAR players. Anyway, I think MjwW has already pretty much done a study looking directly into this question, though he may be trying to answer it from a different angle. For now, maybe we should wait and see what he comes up with and maybe we can work together to integrate his findings into this conceptual model.
Thanks to MjwW and the rest of y'all for the discussions and to the Mountain Goats' "Prowl Great Cain" for today's post title.