In December, when news of the R.A. Dickey trade broke, the reaction here and elsewhere in Jaysland was largely one of enthusiasm. But not for me; my reaction was one of ambivalence. The immediate reaction for that was the sticker shock: d'Arnaud AND Syndergaard AND an expensive lottery card in Wilmer Becerra?!?!? It was a hell of a price to pay. But that's only one side of the equation, and the Jays did get the reigning Cy Young Award winner to add to a suddenly veteran team that was right in the thick of it talent-wise in the AL East. Which led to the larger reason for my ambivalence: how to properly assess this trade-off of future wins for current wins to evaluate if it was, strategically, a good idea?
One way to do this is to try and project the number of wins the Blue Jays would win, with and without Dickey. So, maybe the Jays were something like an 85 win team before Dickey, Dickey adds a net of three wins, and now they're an 88 win team. I saw some commenters doing this, and it's a solid approach. The marginal value of wins in terms of generating revenue is highest between about 83 and 93 wins, as some research has shown and basic intuition should make plain. So, you look at the expected revenue payoff, the expected value of the prospects, and make a judgment.
However, any expected revenue payoffs are going to be highly approximate since little information about MLB revenues is known. Moreover, since the Jays were already expected to be very competitive before, this move was basically about bolstering the playoff likelihood (and getting the revenue payoff), and AA said exactly that. So then, what did the trade do for the Jays playoff chances? How much did it increase it? It seems to me that to pass judgment on this trade, this question really needs to be answered.
Which leads into the bigger issue: you'll notice that we're talking about probabilities, likelihoods and chances. In other words, we're talking about uncertainty. But those projections I referenced were single numbers - a point estimate of the mean (weighted average) or median (50th percentile outcome). There is no estimate of the uncertainty around them. And this to me is a critical fact. In my view, projections are relatively useless unless you give me both the point estimate and the uncertainty in the estimate.
Let me illustrate my point with a practical example. Suppose I made the following offer: we flip a fair coin 100 times. If it comes up heads between 51 and 55 times (inclusive), I pay you $25. Anything else, you pay me $10. Should you take me up on the offer? We know that each flip had a 50/50 chance, which means the expectation is for 50 heads and 50 tails. But that's subject to uncertainty. In order to decide whether it's a good deal, we need to know about the distribution of possible outcomes.
With a coin flip that's easy, since a coin flip is a binary event (heads/tails), so the range of outcomes follows the binomial distribution. It turns out that for the above scenario, theoretically 32.5% of the time the coin should come up heads (or tails) 51 to 55 times. So, the expectation for the proposed bet is to win $25 about one third of time, and lose $10 two-thirds of the time. That works out to a positive expected value of $1.33, or a return of 13.3% on the $10. So, yes, it's a good bet.
Basically,that scenario is roughly the scenario that AA faced, though not quite so binary obviously. If Dickey adds three wins over the guy he displaces, and the Jays were an 82 to 89 win team beforehand, adding Dickey has a pretty big payoff. Below or above those levels, the payoff is much smaller. So the distribution of outcomes is quite significant in the analysis. But most projections (ZiPS/Steamer/Marcel/Oliver etc) only give you the point estimate.
Before Opening Day last year, I tried to take a stab at this, and produced a post on Opening Day with projections for the starting position players. Basically, I took a mix of ZiPS and Steamer projections, and add uncertainty. I ran out of time to do pitchers, though part of that was I couldn't think the way to do it at the time. So after the Dickey trade, I wanted to dust off those files and update them, develop the same for pitching and see what we got. Neither ZiPS nor Steamer projections for the Jays were out, so I was just going to use Marcel projections (5/4/3 weighting for position players, 3/2/1 for pitching).
But before that, I had to make important changes. The truth is, those distributions had some significant flaws in how they were put together. I don't think it was wrong to post them, since they gave a pretty good idea and had far more right than wrong. But there were basically four elements (playing time, wOBA, defence, baserunning), and there were some issues with all. For applying uncertainty to playing time, I had the right idea, but took a couple shortcuts to save time. Not a big deal, but something to fix. For applying uncertainty to wOBA, I used a formula to applying uncertainty for random variation. I didn't account for non-random variation in skill (ie, between projected true talent and actual true talent). For defence and baserunning, there were no quantitative projections from ZiPS or Steamer, so I made my own quickly using Marcel weights. But I didn't regress them, and my estimates of uncertainty were quite inexact. So that needed to be overhauled.
As I went about this, I was working through projecting defensive performance from the metrics, as well as uncertainty, I realized a few things. By this point, ZiPS was out, and they had quantitative defensive and baserunning projections. I didn't want to build my own projections, since my goal was not to build a better mousetrap, but to add a new feature to the existing mousetrap. But I realized that to properly introduce uncertainty, I needed to see behind the projections and understand how many years of data was used, the degree of regression, etc.
The final product was my own projection system, with fully regressed estimates based on linear weights that I derived, adjusted for park, league and age, with uncertainty to build a distribution. For both pitchers and position players.
So, over the next couple days, I'm going to roll out my projections for the 2013 Blue Jays, showing both point estimates I have developed and more importantly, the distribution of expected outcomes. I'm not aware of anywhere else where you'll get the latter (at least freely available anyway), so I hope I'm doing something that adds some value to our understanding of expectations for 2013. If all goes according to plan, I'll start that tomorrow with the pitchers at an individual level, and continue Saturday with pitching at the team level. Sunday and Monday will be the same for position players, and I'll finish Tuesday by trying to answer the question I originally set to answer: how much did R.A. Dickey change the Blue Jays chances of making the playoffs.
But before getting around to that, I'm going to set out my methodology. If someone else were doing this, the first thing I'd want to know is the underlying assumptions being used, at least at a high level. Because at the end of the day modeling boils down to a simple concept: garbage in, garbage out. I realize not everyone is interested in these details, so rather than trying to cram it alongside the actual projection post and making them very long and unreadable, I'm going to lay it out here. That way, I can keep the actual projection posts nice and clean with no big blocks of text.
Before that, I want to make a few acknowledgements. Thanks to our own jessef, who indulged me by first not ignoring several long, rambling e-mails, but responding with significant insight especially as I was trying to get started and get my head around several comments. Thanks to Jared Cross (of the Steamer projections), for responding to a query on the boards at the The Book Blog with a link that was critical (and thanks to Tango for running that place). Some individual studies that were critical building blocks: Tom Tango's Marcels, Colin Wyers at The Hardball Times on weighted correlation using harmonic mean, Kincaid's blog post on estimating true talent using Bayeisan updating and the Beta Distribution, and Jeff Zimmermann at Beyond the Boxscore on aging curves weighted by harmonic mean. And of course, FanGraphs for all the data in easy exportable form, as well as Baseball-Reference, Baseball Prospectus, the Lahman database and MLBTR for various pieces of information. One of my favourite quotes is that of Newton: "If I have seen further it is by standing on the shoulders of giants". I won't claim to have seen further, but any insights I might be able to offer are only possible with the works of a lot of others.
What is Being Projected?
For position players, I project performance based playing time (PA), offence (wRAA, based on projecting wOBA), fielding (aggregate of DRS/150 and UZR/150, as well as positional adjustment), and baserunning (SB and UBR). In total, 6 variables. I chose to just project wOBA for offence rather than individually projecting BABIP, 1B, 2B+3B, HR, BB and K. It's much simpler (avoids covariance between a lot of those variables), and wOBA is still a very strong predictor even if individual components might have a little more explanatory power in total. Again, I'm not trying to invent the most accurate projection system, just add uncertainty to point estimates.
For pitchers, I project performance based on playing time (TBF), and RA/9. RA/9 is based off of K%, nBB% (BB+HBP-IBB), GB%, HR/(FB+LD)% aka HR/(1-GB)%, BABIP and LOB%. LOB and BABIP have a very strong relationship, so LOB% is predicted based off BABIP. So there are 6 variables projected. One thing to note is that the pitching will not take defence into account (that is attributable to position players). The reason for using HR/(FB+LD) ratio rather than HR/FB ratio is that my research showed that GB% (and therefore FB+LD%) stabilizes a little bit quicker than FB%, and that HR/(LD+FB) and HR/FB are basically equivalent (they both have almost equally low correlations). So, since it's the same amount of effort to get a home run rate regardless, I prefer the one with a little more explanatory power even though both get hugely regressed back toward average. BABIP too largely gets regressed back to average.
How the Data is Weighted
I was actually going to write a detailed post last week about the analysis I did on weighting data and how I got to the weights that I'm using, partially since jessef wrote a piece about in December. I got about halfway through writing it and realized it was probably not something that many were going to be interested in, so I decided not. Instead, I've summarized them in the charts below. It's also worth noting that all variables are PA weighted in addition to recency. The weights are based on 2003-12 data, and all WAR numbers are fWAR (from before the recent changes, but that shouldn't affect anything since it's just rebaselining)
For position players, I ran the data on all players to determine weights, and then also on everyday players only. I defined an everyday player as one who had averaged 500 weighted PA for however many years were being considered:
Note: wCorr is the weighted correlation between the weighted data and the next season for all players between 2003-12. So if the weighting was 3/2/1, that would be the correlation previous three seasons and the next season, for all players who had PA in four consecutive seasons. The correlation is weighted by the harmonic mean of the weighted PA in the previous season and the PA in the subsequent season.
TBU - I haven't quite finished this part of the analysis and will update when I've finished it.
For pitchers, I also ran the data on all pitchers, and then on ""full-time" SP and RP separately. A fulltime SP was a pitcher who had made at least 80% of his weighted appearances as a SP, with at least 500 weighted TBF per season (about 125 IP). A full-time RP was a pitcher who made no more than 10% of the his weighted appearances as a SP, with a weighted average of at least 160 TBF (about 40 IP).
Finally, here are a few other things that aren't directly used in my projections, but that I tested out of interest:
For playing time, because catchers are different than most other position players in that it is rare for them play more than 80% of games even in a fully healthy season, I separate them out from all other position players. The factors that are significant in predicting playing time are historical playing time, historical performance (fWAR/PA or fWAR/IP), as well as performance in the actual year. This intuitively makes sense since who do well get more time and payers who do poorly get benched. Obviously, we don't know how players will actually perform, so when playing time is projected for, it includes the covariance with 2013 projected performance in a given simulation.
Regression
The first thing that I want to reinforce is that while it's popular to refer to "regression to the mean (of all players)", what is actually being done is regression to a prior expectation. In the case of baseball, that means to a group of similar players. An everyday player should not be regressed back towards the average of all players, because everyday players get everyday playing time because they are better than all players. So what constitutes a group of similar players? At its most complicated, you can look at position, age, talent level, career path and then build a list of the most similar historical players, similar to what Baseball-Reference displays on a player page. For each player, I build a list of the 100 most similar player seasons from 2003-10 based on playing time and the variable being projected. The one exception is that for R.A. Dickey and BABIP, I use a list of knuckleballers that I could compile, because it has been shown that they have influence over BABIP that is completely different than for non-knuckleballers. He must be regressed to a group of similar players on that attribute.
Uncertainty
The first step in introducing uncertainty into the projection is figuring out the variance of the variable being estimated. The total observed variance of performance in baseball players is made up of random variation, and non-random variation. I will refer to non-random variation of a player's performance from here on as skill variation, even though it would also pick up things including weather, park, pitchers faced, etc (as jessef pointed out to me).
Most of the variables being projected are binomial, in that the outcomes are binary. For K%, a pitcher either strikes out a batter or he doesn't. For nBB%, he either gets first on a walk or HBP, or he doesn't. For these variables, the amount of random variation can be calculated. And then since we can calculate the total variation of the given population, the difference between the two is the variation due to skill. For wOBA, it's similar, but instead of being a binomial variable, it's a multinomial variable, but the random variation can still be calculated (I actually have to cheat a little to do this since I'm not projecting the individual components, but it's small enough to be immaterial).
However, there are a few variables where it's not feasible to calculate the random variation, such as UZR, DRS, and UBR. I'm sure there is a way to calculate it, but the mechanics of those metrics are black boxes at this point. So instead, I analyzed the data and estimated the breakdown of total variation into random and non-random. It's not perfect, but there's no better way I found or could think of. Finally, for playing time, there is no meaningful random component. I simply take the projected playing time, look at players with similar projections in the past, and then look at projected/actual playing time, and build a distribution accordingly (playing time has be treated differently, since it's a non-symmetric distribution due primarily to injuries).
Having calculated the variance of the variables being projected, uncertainty can now be introduced via Monte Carlo simulation. For each variable for each player, a number of simulations (1,000) are performed using random numbers by applying both random and non-random variation (separate random numbers for each. That means that for each position player, 11,000 random numbers are used and for each pitcher 13,000 are used.
Park, League & Aging Adjustments
This is actually the simplest part. To neutralize park effects for players who accumulated data used in the projections with teams in other parks, I applied FanGraphs park factors to the relevant variables so that they would be as if accumulated with the Jays' Park Factor. Similarly, for players who played in the NL, I looked at the difference for the variables between the AL and NL (for example, pitcher K% is higher in NL due to facing the pitcher rather than DH, among other things) and then adjusted the raw numbers so that it would be like they were accumulated in the AL.
I created aging curves for each variable, and applied it to the raw projection. It was a little complicated, because the projection is based on several years of data, and aging is occurring during that time. For example, if a wOBA projection is based on 3 years of data with a 3/2/1 weight, and aging is 5 points of wOBA a year, then the aging factor has to be more than 5 points because the projection is 50% based on 1 year ago (5 points aging), 33% on two years ago (10 points) and 17% on three years ago (15 points), so the aging from the projection to the following year is 8.33 points of wOBA. Moreover, the regression element is based on comparing historical data of similar players to what they did in the next year, which will include aging effect. However, this is not going to be material, so I ignore this last effect.
Possible Future Adjustments
I thought I'd end with a couple things I think might be worth doing or exploring if I were to do this again next year
- In determining weights, I used raw data that was unadjusted for park, league, year and age. Since the data is 2003-12 (after the steroid era), the overall run scoring environment is pretty consistent, but averages had seen important changes over that time.
- If I were to move away from the idea of Marcel type estimates, I'd look into customized linear weights for each player. That would entail building a comparison group and then estimating weight based off that group alone rather than all players or all starters, etc
- In terms of offensive stats, I'd look at going more granular and projecting things like BABIP, 1B, 2B+3B, HR, BB+HBP and K rather than just wOBA.
- I'd look at adding things like FB velocity to predict K rate, since it's obviously an important factor in explaining variation of the variable
- Look into building more detailed comparison groups with more factors
- One thing I'm interested in is whether past variability in performance of playing time is a potentially important factor in predicting future variability. For example, does the fact that Romero had a great year in 2011 and then an awful year in 2012 tell us that his range in outcomes is higher in 2013? Intuitively, one would think so, but I'd be interesting in seeing the data.