Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: NFL Owners Vote to Change Trade Deadline

Opinions Were Like Kittens, I Was Giving Them Away: Should WAR Be Based on xFIP?

This morning in the comments, we discussed the differences in calculating WAR between the baseballprojection (rWAR or Rally WAR) method and the fangraphs (fWAR) method.  In gauging position players, the two groups tend to be fairly similar, differing mostly in how they determine the value of a player's fielding contributions -- the rally system uses Sean Smith's TotalZone (TZ) and fangraphs uses Mitchel Lichtmann's Ultimate Zone Rating (UZR).  For some players, there is quite a bit of divergence between the fielding metrics but, on the whole, the systems are pretty convergent in terms of position player evaluation.  When evaluating the contributions of pitchers, on the other hand, the systems tend to diverge because of a distinct difference in methodology -- Rally determines a pitcher's value (on a per-inning basis) based upon the number of runs he actually yields whereas fangraphs determines a pitcher's per-inning value based on how many runs they expect him to yield according to his Fielding Independent Pitching (FIP) statistic, which is based solely on strikeouts, walks, and homeruns.  There are benefits and drawbacks to each method but I prefer the fangraphs method, as it does not reward pitchers for contributions made by exceptional fielders or fortuitous bounces on balls in play.

However, while I do think the fangraphs (FIP-based) method is preferable to the rally method, I think it could be signficantly improved by modeling WAR on xFIP (which, as discussed earlier this week, normalizes a pitcher's HR/fly-rate to league-average) because most pitchers seem to have very little control over whether or not flyballs become homeruns.  Unfortunately, fangraphs does not allow you to query WAR based on xFIP.  After the jump, I will outline a quick-and-dirty method for those of us interested in doing so.

Star-divide

The very first thing to do is determine the relationship between Wins Above Replacement and Runs Above Replacement.  While it's usually around 10 runs = 1 win, it depends on how many runs are being scored across the league.  MLB-wide offence is down in 2011.  Go to the pitcher leaderboards for value and divide an AL pitcher's Runs Above Replacement by his WAR (e.g., Ricky Romero, 17.4 RAR / 1.9 RAR = 9.16 Runs / Win).

The first step to this is exporting data from fangraphs to your spreadsheet program (most windows users probably have MS Excel, though -- and I know this makes me sound prickish -- I just have OpenOffice Calc), a fairly easy task.  To do this, simply query the fangraphs pitcher WAR leaderboard and click "export to Excel" or "export to CSV."  Once you have the data, you can estimate league-adjusted replacement-level FIP based on whichever pitcher is closest to zero RAR (runs above replacement) in the league you want (today, I used Wade Davis's 4.94).  Now that you know what replacement-level FIP is, you're halfway there.  

Divide that FIP by nine to determine FIP replacement-level runs per inning.  Now, depending on how you want to deal with this, you can either convert this FIP replacement-level runs per inning to xFIP or just use it as a proxy.  If you want to convert it, determine how much average xFIP differs from average FIP.  Else, you can just use FIP replacement-level as a proxy for xFIP replacement-level.  Today, for example, the ratio of xFIP : FIP is 1.01 so multiply your per-inning replacement-level estimate by 1.01.  I found that:

Replacement-Level = [(4.94) / 9] (1.01) = 0.554 runs per inning

Now you can take whichever pitcher you're interested in (as an example, we'll use Ricky Romero) and divide his xFIP by nine to estimate how many runs per inning he is likely to give up by xFIP.  We find that, by xFIP, Romero is likely to yield:

(3.5) / 9 = 0.389 runs per inning

Now, some very simple arithmetic gets you where you need to be.  Subtract Romero's per-inning xFIP from the replacement-level per-inning xFIP estimated during the previous step (0.554 - 0.389 = 0.165).  This is the number of runs Romero saves above replacement-level per inning.  Finally, multiply the difference by the number of innings Romero has pitched [ (0.165) (118) ] and you have your xFIP-based RAR (19.47).  Convert RAR to WAR based on the very first step by dividing xFIP RAR by RAR/WAR (19.47 RAR / 9.16 RAR/WAR = 2.13) and you're done!  By xFIP-based WAR, Romero has been worth 2.13 wins -- slightly better than he appears by FIP-based WAR.

What do you guys think -- is an xFIP-based method better or worse than an FIP-based method?  What about the Rally method?  Anyone think we should periodically run the numbers to determine league leaders based on xFIP WAR?

 

UPDATE:

As WAR is park-adjusted, to find replacement-level FIP, optimally, another pitcher from the same team should be used but make sure it is a starting pitcher if you're comparing him to a starter.

UPDATE 2:

After exporting all the data, I created the fields necessary to query xFIP-based WAR (xWAR).  I estimated replacement-level FIP by league and estimated xFIP by separating out the ratios by league.  I estimated RAR/WAR by the values given on player value pages (so, while FIP is not park-adjusted, RAR/WAR is).

 

Name IP FIP fWAR xFIP xWAR
Roy Halladay 127.1 2.15 4.5 2.4 4.04
James Shields 128.2 3.07 2.9 2.8 3.48
Justin Verlander 135.2 2.88 3.6 2.98 3.41
Cole Hamels 116 2.34 3.8 2.58 3.36
Cliff Lee 122 2.55 3.6 2.75 3.22
David Price 118 2.68 3.3 2.89 3.1
Clayton Kershaw 116.2 2.49 3.4 2.73 3.08
Felix Hernandez 129 2.82 3.3 3.11 3.08
CC Sabathia 129.2 2.66 3.9 3.23 2.91
Dan Haren 116.2 2.69 3.3 3.07 2.81
Tim Lincecum 112.1 2.73 3 2.83 2.78
Jered Weaver 123.1 2.45 4 3.47 2.46
Jaime Garcia 105.1 2.87 2.5 3.03 2.34
Tim Stauffer 106 3.02 2 3 2.33
Anibal Sanchez 109 3 2.5 3.05 2.32
C.J. Wilson 117.2 3.25 2.7 3.45 2.23
Chris Carpenter 114.2 3.26 2.1 3.26 2.14
Edwin Jackson 99.2 3.12 2.5 3.29 2.12
Ricky Romero 118 3.69 1.9 3.5 2.11
Jon Lester 110.1 4.02 1.5 3.47 2.07
Madison Bumgarner 98.2 2.54 2.9 3.15 2.06
Justin Masterson 113.2 3.12 2.6 3.54 2.06
Scott Baker 105.2 3.44 2 3.41 2.05
Matt Garza 84 3.07 2 2.84 2.02
Daniel Hudson 116 2.69 3.4 3.46 1.97
Erik Bedard 90 3.54 1.4 3.24 1.94
Bud Norris 105 3.44 1.7 3.27 1.91
Ian Kennedy 116.2 3.34 2.4 3.44 1.9
Ryan Dempster 106.1 3.94 1.5 3.33 1.89
Tommy Hanson 89.1 3.14 1.8 3.06 1.87
Tim Hudson 100 3.19 2 3.29 1.86
Matt Cain 113.1 2.97 2.7 3.5 1.83
Jonathon Niese 104 3.51 1.6 3.37 1.8
Jhoulys Chacin 104.2 4.18 1.3 3.32 1.8
Gio Gonzalez 102 3.29 2 3.56 1.79
Michael Pineda 102 3.08 2.2 3.54 1.77
Yovani Gallardo 110.1 3.77 1.4 3.46 1.76
Trevor Cahill 112.1 3.84 1.4 3.65 1.76
Ricky Nolasco 110.1 3.43 2 3.49 1.75
Brandon Morrow 75
2.3 3.2 1.71
Derek Lowe 101.2 3.49 1.6 3.4 1.67
Shaun Marcum 99.2 3.24 1.9 3.4 1.67
Doug Fister 110.1 3.19 2.2 3.78 1.61
Ervin Santana 110.1 4.28 1 3.78 1.61
Chris Narveson 91.2 3.3 1.7 3.38 1.57
Carlos Carrasco 94 3.46 1.7 3.65 1.51
Josh Tomlin 102.2 3.99 1.2 3.74 1.51
Alexi Ogando 97.2 3.34 2.1 3.71 1.5
Hiroki Kuroda 108.2 3.77 1.3 3.61 1.47
Josh Beckett 98 3.24 2.3 3.79 1.47
Chad Billingsley 100.1 3.3 1.8 3.58 1.45
Brett Anderson 83.1 3.88 1 3.5 1.45
Jordan Zimmermann 102.2 2.71 2.8 3.68 1.44
Max Scherzer 102.2 4.06 1.2 3.8 1.42
Jair Jurrgens 104.2 3.07 2.2 3.66 1.42
Gavin Floyd 101 3.91 1.5 3.82 1.39
Zach Britton 98.2 3.83 1.5 3.82 1.38
Wandy Rodriguez 91 3.86 1 3.52 1.35
Derek Holland 100 4.28 1.1 3.91 1.31
A.J. Burnett 106.2 4.43 0.9 3.91 1.3
Kyle Lohse 110 3.67 1.4 3.76 1.3
Fausto Carmona 102.1 4.8 0.3 3.98 1.27
Jason Marquis 99.1 3.7 1.4 3.73 1.24
Jason Vargas 113.1 3.58 1.7 4.05 1.24
Nick Blackburn 101.1 4.5 0.6 3.9 1.22
Kyle McClellan 85 4.63 0.1 4.11 1.21
Jeremy Guthrie 110 4.1 1.3 4.04 1.2
Philip Humber 96.2 3.5 1.9 3.98 1.18
Chris Capuano 90.2 4.07 0.7 3.64 1.15
Jeff Francis 103.1 3.98 1.4 4.06 1.13
Luke Hochevar 110.2 4.75 0.6 4.16 1.11
Mike Leake 83.1 3.77 1.2 3.69 1.1
Mat Latos 87 3.57 1 3.74 1.09
R.A. Dickey 102.2 4.19 0.7 3.85 1.09
Colby Lewis 100 4.81 0.5 4.04 1.08
Ubaldo Jimenez 91 3.68 1.7 3.82 1.08
John Danks 94 4.13 1.2 3.98 1.07
Charlie Morton 91.2 3.56 1.4 3.82 1.04
Carl Pavano 108.1 3.91 1.4 4.17 1.04
Brett Myers 109.2 5.06 -0.3 3.93 1.03
Jeff Karstens 91.2 4.7 0.1 3.78 0.98
Brian Duensing 86.1 3.69 1.4 4.06 0.94
Livan Hernandez 108.2 3.65 1.6 4.05 0.93
Mark Buehrle 106 4.02 1.5 4.25 0.91
Ted Lilly 96 4.44 0.4 3.97 0.9
Clay Buchholz 82.2 4.26 0.9 4.07 0.87
Chris Volstad 93.1 4.72 0.2 3.76 0.86
Matt Harrison 90 4.13 1.1 4.18 0.84
Ivan Nova 91.2 4.41 0.8 4.2 0.83
Freddy Garcia 85 4.06 1.1 4.18 0.82
Paul Maholm 108 3.79 1.3 4.11 0.81
Jake Westbrook 93.1 4.09 0.7 3.98 0.81
Randy Wolf 105.1 4.17 0.8 4.11 0.79
Jake Arrieta 88 4.57 0.6 4.2 0.77
Rick Porcello 83.2 4.35 0.7 4.18 0.77
Kevin Correia 107 4.15 0.9 4.17 0.77
Carlos Zambrano 112 3.85 1.6 4.18 0.76
Bronson Arroyo 101.2 5.55 -0.5 4.1 0.72
Jeremy Hellickson 96.1 4.13 0.9 4.38 0.7
John Lannan 98.1 4.22 0.8 4.2 0.67
Brad Penny 103.2 4.39 0.9 4.44 0.66
Jonathan Sanchez 89.2 3.93 1 4.18 0.6
Jo-Jo Reyes 88.2 4.37 0.7 4.39 0.58
Clayton Richard 94.2 4.13 0.4 4.28 0.53
Joe Saunders 100 4.92 0.3 4.32 0.51
Dustin Moseley 93.2 4.01 0.5 4.28 0.49
Travis Wood 93.1 4.1 1 4.34 0.47
Mike Pelfrey 100.2 4.75 0 4.42 0.4
J.A. Happ 87.2 4.35 0.4 4.39 0.33
Jason Hammel 100.2 4.31 1.1 4.49 0.31
Tyler Chatwood 94 4.35 0.7 4.74 0.26
James McDonald 85.2 4.69 0.2 4.56 0.23
Javier Vazquez 83.1 4.58 0.3 4.63 0.1
Wade Davis 98.2 4.94 0 5.1 -0.13

Comment 7 comments  |  1 recs  | 

Do you like this story?

Comments

Display:

What about..

Clayton Kershaw, Dallas Braden, Jair Jurrjens, Josh Johnson, Anibal Sanchez, Matt Cain, Tim Lincecum, Jered Weaver, Cliff Lee, Ubaldo Jimenez?

by Woodman663 on Jul 2, 2011 12:29 PM EDT reply actions  

I had been thinking about this too

we’ve talked about the predictive vs. actual statistical value of FIP, but if you’re not going to use RA – which is, of course, the real measure of how effective a pitcher is (despite its lack of predictive value) – the best predictor we have, which is xFIP, should be used

by benk on Jul 2, 2011 12:40 PM EDT reply actions  

I think FIP>xFIP for this purpose

xFIP is a better predictor for small sample sizes, but when calculating WAR, you’re actually looking at HISTORY, not THE FUTURE.

We have 100% of the data we need to analyze history, there is no need to factor in regression to any league norm when assigning a value to a historical performance. You might be missing the point if that’s the case.

I’m not sure I made that completely clear, but what I’m trying to say is that introducing regression has no place in analysis of historical performance. It’s a tool used only when making future predictions. The difference in xFIP which is to introduce the league mean HR/FB rates is meant to be a tool to predict regression (both positive or negative) to the mean, not to give a more accurate description of what a pitcher actually did. WAR strives to be the most accurate tool to describe what a pitcher actually did.

Also, over larger sample sizes, pitchers to tend to normalize their HR rates to their true talent levels. Pitchers like Sabathia who have lower numbers over many years (some for over 2 decades) would be absolutely shafted by basically calling the skill they were able to master over many years luck. Really, it’s one of the main things that seperates them from lesser pitchers.

by Sivvi on Jul 2, 2011 3:14 PM EDT reply actions  

Hmmm

FIP may be better over a pitcher’s career, but do you really think in-season HR/fly is based on much more than luck? We’re a half-season in and C.C. Sabathia’s HR/fly-rate is roughly half of what his career rate is. Does this make sense to you? It doesn’t to me. How about Jered Weaver, whose HR/fly-rate is roughly 1/3 of his career’s? Bronson Arroyo’s so far is almost twice his career-rate. Do you really think that’s because he’s suddenly become that much worse?

Finally, the reason xFIP is a better predictor for small samples is because it tells the story of what the pitcher actually did. FIP is may be more strongly correlated with ERA but that’s simply because a pitcher who is unlucky HR-wise is also likely to be unlucky ERA-wise.

If you want to argue that FIP tells you “history,” why not argue that Run Average tells you history? I’m sure that there are some pitchers who really can bear down with runners on base or who hold runners on so well that they prevent runs from scoring.

The point of regressing all these things to league-average is that there’s more variation due to random effects than there is to fixed effects. I’m pretty sure that a pitcher’s in-season HR/fly-rate varies more from his “true talent” level (random effect) than pitcher’s true talent HR/fly-rates vary from league average (fixed effect).

"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"

by jessef on Jul 2, 2011 3:56 PM EDT up reply actions  

Regarding your first point

What if you average a player’s current HR/FB rate with the average rate?
Alternatively, perhaps you could base it off their career average rate (assuming they have at least [some number] of IP), or off some weighted average of current rate, career rate, and average rate (maybe .25/.45/.3).

Obviously it makes it more difficult to calculate off-hand, but fangraphs has the data available.

The holy crusader of internal logical consistency

by Gerse on Jul 2, 2011 9:07 PM EDT up reply actions  

You could

There is a slight trouble that goes with park effects, though. For example, even though Sabathia pitches in Yankee Stadium (should be lots of homeruns), he is a lefty, so it’s a lot easier on him. He ends up benefiting from a park effect that doesn’t affect him as much.

I think a large portion of why some pitchers seem to be able to control HR/fly-rates has a lot to do with that. People talk about how great SF Giants pitchers are but I think a large part of that is they’re almost all righthanded and it’s incredibly difficult for lefthanded batters to hit the ball out of Pac Bell.

I do think there is some way you can avoid completely regressing the whole league to the same value, but I also think that people are still pretty quick to credit pitchers with things that they aren’t actually controlling.

"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"

by jessef on Jul 2, 2011 10:47 PM EDT up reply actions  

FWIW

The top 5 based on an average of FIP and xFIP are:
Doc: 4.18
Hamels: 3.55
Lee: 3.46
Verlander: 3.38
Kershaw: 3.36

The top 10 don’t change much whether it’s 25-75, 50-50, or 75-25.

The holy crusader of internal logical consistency

by Gerse on Jul 2, 2011 11:20 PM EDT up reply actions  

Comments For This Post Are Closed


User Tools

Welcome to the SB Nation blog about our heroic azure-tinged corvidae, the Toronto Blue Jays.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Graffiti-cbgb-bathroom_small
You know what Grinds my Gears?
Hal2_small
Quantifying the Effect of Team Defense on Over/Underperforming the Team's FIP
Small
Brett Lawrie's historic defensive prowess

Recent FanPosts

Grain-of-salt_small
On random variation: LOB%, BABIP and FIP vs. ERA
Img_0569_2_small
Tell me where to go...
Small
Blue Jays Player Stats Multiplied by 4
Small
Petition to change Suckage Award Titles
Jaysfanimage_small
The Lansing 4: What to do when they outpitch expectations?
Misc_003_small
Jays' All-Star Alliterative Name Team
Kingkelly_small
Stats tools?

+ New FanPost All FanPosts >

Yahoo_full_count

Managers

Bluejayperched_small hugo

Rincewind-1_small Tom Dakers

Assistant Manager

Smith_up_small JohnnyG

Authors

Hiro_small jessef

Profile_small masterkembo

Profiel_small Woodman663

Minorleaguer_small Minor Leaguer

Tony_fernandez_small TonyFernandezSavedMyLife

Moderators

J_bau_small jays182

Aejfuulciaar18g_small Bowling_Guy25