Comparing AA and MLB hitting production from AA batters between 1995-2002
I put together a spreadsheet of all batters that hit in AA seasons between 1995 and 2002, and their MLB stats (minimum 100 MLB PA). I only included the batters' MLB production if they followed the criteria: The batters were less than 26 years of age and had a minimum of 400 PA in their AA season(s). This is the spreadsheet I put together using numbers from B-R and Fangraphs (Glove tap to those two):
https://docs.google.com/spreadsheet/ccc?key=0AnAFMTj7pea8dFBtVnZBSUMzYzQyYXM0UWx3a3BkZmc#gid=0
More after the jump.
I chose 100 MLB PA as the minimum because I didn't know what minimum people would want, so I just went ahead with 100 just to give people the choice themselves. I chose fWAR/100 games, since it was more convenient to use fWAR as opposed to rWAR.Just to note, a player's wOPS+ = 1.8*OBP + SLG, which was then league adjusted (though, not park adjusted). I thought wOPS+ was a good proxy for wRC+, since Fangraphs doesn't show MiLB wRC+ prior to the 2006 season. I also estimated a player's contact rate by using the following formula: estContact% = (AB-K)/AB.
Just for fun, I put together some data (for AA batters with a minimum of 400 career MLB PA):
I separated batters into four age categories: 18-21, 22, 23, 24-25 (there was only one 18 year old AA batter that qualified, whom of which was Edgar Renteria in 1995). The number of AA batters that had a minimum of 400 career MLB PA were as follows:
- 18-21: 73 qualified batters out of 137 total AA batters (53.3%)
- 22: 63 out of 151 (41.7%)
- 23: 57 out of 217 (26.3%)
- 24-25: 70 out of 421 (16.6%)
The average fWAR/100 of the qualified batters in each age group were as follows:
- 18-21: 0.91 fWAR/100
- 22: 0.98
- 23: 0.63
- 24-25: 0.66
Out of curiosity, I also wanted to see which MiLB stats that I was interested in (K%, BB%, wOPS+, BB/K) correlated the most in each age group with the following: MLB K%, MLB BB%, wRC+, fWAR/100, MLB BB/K. Some should be obvious, but I wanted to look at how strong the relationships were. I will use the correlation coefficient (R) to determine the relationship between the stats. These were the following results (I'll have to check for p-values later):
18-21:
MLB K% correlated most with MiLB K% (R = 0.801)
MLB BB% correlated most with MiLB BB% (R = 0.755)
wRC+ correlated most with wOPS+ (R = 0.430)
fWAR/100 correlated most with BB/K (R = 0.317)
MLB BB/K correlated most with BB/K (R = 0.701)
22:
MLB K% correlated most with MiLB K% (R = 0.703)
MLB BB% correlated most with MiLB BB% (R = 0.674)
wRC+ correlated most with wOPS+ (R = 0.455)
fWAR/100 correlated most with BB% (R = 0.318); wOPS+ was close behind (R = 0.315)
MLB BB/K correlated most with BB/K (R = 0.630)
23:
MLB K% correlated most with MiLB K% (R = 0.565)
MLB BB% correlated most with MiLB BB% (R = 0.568)
wRC+ correlated most with wOPS+ (R = 0.340)
fWAR/100 correlated most with BB% (R = -0.217)
MLB BB/K correlated most with BB/K (R = 0.450)
24-25:
MLB K% correlated most with MiLB K% (R = 0.731)
MLB BB% correlated most with MiLB BB% (R = 0.655)
wRC+ correlated most with wOPS+ (R = 0.487)
fWAR/100 correlated most with BB% (R = 0.352)
MLB BB/K correlated most with BB/K (R = 0.661)
Total:
MLB K% correlated most with MiLB K% (R = 0.716)
MLB BB% correlated most with MiLB BB% (R = 0.671)
wRC+ correlated most with wOPS+ (R = 0.400)
fWAR/100 correlated most with BB/K (R = 0.198)
MLB BB/K correlated most with BB/K (R = 0.626)
A few obvious issues is that I haven't done determined whether the correlations are significant (p<0.05) or not, so take these values with a grain of salt. As well, which ties in with the significance issue, the sample sizes were somewhat small for my liking.
Nonetheless, the main emphasis of this was to just put together a spreadsheet of AA batter stats in seasons between 1995-2002 and their MLB stats. This spreadsheet took a lot of time and effort on my part, and my wrists are killing me. =P
What do you think of the spreadsheet I put together?
47 comments
|
4 recs |
Do you like this story?
Comments
this, to me
implies that it’s difficult for a player to improve his plate discipline while in the Majors
I posted it on Minor League Ball, as well
Wanted to post this on both sites.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
I think you mean
It’s difficult to improve plate disclipline from the high minors (AA) to the Majors…he wasn;t looking at the correlation from say the 1st MLB season to the 5th MLB season
it's not difficult to do anythinG
if you’re Brett Lawrie
by benk on Feb 5, 2012 7:01 PM EST up reply actions 2 recs
A couple thoughts
1) I’m not very surprised by these findings. Not only to BB% and K% (plate discipline) correlate well from the minors to majors, but they are also the best predictors of success.
2) That said, I’d be wary about reading too much into these numbers in terms of predictive value, because there’s a selction bias at play. Of the players, who make it to the majors, we know that their minor league K% and B% are important in terms of projecting performance, but the reverse is not necessayil true. We can’t look at a player in AA and take their K% and BB% and assume they will have similar plate discipline in the majors, because not all those players will actually make the majors
3) Not good news for Eric Thames, who exhibited pretty terrible minor league plate discipline. His major league plate discipline is unlikely to improve, which means he needs to hit for a lot of power to provide value with the bat. In other words, he’s likely more a 6-8 bat at best, because the OBP is unlikely to be there.
To elaborate on my first point
Because I didn’t explain why I wasn’t surprised – other studies have come to similar results/conclusions
I wasn't surprised, also
I just did it for the heck of it.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
That wasn't intended as criticism at all
I don’t think you took my comment that way, but just to be clear.
One thing I did
Was compare the players that either didn’t make the majors or didn’t have enough career PA to the ones that (Minimum 400 MLB PA). Just out of curiosity, I compared the batters with 3 variables: K%, BB%, and wOPS+. I was more interested in the differences in wOPS+ than I was in K% and BB%, as the latter two were easily predictable, but I included them anyway. These were the results I got:
K%:
MLB: 15.0%
non-MLB: 17.1%
p<0.0001
BB%:
MLB: 9.7%
non-MLB: 9.3%
p=0.04083
wOPS+:
MLB: 109
non-MLB: 103
p<0.0001
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
terrible minor league plate discipline?
He had 9.5 BB% and 17K% at AAA. 8.9BB% (lowest at any level in the minors) and 21.1K% in AA.
He will likely always strikeout a good amount but if anything, his minor league number suggest his walk rate should be better.
AAA is a notoriously ridiculous hitter's league
I’m not sure I’d classify Thames’ plate discipline as “terrible”, but not only was Thames’ plate discipline in AA (or any other level besides AAA) not very good, he was also 24 by the time he hit AA
I don't know where you're getting your numbers from
But I’m using Statcorner numbers:
1) His 9.5% BB rate in Triple was inflated by a 2% IBB rate. So his untentional IBB rate (better measure of patience) was only 7.5% against a K rate 22.4%, which is not good at all.
2) In 2010 in AA, a 21% K rate against 8.4% uBB rate. Again, not very good.
So yeah, maybe terrible was a little over the top. But they are quite poor
by MjwW on Feb 6, 2012 8:24 PM EST up reply actions 1 recs
Just to clarify
I mostly wanted to post the spreadsheet for others to work with it. I just did the analysis for the fun of it.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
Very interesting stuff here
What do the fWAR/100 values correspond to? I thought it would be fWAR/100 plate appearances, except that seems impossible given the values some of these players put up (e.g., Coco Crisp at 2.3).
In addition to the axiom that plate discipline translates to the majors better than contact and power (which I think you should test with ISO and contact, the results would be pretty interesting), the other thing that I think really makes the BB- and K- numbers stand alone with regard to predicting future success is the fact that those are important for pretty much every player. Both punch-and-judy and power hitters are able to display good plate discipline. Power hitters generally don’t need to display good contact rates (unless they rarely walk, but guys like Vlad Guerrero are rarer than guys like Adam Dunn, so they may be lost in the analyses) and singles hitters can succeed in the majors without power (particularly if they play up-the-middle positions).
I would think that — for individual players - displaying the ability to make contact or hit for power are likely as (or more!) important than raw plate discipline, each just isn’t as important to all hitters the way that BB and BB/K would be.
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
SB Nation really needs to redo their coding for automatic strikethrough
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
fWAR/100 games
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
ahhhhhh
that makes a lot more sense. Maybe you could standardize it to 150 games, though, the numbers might be a little easier to put into context?
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
That's possible
I’m planning on doing something similar to this, but with pitching stats. Just want your opinion on this: Should I go with fWAR/100 IP, or some other way?
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
that would be a little bit more relatable, since it would be easy to convert to 200 ip
but a lot depends on whether it’s a starter or a reliever, since a starter would be 180-200 ip and a reliever would be somewhere around 70 ip.
I think the pitching stats could be really interesting, particularly if you found a way to discern between pitchers who strike out a lot of batters based on command vs. movement and velocity. My guess is that you won’t be able to find data describing whether strikeouts were swinging or looking, but there might be some sort of walk cutoff where it seems like you’re dealing with power, rather than finesse, pitchers (above 4 bb/9 or something?).
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
but a lot depends on whether it’s a starter or a reliever, since a starter would be 180-200 ip and a reliever would be somewhere around 70 ip.
Maybe I should keep it at around 100 IP, and maybe allow others to convert it to 70 IP or 200 IP if they so choose.
My guess is that you won’t be able to find data describing whether strikeouts were swinging or looking, but there might be some sort of walk cutoff where it seems like you’re dealing with power, rather than finesse, pitchers (above 4 bb/9 or something?).
I’m including these MLB measures: K%, BB%, K/BB, FIP, IP, fWAR, fWAR/100 IP, WHIP, and HR/9.
I calculated FIP for MiLB pitchers, as well as K% and BB% (Batters faced is a stat available on B-R, so it was easy to calculate).
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
As long as conversion goes,
if you’re doing this in Excel (it seems like you are?), unless you’ve already started and haven’t separated innings by starter/reliever, you can put in an IF statement that would basically sort for whether a pitcher has accumulated more starting than relieving and sort it by class that way. Or you could also include one field with 70 innings and one with 200 innings, though the problem with that is that Fangraphs uses different baselines for their WAR calculations (reliever replacement-level is higher FIP than starter replacement-level).
Regrettably, I am guessing that the swingingstrike data, etc. can’t be found, so I think your method makes sense. I wonder if separating out by age might be a proxy for power vs. finesse pitchers, with power pitchers generally hitting AA (and the majors) younger than finesse pitchers. I don’t know if that’s the case necessarily, but I wouldn’t be surprised if it were.
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
I am using Excel, yes. I have the basic framework together, but have not started putting the MLB numbers in yet.
I’d like to sort out relievers from starters, but I don’t know how to do so. IP limit?
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
at the bottom of Fangraphs pages,
they sort out the innings of individual players for you. I don’t know how you’re doing it, but if you’re searching individual players (and, if you are, I am quite impressed!) you can find the information there.
Otherwise, I’d just go by a games started system. If the pitcher has started 35% or more (or some other cutoff) of his games, I’d classify him as a starter, otherwise I’d classify him as a reliever.
Else, as you said, you could just do an IP vs. G method . . . greater than 3 IP per game, he can probably be considered a starter, fewer he should be considered a reliever.
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
I don’t know how you’re doing it, but if you’re searching individual players (and, if you are, I am quite impressed!) you can find the information there.
Indeed, each player’s MLB stats were searched individually. My, was it an epic journey. =P
I’ll try the IP/G method to separate starters from relievers.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
So, after looking at a data set containing all pitchers since 1980
Using IP/G, I will use 4 IP/G as the minimum for Starters, 3 IP/G as the maximum for Relievers, and anyone within the 3-4 IP/G range will be either/or (eg. Derek Lowe or John Smoltz type pitchers).
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
Makes sense
all depends on where you want your cutoff to be.
I’d probably call Lowe and Smoltz starters but you can always pro-rate it for both. just don’t forget that replacement-level ip for starters use a higher FIP
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
which I think you should test with ISO
I actually included ISO, but accidentally cut and pasted on top of it half way through this list. Didn’t notice until it was too late. =(
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
Btw, I checked the difference between MLBers (>400 career PA) and non-MLBers in certain measures
BB/K:
MLB: 0.70
non-MLB: 0.59
p<0.0001
ISO:
MLB: .162
non-MLB: .143
p<0.0001
estContact%:
MLB: 82.8%
non-MLB: 80.6
p<0.0001
My one regret with this list is not including the fielding position each player played. I think it would help put the numbers into context even more.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
This is by comparing the MiLB numbers, btw.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
Interesting,
what might also be illuminating is comparing not just how those player’s performed in MiLB to one another but to look at how strongly linked a player’s MiLB ISO, contact, and plate discipline are to his MLB ISO, contact, and plate discipline. It’s often said that plate discipline translates much better from the minors to the majors but I’ve not seen the relationships compared with one another.
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
I did initial measures between MiLB and MLB ISO (~ halfway point)
The R, IIRC, wasn’t as high as K% and BB%.
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
the R**2 values are useful
but only describe how much variance the model explains. what would be equally useful would be to look at the coeficients of the model variables, themselves, which would describe how well they are autocorrelated with one another (basically how close the slope is to 1.0) because that would tell us how close the values are to one another.
I wouldn’t worry about the R**2 values being a bit lower for ISO, unless that were the case when you looked at power hitters by themselves. I’d guess that you’d see way more variance in ISO than in other metrics, partially because even somewhat light-hitting batters can drive the ball against worse pitchers and partially because ISO should be way more related to park variables than K- and BB-rates.
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"
So, I should try reliability measures on model variables?
"We are all agreed that your theory is crazy. The question that divides us is whether it is crazy enough to have a chance of being correct."
- Niels Bohr
Sorry, unauthorized hotlinking of copyrighted material not permitted.
Well, up to you
I do think that would be quite interesting. At the very least, it would give some credence to our current views and, at the most, it could provide some evidence that the way we currently perceive things is off.
"Look at me! I'm Tomokazu Ohka of the Montreal Expos!"

by 






















