One of my newest, favorite toys is the Pitch Data supplied by Baseball-Reference. The problem with the data is that there is so much of it that it is hard to decipher the importance of each stat contained in the data. I’m sure there have been extensive studies of the data, but I haven’t seen them. I did see a pretty good article on The Hardball Times website about it, but that’s about it. I decided to do a small study of my own. To do this, I needed some data. I transferred the 2005 through 2007 data of every pitcher who has appeared in a 2007 American League game through May into an Excel spreadsheet. The approach I decided to take is to first determine which of the stats a pitcher has the most control over. (I would be remiss if I didn’t warn you up front that the data used in this study potentially suffers from the dreaded “small sample size.”) I did this via year-to-year correlation of each stat (filtering out any pitcher who had faced less than 100 batters in either year one or year two – leaving me with 224 pairs of seasons). The results:
Cntc% .732
StS% .721
StI% .719
Sw/In% .700
K% .685
Strk % .668
P/PA .620
StL% .594
1st% .533
StF% .516
SO c% .484
ERA .272
The definition of each of the above stats can be found on each pitcher’s B-R page. Just below the pitching stats line you’ll see Pitch Data Summary (Show or Hide). Click on the show or hide to see the data. Below the data you will see the word Glossary – click on the word to see it. I actually made up one of the above stats – Sw/In%. Sw/In% is the percentage of strikes swung at by a batter that are put in play (a home run is counted as a ball in play in this instance). Notice that seven of the 12 stats listed above have a correlation of at least 0.7 – remember those stats. Next, I needed to find out which stats correlate best with ERA. To determine this, I decided to use the overall three-year data (filtering out any pitcher who faced fewer than 300 batters – leaving me with 161 pitchers). The results:
StI% .430
Sw/In% .421
Cntc% .391
SO c% .082
StL% .046
StF% -.122 P/PA -.196
StS% -.387
1st% -.414
Strk % -.427
K% -.525
The correlation obviously isn’t as high for this set of correlations. On the bright side, four of the stats I wanted to focus on based on the first chart each had a correlation of at least 0.4. The only outlier is 1st%. I’m a big believer in the importance of striking out batters. To this end, I decided to run a correlation of Pitch Data to K% (the percentage of batters faced who strike out). This was performed in a similar fashion to the previous study. The results:
StS% .828
P/PA .621
Strk % .183
StF% .174
1st% .150
StL% -.023
SO c% -.119
ERA -.525
Cntc% -.859
Sw/In% -.890
StI% -.930
No big surprises here. One thing I’d like to point out. The previous chart showed a relatively high correlation between 1st% and ERA, whereas this chart shows a low correlation between 1st% and K%. I find it fascinating that there is such a low correlation between K% and 1st% – especially if you watch a lot of baseball and hear so many announcers talk about the importance of throwing strike one. Because 1st% has a relatively low year-to-year correlation, I will not focus on its importance. My opinion, based on these correlations is that the pitch stats to focus on are:StI%Strk%Sw/In%Cntc%StS% So now that we know which stats we want to focus on, what do they mean in terms of Orioles pitchers? This chart shows the ERA levels and expectancies at the various pitch data levels using three year cumulative data.
StI% Low High Median AVG <4.00 Between >5.00
Lo 1.55 4.67 3.41 3.24 73% 23% 3%
Mid 1.99 6.46 4.19 4.19 38% 44% 17%
Hi 3.34 6.33 4.79 4.83 9% 63% 28%
The three-year average StI% (percentage of strikes thrown that are put into play) for the pitchers in this study is 31%, with 27% and 34% being at the extremes. This chart tells us there’s a 73% chance that a pitcher with a 27% or lower StI% will finish with an ERA below 4.00, while there’s a 28% chance that a pitcher with a 34% or higher StI% will finish with an ERA above 5.00. So how do the 2007 Orioles pitchers rate through Saturday?
Ray 25%
Bedard 25%
Parrish 26%
Williamson 27%
Burres 29%
Walker 30%
Cabrera 31%
Bradford 33%
Guthrie 33%
Williams 33%
Trachsel 38%
Baez 41%
Strk% Low High Median AVG <4.00 Between >5.00
Hi 1.55 5.27 3.45 3.56 62% 31% 8%
Mid 2.15 6.33 4.19 4.17 41% 43% 16%
Lo 3.29 6.46 4.77 4.77 14% 59% 28%
The league average Strk% (percentage of pitches thrown that are strikes) for the pitchers in this study is 63%, with 66% and 60% being at the extremes. This chart tells us there’s a 62% chance that a pitcher with a 66% or higher Strk% will finish with an ERA below 4.00, while there’s a 28% chance that a pitcher with a 60% or lower Strk% will finish with an ERA above 5.00. Orioles pitchers in 2007:
Bradford 69%
Walker 67%
Guthrie 65% Williams 65%
Bedard 64%
Ray 63%
Williamson 59%
Burres 59%
Cabrera 58%
Trachsel 57%
Parrish 56%
Baez 54%
Ouch. This may be the single biggest stat that bothers me about Daniel Cabrera. From what I’ve seen from eyeballing things, once a pitcher establishes himself as a sub 60% Strk% pitcher he typically stays there. The only pitcher I’ve seen who has defied this is Randy Johnson. Of the 161 pitchers in the study, Bradford had the second highest Strk% (71%), while Cabrera had the 5th worst (58%).
Sw/In% Low High Median AVG <4.00 Between >5.00
Lo 1.55 5.18 3.45 3.32 73% 23% 3%
Mid 2.62 6.46 4.25 4.24 38% 44% 17%
Hi 3.39 6.33 4.78 4.80 9% 63% 28%
The three-year average Sw/In% for the pitchers in this study is 42%, with 37% and 47% being at the extremes. This chart tells us there’s a 73% chance that a pitcher with a 37% or lower Sw/In% will finish with an ERA below 4.00, while there’s a 28% chance that a pitcher with a 47% or higher Sw/In% will finish with an ERA above 5.00.
Parrish 34%
Ray 35%
Bedard 35%
Williamson 36%
Walker 39%
Burres 42%
Cabrera 44%
Guthrie 46%
Bradford 46%
Williams 48%
Baez 53%
Trachsel 57%
Trachsel had the third worst rate in the study. This chart and the StI% chart are a couple of good examples of why I’m such a big fan of Erik Bedard.
CntC% Low Hgh Median AVG <4.00 Between >5.00
Lo 1.55 4.87 3.37 3.26 78% 22% 0%
Mid 1.99 6.46 4.31 4.28 36% 46% 18%
Hi 2.99 6.33 4.60 4.61 19% 56% 25%
The league average Cntc% (percentage of strikes thrown in which the batter makes contact) for the pitchers in this study is 80%, with 75% and 84% being at the extremes. This chart tells us there’s a 78% chance that a pitcher with a 75% or lower Cntc% will finish with an ERA below 4.00, while there’s a 25% chance that a pitcher with an 84% or higher Cntc% will finish with an ERA above 5.00.
Williamson 64%
Parrish 69%
Walker 74%
Ray 75%
Bedard 75%
Burres 78%
Cabrera 79%
Williams 80%
Baez 83%
Guthrie 83%
Bradford 85%
Trachsel 88%
One thing I like about this staff is that when they throw strikes they’re hard to hit. Trachsel tied for the second highest rate in the study.
StS% Low High Median AVG <4.00 Between >5.00
Hi 1.55 5.28 3.40 3.42 71% 26% 3%
Mid 1.99 6.46 4.32 4.27 37% 45% 18%
Lo 2.99 6.33 4.61 4.60 20% 56% 24%
The league average StS% (percentage of strikes thrown in which the batter swings and misses) for the pitchers in this study is 15%, with 18% and 12% being at the extremes. This chart tells us there’s a 71% chance that a pitcher with a 18% or higher StS% will finish with an ERA below 4.00, while there’s a 24% chance that a pitcher with a 12% or lower StS% will finish with an ERA above 5.00. I’ll admit that this is my favorite pitch data stat.
Williamson 27%
Parrish 23%
Walker 20%
Bedard 18%
Ray 18%
Burres 15%
Cabrera 15%
Williams 14%
Wright 14%
Baez 14%
Guthrie 12%
Bradford 11%
Trachsel 8%
Trachsel tied for the highest rate in the study. Trachsel has obviously proven that a pitcher can still succeed while not doing well in pitch data stats. The problem is that his margin for error is so much greater than it is for other pitchers.