Cookin’ With Gas

Statistical analysis of the Baltimore Orioles on an almost weekly basis.

Archive for June, 2007

Posted by cookinwithgas on June 10, 2007

One of my newest, favorite toys is the Pitch Data supplied by Baseball-Reference.  The problem with the data is that there is so much of it that it is hard to decipher the importance of each stat contained in the data.  I’m sure there have been extensive studies of the data, but I haven’t seen them.  I did see a pretty good article on The Hardball Times website about it, but that’s about it.  I decided to do a small study of my own. To do this, I needed some data.  I transferred the 2005 through 2007 data of every pitcher who has appeared in a 2007 American League game through May into an Excel spreadsheet.  The approach I decided to take is to first determine which of the stats a pitcher has the most control over.  (I would be remiss if I didn’t warn you up front that the data used in this study potentially suffers from the dreaded “small sample size.”)   I did this via year-to-year correlation of each stat (filtering out any pitcher who had faced less than 100 batters in either year one or year two – leaving me with 224 pairs of seasons).  The results: 

Cntc%             .732

StS%               .721

StI%                .719

Sw/In%           .700

K%                  .685

Strk %             .668

P/PA                .620

StL%               .594

1st%                .533

StF%               .516

SO c%            .484

ERA                 .272

The definition of each of the above stats can be found on each pitcher’s B-R page.  Just below the pitching stats line you’ll see Pitch Data Summary (Show or Hide).  Click on the show or hide to see the data.  Below the data you will see the word Glossary – click on the word to see it.  I actually made up one of the above stats – Sw/In%.  Sw/In% is the percentage of strikes swung at by a batter that are put in play (a home run is counted as a ball in play in this instance).  Notice that seven of the 12 stats listed above have a correlation of at least 0.7 – remember those stats. Next, I needed to find out which stats correlate best with ERA.  To determine this, I decided to use the overall three-year data (filtering out any pitcher who faced fewer than 300 batters – leaving me with 161 pitchers).  The results: 

StI%        .430

Sw/In%   .421

Cntc%     .391

SO c%    .082

StL%       .046

StF%       -.122 P/PA        -.196

StS%       -.387

1st%        -.414

Strk %     -.427

K%          -.525

The correlation obviously isn’t as high for this set of correlations.  On the bright side, four of the stats I wanted to focus on based on the first chart each had a correlation of at least 0.4.  The only outlier is 1st%.   I’m a big believer in the importance of striking out batters.  To this end, I decided to run a correlation of Pitch Data to K% (the percentage of batters faced who strike out).  This was performed in a similar fashion to the previous study.  The results: 

StS%       .828

P/PA        .621

Strk %     .183

StF%       .174

1st%        .150

StL%       -.023

SO c%    -.119

ERA         -.525

Cntc%     -.859

Sw/In%   -.890

StI%        -.930 

No big surprises here.  One thing I’d like to point out.  The previous chart showed a relatively high correlation between 1st% and ERA, whereas this chart shows a low correlation between 1st% and K%.  I find it fascinating that there is such a low correlation between K% and 1st% – especially if you watch a lot of baseball and hear so many announcers talk about the importance of throwing strike one. Because 1st% has a relatively low year-to-year correlation, I will not focus on its importance.   My opinion, based on these correlations is that the pitch stats to focus on are:StI%Strk%Sw/In%Cntc%StS% So now that we know which stats we want to focus on, what do they mean in terms of Orioles pitchers?   This chart shows the ERA levels and expectancies at the various pitch data levels using three year cumulative data. 

StI%            Low         High        Median          AVG         <4.00       Between   >5.00

Lo                1.55         4.67         3.41                3.24         73%         23%            3%

Mid              1.99         6.46         4.19                4.19         38%         44%            17%

Hi                3.34         6.33         4.79                4.83         9%           63%            28%

The three-year average StI% (percentage of strikes thrown that are put into play) for the pitchers in this study is 31%, with 27% and 34% being at the extremes.  This chart tells us there’s a 73% chance that a pitcher with a 27% or lower StI% will finish with an ERA below 4.00, while there’s a 28% chance that a pitcher with a 34% or higher StI% will finish with an ERA above 5.00.   So how do the 2007 Orioles pitchers rate through Saturday? 

Ray              25%

Bedard          25%

Parrish          26%

Williamson    27%                   

Burres          29%

Walker          30%

Cabrera         31%

Bradford        33%

Guthrie         33%

Williams       33%                   

Trachsel        38%

Baez            41%

Strk%              Low        High        Median          AVG         <4.00       Between   >5.00

Hi                    1.55        5.27         3.45                3.56         62%         31%            8%

Mid                  2.15        6.33         4.19                4.17         41%         43%            16%

Lo                    3.29        6.46         4.77                4.77         14%         59%            28%

The league average Strk% (percentage of pitches thrown that are strikes) for the pitchers in this study is 63%, with 66% and 60% being at the extremes.  This chart tells us there’s a 62% chance that a pitcher with a 66% or higher Strk% will finish with an ERA below 4.00, while there’s a 28% chance that a pitcher with a 60% or lower Strk% will finish with an ERA above 5.00.   Orioles pitchers in 2007:

Bradford        69%

Walker          67%                     

Guthrie          65% Williams        65%

Bedard          64%

Ray               63%                     

Williamson    59%

Burres           59%

Cabrera        58%

Trachsel       57%

Parrish          56%

Baez             54%

Ouch.  This may be the single biggest stat that bothers me about Daniel Cabrera.  From what I’ve seen from eyeballing things, once a pitcher establishes himself as a sub 60% Strk% pitcher he typically stays there.  The only pitcher I’ve seen who has defied this is Randy Johnson. Of the 161 pitchers in the study, Bradford had the second highest Strk% (71%), while Cabrera had the 5th worst (58%). 

Sw/In%            Low     High         Median          AVG         <4.00       Between   >5.00

Lo                    1.55     5.18         3.45                3.32         73%         23%            3%

Mid                  2.62     6.46         4.25                4.24         38%         44%            17%

Hi                     3.39     6.33         4.78                4.80         9%           63%            28%

The three-year average Sw/In% for the pitchers in this study is 42%, with 37% and 47% being at the extremes.  This chart tells us there’s a 73% chance that a pitcher with a 37% or lower Sw/In% will finish with an ERA below 4.00, while there’s a 28% chance that a pitcher with a 47% or higher Sw/In% will finish with an ERA above 5.00.  

Parrish          34%

Ray               35%

Bedard          35%

Williamson    36%                     

Walker          39%

Burres           42%

Cabrera        44%

Guthrie          46%

Bradford        46%                     

Williams        48%

Baez             53%

Trachsel       57%

Trachsel had the third worst rate in the study.  This chart and the StI% chart are a couple of good examples of why I’m such a big fan of Erik Bedard. 

CntC%            Low     Hgh         Median          AVG         <4.00       Between   >5.00

Lo                    1.55     4.87         3.37                3.26         78%         22%            0%

Mid                  1.99     6.46         4.31                4.28         36%         46%            18%

Hi                     2.99     6.33         4.60                4.61         19%         56%            25%

The league average Cntc% (percentage of strikes thrown in which the batter makes contact) for the pitchers in this study is 80%, with 75% and 84% being at the extremes.  This chart tells us there’s a 78% chance that a pitcher with a 75% or lower Cntc% will finish with an ERA below 4.00, while there’s a 25% chance that a pitcher with an 84% or higher Cntc% will finish with an ERA above 5.00. 

  

Williamson    64%

Parrish          69%

Walker          74%

Ray               75%

Bedard          75%                     

Burres           78%

Cabrera        79%

Williams        80%

Baez             83%

Guthrie          83%                     

Bradford        85%

Trachsel       88%

One thing I like about this staff is that when they throw strikes they’re hard to hit.  Trachsel tied for the second highest rate in the study. 

StS%            Low        High        Median          AVG         <4.00       Between   >5.00

Hi                  1.55        5.28         3.40                3.42         71%         26%            3%

Mid               1.99        6.46         4.32                4.27         37%         45%            18%

Lo                 2.99        6.33         4.61                4.60         20%         56%            24%

The league average StS% (percentage of strikes thrown in which the batter swings and misses) for the pitchers in this study is 15%, with 18% and 12% being at the extremes.  This chart tells us there’s a 71% chance that a pitcher with a 18% or higher StS% will finish with an ERA below 4.00, while there’s a 24% chance that a pitcher with a 12% or lower StS% will finish with an ERA above 5.00.  I’ll admit that this is my favorite pitch data stat. 

Williamson    27%

Parrish          23%

Walker          20%

Bedard          18%

Ray               18%                     

Burres           15%

Cabrera        15%

Williams        14%

Wright           14%

Baez             14%                     

Guthrie          12%

Bradford        11%

Trachsel       8%

Trachsel tied for the highest rate in the study. Trachsel has obviously proven that a pitcher can still succeed while not doing well in pitch data stats.  The problem is that his margin for error is so much greater than it is for other pitchers.

Posted in Uncategorized | Leave a Comment »