Cookin’ With Gas

Statistical analysis of the Baltimore Orioles on an almost weekly basis.

Archive for November, 2006

ERA vs Theoretical ERAs

Posted by cookinwithgas on November 3, 2006

Okay, it has been a while since my last post on here, but things have been kind of crazy for me. 

First, let me say congratulations to the St. Louis Cardinals and their fine fans.  While I think it is safe to say that they probably weren’t the best team in baseball this year, they were the best team when it mattered most.

 

Here’s my latest article on the Orioles Hangout.

 

Yet another outstanding article from the fine folks at THT can be found here.  And yes, it is on my favorite subject.  I can hardly wait for the THT Annual.  Be sure to order your copy today.

 

One thing that has taken a lot of my time is that I had to rebuild my stats database after my hard drive crashed on me.  I now have the pitching stats for every pitcher from 2002 through 2006.  Included in these stats is batted ball data.  Having this nice database of information gives me the ability to perform a lot of studies over the winter.  I’m open to any ideas that you may have.  Just remember that I’m not a mathematician, so I may not be as thorough or as clean with my analysis as the fine folks at sites such as BP and THT.

 

One of the first studies I wanted to do was one on the consistency on specific stats such as ERA, various theoretical ERAs (think DIPS and FIP ERA), K-Rate, and so on.  I can now do that.  To do it I first had to narrow down all the instances in which a pitcher appeared in back to back seasons.  Rodrigo Lopez, for instance, ended up with four pairs of seasons (2002-2003 / 2003-2004 / 2004-2005 / 2005-2006).  I lined up each of the Year 1 and Year 2 ERAs, and then performed a simple correlation test on Excel.

I wanted to compare the correlation of ERA to those of various theoretical ERAs.  I used three versions each of four fairly common theoretical ERAs – DIPS, Component ERA (ERC), FIP, and XERA.  ERC was invented by Bill James, XERA by Ron Shandler and the folks at Baseball HQ.  This site gives the basic formula for each.  FIP ERA is sort of the cousin to DIPS. 

 

I actually use three different versions of each theoretical ERA.  The first version (ERC1, XERA1, FIP1, DIPS1) is based on the standard non-adjusted stat.  I also use DIPS 3.0 instead of either of the original two versions created by Voros McCracken (so my numbers would be different from those posted on ESPN.  For version 2 of each (ERC2, XERA2, FIP2, DIPS2) I adjusted the hits and innings pitched totals using batted ball data.  For version 3 of each (ERC3, XERA3, FIP3, DIPS3) I made the same adjustments as version 2, but took it a step further by normalizing the number of infield flyballs and line drives for each pitcher to the overall MLB average for each stat for the 2002-2006 seasons.  The next trick was to filter out pitchers based on innings pitched totals as I went along.

 

This list shows the correlation for each stat for all pitchers.  The totals for the 1,848 pitchers were 156,276 IP in Year 1, and 152,186 IP in Year 2:

 

ERA                 .068

ERC1               .075

ERC2               .084

ERC3               .204

DIPS1              .199

DIPS2              .185

DIPS3              .265

XERA1            .118

XERA2            .120

XERA3            .271

FIP1                .139

FIP2                .202

FIP3                .251

 

The highest correlation for any of these was only.271, which isn’t very high.  It is telling that ERA was so low when compared to anything other than ERC1 or ERC2.  Note that XERA3 had the highest correlation.

 

This list shows the correlation of each stat for all pitchers with at least 25 IP in each season.  The totals for the 1,250 pitchers were 137,732 IP in Year 1, and 135,054 IP in Year 2:

 

ERA                 .263

ERC1               .301

ERC2               .482

ERC3               .578

DIPS1              .549

DIPS2              .562

DIPS3              .593

XERA1            .321

XERA2            .503

XERA3            .623

FIP1                .402

FIP2                .576

FIP3                .587

 

The correlation for ERA became a little better this time, even though it was still less than it was for XERA3 in the first chart.  Once again, XERA3 had the highest correlation – with a very respectable .623. 

 

One last list – this one showing all pitchers with at least 75 IP in each season.  The totals for the 557 pitchers were 90,998 IP in Year 1, and 89,583 IP in Year 2:

 

ERA                 .381

ERC1               .430

ERC2               .595

ERC3               .680

DIPS1              .665

DIPS2              .681

DIPS3              .688

XERA1            .452

XERA2            .601

XERA3            .706

FIP1                .553

FIP2                .687

FIP3                .688

 

Once again we see improvement for ERA – even though it was still lower than every single theoretical ERA.  XERA3 was also the king once again, even though ERC3. DIPS1, DIPS2, DIPS3, FIP1, and FIP2 also did quite well, and weren’t far behind XERA3.

 

So now we have an idea of the consistency of each theoretical ERA.  In the next installment I plan to evaluate the success rate of each at predicting whether Year 2’s ERA will go up or down.  We’ll also look at whether the difference can be used to predict Year 2’s ERA.

 

After I had begun typing this article I stumbled upon this posting on another site.  This guy essentially argues that the theoretical stats are too busy and that we should just focus on K:BB ratio (or Command Rate, or whatever else you might want to call it) as they are more consistent.  So I decided to do a quick correlation test.  Using pitchers with at least 75 IP in each season, K:BB had a correlation of .598.  Pretty good, but eight of the theoretical stats were higher.  Having said that, I think he makes a pretty good argument, and I plan to study it further.

 

One final set of correlations to mention.  This list shows the correlations of various rate stats.  The columns are 25 IP, and 75 IP.

 

H/9                  .392     .471

HR/9                .207     .356

HB/9                .330     .408

BB/9                .521     .653

K/9                  .720     .768

GB%                .756     .821

FB%                .722     .734

IFFB%             .187     .241

LD%                .088     .066

HR/OFFB%     .083     .199

BIP%               .735     .779

 

Most of the above is not a surprise, but there is one big surprise – at least to me.  I was floored that the Year 1 to Year 2 correlation for H/9 was higher than the Year 1 to Year 2 correlation for HR/9 – and it wasn’t really even close. 

 

I was concerned that my methodology may have been wrong, but the data available in the 2006 THT Annual was consistent with my numbers, so I’m pretty confident they are right.

 

I’ve seen it written many times that pitchers have a lot of control over whether a batter hits a home run, but not nearly as much control over whether a batted ball becomes a hit.  The above tells me we’re either giving pitchers too much or too little credit (depending on your point of view).  One thought that crossed my mind is that home run data was skewing the data, so I ran correlations for (H-HR)/9.  I came up with a correlation of .447 – not much less than H/9, but still higher than HR/9.

 

Interesting.

Posted in Batted Ball | 1 Comment »