Cookin’ With Gas

Statistical analysis of the Baltimore Orioles on an almost weekly basis.

Archive for the ‘Batted Ball’ Category

ERA vs Theoretical ERAs

Posted by cookinwithgas on November 3, 2006

Okay, it has been a while since my last post on here, but things have been kind of crazy for me. 

First, let me say congratulations to the St. Louis Cardinals and their fine fans.  While I think it is safe to say that they probably weren’t the best team in baseball this year, they were the best team when it mattered most.

 

Here’s my latest article on the Orioles Hangout.

 

Yet another outstanding article from the fine folks at THT can be found here.  And yes, it is on my favorite subject.  I can hardly wait for the THT Annual.  Be sure to order your copy today.

 

One thing that has taken a lot of my time is that I had to rebuild my stats database after my hard drive crashed on me.  I now have the pitching stats for every pitcher from 2002 through 2006.  Included in these stats is batted ball data.  Having this nice database of information gives me the ability to perform a lot of studies over the winter.  I’m open to any ideas that you may have.  Just remember that I’m not a mathematician, so I may not be as thorough or as clean with my analysis as the fine folks at sites such as BP and THT.

 

One of the first studies I wanted to do was one on the consistency on specific stats such as ERA, various theoretical ERAs (think DIPS and FIP ERA), K-Rate, and so on.  I can now do that.  To do it I first had to narrow down all the instances in which a pitcher appeared in back to back seasons.  Rodrigo Lopez, for instance, ended up with four pairs of seasons (2002-2003 / 2003-2004 / 2004-2005 / 2005-2006).  I lined up each of the Year 1 and Year 2 ERAs, and then performed a simple correlation test on Excel.

I wanted to compare the correlation of ERA to those of various theoretical ERAs.  I used three versions each of four fairly common theoretical ERAs – DIPS, Component ERA (ERC), FIP, and XERA.  ERC was invented by Bill James, XERA by Ron Shandler and the folks at Baseball HQ.  This site gives the basic formula for each.  FIP ERA is sort of the cousin to DIPS. 

 

I actually use three different versions of each theoretical ERA.  The first version (ERC1, XERA1, FIP1, DIPS1) is based on the standard non-adjusted stat.  I also use DIPS 3.0 instead of either of the original two versions created by Voros McCracken (so my numbers would be different from those posted on ESPN.  For version 2 of each (ERC2, XERA2, FIP2, DIPS2) I adjusted the hits and innings pitched totals using batted ball data.  For version 3 of each (ERC3, XERA3, FIP3, DIPS3) I made the same adjustments as version 2, but took it a step further by normalizing the number of infield flyballs and line drives for each pitcher to the overall MLB average for each stat for the 2002-2006 seasons.  The next trick was to filter out pitchers based on innings pitched totals as I went along.

 

This list shows the correlation for each stat for all pitchers.  The totals for the 1,848 pitchers were 156,276 IP in Year 1, and 152,186 IP in Year 2:

 

ERA                 .068

ERC1               .075

ERC2               .084

ERC3               .204

DIPS1              .199

DIPS2              .185

DIPS3              .265

XERA1            .118

XERA2            .120

XERA3            .271

FIP1                .139

FIP2                .202

FIP3                .251

 

The highest correlation for any of these was only.271, which isn’t very high.  It is telling that ERA was so low when compared to anything other than ERC1 or ERC2.  Note that XERA3 had the highest correlation.

 

This list shows the correlation of each stat for all pitchers with at least 25 IP in each season.  The totals for the 1,250 pitchers were 137,732 IP in Year 1, and 135,054 IP in Year 2:

 

ERA                 .263

ERC1               .301

ERC2               .482

ERC3               .578

DIPS1              .549

DIPS2              .562

DIPS3              .593

XERA1            .321

XERA2            .503

XERA3            .623

FIP1                .402

FIP2                .576

FIP3                .587

 

The correlation for ERA became a little better this time, even though it was still less than it was for XERA3 in the first chart.  Once again, XERA3 had the highest correlation – with a very respectable .623. 

 

One last list – this one showing all pitchers with at least 75 IP in each season.  The totals for the 557 pitchers were 90,998 IP in Year 1, and 89,583 IP in Year 2:

 

ERA                 .381

ERC1               .430

ERC2               .595

ERC3               .680

DIPS1              .665

DIPS2              .681

DIPS3              .688

XERA1            .452

XERA2            .601

XERA3            .706

FIP1                .553

FIP2                .687

FIP3                .688

 

Once again we see improvement for ERA – even though it was still lower than every single theoretical ERA.  XERA3 was also the king once again, even though ERC3. DIPS1, DIPS2, DIPS3, FIP1, and FIP2 also did quite well, and weren’t far behind XERA3.

 

So now we have an idea of the consistency of each theoretical ERA.  In the next installment I plan to evaluate the success rate of each at predicting whether Year 2’s ERA will go up or down.  We’ll also look at whether the difference can be used to predict Year 2’s ERA.

 

After I had begun typing this article I stumbled upon this posting on another site.  This guy essentially argues that the theoretical stats are too busy and that we should just focus on K:BB ratio (or Command Rate, or whatever else you might want to call it) as they are more consistent.  So I decided to do a quick correlation test.  Using pitchers with at least 75 IP in each season, K:BB had a correlation of .598.  Pretty good, but eight of the theoretical stats were higher.  Having said that, I think he makes a pretty good argument, and I plan to study it further.

 

One final set of correlations to mention.  This list shows the correlations of various rate stats.  The columns are 25 IP, and 75 IP.

 

H/9                  .392     .471

HR/9                .207     .356

HB/9                .330     .408

BB/9                .521     .653

K/9                  .720     .768

GB%                .756     .821

FB%                .722     .734

IFFB%             .187     .241

LD%                .088     .066

HR/OFFB%     .083     .199

BIP%               .735     .779

 

Most of the above is not a surprise, but there is one big surprise – at least to me.  I was floored that the Year 1 to Year 2 correlation for H/9 was higher than the Year 1 to Year 2 correlation for HR/9 – and it wasn’t really even close. 

 

I was concerned that my methodology may have been wrong, but the data available in the 2006 THT Annual was consistent with my numbers, so I’m pretty confident they are right.

 

I’ve seen it written many times that pitchers have a lot of control over whether a batter hits a home run, but not nearly as much control over whether a batted ball becomes a hit.  The above tells me we’re either giving pitchers too much or too little credit (depending on your point of view).  One thought that crossed my mind is that home run data was skewing the data, so I ran correlations for (H-HR)/9.  I came up with a correlation of .447 – not much less than H/9, but still higher than HR/9.

 

Interesting.

Posted in Batted Ball | 1 Comment »

Batted Ball Data

Posted by cookinwithgas on July 28, 2006

There have been some concerns raised about the Orioles’ 6.02 ERA (as of July 27) for the month of July.  While I can understand these concerns, I tend to look at things a little differently.   

The problem with ERA is that there are so many factors that affect it, especially over short periods of time.  That’s one reason I prefer to look at other stats, particularly theoretical stats, such as one that I call True Performance ERA (yes, it is a borrowed name). TPE uses Component ERA, except my version adjusts hits and home runs based on batted ball data, and I make an adjustment on walks and strikeouts.  The idea is that this formula will take out flukes such as poor fielding or “luck” impacting the number of hits allowed, as well as some other things that impact ERA.

Here is how the Orioles pitchers fare by month:

TPE – 5.01 / 5.09 / 4.52 / 5.10
ERA – 5.54 / 5.54 / 4.49 / 6.02

TPE indicates that things haven’t been quite as bad this month overall.  In fact, take out the “contributions” of Russ Ortiz, and the July TPE drops to 4.72.  Also, even with Ortiz, the staff’s BB% (11.1 > 10.1 > 9.2 > 9.1) and K% (14.2 > 14.4 > 16.9 > 16.4 [but 17.3 without Ortiz]) represent continued improvement.

I’m sure some are wondering what other factors have impacted actual ERA.  These things have played a role:

HR/OFFB.  It is pretty much accepted that pitchers typically don’t have much control over the percentage of flyballs that become home runs. Typically, you can expect to see a rate of about 11%.  The Orioles rate by month is 14.1 > 12.6 > 13.4 > 16.5.  Yes, this is the second straight month it has increased, but a rate of over 16% is extremely high.  Lower that number to something closer to normal, and they would have allowed fewer HRs this month (11% = 22, compared to an actual total of 33).  Fewer HRs would likely have lead to a lower ERA.

LOB%.  The percentage of baserunners left on base.  The typical league average is about 71% (both leagues are at exactly 71% in 2006).  The Orioles by month – 68.4 > 68.9 > 72.5 > 67.0.  In other words, 33% of all runners who reach base score – as opposed to a league average of 29%.

H%/BABIP.  I track a stat I call H%, which is essentially the same as BABIP-A.  H% represents the percentage of balls put in play that become hits.  The Orioles by month: .298 > .296 > .297 > .327.  This tells me that their combination of “luck” (variance), defense, and yes, pitching just hasn’t been all that great this month.

I’m sure some will look at the above and see excuses. I don’t intend for them to be looked at as such. Yes, the pitcher still plays a role in each of these.

The point of this post is that the rise in ERA isn’t nearly as bad as it appears to be at first glance. 

MiLB Batted Ball Data 

I’ve been asked a lot about minor league batted ball data.  One recent question had to do with a pitcher’s ability to control whether a fly ball becomes a home run.  The person who asked made the assumption that this applies to both major and minor league pitchers.
I have seen arguments that some pitchers have an ability to limit home runs on outfield fly balls, but I haven’t seen the evidence that “proves” it (this evidence may exist, I just haven’t seen it).  For instance, a couple of people have pointed to Erik Bedard as one of these pitchers – pointing to his rates of 8.2 and 7.5% the previous two seasons.  The problem here is that his rate is up to 11.7% this year (even though it has been down to 9.1 and 8.3 in June and July, respectively).

I will say that my gut tells me that eventually someone will be able to show that pitchers do have more control over this than what is currently believed. 

The problem with the MLB/MiLB batted ball data discussion is that we are comparing Major League rates and expectancies to Minor League data.  For instance, we know that 11% of all OFFBs become home runs, we know that 18.3% of all line drives become doubles (based on 2002 through 2005 data), but we don’t know how often these things occur within the various minor leagues.  What we need is for someone to do that for each level and for each league.

The person that asked me the question was asking primarily about Astros farmhand Jason Hirsh – whom he thought had an ability to “miss bats.”  The thing to do (in my mind) is to compare his stats to the stats for his team and league (PCL).(Hirsh/Team/League – team and league through Friday):

GB%: 39 / 47 / 46
FB%: 47 / 37 / 37
IFFB%: 28 / 20 / 20 (% of flyballs that become popups)
LD%: 13 / 16 / 17

MLB stats tell us that IFFB% and LD% are not easily controlled by pitchers.  If that is true about minor league stats, then Hirsh may be in trouble – in that he can expect to start allowing more line drives and fewer popups.  FWIW, I’ll give an educated guess that minor league pitchers (especially the good ones) have much more control over these than do ML pitchers.  This would explain what would be his absurdly low (by MLB standards) HR/OFFB rate (3.8% compared to a PCL rate of 10.6%).

There are two other stats to compare:

BB%: 9.9 / 9.0 / 9.0
K%: 21.4 / 16.4 / 18.2

You have to love his K%, but should be a little worried about his BB%. 

By the way, the MiLB data comes from what is now one of my favorite stat sites.  (See the link to the right.) 

Clutchiness 

Thanks to the good folks at THT I have found yet another great blog – Clutchiness (see the link to the right).  I’m not even going to try to explain how this new stat works.  For one, I haven’t read enough about it to give a good explanation.  Besides, he does a great job of explaining it.  Do yourself a favor, take the time to read this – it is definitely good stuff.

By the way, let’s hope the Angels, Astros, and whatever other team that might be interested in Miguel Tejada doesn’t check out the link on the Orioles.  And to think that the primary reason given for going after Tejada a few years ago by a prominent OH poster was his abilities in the clutch.  Oh well, maybe that’s a fluke.

My Most Recent OH Article…

can be found here.

Posted in Batted Ball | 13 Comments »