Early in the 2017 MLB season DJ LeMahieu is leading the league in batting average, hitting .348 according to Baseball Reference MLB Leaders. We all know by now advanced metrics, OBP, etc. paint a much better picture of batter production. However, today’s post is not an attack on traditional numbers, rather an attack on a false assumption some have with any batting statistic, even the media elite and some elementary math textbooks:
As you can see in the chart below, as of press-time LeMahieu is hitting .348, or he has gotten a hit 34.8% of the time. This is factually correct on how LeMahieu HAS performed this season, but tells us nothing about his future performance.
When DJ steps to the plate today, carrying a .348 average, he does NOT have a 34.8% chance of getting a hit. Its an easy trap to fall in, and although its close, its not technically the “hot hand fallacy“, and I have yet to find the formal mathematical term for this fallacy – so, until someone tells me otherwise I am calling this “Predicting Future on Previous Outcome Fallacy” (better names gladly welcomed). When DJ, or anyone else for that matter steps to the plate, his previous success of getting a hit does not mathematically impact his odds of getting a hit at this instantaneous moment. If you were to look at each at bat from previous games in a vacuum, yes, you KNOW there is a 34.8% chance of a hit each time he steps to the plate.
Imagine a round of roulette:
Regardless of what you play, lets say you play red, you KNOW going into the event there is a set probability of 18/40 you will have a successful spin (0 and 00 I believe are green). That outcome is fixed, and not based on previous events. Because of the variance of the human element and talent, every single at bat (AB) technically has two outcomes (in the context of Batting Average): 1 or 0 out of 1.
The issue is one of using historical data to infer future probability. Yes, this information could give you the right answer, and someone hitting .300 is likely to have a more successful outcome than someone hitting .250 in one particular AB, but officially, you can NOT say he has a 30% chance of getting a hit. When the player steps to the plate, in regards to batting average, there is a set, binary level of outcomes: hit, no hit. And he steps to the plate with a clean slate, that instant, in that moment is not constricted to previous outcome’s successes.
Think of it this way:
Let’s say through the first half of the season (lets say 250 ABs) Anthony Rizzo is hitting a stellar .300, yet he finishes the 2nd half of the season at .280 (500 ABs). This means on his 251st AB, stepping to the plate with a .300 BA, we have the (false) assumption he has a 30% chance of getting a hit, even though we know his “2nd half BA” was actually .260 (or 26%).
The long and short of it is we don’t know what someone’s actual odds of getting a hit (BA), getting on base (OBP), etc. is IN THAT MOMENT they step to the plate, and using historical data, even if you whittle it down to the most specific split (vs. LH, at night, etc.) that event doesn’t KNOW what a player’s previous events looked like. So, we can’t say a player has a (batting average in %) chance of getting a hit this at bat.