There is a lot you can do right, or wrong when placing a sports betting wager. This piece is meant to give you an overview of 4 basic principles to follow and the mistakes to avoid.
- DO – Learn what actually matters
What actually wins, say, football games. Note “actually”, is key here. If you have been around the game long enough, you heard clichés like “*defense wins championships”, “turnovers are the only statistic coaches care about” or “when the RB runs for over 100 yards you win”. The last one there is my all time favorite, a classic correlation/causation mistake (it should be, when you win, you tend to run for a lot of yards, because you’re likely milking the clock). Beyond the fact none of these are really true, the more important aspect is the very question itself. In other words, never assuming you know some idiom to be the case, but rather to search for objective facts to support an idea.
*According to FootballOutsiders work by Chase Stuart, Neil Paine, and Brian Burke suggests a split between offense and defense of roughly 58-42, without considering special teams. Our research suggests that special teams contributes about 13 percent to total performance; if you measure the remaining 87 percent with a 58-42 ratio, you get roughly 4:3:1.
Here’s an example. Before I had started using formal models, I had thought long and hard about what actually won or lost football games, and considered the following relative variables when predicting winners:
This is an actual snapshot of all the things, still by hand that I was considering on a week-to-week basis, in hopes of predicting the correct winner. Variables such as where an OC or DC previously coached, how “hot/cold” a QB has been etc. Almost 32 different variables I was considering, not to mention taking a ton of time finding, especially without an automated process. However, after the season finished and I ran a simple regression comparing how the variables I was considering impacted ACTUAL wins and losses, only 3-4 of them actually had a statistically significant impact.
The moral of the story is to NOT assume certain things you’ve been told actually matter to predicting football, but instead to test. Now, I know the vast majority of this book’s readership is not looking for some advanced statistical lesson, but I will provide a very basic, very rudimentary process you can do to test any theories you have (Again, for the advanced bettors I would not recommend this, but for those starting out).
Let’s say you think that NEXTgen stats’ “QB Accuracy +/-” impacts the result of football games. You can use a simple function in excel called “CORREL” which quickly correlates two variables. Granted, gathering the data, whether it be by scraping/crawling or old school copy and pasting (will be covered later in the book) takes a lot of time (not to mention you the amount of samples you want to include to make the data statistically significant), the actual process to test a baseline level of “does it matter” is quite simple to complete in excel. It would look something like this:
In other words, you’re investigating what actually impacts the outcome of games. Its likely not what you’ve been taught, and the only thing you should really trust are the facts backed up by truly reputable sources and/or your own research.
Don’t get bogged down in the “noise”, focus solely on variables that actually, historically have shown to impact the outcome.
- DO – Ensure the variable you’re considering is it objective, trackable and predictive
The second aspect is a tad more complicated, with each item deserving its own explanation:
Is the variable objective:
If you are not familiar with the terms “objective” and “subjective”, simply put, an objective item is one that can be measured. It’s tangible, like how many cars pass through an intersection on a given day. Subjective items would include things like “how courteous is the driver going through the intersection”. That is, you really have no concrete way of recording the magnitude of that said variable. Thus, when using data to help understand what’s likely to happen, the data is worthless. Relevant examples of this could be motivation by the athletes, actual ability to overcome adversity (not simply raw “come-from-behind” victories), etc. Focus on hard, tangible variables.
Don’t work with variables that can NOT be measured
Is the variable trackable (i.e. accounted for)
There are plenty of variables out there that can fairly be considered as “impacting the outcome” of a game, yet are not measured. For example, one of the longest running “aspects sharps try to measure” is coach impact. And although FPS has a very specific way we judge this, (we’ll cover later in the book) it’s very tough to empirically judge the actual impact a coach makes on a particular game. Specifically, after you have a theory on a certain item that may dictate the outcome, you need to be able to track and record it. If not, unfortunately, you won’t be able to do much with it.
NOTE: There are plenty of automatic/semi-automatic ways to capture this info if you fear the amount of work it would take to copy and paste loads of data. Processes called scraping, crawling and even simple excel functions work great for the novice PC user.
More relevant to this topic is likely figuring out how to measure it. One more example:
Let’s say you want to bet on the sack prop for a particular game. And lets say you already know that sacks are not very predictive, and instead want to see pressure rates by pass rusher in previous games. Up until the Sports Info Solutions types came around, no one was recording this data point, you literally needed a team of video scouts to watch and chart games to get this data readily available. Hence, if you thought using historical pressure rates may correlate with sack totals, you were likely right. However, if you had a plan to leverage this info pre-2016 you likely were SOL as the datapoint was not objective at the time.
Don’t work with variables that can be measured, yet are not (unless you have the time and resources to invest heavily in them).
Is the variable predictive
This is the most important, most under-utilized and most difficult aspect to step one. Frankly, it’s where good bettors are separated from great bettors. As mentioned earlier in how Winvest was formed, discovering almost 90% of the variables we were considering weren’t only useless, but a huge waste of time, the variable’s predictive power is absolutely key.
Let’s say you have a hunch that you can predict total points a team will score in a game, and you believe that turnovers, interceptions in particular, will play a crucial part in this prediction. In other words, you think you know a team will score more/less points than the set line, and feel given the circumstances, it will come down to the QB that turns the ball over less (in the air). That’s a perfectly fine assertion, but what are you going to use to help predict actual interceptions that will be thrown between QB A and QB B? Let’s say you go with each QB’s interception rate from the previous games in the season.
The variable in question is in fact objective, and easily tracked, but is it predictive? That is, does how much a QB has thrown interceptions thus far in the season actually impact how many interceptions he will throw in this ONE game (one sample)? Knowing the answer to this question is crucial to utilizing a relevant variable, because at the end of the day you want to know the impact that has on variable abc, to future occurrences of abc. Many times, even personally, when I have had what I thought was an awesome system that would break the sportsbooks I found out that what I thought mattered, might, but wasn’t something historical magnitude to infer future happenings.
To understand predictive value a bit better, refer to my favorite explanation, “Why your math teacher was wrong”:
Early in the 2017 MLB season DJ LeMahieu is leading the league in batting average, hitting .348 according to Baseball Reference MLB Leaders. We all know by now advanced metrics, OBP, etc. paint a much better picture of batter production. However, today’s post is not an attack on traditional numbers, rather an attack on a false assumption some have with any batting statistic, even the media elite and some elementary math textbooks:
As you can see in the chart below, as of press-time LeMahieu is hitting .348, or he has gotten a hit 34.8% of the time. This is factually correct on how LeMahieu HAS performed this season, but tells us nothing about his future performance.
When DJ steps to the plate today, carrying a .348 average, he does NOT have a 34.8% chance of getting a hit. Its an easy trap to fall in, and although its close, its not technically the “hot hand fallacy“, and I have yet to find the formal mathematical term for this fallacy – so, until someone tells me otherwise I am calling this “Predicting Future on Previous Outcome Fallacy” (better names gladly welcomed). When DJ, or anyone else for that matter steps to the plate, his previous success of getting a hit does not mathematically impact his odds of getting a hit at this instantaneous moment. If you were to look at each at bat from previous games in a vacuum, yes, you KNOW there is a 34.8% chance of a hit each time he steps to the plate.
Imagine a round of roulette:
Regardless of what you play, lets say you play red, you KNOW going into the event there is a set probability of 18/40 you will have a successful spin (0 and 00 I believe are green). That outcome is fixed, and not based on previous events. Because of the variance of the human element and talent, every single at bat (AB) technically has two outcomes (in the context of Batting Average): 1 or 0 out of 1.
The issue is one of using historical data to infer future probability. Yes, this information could give you the right answer, and someone hitting .300 is likely to have a more successful outcome than someone hitting .250 in one particular AB, but officially, you can NOT say he has a 30% chance of getting a hit. When the player steps to the plate, in regards to batting average, there is a set, binary level of outcomes: hit, no hit. And he steps to the plate with a clean slate, that instant, in that moment is not constricted to previous outcome’s successes.
Think of it this way:
Let’s say through the first half of the season (lets say 250 ABs) Anthony Rizzo is hitting a stellar .300, yet he finishes the 2nd half of the season at .280 (500 ABs). This means on his 251st AB, stepping to the plate with a .300 BA, we have the (false) assumption he has a 30% chance of getting a hit, even though we know his “2nd half BA” was actually .260 (or 26%).
The long and short of it is we don’t know what someone’s actual odds of getting a hit (BA), getting on base (OBP), etc. is IN THAT MOMENT they step to the plate, and using historical data, even if you whittle it down to the most specific split (vs. LH, at night, etc.) that event doesn’t KNOW what a player’s previous events looked like. So, we can’t say a player has a (batting average in %) chance of getting a hit this AB.
Don’t fall into the trap of correlated data that’s not actually predictive BEFORE the fact.
- Don’t ignore the fact the sportsbook may have already priced in your angle/variable.
I once heard a friend say, “I’m taking the Giants at -2.5 in the first half vs. the 49ers, given the 1:00 PM EST start time”. The angle he was referring to was what we call “10 AM PST Game”. This happens quite frequently in the NFL given their standard 1:00 PM / 3:25 PM EST Sunday start times, when a visiting west coast team is forced to play a game much earlier than their opponent, at least in terms of Circadian Rhythms. If you’re not familiar, think about it for a second: football gameday has a lot of prep and pregame involved, with players arriving 3-6 hours early (for warmup, medical treatment etc.). This means that the east coast team (Giants in this case) get to “sleep in” 3 more hours than the opponent, relatively speaking and have a significant advantage, especially early in the game vs. their opponent (the 49ers in this example). All of this is true, and frankly, pretty astute: imagine how productive you’d be at work if your company flew you three times zones later, relative to a colleague that lives there.
There is one major issue with my friend’s contention here: it’s already priced in. What I mean by this, sportsbooks, and much of the market already knows this. If, in a hypothetical scenario the 49ers were to play the Giants in a neutral place, with neutral time zones (as crazy as that sounds) that same first half line would have likely been Giants -1.5. In other words, the pricing/payout/etc of that particular play is already widely known and accounted for by those that are setting the market (and the market players). There really is no advantage here. Again, not saying he is wrong, or even that he may lose, but his reasoning behind the wager is flawed given that it’s already baked in.
Don’t assume you’re the only person that “knows the thing you’re tracking. It’s not “what you know”, it’s “what you know better than the sportsbook/market”.
- Don’t buy picks
Here’s a secret of the sports betting industry. Those that are actually good at making predictions, don’t need to have someone buy the picks from them. They will just play the picks themselves. Think about that for a second. If that guy touting 70%, is actually picking as such, why doesn’t he just bet those picks? Wouldn’t he be making a killing. A better yet, if they are as good as they say they are, they’d probably NOT want other people knowing their picks, and possibly moving the market towards their selections and away from their advantage.
The sad fact of the matter is, those that have to sell picks are simply mitigating their own risk across “buyers of picks”, and maintaining their ROI regardless of success.
Don’t get me wrong, there are alot of good people out there with valuable INFORMATION and insight, but those selling straight picks, clearly don’t trust themselves enough, so you shouldn’t either.