MLB Update: An Overview on our Betting Model

MLB Update: An Overview on our Betting Model
Dan Rubin
March 17, 2021
Stock image of Chemistry

The most consistent feedback we received from our readers was wanting more details on how our betting models work. While we do not believe complexity is required to have a great betting model, contextualizing data is critical. In baseball, the context of an event is hugely important. Imagine this sequence of events:

With two outs and the bases empty, 1) Mike Trout hits a homerun. 2) Anthony Rendon follows him with a walk. 3) Justin Upton, batting fifth, strikes out, stranding Rendon on first to end the inning. The Angels score one run and leave one base.

If Rendon was batting before Trout and the same three events occurred (Rendon walk, Trout HR, Upton K) the Angels would have scored an additional run. Would this make Trout's homerun any more impressive? We would argue no.

In our MLB model, we prefer to assess performance using context-neutral statistics. Baseball analyst Mike Gimbel was one of the first to analyze baseball performance using Run Production Average (RPATM) in the mid 90s, but the concepts still hold considerable merit.

The beauty of modeling MLB is that most of the game can be broken down into a series of interactions between a pitcher and hitter in which there are only a finite number of outcomes: strikeout, groundout, walk, double, homerun, etc. From a pitcher's perspective, these outcomes range from great (strikeout) to terrible (homerun). After we assign a value in terms of expected run contribution to each outcome, we can begin to assess player performance.

To assess performance more accurately, we need to contextualize performance for a multitude of factors such as their opponent and what stadium they're playing at. Hitting a homerun off of Gerrit Cole with the wind blowing in at Yankee Stadium is more impressive than hitting a homerun off of a Double-A call-up at Coors Field. We contextualize performance by using a mixed-effects regression model. To put simply, a mixed-effects regression model allows us to breakdown an event such as hitting a homerun into a multitude of contributing variables. We solve for how much weight to put on each of these variables by minimizing the error in our model predictions.

Once we feel that we have reasonably assessed historical player performance using this model, we forecast future performance using, among other factors, player aging curves, as players over time tend to improve, then regress depending on age. The value of each player is then consistently updated with new data as the season progresses.

Now that we have a prediction of player performance measured by run contributions, forecasting how many runs we expect each team to score is not much more difficult than adding up the run contributions for a given lineup facing a given pitcher(s). To translate this number into a betting price to compare against the markets, converting expected runs into a win probability involves sampling from an estimated distribution of the number of runs scored by each team, which closely resembles a negative binomial distribution.

Lastly - and importantly - understanding how great a model is requires an understanding of its limitations. Models are only so useful and cannot possibly capture every element that affects win probability. While we are confident in our models, we always regress our win probability to the probability implied by market odds before we make any assessment of expected value, and whether or not to bet on a particular game. Once we regress to the market and identify the bets with the most value, we send them to you and spend the remaining time up until the first pitch line shopping across several books to make sure we're getting the best price.

We try our best to  publish as many bets we place ahead of time as possible, and the bets we managed to share with our readers represented nearly 80% of our total MLB trading volume in 2020. Using this model, we published 122 bet recommendations to our readers last year, wagering $939,815 on these games to win $13,973. With a starting bankroll of $300,000, this represented a portfolio ROI of 4.7% and a trading ROI of 1.5%. While we would have liked better results, given the shortened season and added uncertainty from COVID-19, we believe coming out with any profits was an achievement in itself, and we are excited for what is in store for 2021.

Stay tuned next week when we will share what improvements we have made to our models for the upcoming 2021 MLB season.

Other Related Blog Posts

Receive our research, bets, and analysis