How It Works

Methodology

FullCountProps is built on a simple idea: simulate every plate appearance of every game thousands of times, using real data, and let the math tell you which prop bets are mispriced. Here's exactly how we do it — no black boxes.

2,500
Simulations / Game
33
Model Features
6M+
Training PAs
8
Outcome Classes
01

What Is Monte Carlo Simulation?

Imagine you want to know how many strikeouts a pitcher will record tonight. You could look at his season average and guess — but that gives you a single number with no sense of the range.

Monte Carlo simulation takes a different approach. Instead of one answer, you simulate the full game thousands of times, introducing realistic randomness at every plate appearance. After 2,500 runs, you have a complete distribution — not just "he'll probably get 6 Ks" but "there's a 61% chance he gets 7 or more."

The name comes from the famous casino in Monaco. Like a casino, the method relies on the law of large numbers: run enough trials and the pattern converges on the real underlying probabilities.

02

The Two-Layer Architecture

FullCountProps is built in two layers that work together:

Layer 1: MATCHUP MODEL (LightGBM)
Input: pitcher x batter x 33 context features
Output: probability of each of 8 PA outcomes
Layer 2: MONTE CARLO ENGINE
Runs 2,500 full-game simulations with real game state
Output: probability distribution for every player stat

The matchup model answers: "What happens when this pitcher faces this batter in this context?" The simulation engine chains those answers together across an entire game — tracking innings, outs, baserunners, pitch count, and fatigue — to produce full stat distributions.

03

The Matchup Model

At the heart of everything is a LightGBM gradient-boosted tree classifier trained on over 6 million plate appearances from 5 seasons of Statcast data (2021–2025).

For every upcoming plate appearance, the model takes 33 features and outputs the probability of 8 possible outcomes:

Strikeout
~22.5%
Walk
~8.4%
Single
~14.1%
Double
~4.6%
Triple
~0.4%
Home Run
~3.1%
Hit By Pitch
~1.0%
Out (other)
~45.9%

Percentages shown are 2024 league averages. The model adjusts these for every specific pitcher-batter matchup.

The 33 Features

Our feature set is organized into five categories:

  • 10
    Pitcher Statcast — K rate, BB rate, HR rate, whiff%, called strike + whiff%, zone%, swinging-strike%, avg fastball velocity, chase rate, in-zone contact%
  • 10
    Batter Statcast — K rate, BB rate, HR rate, xBA, xSLG, barrel%, hard-hit%, chase rate, whiff%, contact%
  • 5
    Matchup Context — platoon advantage, home/away, park HR factor, park K factor, park hits factor
  • 5
    Game-Day Context — umpire K-rate tendency, catcher framing score, pitcher recent 14-day K rate, batter recent 14-day BA, sportsbook game total line
  • 3
    Weather — game-time temperature, wind speed, wind direction (blowing out vs. in)
04

The Simulation

Each of the 2,500 simulations plays out a complete baseball game, plate appearance by plate appearance:

  1. Load the confirmed starting lineup and batting order from the MLB Stats API
  2. For each plate appearance, the matchup model predicts the probability of all 8 outcomes for this specific pitcher vs. batter
  3. Apply context adjustments: park factors, umpire tendency, catcher framing, weather, platoon
  4. Randomly sample one outcome from the adjusted probability distribution
  5. Update game state — advance runners, record outs, score runs, track individual player stats
  6. When the pitcher reaches his simulated pitch count limit, hand off to the bullpen composite
  7. Repeat through 9 innings (or extras if tied)

After 2,500 full games, each player has a frequency distribution of outcomes. "Corbin Burnes recorded 7+ Ks in 1,530 of 2,500 sims" gives us P(Over 6.5 Ks) = 61.2%.

05

Context Adjustments

Raw matchup probabilities are adjusted by five real-world factors before each PA outcome is sampled:

Park Factors

All 30 MLB stadiums have different HR, K, and hit rates. Coors Field inflates HR probability by ~30%; Petco Park suppresses it by ~15%. We apply 6 park-specific adjustments per venue.

Umpire Tendencies

Each home plate umpire has a documented strike zone tendency. Some run 10-20% higher K rates than average. We pull umpire assignments from MLB's pregame data and adjust K and BB probabilities accordingly.

Catcher Framing

Elite pitch framers like J.T. Realmuto can add ~15 extra called strikes per 100 borderline pitches. We use Baseball Prospectus CSAA (Catcher Strike-Added Above Average) data to adjust K and BB rates.

Weather

Temperature affects batted ball carry (~0.3% HR change per degree above 72°F). Wind direction matters even more: 15 mph blowing out at Wrigley adds up to 8% HR probability. We fetch real-time weather 75 minutes before first pitch.

Platoon Splits

Left-handed batters vs. right-handed pitchers (and vice versa) perform measurably differently. For players with 200+ career PA against the relevant hand, the model uses its direct estimate. For smaller samples, we blend with positional-level platoon priors.

06

From Simulation to Prop Edge

Once we have the simulated probability distribution, we compare it to the sportsbook's odds to find mispriced props:

Our simulated P(Over 6.5 Ks)60.7%
Sportsbook no-vig implied probability54.3%
Edge+6.4%

A positive edge means our model believes the outcome happens more often than the sportsbook's price implies. We surface all props with edges of 3% or more — anything below that is within the noise.

Removing the Vig

Sportsbooks build in a profit margin (the "vig" or "juice") on every line. Typical MLB prop vig is 4–5%. Before comparing our probability to the market, we mathematically strip out the vig to get the book's true implied probability. This ensures we're measuring real edge, not just beating the vig.

Kelly Criterion Sizing

For each edge, we calculate an optimal bet size using the Kelly Criterion — a mathematically proven formula for maximizing long-run bankroll growth. We use quarter-Kelly (25% of the theoretical optimum) to account for model uncertainty, with a hard cap of 5% of bankroll on any single bet.

07

SHAP Explanations (Glass-Box Transparency)

This is what makes FullCountProps different from every other prop analytics service. For every single projection, you can see exactly what drove the number:

Corbin Burnes O6.5 Ks — edge: +9.2%
base_log5_k = 0.263
| park_k_factor = +1.05 (+1.4pp)
| umpire_k_factor = +1.08 (+2.2pp)
| catcher_framing = +1.2 SD (+3.0pp)
| weather = no adjustment
| platoon = no advantage
data_confidence = 0.84 (pitcher 420 BF, batter avg 340 PA)

If we say "bet the over on strikeouts" and you see the umpire factor is driving half the edge, you can decide for yourself whether you trust that signal. If the data feed pulled the wrong umpire, you can catch the mistake before placing a bet.

This kind of transparency doesn't exist at BallparkPal, THE BAT X, or any other prop service we're aware of. We think it should be the standard.

08

Model Validation

We backtest every model version using strict out-of-sample walk-forward testing. The model is trained on seasons T-3 through T-1 and tested on season T, with no future information leaking into training features.

Calibration

A well-calibrated model "hits what it says." When we predict 60% probability, the outcome should happen about 60% of the time. Our current calibration error (ECE) is 3.1% — meaning on average, our predictions are off by about 3 percentage points. That's considered good for sports prediction.

Predicted RangeActual Hit RateSample Size
50–55%52.3%1,841
55–60%57.8%2,203
60–65%62.1%1,976
65–70%67.4%1,124
70%+71.9%608

Calibration data from 12,847 graded props in the 2024 backtest.

09

Honest Limitations

We believe in transparency about what the model does not do well:

  • Bullpen transitions are simplified — when a starter exits, we use the team's aggregate bullpen stats rather than modeling individual relievers.
  • Pinch hitting and defensive substitutions are not modeled — late-game lineup changes can affect PA volume for projected starters.
  • Early season data (April) is noisy — the model leans heavily on prior-season rates until ~400+ PA accumulate.
  • Weather data is fetched ~75 minutes before first pitch and may not reflect conditions for late-starting or rain-delayed games.
  • Stolen bases and errors are not modeled — these affect ~2-3% of base-running situations.
10

Data Sources

Baseball Savant / StatcastPitch-level tracking data for model training and player features
MLB Stats APISchedules, lineups, rosters, and box scores
The Odds APILive prop lines from major sportsbooks
OpenWeatherMapGame-time weather conditions for outdoor ballparks
Baseball Prospectus (CSAA)Catcher framing metrics, updated weekly
Umpire ScorecardsHistorical umpire strike zone tendencies

Disclaimer

FullCountProps is an analytical tool for baseball enthusiasts and researchers. Nothing on this site constitutes financial or gambling advice. Past model performance does not guarantee future results. Sports betting involves risk. Please bet responsibly and in accordance with the laws of your jurisdiction.

FullCountProps is open source. Verify everything yourself.

View on GitHub