Stage 1 — Feature engineering
Every match comes in with two players, a surface, a venue, a tournament tier, and a set of bookmaker odds. From that TIPERO derives 22-36 features per match:
- Surface-specific Elo ratings for both players.
- Last-5 form on the same surface, with quality-of-opponent weighting.
- Head-to-head record, recency-weighted, surface-filtered.
- Sackmann historical stats (per-tour, per-surface win rates).
- Ranking points and tier-of-opponent bonuses.
- Implied probability + de-vigged market signals from opening and closing odds.
- Interactions: gap × logit_market, surface × form, tour × ace-rate.
Stage 2 — Score combination
A weighted combination of those features produces a per-player score. The score gap between the two players is the rawest "model says X wins" signal. Larger gaps map to higher confidence picks, with diminishing returns past 8 score points.
Stage 3 — Probability calibration
Raw scores need to be calibrated into actual win probabilities. TIPERO uses a V4.1
logistic regression that takes (score_gap, logit(implied_prob), interaction)
and outputs a calibrated probability per player. This is the model's "we think p_win = 0.58"
final estimate.
Stage 4 — XGBoost gradient-boosted classifier
A second layer is the XGBoost classifier (currently v2 with 36 features) trained on every graded match in the prediction log. It serves as a complementary signal — when the XGB confidence is below 0.50, the pick is gated out even if EV is positive. This filters out marginal cases where the model's own confidence is low.
Stage 5 — Self-learning v3 (shadow mode)
The newest layer is a leak-free GBM head retrained weekly on the prediction log. In walk-forward OOS validation it produced +9.5pp ROI improvement on the EV ≥ 0 slice over the existing market-prob model. Currently in shadow mode (logged but not deciding) — once 7+ days of live CLV evidence confirms the OOS result, it flips to deciding.
Stage 6 — Bet selection + tiering
Picks are partitioned by odds band:
- CORE (1.01-1.70) — short-priced favourites, gated by XGB ≥ 0.60.
- VALUE (1.70-2.30) — main slot, broad coverage.
- LONG (2.30+) — underdogs with strong model signal.
- ULTRA (3.50-5.00) — highest-EV long shots, capped at 0.30u.
Stage 7 — Stake sizing
Kelly fraction (0.15) applied per pick using the calibrated probability and listed odds, capped at 0.25u min and 1.5u max. ULTRA tier additionally hard-capped at 0.30u to prevent single-bet bankroll damage.
Stage 8 — Daily grading and self-correction
Every pick is graded the next day from final scores. Calibration tables (ace rates, market-prob model weights, XGBoost trees, V3 GBM) all retrain on the updated log. The system improves itself overnight without human intervention.
Bottom line
An AI tennis predictor isn't one model — it's a stack of features, calibration, ML classifiers and selection rules that together produce a tracked edge. TIPERO ships that whole stack as a subscription so you don't have to build it.
Try the full TIPERO stack free →