The problem
Sports prediction APIs fail in predictable ways. Model quality degrades silently — accuracy drops before any infrastructure alert fires, and by the time users notice, the damage to trust is done. The harder failure mode: a model that performs well in a notebook becomes unreliable under live traffic when feature distributions shift mid-season or inference latency creeps above the window where predictions are still useful.
SabiScore needed to solve both problems: deliver accurate predictions during live match windows, and detect model degradation before users experience it.
What was built
An end-to-end ML platform with three distinct layers:
Inference layer: FastAPI serving ensemble gradient-boosted predictions (XGBoost, LightGBM, CatBoost) with Redis Pub/Sub for sub-50ms event fan-out to concurrent sessions. Stale-while-revalidate cache strategy keeps p99 under 150ms even during retraining windows — 73% of requests never touch the model.
Observability layer: Prometheus tracks Brier score, PSI (Population Stability Index), prediction latency at p50/p95/p99, and cache hit rate per model version. An alert fires when Brier score degrades beyond 0.03 from the established baseline — a model performance alert, not a system health alert. Engineers are notified before users experience any quality change.
Dashboard: Next.js 15 interface surfacing predictions with legible confidence signals and a model health view. The product communicates trust before it explains capability.
Key decision: FastAPI + Redis Pub/Sub over synchronous REST with polling
Chosen: FastAPI + Redis Pub/Sub for inference serving
Over: Synchronous REST endpoints with database polling
Because: Sub-50ms event fan-out at sustained concurrent load is not achievable with synchronous polling. When 200 sessions are watching live match predictions simultaneously, polling creates a thundering herd against the database on every score update. Redis Pub/Sub fans out to all subscribers in a single write operation. Dead-letter recovery ensures no session misses an update if the subscriber is temporarily disconnected.
Key decision: Time-based train/test splits
Chosen: 80th-percentile date as the train/test boundary
Over: Random shuffle cross-validation
Because: Sports outcomes are non-stationary. Team form, injury status, and competition stage change over time. Random shuffling allows future match statistics to leak into training data — the model learns from information it would not have had in production. A model trained with random shuffle will report better CV accuracy and underperform in production. Time-based splits produce a model that generalises forward, which is the only direction that matters.
Key decision: PSI-triggered unscheduled retraining
Chosen: PSI threshold monitoring with unscheduled retraining runs
Over: Fixed weekly retraining cadence only
Because: Tournament windows produce feature distributions that look nothing like regular-season distributions. A weekly-only cadence would tolerate weeks of degraded predictions during exactly the periods when user demand is highest. PSI drift detection triggers an unscheduled run when the incoming feature distribution crosses the threshold — false positives cost compute time, false negatives cost prediction quality at peak visibility. The asymmetry favours the conservative threshold.
Constraint
Ensemble inference must complete at p99 < 150ms at peak concurrent load, with no model warmup on cold start. The Redis cache layer must handle model version transitions cleanly — stale predictions from an old model version must not survive a retrain.
The cache key strategy handles this: keys are scoped to match_id + model_version. On retrain, only keys containing the old model_version expire. Predictions for unchanged matches remain cached. The retrain does not flush the entire cache.
Results
- 99.9%+ uptime across a 90-day Prometheus proof window
- 45% MTTD improvement over reactive alerting baseline (Prometheus-measured)
- ~30% inference latency reduction via Redis caching and query optimisation
- 73% cache hit rate — 73% of requests served without touching the model
- sub-150ms p99 under peak concurrent load
Lessons
The most valuable production pattern was the fallback path. When Redis is unavailable or the feature store is slow, the system degrades deliberately: it serves a lightweight single-model prediction from a pre-computed baseline, with a visible confidence reduction in the response payload. Users see a lower-confidence score rather than an error.
Reliability is usually decided in the fallback design, not in the training notebook.
Status
Live at sabiscore.scardubu.dev. Auto-retraining CI/CD runs weekly plus PSI-triggered. Prometheus 90-day window maintained.