Practical caching patterns to cut ML latency and infrastructure cost using Redis.
In most ML products, users ask the same questions again and again:
Recomputing the prediction every time is:
At SabiScore, Redis caching gave us a 73% cache hit rate and pushed p95 latency below 100ms.
Use when predictions don’t change often (minutes to hours).
key = f"pred:{home}:{away}:{kickoff_date}"
cached = await redis.get(key)
if cached:
return PredictionResponse.parse_raw(cached)
prediction = await compute_prediction(match)
await redis.setex(key, 3600, prediction.json()) # 1h TTL
return prediction
Tips:
pred:v2:...Not all predictions are equal.
def ttl_for_match(match) -> int:
if match.league in TOP_LEAGUES:
return 6 * 3600 # 6 hours
return 3600 # 1 hour
await redis.setex(key, ttl_for_match(match), prediction.json())
This keeps hot keys around longer while letting cold keys expire quickly.
Sometimes feature engineering is more expensive than the model call.
feature_key = f"features:{match_id}"
raw = await redis.get(feature_key)
if raw:
features = json.loads(raw)
else:
features = await build_features_from_db(match_id)
await redis.setex(feature_key, 900, json.dumps(features))
probs = model.predict_proba([features])[0]
Use this when:
If a resource truly doesn’t exist, don’t hammer your DB.
NOT_FOUND = "__NOT_FOUND__"
value = await redis.get(key)
if value == NOT_FOUND:
raise NotFound()
if value is not None:
return parse(value)
db_value = await load_from_db(id)
if db_value is None:
await redis.setex(key, 60, NOT_FOUND) # cache 404 for 60s
raise NotFound()
await redis.setex(key, 3600, serialize(db_value))
return db_value
Great for noisy clients repeatedly requesting missing resources.
allkeys-lrumaxmemory so Redis fails fast instead of swappingFor full end-to-end context on how caching fits into an ML system, read:
And if you want help designing a caching layer tailored to your ML API traffic, reach out and mention this article.