Redis Caching Patterns for ML APIs

Practical caching patterns to cut ML latency and infrastructure cost using Redis.


Why Cache ML Predictions at All?

In most ML products, users ask the same questions again and again:

Recomputing the prediction every time is:

At SabiScore, Redis caching gave us a 73% cache hit rate and pushed p95 latency below 100ms.


Pattern 1: Simple TTL Cache

Use when predictions don’t change often (minutes to hours).

key = f"pred:{home}:{away}:{kickoff_date}"

cached = await redis.get(key)
if cached:
    return PredictionResponse.parse_raw(cached)

prediction = await compute_prediction(match)
await redis.setex(key, 3600, prediction.json())  # 1h TTL
return prediction

Tips:


Pattern 2: Popular vs Long-Tail Keys

Not all predictions are equal.

def ttl_for_match(match) -> int:
    if match.league in TOP_LEAGUES:
        return 6 * 3600  # 6 hours
    return 3600         # 1 hour

await redis.setex(key, ttl_for_match(match), prediction.json())

This keeps hot keys around longer while letting cold keys expire quickly.


Pattern 3: Feature Caching

Sometimes feature engineering is more expensive than the model call.

feature_key = f"features:{match_id}"

raw = await redis.get(feature_key)
if raw:
    features = json.loads(raw)
else:
    features = await build_features_from_db(match_id)
    await redis.setex(feature_key, 900, json.dumps(features))

probs = model.predict_proba([features])[0]

Use this when:


Pattern 4: Negative Caching

If a resource truly doesn’t exist, don’t hammer your DB.

NOT_FOUND = "__NOT_FOUND__"

value = await redis.get(key)
if value == NOT_FOUND:
    raise NotFound()

if value is not None:
    return parse(value)

db_value = await load_from_db(id)
if db_value is None:
    await redis.setex(key, 60, NOT_FOUND)  # cache 404 for 60s
    raise NotFound()

await redis.setex(key, 3600, serialize(db_value))
return db_value

Great for noisy clients repeatedly requesting missing resources.


Operational Guardrails


Where to Go Next

For full end-to-end context on how caching fits into an ML system, read:

And if you want help designing a caching layer tailored to your ML API traffic, reach out and mention this article.