Design a Distributed Cache (Redis/Memcached at Scale)

Consistent hashing, replication, eviction, and the thundering herd problem. This is one of the most frequently reported system design rounds in the LeakCode database, ranked #15 by appearance volume across 5 top companies.

Companies That Ask "Design Distributed Cache"

Based on LeakCode's aggregated interview reports, "Design Distributed Cache" or a close variant has been reported in system design rounds at the following companies. Click any company name to see all their interview questions including coding rounds, behavioral, and other system design variants.

The actual phrasing varies across companies. Meta tends to frame this as a product question ("how would you build the photo feed for Instagram"), while Amazon and Google often state it abstractly ("design a system that handles 500M users posting short messages"). The underlying components and trade-offs are the same regardless of phrasing, which is why preparing the canonical version covers all variants.

Functional Requirements

Start every system design round by enumerating functional requirements out loud. Interviewers want to see that you scope before architecting. For "Design Distributed Cache" the core APIs candidates typically converge on:

  • get(key)
  • set(key, value, ttl)
  • del(key)

Confirm with the interviewer which APIs are in scope before going deeper. If they want you to focus on one (often read path or one specific feature), ask them to pick. Spending 5 minutes here saves 20 minutes of redirection later.

Scale Estimates

Back-of-envelope numbers anchor the design. State the scale assumptions explicitly so the interviewer knows what regime you are designing for. Different scales lead to different architectures, and skipping this step is a common reason candidates over-engineer or under-engineer the solution.

  • Nodes: 1000+ in cluster
  • Ops: 10M ops/sec aggregate
  • Data: 10TB+

Always convert qualitative requirements ("low latency", "high availability") into quantitative targets ("p99 read < 100ms", "99.99% uptime = 52 min downtime/year"). Interviewers grade heavily on whether you reason from numbers or from vibes.

Key Components

At staff and senior levels, interviewers care more about how you connect components than the components themselves. The canonical "Design Distributed Cache" architecture includes:

  • Consistent hash ring
  • Replica per partition
  • LRU/LFU eviction
  • Hot-key detection
  • Gossip for membership

Draw the diagram top-down: client, load balancer, API gateway, service tier, cache, primary datastore, async processing, then storage layer. Label every arrow with the protocol (HTTP, gRPC, WebSocket) and the data shape. Interviewers will probe arrows more than boxes.

Trade-offs Interviewers Probe

The differentiation between a hire and a strong-hire is in trade-off discussion. The interviewer will pick one component and ask "why did you choose X over Y." Be ready to articulate the alternative and its costs.

Client-side hashing
No proxy, simple
Proxy-based (Twemproxy, mcrouter)
Easier migration, extra hop
Server-side cluster (Redis Cluster)
Built-in, slot-based, limited rebalance

State the recommended approach AND name the conditions under which the alternative would be better. "I'd use X here because we have requirement R, but if R were different (say scale was 10x or consistency model was strong), I'd choose Y instead." That framing signals senior-level thinking.

Common Follow-Up Questions

After the high-level design, interviewers typically pick one of these threads and go deep. Have a 5-minute plan for each. Practice by drawing each follow-up on its own page with its own diagram so you can pivot smoothly when asked.

  • How to handle cache stampede (thundering herd)?
  • Multi-region cache invalidation
  • Write-through vs write-back
  • Cache warming after restart
  • Hot-key mitigation

At senior+ levels, expect 2-3 follow-ups in a 45-minute round. Junior candidates usually only get one. Calibrate depth based on how much time you have left. If 15 minutes remain, go deep on one follow-up; if 5 minutes remain, sketch the approach without implementing.

What LeakCode Reports Show About This Question

LeakCode has aggregated thousands of interview reports across 7 sources (1Point3Acres, Blind, Glassdoor, Reddit, GeeksforGeeks, the 1p3a OJ catalog, and direct submissions). Looking at the subset that mention "design distributed cache" or its variants:

The question shows up most often in onsite system design rounds for senior and staff level loops. Phone-screen-stage system design rounds tend to use simpler variants (rate limiter, TinyURL, parking lot). Director and Principal interviews often ask this question but expect the candidate to also discuss organizational and operational concerns (oncall, deployment topology, multi-region failover) on top of the technical design.

The success signal in reports is not whether the candidate produced a "correct" design (there isn't one) but whether they navigated trade-offs explicitly, asked the interviewer for clarification when blocked, and demonstrated familiarity with at least one production system at this scale. Reports from candidates who got offers consistently mention they spent 5+ minutes on requirements clarification before drawing anything.

How to Practice "Design Distributed Cache"

Reading about the design is not the same as being able to draw and defend it under pressure. The fastest path to fluency is mock interviews where someone interrupts you with follow-ups while you whiteboard. If you cannot find a partner, record yourself doing a 45-minute timed solo run, then watch the recording and grade your own reasoning chain.

Focus your prep on the 2-3 follow-ups you would be weakest at, not the high-level design (which you already understand). For "Design Distributed Cache" the highest-leverage follow-up to drill is usually the deep-dive question that requires explaining a specific algorithm or distributed protocol, since those are where most candidates get stuck.

After you can draw the design from memory in 10 minutes, time yourself responding to each follow-up in 5 minutes. The drill is not "can I solve this" but "can I respond at interview tempo." Interviews are bandwidth-constrained; the bottleneck is your ability to compress a 30-minute design into 5 minutes of speech without losing trade-off depth.

Browse System Design Reports from LeakCode

LeakCode aggregates real system design reports tagged by company, level, and round. Filter to find candidates who described their design distributed cache round at the exact company you are interviewing at.

Related System Design Problems