Top System Design Interview Answers — Reliability, Scalability, Maintainability

Top System Design Interview Answers — Reliability, Scalability, Maintainability (DDIA Ch.1)
System Design

In-depth, interview-ready patterns with trade‑offs, failure modes, and metrics. Written for whiteboard discussions and production reality.

How should I use this post?

We begin with correctness under failure, move through scaling patterns, and finish with operability & evolution. Each section opens with a crisp Thesis, then a pragmatic design, trade‑offs, failure modes, and metrics—so you can speak fluently at the board and reason about production realities.

How do you achieve exactly‑once processing for payments and orders?

Transition: Reliability starts with correctness. Rather than chasing mythical “exactly‑once”, we compose guarantees we can actually implement.

Thesis: True exactly‑once is infeasible in distributed systems; we achieve it with at‑least‑once delivery + idempotency and/or a transactional outbox with deduplication.

Design

  • Generate an idempotency_key per request (client or gateway).
  • Begin a DB transaction: upsert a PENDING payment row with a unique constraint on the idempotency key.
  • In the same transaction, append to an outbox table (topic + payload + event_id).
  • A background consumer (or CDC) reads outbox events and calls the external gateway.
  • Consumer is idempotent: maintains a processed set (table) keyed by event_id.
  • Reconciliation job queries the gateway by request_id to resolve ambiguous states.
-- Producer (RDBMS)
BEGIN;
INSERT INTO payments(request_id, user_id, amount, status)
VALUES($rid,$uid,$amt,'PENDING')
ON CONFLICT (request_id) DO NOTHING;

INSERT INTO outbox(event_id, topic, payload)
VALUES(gen_event_id(), 'ChargeRequested', jsonb_build_object('request_id',$rid,'user_id',$uid,'amount',$amt));
COMMIT;
// Consumer (pseudo)
for event in stream('ChargeRequested'):
  if processed.exists(event.id): continue
  res = gateway.charge(event.payload.request_id, event.payload.user_id, event.payload.amount)
  upsert(payments, request_id=event.payload.request_id, status=res.status, txn_id=res.txn_id)
  processed.add(event.id)

Failure Modes & Fixes

  • Gateway charged, DB write failed → reconciliation queries gateway by request_id, upserts final status.
  • Consumer crash mid‑charge → safe to retry; idempotency key prevents double charge.
  • Duplicate client requests → unique constraint on request_id collapses duplicates.

Metrics

  • Outbox lag, duplicate rate, retry counts, reconciliation backlog, success rate, P95/P99 end‑to‑end latency.
Sound bite: “We implement exactly‑once effects by combining at‑least‑once delivery with idempotent handlers, enforced by unique keys and an outbox/CDC pipeline.”

How do you choose a shard key, and when should you change it?

Transition: With correctness guarded, we scale state by partitioning it intelligently.

Thesis: Choose a shard key that aligns with dominant access patterns and spreads load; change it when hotspots or scatter‑gather dominate.

Design & Migration

  • Start: hash(user_id) for uniform load in user‑centric apps; co‑locate related entities by the same key.
  • Scans/time‑series: range by date with “today” bucketed/salted to avoid hotspots.
  • Change key when skew is high or cross‑shard queries dominate.
  • Migration: introduce a router supporting v1/v2 → dual‑write → idempotent backfill → flip reads gradually → retire v1.

Metrics

  • Shard skew (top‑1% QPS vs median), cross‑shard query ratio, rebalancing time, hottest‑key QPS.
Sound bite: “Hash for uniformity, range for scans. Co‑locate things read together. When hotspots or scatter‑gather appear, migrate via router + dual‑write + backfill.”

How would you scale reads from 10k RPS to 1M RPS?

Transition: Partitioning is one axis; caching and read replicas multiply capacity while preserving correctness signals.

Thesis: Layer edge caching, request coalescing, lag‑aware replicas, and materialized views with read‑your‑writes via session stickiness or versioning.

Playbook

  • CDN/edge: cache JSON/HTML with ETag or signed URLs; vary by auth if needed.
  • App cache: soft TTL + background refresh; single‑flight to stop stampedes.
  • DB replicas: lag‑aware routing; post‑write flows stick to leader for N seconds.
  • Precompute hot aggregates via materialized views; maintain by triggers/CDC/jobs.
  • Salt extremely hot keys; consistent hashing for cache cluster.
// Stampede guard (sketch)
value = cache.get(key)
if (!value) {
  if (lock.try_acquire(key, 2000 /*ms*/)) {
    value = recompute()
    cache.set(key, value, /*soft_ttl*/ 60)
  } else {
    value = cache.get_stale_or_placeholder(key)
  }
}

Metrics

  • CDN hit rate, coalesced request ratio, cache fill latency, replica lag distribution, P95/P99.
Sound bite: “Push out with CDN, collapse duplicates, read from lag‑aware replicas, and precompute hot views—stick to leader for read‑your‑writes.”

How do you prevent cascading failures with backpressure and graceful degradation?

Transition: At high load, stability is about bounding everything and degrading gracefully.

Thesis: Combine timeouts, bounded queues, circuit breakers, rate limits, and load shedding with fallbacks.

Design

  • Jittered exponential backoff; budget retries to stay within client SLA.
  • Open breaker on sustained errors/latency; fast‑fail with cached/approx results.
  • Bounded worker pools and queues; prioritize critical traffic.
  • Adaptive concurrency per dependency; queue depth‑based backpressure.
// Retry with backoff + jitter (pseudo)
for (i=0; i<max; i++) {
  try { return call() } 
  catch (e) { sleep(base * (2 ** i) + rand(0, jitter)) }
}

Metrics

  • Retry rate, breaker open time, queue depth, tail latency, dropped/served‑stale counts.

How do you design a multi‑region active‑active system and pick consistency?

Transition: Geography adds partitions and latency; choose consistency per operation.

Thesis: For read latency and availability, use multi‑leader/leaderless with conflict resolution. For strict invariants, enforce a single writer per entity or a globally serializable region.

Patterns

  • Per‑entity owner region for payments/unique constraints; reads local, writes coordinated.
  • CRDTs or app‑merge for collaborative docs, counts, likes.
  • Failover drills; write admission control during recovery.

Metrics

  • RPO/RTO, cross‑region RTT budget, conflict rate, reconciliation backlog.
Sound bite: “We read local for latency, but serialize writes where invariants matter—either per‑entity single leader or a global serializable region.”

How do you run zero‑downtime schema migrations?

Transition: Systems live and change; evolve without breaking readers or writes.

Thesis: Apply expand → migrate → contract with dual‑read/write and idempotent backfills.

Steps

  1. Expand: add new columns/tables/indexes (nullable).
  2. Dual‑write: populate new path on all new writes.
  3. Backfill in batches (WHERE new_col IS NULL), throttle to keep replica lag healthy.
  4. Dual‑read until confidence; then flip reads.
  5. Contract: remove old code/columns when safe.
ALTER TABLE orders ADD COLUMN total_cents BIGINT;
-- Batched backfill (sketch):
UPDATE orders o
SET total_cents = s.sum_cents
FROM (
  SELECT order_id, SUM(qty*price_cents) AS sum_cents
  FROM order_items
  WHERE order_id BETWEEN $lo AND $hi
  GROUP BY order_id
) s
WHERE o.id = s.order_id AND o.total_cents IS NULL;

Metrics

  • Null‑rate over time, backfill throughput, replica lag, error deltas pre/post flip.

What does good day‑one observability look like?

Transition: You can’t run what you can’t see—observability underpins reliability.

Thesis: Ship traces, metrics, and structured logs with correlation IDs; define SLOs and runbooks.

Baseline

  • Tracing via OpenTelemetry; propagate trace_id/span_id and safe baggage.
  • Metrics: golden signals (latency, traffic, errors, saturation) per endpoint and tenant.
  • Structured JSON logs with request_id, hashed user/tenant, status, latency.
  • SLOs tied to UX; alerts on burn rate of error budgets; synthetic checks.

Anti‑Patterns

  • PII in logs, high‑cardinality label explosions, alert fatigue, dashboards without runbooks.

Should product catalog and orders live in the same database?

Transition: Data modeling choices affect reliability and scale.

Answer: Usually no. Catalog (variable attributes) → document store + search index. Orders (financial correctness) → relational with constraints and materialized totals. Sync via CDC to denormalized read models.

Risks if Combined

  • OLTP and search/analytics contention; long scans harming write latency; rigidity for catalog evolution.

When should you join vs denormalize/materialize?

Transition: Performance and integrity trade‑offs meet access patterns.

Heuristic: Normalize when writes are hot and constraints matter; denormalize/materialize when reads dominate and aggregates are stable, accepting bounded staleness.

Safe Denormalization

  • Materialized views (DB‑native or job‑maintained).
  • Derived columns (e.g., order_total_cents) with triggers/CDC jobs.
  • Version derivations to avoid circular dependencies.

Pitfalls

  • Silent drift, multi‑source truth, cross‑service coupling.

When do you pick a graph database over SQL?

Transition: Sometimes, relationships are the query.

Choose Graph if: Variable‑length traversals, path constraints, role hierarchies, FOAF/recommendations—and you can anchor traversals on selective nodes.

Otherwise

  • For fixed‑depth joins or analytics, relational with recursive CTEs or precomputed adjacency lists is often faster and simpler operationally.
-- Recursive CTE (FoF ≤ 2)
WITH RECURSIVE fof(id, depth) AS (
  SELECT friend_id, 1 FROM friends WHERE user_id = $1
  UNION ALL
  SELECT f.friend_id, fof.depth + 1
  FROM friends f JOIN fof ON f.user_id = fof.id
  WHERE fof.depth < 2
)
SELECT DISTINCT id FROM fof;

How do you design the right index for a specific query?

Transition: Scalability depends on making the planner’s life easy.

Example Query

SELECT id, created_at, total_cents
FROM orders
WHERE user_id = ?
  AND created_at BETWEEN ? AND ?
ORDER BY created_at DESC
LIMIT 50;

Rule

Equality → Range → Order (and make it covering).

CREATE INDEX idx_orders_user_created
ON orders(user_id, created_at DESC)
INCLUDE (total_cents);

Watchouts

  • Low‑cardinality leading columns, too many indexes (write amplification), stale statistics.

How do you implement fair multi‑tenant rate limiting?

Transition: Protect the platform and ensure fairness.

Thesis: Use hierarchical token buckets—global → tenant → endpoint—with bursts and sustained rates, plus Retry-After hints.

Design

  • Global ceiling to protect shared resources.
  • Per‑tenant bucket sized by plan; per‑endpoint bucket for expensive operations.
  • Distributed enforcement with sharded counters or sliding‑window approximations.
// Token bucket (multi-tenant sketch)
allow(tenant, path, cost=1):
  now = monotonic()
  refill_all(now)
  if bucket[tenant, path].tokens >= cost:
     bucket[tenant, path].tokens -= cost
     return ALLOW
  else:
     return DENY with Retry-After

What’s your disaster recovery plan if the primary region is down?

Transition: Reliability at org scale demands practiced recovery.

Thesis: Define RPO/RTO, replicate data cross‑region, practice failovers, and control write admission during recovery.

Plan

  • Data: async cross‑region replicas (binlog/WAL shipping), multi‑AZ.
  • Compute: infra‑as‑code templates; warm standby capacity.
  • Failover: DNS/control‑plane switchover; shard ownership reassignment; block conflicting writes until caught up.
  • Replay: durable logs for queues/outboxes; postmortems and improvements.

Metrics

  • RPO achieved (bytes/seconds), RTO, recovery test frequency, config drift rate.

How do you protect user data while staying observable?

Transition: Maintainability includes responsible operations.

Thesis: Enforce least privilege and encryption; keep logs useful but sanitized; rotate secrets; test restores.

Controls

  • Per‑service identities; scoped IAM; short‑lived tokens (OIDC).
  • TLS in transit; envelope encryption at rest; key rotation.
  • Structured logs with redacted/tokenized PII; preserve trace_id/request_id.
  • Retention policies; GDPR/CCPA delete/export; encrypted, access‑logged backups with restore drills.

Pitfalls

  • PII in traces, shared root creds, over‑permissive roles, unencrypted backups.

What user‑visible consistency guarantees will you provide?

Transition: Beyond database semantics, we promise session guarantees that users feel.

Answer

  • Read‑your‑writes, monotonic reads, monotonic writes, and consistent prefix.
  • Implement via sticky sessions to leader/region post‑write or session tokens carrying last‑seen version/timestamp; route reads accordingly.
  • Use ETag/If-None-Match; include version in cache keys.

Metrics

  • Leader‑read ratio post‑write, replica‑lag distribution, stale‑read rate.

Which isolation level prevents which anomalies, and when is SERIALIZABLE required?

Transition: Correctness also means preventing cross‑request anomalies.

Quick map

  • Read Committed: avoids dirty reads; allows non‑repeatable reads & phantoms.
  • Repeatable Read / Snapshot: prevents non‑repeatable reads; can allow write skew.
  • Serializable: prevents all anomalies (2PL or SSI).

Example — Write Skew

-- Doctor-on-call: each txn checks count=1 then sets off-duty.
-- Under snapshot isolation, both commit → zero on duty.
-- Fix: SERIALIZABLE or SELECT ... FOR UPDATE with predicate locks.

When to use SERIALIZABLE

  • Financial transfers, inventory guarantees, scheduling constraints, cross-row invariants.

Metrics

  • Serialization failure (retry) rate, lock waits, long-running transactions.

When do you use Two‑Phase Commit vs a Saga?

Transition: Cross‑service changes require coordination or compensation.

2PC (distributed ACID)

  • Coordinator prepares then commits; strong consistency but blocking risk on failures.
  • Use in tight trust boundaries with low latency and strict SLAs.

Saga

  • Sequence of local transactions with compensating actions.
  • Orchestrated (central controller) or choreographed (events/outbox).
// Orchestrated saga (pseudo)
reserveInventory(); chargePayment(); confirmOrder();
compensate on failure: refundPayment(); releaseInventory();

Rule of thumb

  • Prefer Saga when availability trumps global atomicity; design idempotent steps and compensations.

How do you handle leader election and prevent split‑brain?

Transition: High availability hinges on safe failover.

Answer

  • Use consensus (Raft/Paxos) to elect one leader.
  • Protect shared resources with fencing tokens (monotonic lease terms) so old leaders can’t write after losing leadership.
  • Readiness vs liveness health checks; conservative timers to avoid flapping.
// Fencing example with DB
UPDATE shard SET leader_id=$me, term=term+1
WHERE id=$id AND term=$observed_term;

How do you generate IDs at scale (UUID v7/ULID/Snowflake)?

Transition: IDs affect ordering, locality, and hot shards.

Options

  • UUID v4: random, no order; index‑unfriendly.
  • UUID v7/ULID: time‑ordered; better locality and pagination.
  • Snowflake: 64‑bit (time + region + worker + seq); monotonic per worker, good sharding properties.
// Snowflake bits: 41 time | 5 region | 5 worker | 12 sequence

Guidance

  • Prefer v7/ULID or Snowflake for ordered inserts and stable keyset pagination; ensure clock safety & sequence rollover handling.

How do you reason about time, clocks, and ordering?

Transition: Correct ordering without trusting wall‑clock.

Answer

  • Wall clocks skew (NTP); don’t assume strict order.
  • Use Lamport timestamps for causality; vector clocks to detect concurrency and resolve with app rules.
  • Persist server‑assigned times; present user times with correction windows.
// Lamport: each event increments counter; on receive: counter = max(local, remote) + 1

How do you keep caches correct—invalidate or update?

Transition: Scale is easy; cache correctness is hard.

Patterns

  • Cache‑aside: app populates; invalidate on write; race risk without versioning.
  • Read‑through/Write‑through: cache mediates; simpler, higher write latency.
  • Write‑behind: async to DB; needs durable queue and replay.

Keep it correct

  • Use versioned keys (e.g., obj:{id}:{version}) so stale values don’t overwrite fresh.
  • Soft TTL + background refresh; request coalescing under a lock.
// Safe invalidate (versioned)
db.update(...); cache.set("obj:42:v3", data); cache.del("obj:42:v2")

Which pagination strategy should you use: OFFSET/LIMIT, keyset, or tokens?

Transition: Pagination impacts performance and user consistency.

Guidance

  • OFFSET/LIMIT: simple but slow/unstable on big offsets.
  • Keyset (seek): stable & fast—use the last seen sort key.
  • Continuation tokens: encode cursor + filters for complex sorts or sharded scans.
-- Keyset example
SELECT * FROM orders
WHERE user_id = $1 AND created_at < $cursor
ORDER BY created_at DESC
LIMIT 50;

What queue semantics do you guarantee (ordering, at‑least‑once, DLQ)?

Transition: Async pipelines must define delivery and order per key.

Answer

  • Guarantee ordering per partition key (e.g., userId) by routing all events to the same partition.
  • Accept at‑least‑once and design idempotent consumers.
  • Use DLQ with bounded retries and replay tooling.
// Idempotent consumer sketch
if (!seen(event.id)) { apply(event); mark_seen(event.id) }

How do you defend against hot keys and noisy neighbors?

Transition: Even perfect sharding suffers under skew.

Playbook

  • Detect via per‑key QPS histograms; apply per‑key rate limits and request coalescing.
  • Hot‑key sharding: salt the key into N buckets for cache/DB; merge on read.
  • Priority queues and tenant isolation (separate threadpools/quotas).

When should you use blue/green, rolling, or canary deployments?

Transition: Maintainability requires safe change delivery.

Answer

  • Blue/Green: instant switch + easy rollback; needs double capacity.
  • Rolling: gradual replacement; watch partial incompatibility and state transitions.
  • Canary: small % of traffic + automated SLO gates; pair with feature flags to decouple deploy vs release.

How do you manage data lifecycle: retention, tiering, privacy deletes?

Transition: Reliability includes data safety over time—and cost control.

Answer

  • Retention: hot (OLTP) → warm (object store) → cold/archive; TTL indexes for ephemeral data.
  • Privacy deletes: hard delete + tombstones for logs/streams; crypto‑erasure for immutable backups (destroy keys).
  • Lineage: track derived datasets to propagate deletes correctly.

How do you approach load testing and capacity planning?

Transition: We close the loop by proving the design under load.

Plan

  • Define SLOs (P95/P99, error budget). Model arrival (spiky vs steady); include background jobs.
  • Use production‑like data; ramp to 1×, 2×, and burst (e.g., 10×) with think‑time.
  • Apply Little’s Law to reason about concurrency: L = λ × W.
  • Measure saturation signals (CPU, heap, queue depth), shed load, verify graceful degradation.
// Capacity sanity (example)
peak_qps = 50_000; p95_latency = 0.050  // 50ms
concurrency ~= peak_qps * p95_latency   // ≈ 2,500 concurrent in-flight ops
Interview flow tip: Map any new scenario to these themes: session guarantees → isolation → cross‑service transactions → leadership & IDs → cache correctness → pagination → async semantics → hot key defense → safe deploys → lifecycle → load test.

How do you present a coherent interview narrative?

Lead with correctness (idempotency, outbox, invariants). Then scale the hot paths with cache, replicas, sharding, and backpressure. Finally, prove it’s operable and safe to evolve: observability, migrations, DR, security. Close by tying back to data modeling (normalize vs denormalize, graph vs SQL) and indexing choices that make the plan real.

One‑liner: “First make it right, then make it fast, then make it safe to change.”

Credits: Original, interview‑focused synthesis inspired by concepts from Designing Data‑Intensive Applications (Chapter 1). All interpretations and examples are my own; any errors are mine.

Author: Surendra Rayapati  •  Next: Pair with my Chapter‑2 deep dive (Data Models & Query Languages) for modeling and query trade‑offs.

Post a Comment

0 Comments