10 System Design Tips (Senior-Level)

April 8, 2026
Yutong Jin


🧠 System Design Key Principles

1. Start with Functional Requirements

Clearly define what the system needs to do:

  • Core APIs (read/write flows)
  • User interactions
  • Edge cases

πŸ’‘ Tip: Do NOT spend too long here β€” 2–3 minutes is enough in interviews.


2. Identify the Core Bottleneck Early

Before designing, ask:

  • What is the expected QPS?
  • Where will the system break first?

Common bottlenecks:

  • Feed systems β†’ fanout explosion
  • Chat systems β†’ message ordering
  • Payment systems β†’ consistency

πŸ’‘ Senior mindset:

Design starts from bottlenecks, not components.


3. Classify the System Type

Go beyond “read-heavy vs write-heavy”:

TypeExampleKey Requirement
Latency-sensitiveFeed / SearchLow latency
Throughput-heavyLogging / MetricsHigh ingestion
Consistency-criticalPayment / AuctionStrong consistency

πŸ’‘ This classification drives all design decisions.


4. CAP Trade-offs = Product Decisions

CAP is not theory β€” it’s a user experience tradeoff.

  • Availability-focused (AP):

    • Feed systems (TikTok, Instagram)
    • Metrics systems
    • Google Drive / Dropbox (eventual consistency)
  • Consistency-focused (CP):

    • Payment systems
    • Auction systems

πŸ’‘ Example:

Slightly stale feed is acceptable β†’ choose Availability
Double charge is unacceptable β†’ choose Consistency


5. Design for Failure by Default

Assume every component can fail.

Key techniques:

  • Retry + exponential backoff
  • Idempotency (critical for payments)
  • Dead letter queue (DLQ)
  • Fallback strategies

πŸ’‘ Principle:

Design for partial failure, not full system failure.


6. Sync vs Async is a Core Decision

SyncAsync
Low latencyHigh reliability
Strong consistencyEventual consistency
User blockingDecoupled

Examples:

  • Post creation β†’ async fanout
  • Payment β†’ sync confirmation + async settlement

πŸ’‘ Tradeoff: Async improves scalability but adds complexity.


7. Fault Tolerance via Message Queues

Use systems like Kafka to:

  • Buffer traffic spikes
  • Enable retries
  • Decouple services

Benefits:

  • Prevent data loss
  • Smooth traffic bursts
  • Allow downstream recovery

πŸ’‘ Important: Kafka is not just a queue β€” it’s a durability and replay system.


8. Read vs Write Optimization

Write-heavy systems:

  • Must not lose data
  • Use:
    • Kafka (durability)
    • Batch writes
    • Idempotent operations

Read-heavy systems:

  • Optimize for latency
  • Use:
    • Caching (Redis)
    • Read replicas
    • Precomputation

9. Cache Strategy (Performance vs Consistency)

Cache improves performance but introduces inconsistency.

Common patterns:

  • Cache-aside (most common)
  • Write-through
  • Write-back

Key decisions:

  • TTL vs explicit invalidation
  • Handling stale data
  • Cache fallback to DB

πŸ’‘ Principle:

Cache is not just optimization β€” it is a consistency tradeoff.


10. Read/Write Separation

Separate services:

  • Write service β†’ handles mutations
  • Read service β†’ optimized for queries

Techniques:

  • Read replicas
  • CQRS (Command Query Responsibility Segregation)

Benefits:

  • Better scalability
  • Independent optimization

πŸ”₯ Advanced Principles (Senior-Level Thinking)

11. Data Model Drives the System

Schema design determines scalability.

Examples:

  • Feed: (user_id, timestamp)
  • Chat: (chat_id, message_id)
  • DynamoDB: PK + SK + GSI

πŸ’‘ Principle:

Bad schema cannot be fixed by scaling.


12. Avoid Hotspots

Common issues:

  • Hot keys (celebrity users)
  • Hot partitions (Kafka)

Solutions:

  • Sharding
  • Consistent hashing
  • Randomization

13. Always Design Fallback Paths

Never rely on a single component.

Examples:

  • Cache miss β†’ DB fallback
  • ML ranking timeout β†’ heuristic ranking
  • Service failure β†’ degraded experience

πŸ’‘ Principle:

A degraded system is better than a broken system.


14. Measure and Iterate

Add observability:

  • Metrics (QPS, latency)
  • Logging
  • Alerting

Use:

  • Rollout (A/B testing)
  • Dark traffic

πŸ’‘ Real-world: Production systems evolve, not designed once.


🧩 Summary

System design is not about drawing boxes β€” it is about making tradeoffs:

  • Latency vs Consistency
  • Throughput vs Cost
  • Simplicity vs Scalability

πŸ’‘ Final takeaway:

Good engineers design systems that work.
Great engineers design systems that still work when things fail.


Β© 2026 Yutong Jin