System Design Interview Questions

Scaling is the process of expanding system capability to handle growing request volumes:

Vertical Scaling (Scale Up): Adding raw compute resources (bigger CPU, more RAM, faster SSDs) to an existing single server.
Pros: Simple to implement; no database replication or application alterations required.
Cons: Hard hardware ceilings; single point of failure; costs scale exponentially at high specifications.
Horizontal Scaling (Scale Out): Adding more server machines to the pool, distributing load across them.
Pros: Practically infinite scaling capability; high redundancy and availability.
Cons: Requires load balancers; application servers must be stateless; introduces distributed database consistency complexities.

Key Points

Scale Up vs Scale Out, Hardware Ceiling, Stateless application tier, Cost vs Capacity

Common Follow-ups

What is statelessness in horizontal scaling, and where is session state stored?

Requirements: - Generate a unique, short alias for a long URL. - Redirect short URL to the original URL. - High availability, low latency reads.

Core Components: 1. Hash Function: Use Base62 encoding (a-z, A-Z, 0-9) on a unique ID (e.g., from a distributed ID generator like Snowflake). A 7-character Base62 key yields 62^7 ≈ 3.5 trillion unique URLs. 2. Database: Use a key-value store like Redis for caching popular mappings, with PostgreSQL/Cassandra as persistent storage. Schema: (id, short_key, long_url, created_at). 3. API: POST /shorten with long_url → returns short_key. GET /{short_key} → HTTP 301 redirect to long_url. 4. Scaling: Use a CDN for geo-distributed redirects. Pre-compute keys in batches to avoid collisions.

Read/Write Ratio: ~100:1 reads to writes, so optimize for fast lookups.

Key Points

Base62 encoding, Write-once read-many, Cache-heavy (Redis), CDN for hot keys, 301 redirect

Common Follow-ups

How do you handle custom short URLs? How do you prevent one user from guessing another's URLs?

Requirements: - Send and receive messages in real-time. - Support one-on-one and group chats. - Messages must be reliably delivered and ordered.

Core Components: 1. WebSocket Connection: Maintain a persistent TCP connection between client and server for real-time bidirectional communication. 2. Chat Service: Stateless service that routes messages. Each message is stored and assigned a monotonically increasing sequence ID (per conversation) for ordering. 3. Message Store: Use a distributed database like Cassandra (wide-column, fast writes) or a time-series DB. Schema: (conversation_id, message_id, sender_id, content, timestamp). 4. Presence Service: Redis pub/sub or a heartbeat mechanism to track online/offline status. 5. Group Chat Fan-out: For small groups (<100), fan-out write to each member's inbox. For large groups, fan-out read (pull model) where members fetch new messages on login. 6. Delivery Semantics: At-least-once delivery with deduplication using message IDs on the client side. 7. End-to-End Encryption: Each message is encrypted on the sender's device and decrypted on the receiver's; the server only stores ciphertext.

Key Points

WebSocket, Fan-out strategy, Inbox/Outbox pattern, Sequence IDs for ordering, E2E encryption

Common Follow-ups

How would you handle multi-device sync? How do you deliver offline messages when a user comes back online?