System Design Concepts
1. How to transfer data at large scale ?
- Non blocking I/O
- Buffering and Batching
- Network Protocols
- Message Formats
- Load Balancing
- Partitioning
- Consistent Hashing
2. How to aggregate data efficiently ?
- Push vs Pull
- Deduplication
- Checkpointing
- Data enrichment
- Embeded database
- State management
- Fallback
3. How to store data reliably ?
- Reverse Proxy
- Coordination service
- Health checking
- Peer and service discovery
- Replication
- Quorum
- Availability zone
4. How to retrieve data quickly ?
- Aggregate on write
- Eventual consistency
- Denormalization
- Data rollup
- Hot and cold storage
- Polyglot persistence
- Distributed cache
4. How to define system requirements ? Functional requirement - Define the qualities of the system. What a system is supposed to do ? Example The system must allow application to exchange messages. Non Function requirement - Define the qualites of a system how a system is supposed to be. Example scalable, highly available and fast.
How to go about defining functional requirement ? Start with the customer and work backward. We need to identify who is going to use the system and how ? For well know systems like youtube, twitter, facebook etc its easy. But for systems such has rate limiting, content delivery network, it can be quite challenging. In such systems its better to start with customer/clients/users and work backwards on how they use the system. For example, In youtube the customers are content creators and viewers and how they are going to use the system ? content creators will upload videos, create posts where as viewers search for videos, watch videos, and comment. Similarly in the case of content delivery network the customer are webservices and the system will trottle there request.
Non functional requirements - High availability Availability - Defines the system uptime, the percentage of time the system has been working and available. Example, 99% availbility, The system was unavialable about 3.65 days a years.
Success ration of request - 1 request out of 100 fails.
What is a highly aviable system ? Its not about a number. Its is about architecture and process.
Design principles behind high avilability
Build redundancy to eleminate single points of failure. Example : regions, availability zones, fallback, data replication, high availability pair.
Switch from one server to another without losing data. Example: DNS,load balancing, reverse proxy,API gateway, peer discovery, service discovery.
Protect the system from atypical client behavior. Example : load shedding, rate limiting, shuffle sharding, cell- based architecture.
Protect the system from failures and perfomance degradation of its dependencies Example: timeouts, circuit breaker, bulkhead, retries, idempotency.
Detect failure as they occure. Example : monitoring
Process behind high availability:
- Change management - All code and configration changes are reviewed and approved.
- QA - regaularly execercise tests to validate that newly introduced changes meet functional and non-functional requirements.
- Deployment - Deploy changes to a production environment frequently, quickly, safely, automated rollback.
- Capacity planning : Monitor system utilization and add resources to meet growing demand.
- Disaster recovery : Recover system quickly in the event of a diaster, regularly test failover to diaster recovery.
- Root cause analysis : Establish the root cause of the failure and identify preventive measures
- Operational readiness review : Evaluate system’s operational state and identify gaps in operations. Define actions to remediate risks.
- Game day : simulate a failure or event and test system and team responses.
- Team culture : Good team culture promotes process discipline.
Nonfunctional requirements - Fault tolerance
- Fault tolerance - is the property that enables a system to continue operating properly in the even of one or more faults within some of its components.
- Fault tolerance vs High availability - fault tolerant system has the goal of zero downtime. Where as high availability system the down time is possible and the system trie to minimize it. Fault tolerance can be achieved by using same design principles and processes as of high availability and requires more redundancy.
Nonfunctional requirements - Resilience
- Reilience - Systems that in the face of faults can provide and maitain an acceptable level of service are called resilient systems.
- To ensure resiliency we need faults to happen in the system periodically to test resilience also called chaos engineering. Example - Killing instances of a server.