Designing Data-Intensive Applications (2017, author Martin Kleppmann)** is a book that addresses the key concepts needed to design and operate large-scale data systems.
This book describes databases, distributed systems, data modeling, data processing architectures, and teaches you how to handle large amounts of data reliably and efficiently.
📌 Key content of the book (detailed overview)

1️⃣ Reliable Data System Design
Basic concepts of data systems: transactions, consistency, availability, fault tolerance
Online Transaction Processing (OLTP) vs. Online Analysis Processing (OLAP) Systems Differences
Key characteristics of the database (ACID, CAP theory, BASE model)

2️⃣ Storage and data modeling
Relational Database (RDBMS) vs. NoSQL (document repository, key-value repository, graph DB, etc.)
Data serialization formats such as JSON, XML, Avro, Protocol Buffers, and Thrift
Normalization and Normalization (data modeling techniques)
How to efficiently utilize Join and Indexing

3️⃣ Storage and Replication of Distributed Data
Data replication methods: Reader-follower replication, multi-reader replication, asynchronous vs synchronous replication
Leader Elections and Consensus Algorithms (Raft, Paxos, etc.)
CAP Theory: Consistency vs. Availability vs. Allow Network Split (Partition Tolerance)
Data Consistency Issues (Final Consistency, Strong Consistency, Event Tracking Consistency)

4️⃣ Consistency with distributed transactions
Atomicity, Consistency, Isolation, Persistence (ACID) Transactions
Two-Phase Commit (2PC, Two-Phase Commit)
Distributed Transactions and Failure Handling Techniques
Transaction replacement via asynchronous message queue

5️⃣ Processing and streaming data on distributed systems
Batch vs. Stream Processing
Data processing frameworks such as Apache Hadoop, Spark, Kafka, Flink, and more
Optimizing data indexing, retrieval, and caching

6️⃣ Scalability and Disaster Recovery
Data Sharding and Partitioning
Load balancing and performance tuning
System Failure Recovery and Data Backup Strategy
