Designing Data-Intensive Applications (2017, author Martin Kleppmann)** is a book that addresses the key concepts needed to design and operate large-scale data systems.
This book describes databases, distributed systems, data modeling, data processing architectures, and teaches you how to handle large amounts of data reliably and efficiently.
📌 Key content of the book (detailed overview)

1️⃣ Reliable Data System Design
Basic concepts of data systems: transactions, consistency, availability, fault tolerance
Online Transaction Processing (OLTP) vs. Online Analysis Processing (OLAP) Systems Differences
Key characteristics of the database (ACID, CAP theory, BASE model)

2️⃣ Storage and data modeling
Relational Database (RDBMS) vs. NoSQL (document repository, key-value repository, graph DB, etc.)
Data serialization formats such as JSON, XML, Avro, Protocol Buffers, and Thrift
Normalization and Normalization (data modeling techniques)
How to efficiently utilize Join and Indexing

3️⃣ Storage and Replication of Distributed Data
Data replication methods: Reader-follower replication, multi-reader replication, asynchronous vs synchronous replication
Leader Elections and Consensus Algorithms (Raft, Paxos, etc.)
CAP Theory: Consistency vs. Availability vs. Allow Network Split (Partition Tolerance)
Data Consistency Issues (Final Consistency, Strong Consistency, Event Tracking Consistency)

4️⃣ Consistency with distributed transactions
Atomicity, Consistency, Isolation, Persistence (ACID) Transactions
Two-Phase Commit (2PC, Two-Phase Commit)
Distributed Transactions and Failure Handling Techniques
Transaction replacement via asynchronous message queue

5️⃣ Processing and streaming data on distributed systems
Batch vs. Stream Processing
Data processing frameworks such as Apache Hadoop, Spark, Kafka, Flink, and more
Optimizing data indexing, retrieval, and caching

6️⃣ Scalability and Disaster Recovery
Data Sharding and Partitioning
Load balancing and performance tuning
System Failure Recovery and Data Backup Strategy

📌 What you can get from reading this book
✅ Understand the core concepts of databases and distributed systems that handle large amounts of data
✅ Learn how to design reliable data systems
✅ Learn the latest technology concepts such as NoSQL, streaming, data replication, transactions, distributed processing and more
✅ Identify the actual big data architecture and the structure of cloud systems
📌 Recommended targets
✔ Backend developers (interested in designing databases, distributed systems, and APIs)
✔ Data Engineer (Large Data Processing and Data Store Design)
✔ Software Architect (who designs high-performance distributed systems)
✔ Big Data/Cloud Engineer
📌 Conclusion:
"Data-centric application design" is a must-read that outlines the core concepts of designing large-scale data systems.
I recommend it to anyone who wants to systematically learn modern data technologies such as SQL, NoSQL, distributed systems, data consistency, transactions, streaming data processing, and more!
