Designing Data-Intensive Applications (2017, author Martin Kleppmann)** is a book that addresses the key concepts needed to design and operate large-scale data systems.

This book describes databases, distributed systems, data modeling, data processing architectures, and teaches you how to handle large amounts of data reliably and efficiently.

📌 Key content of the book (detailed overview)

1️⃣ Reliable Data System Design

Basic concepts of data systems: transactions, consistency, availability, fault tolerance

Online Transaction Processing (OLTP) vs. Online Analysis Processing (OLAP) Systems Differences

Key characteristics of the database (ACID, CAP theory, BASE model)

2️⃣ Storage and data modeling

Relational Database (RDBMS) vs. NoSQL (document repository, key-value repository, graph DB, etc.)

Data serialization formats such as JSON, XML, Avro, Protocol Buffers, and Thrift

Normalization and Normalization (data modeling techniques)

How to efficiently utilize Join and Indexing

3️⃣ Storage and Replication of Distributed Data

Data replication methods: Reader-follower replication, multi-reader replication, asynchronous vs synchronous replication

Leader Elections and Consensus Algorithms (Raft, Paxos, etc.)

CAP Theory: Consistency vs. Availability vs. Allow Network Split (Partition Tolerance)

Data Consistency Issues (Final Consistency, Strong Consistency, Event Tracking Consistency)

4️⃣ Consistency with distributed transactions

Atomicity, Consistency, Isolation, Persistence (ACID) Transactions

Two-Phase Commit (2PC, Two-Phase Commit)

Distributed Transactions and Failure Handling Techniques

Transaction replacement via asynchronous message queue

5️⃣ Processing and streaming data on distributed systems

Batch vs. Stream Processing

Data processing frameworks such as Apache Hadoop, Spark, Kafka, Flink, and more

Optimizing data indexing, retrieval, and caching

6️⃣ Scalability and Disaster Recovery

Data Sharding and Partitioning

Load balancing and performance tuning

System Failure Recovery and Data Backup Strategy

📌 What you can get from reading this book

✅ Understand the core concepts of databases and distributed systems that handle large amounts of data

✅ Learn how to design reliable data systems

✅ Learn the latest technology concepts such as NoSQL, streaming, data replication, transactions, distributed processing and more

✅ Identify the actual big data architecture and the structure of cloud systems

📌 Recommended targets

✔ Backend developers (interested in designing databases, distributed systems, and APIs)

✔ Data Engineer (Large Data Processing and Data Store Design)

✔ Software Architect (who designs high-performance distributed systems)

✔ Big Data/Cloud Engineer

📌 Conclusion:

"Data-centric application design" is a must-read that outlines the core concepts of designing large-scale data systems.

I recommend it to anyone who wants to systematically learn modern data technologies such as SQL, NoSQL, distributed systems, data consistency, transactions, streaming data processing, and more!

Designing Data-Intensive Applications (2017, author Martin Kleppmann)** is a book that addresses the key concepts needed to design and operate large-scale data systems.

Designing Data-Intensive Applications (2017, author Martin Kleppmann)** is a book that addresses the key concepts needed to design and operate large-scale data systems.

This book describes databases, distributed systems, data modeling, data processing architectures, and teaches you how to handle large amounts of data reliably and efficiently.

📌 Key content of the book (detailed overview)

1️⃣ Reliable Data System Design

Basic concepts of data systems: transactions, consistency, availability, fault tolerance

Online Transaction Processing (OLTP) vs. Online Analysis Processing (OLAP) Systems Differences

Key characteristics of the database (ACID, CAP theory, BASE model)

2️⃣ Storage and data modeling

Relational Database (RDBMS) vs. NoSQL (document repository, key-value repository, graph DB, etc.)

Data serialization formats such as JSON, XML, Avro, Protocol Buffers, and Thrift

Normalization and Normalization (data modeling techniques)

How to efficiently utilize Join and Indexing

3️⃣ Storage and Replication of Distributed Data

Data replication methods: Reader-follower replication, multi-reader replication, asynchronous vs synchronous replication

Leader Elections and Consensus Algorithms (Raft, Paxos, etc.)

CAP Theory: Consistency vs. Availability vs. Allow Network Split (Partition Tolerance)

Data Consistency Issues (Final Consistency, Strong Consistency, Event Tracking Consistency)

4️⃣ Consistency with distributed transactions

Atomicity, Consistency, Isolation, Persistence (ACID) Transactions

Two-Phase Commit (2PC, Two-Phase Commit)

Distributed Transactions and Failure Handling Techniques

Transaction replacement via asynchronous message queue

5️⃣ Processing and streaming data on distributed systems

Batch vs. Stream Processing

Data processing frameworks such as Apache Hadoop, Spark, Kafka, Flink, and more

Optimizing data indexing, retrieval, and caching

6️⃣ Scalability and Disaster Recovery

Data Sharding and Partitioning

Load balancing and performance tuning

System Failure Recovery and Data Backup Strategy

📌 What you can get from reading this book

✅ Understand the core concepts of databases and distributed systems that handle large amounts of data

✅ Learn how to design reliable data systems

✅ Learn the latest technology concepts such as NoSQL, streaming, data replication, transactions, distributed processing and more

✅ Identify the actual big data architecture and the structure of cloud systems

📌 Recommended targets

✔ Backend developers (interested in designing databases, distributed systems, and APIs)

✔ Data Engineer (Large Data Processing and Data Store Design)

✔ Software Architect (who designs high-performance distributed systems)

✔ Big Data/Cloud Engineer

📌 Conclusion:

"Data-centric application design" is a must-read that outlines the core concepts of designing large-scale data systems.

I recommend it to anyone who wants to systematically learn modern data technologies such as SQL, NoSQL, distributed systems, data consistency, transactions, streaming data processing, and more!

티스토리툴바