The course covers the architecture and design of modern big data systems from a data modeling and data management perspectives. Topics includes centralized vs distributed data systems, NoSQL and in particular wide-column database systems and Cassandra, storage strategies, denormalization modeling, stream processing, data warehouse and more.
The goal of the course is to provide the theoretical as well as the practical hands on knowledge required for designing and developing internet scale based data applications.
The class meets once a week for a 3 hours lecture.
There will be 2-3 homework assignments (in pairs, some of which will involve programming) - 45% of the final grade.
Final exam - 55% of the final grade.
* tentative due to Iron Swords war
# | Date | Topics | Material | Notes |
---|---|---|---|---|
1 | 05.11.2024 | Introduction |
Hello, World! Introduction to Big Data Introduction to Relational DB |
|
2 | 12.11.2024 | Relational DB |
SQL Relational Data Integrity MySQL CLI |
|
3 | 19.11.2024 | Relational data modeling |
Relational modeling MySQL workbench |
HW#1 distributed |
4 | 26.11.2024 | Distributed DB, CAP theorem, NoSQL | ||
5 | 03.12.2024 | Dynamo | HW#1 due | |
6 | 10.12.2024 | Bigtable | ||
7 | 17.12.2024 | Cassandra - Intro | ||
8 | 24.12.2024 | Cassandra - Advanced | ||
31.12.2024 | Cassandra - Hands on | HW#2 distributed | ||
10 | 07.01.2025 | Data modeling in NoSQL | ||
11 | 14.01.2025 | Data modeling in NoSQL - Advanced | HW#2 due HW#3 distributed |
|
12 | 21.01.2025 | Data warehouse (BigQuery) | ||
13 | 28.01.2025 | TBD (Spanner / Kafka / ...) | HW#3 due | |
- | 12.02.2025 | Final Test |