The course covers the architecture and design of modern big data systems from a data modeling and data management perspectives. Topics includes centralized vs distributed data systems, NoSQL and in particular wide-column database systems and Cassandra, storage strategies, denormalization modeling, stream processing, data warehouse and more.
The goal of the course is to provide the theoretical as well as the practical hands on knowledge required for designing and developing internet scale based data applications.
The class meets once a week for a 3 hours lecture.
There will be 2-3 homework assignments (in pairs, some of which will involve programming) - 45% of the final grade.
Final exam - 55% of the final grade.
# | Date | Topics | Material | Notes |
---|---|---|---|---|
1 | 25.10.2022 | Introduction |
Hello, World! Introduction to Big Data Introduction to Relational DB |
|
- | 01.11.2022 | No class this week (Election Day) | ||
2 | 08.11.2021 | Relational DB |
SQL Relational Data Integrity MySQL CLI |
|
3 | 15.11.2022 | Relational data modeling |
Relational modeling MySQL workbench |
HW#1 distributed |
4 | 22.11.2022 | Distributed DB, CAP theorem, NoSQL |
Introducrtion to Distributed DB CAP theorm NoSQL |
|
5 | 29.11.2022 | Dynamo |
Dynamo Dynamo (Extra) |
HW#1 due |
6 | 06.12.2022 | Bigtable | Bigtable | |
7 | 13.12.2022 | Cassandra - Intro | Cassandra - Intro | |
8 | 20.12.2022 | Cassandra - Advanced |
Cassandra - CQL Cassandra - Advanced |
|
9 | 27.12.2022 | Cassandra - Hands on |
Astra DB Cassandra - Java Driver |
HW#2 distributed |
10 | 03.01.2023 | Data modeling in NoSQL |
Denormalization Data Modeling in NoSQL - Intro |
|
11 | 10.01.2023 | Data modeling in NoSQL - Advanced |
Data Modeling in NoSQL - Advanced Data Modeling in NoSQL - Examples |
HW#2 due HW#3 distributed |
12 | 17.01.2023 | Data warehouse (BigQuery) | BigQuery (Google) | |
- | 24.01.2023 | HW#3 due | ||
- | 14.02.2023 | Final Test |