About
Due to the massive volume of data, many modern services are nowadays executed in a distributed fashion by resorting to MapReduce, Hadoop, Flume, Spark and similar (so-called) MPC frameworks. But how can one exploit the capabilities of MPC frameworks to design efficient algorithms? This course focuses on answering this question in the context of some basic combinatorial optimization problems, including maximum matching, PageRank, minimum vertex cover, maximal independent set, connected components, and clustering algorithms.The in-class discussion will mainly focus on theoretical aspects of solving tasks in MPC, occasionally touching on practical relevance. The goal of this course is to equip students with tools and techniques that led to state-of-the-art results for the aforementioned problems.
Grading
Grading details will be posted on Canvas when the course begins. Tentatively, a grade will be formed based on the following:- In-class participation in discussions.
- A few homeworks.
- A presentation or a report. Details will be discussed in class.
- There will be no exam nor midterm.
Course attendance
It is not mandatory to attend lectures, although attendance is highly recommended.Prerequisites
ECS 122A and 122B, or equivalents, are required for a proper understanding of the material. Understanding of basic graph algorithms and basic probability, e.g., random variables and expectation, is highly recommended.Lecture topics
The following papers were fully or partially presented in class and/or reports:- Sep 27, 29:
Sorting in MPC
Sorting, Searching, and Simulation in the MapReduce Framework - Oct 4, 6:
Maximal matching in MPC -- superlinear memory regime
Filtering: a method for solving graph problems in MapReduce - Oct 11, 13:
O(1)-approximate maximum matching in MPC -- linear memory regime
Improved Massively Parallel Computation Algorithms for MIS, Matching, and Vertex Cover;
Round Compression for Parallel Matching Algorithms - Oct 18:
Connected components in MPC
External lecture notes 1;
External lecture notes 2, Section 3.3 - Nov 17:
PageRank computation in MPC
Walking Randomly, Massively, and Efficiently;
YouTube video - Student presentations:
- Oct 20: Fully homomorphic encryption over the integers
- Oct 25: Parallel graph connectivity in log diameter rounds
- Oct 27: Connected Components at Scale via Local Contractions
- Nov 1:
- Nov 3: Near-Optimal Massively Parallel Graph Connectivity
- Nov 8: Improved Deterministic (Δ+1) Coloring in Low-Space MPC
- Nov 10:
- Nov 15:
- Massively Parallel Computation via Remote Memory Access
- Constant-Round Near-Optimal Spanners in Congested Clique
- Nov 22: Analysis of multi-modal data merging algorithms in case of timeseries data
- Nov 29: MST in O(1) Rounds of Congested Clique
- Student reports/recordings only:
- Distributed Weighted Min-Cut in Nearly-Optimal Time
- Dynamic Graph Algorithms with Batch Updates in the Massively Parallel Computation Model
- Massively Parallel Computation of Matching and MIS in Sparse Graphs
- Massively Parallel Algorithms for Minimum Cut
- A task-level adaptive MapReduce framework for real-time streaming data in healthcare applications
- Monarch: Google's Planet-Scale, In-Memory Time Series Database
- Apache Hama: An Emerging Bulk Synchronous Parallel Computing Framework for Big Data Applications
- iCARE: A framework for big data-based banking customer analytics
- A periodicity-based parallel time series prediction algorithm in cloud computing environments