Fall quarter 2022, 289A

About

Due to the massive volume of data, many modern services are nowadays executed in a distributed fashion by resorting to MapReduce, Hadoop, Flume, Spark and similar (so-called) MPC frameworks. But how can one exploit the capabilities of MPC frameworks to design efficient algorithms? This course focuses on answering this question in the context of some basic combinatorial optimization problems, including maximum matching, PageRank, minimum vertex cover, maximal independent set, connected components, and clustering algorithms.

The in-class discussion will mainly focus on theoretical aspects of solving tasks in MPC, occasionally touching on practical relevance. The goal of this course is to equip students with tools and techniques that led to state-of-the-art results for the aforementioned problems.

Grading

Grading details will be posted on Canvas when the course begins. Tentatively, a grade will be formed based on the following:

In-class participation in discussions.
A few homeworks.
A presentation or a report. Details will be discussed in class.
There will be no exam nor midterm.

Course attendance

It is not mandatory to attend lectures, although attendance is highly recommended.

Prerequisites

ECS 122A and 122B, or equivalents, are required for a proper understanding of the material. Understanding of basic graph algorithms and basic probability, e.g., random variables and expectation, is highly recommended.

Lecture topics

The following papers were fully or partially presented in class and/or reports:

Sep 27, 29: Sorting in MPC
Sorting, Searching, and Simulation in the MapReduce Framework
Oct 4, 6: Maximal matching in MPC -- superlinear memory regime
Filtering: a method for solving graph problems in MapReduce
Oct 11, 13: O(1)-approximate maximum matching in MPC -- linear memory regime
Improved Massively Parallel Computation Algorithms for MIS, Matching, and Vertex Cover;
Round Compression for Parallel Matching Algorithms
Oct 18: Connected components in MPC
External lecture notes 1;
External lecture notes 2, Section 3.3
Nov 17: PageRank computation in MPC
Walking Randomly, Massively, and Efficiently;
YouTube video
Student presentations:
- Oct 20: Fully homomorphic encryption over the integers
- Oct 25: Parallel graph connectivity in log diameter rounds
- Oct 27: Connected Components at Scale via Local Contractions
- Nov 1:
  - Massively Parallel Algorithms for Small Subgraph Counting
  - ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Nov 3: Near-Optimal Massively Parallel Graph Connectivity
- Nov 8: Improved Deterministic (Δ+1) Coloring in Low-Space MPC
- Nov 10:
  - The Complexity of Symmetry Breaking in Massive Graphs
  - Sublinear Time and Space Algorithms for Correlation Clustering via Sparse-Dense Decompositions
- Nov 15:
  - Massively Parallel Computation via Remote Memory Access
  - Constant-Round Near-Optimal Spanners in Congested Clique
- Nov 22: Analysis of multi-modal data merging algorithms in case of timeseries data
- Nov 29: MST in O(1) Rounds of Congested Clique
Student reports/recordings only: