CS294-252: Architectures and Systems for Warehouse-Scale Computers

Name: CS294-252: Architectures and Systems for Warehouse-Scale Computers, Fall 2023, UC Berkeley
Author: Sagar Karandikar

Fall 2023, UC Berkeley

Location: Tuesdays, 2-4pm in 320 Soda

Course Overview: Warehouse-Scale Computers (WSCs) host hyperscale cloud services relied on by billions of daily users. While classical WSCs were built as homogeneous collections of servers and networking hardware, modern hardware scaling trends have resulted in the introduction of specialized hardware in datacenter environments (e.g., ML accelerators and ML “supercomputer pods”, SmartNICs, GPUs, etc.). Many proposals have also been made to solve challenges like datacenter tax overheads and killer microsecond overheads with further specialization.

This graduate-level course will explore both the opportunities for deeper co-design of hardware and software to meet WSC efficiency and performance goals and the challenges of hardware specialization for the cloud systems software stack.

Prerequisites: Students must have previously taken at least one of the following graduate-level architecture/systems/VLSI courses:

CS252: Graduate Computer Architecture
CS262A: Advanced Topics in Computer Systems
CS268: Graduate Computer Networks
EECS251: Digital Design and Integrated Circuits

Calendar

August 29: Intro to Warehouse-Scale Computers
Reading 1: L. Barroso, et. al. The Datacenter as a Computer, Third Edition.

September 5: Datacenter-Wide Trends
Reading 1: S. Kanev, et. al. Profiling a Warehouse-Scale Computer.
Reading 2: A. Sriraman, et. al. Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale.
Reading 3: J. Dean, et. al. The tail at scale. +
L. Barroso, et. al. Attack of the Killer Microseconds.

September 12: WSC Networking
Reading 1: L. Poutievski, et. al. Jupiter Evolving: Transforming Google’s Datacenter Network via Optical Circuit Switches and Software-Defined Networking.
Reading 2: D. Firestone, et. al. Azure Accelerated Networking: SmartNICs in the Public Cloud.
Reading 3: S. Ibanez, et. al. The nanoPU: A Nanosecond Network Stack for Datacenters.

September 19: Accelerators in WSCs, Pt. 1
Reading 1: I. Magaki, et. al. ASIC Clouds: Specializing the Datacenter.
Reading 2: N. Jouppi, et. al. TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings.
Reading 3: A. Putnam, et. al. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services.

September 26: Accelerators in WSCs, Pt. 2
Reading 1: N. Lazarev, et. al. Dagger: efficient and fast RPCs in cloud microservices with near-memory reconfigurable NICs.
Reading 2: S. Karandikar, et. al. A Hardware Accelerator for Protocol Buffers.
Reading 3: M. D. Hill, et. al. Accelerator-Level Parallelism. +
R. Murty. Powering Amazon EC2: Deep dive on the AWS Nitro System.

October 3: Memory and Disaggregation, Pt. 1
Reading 1: A. Lagar-Cavilla, et. al. Software-Defined Far Memory in Warehouse-Scale Computers.
Reading 2: J. Weiner, et. al. TMO: transparent memory offloading in datacenters.
Reading 3: K. Zhao, et. al. Contiguitas: The Pursuit of Physical Memory Contiguity in Datacenters.

October 10: Modeling and Evaluation + Sustainability (+ Project Proposal Presentations)
Reading 1: S. Karandikar, et. al. FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud.
Reading 2: D. Cock, et. al. Enzian: an open, general, CPU/FPGA platform for systems software research.
Reading 3: B. Acun, et. al. Carbon Explorer: A Holistic Framework for Designing Carbon Aware Datacenters.

October 17: Memory and Disaggregation, Pt. 2
Reading 1: P. Duraisamy, et. al. Towards an Adaptable Systems Architecture for Memory Tiering at Warehouse-Scale.
Reading 2: H. Al Maruf, et. al. TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory.
Reading 3: H. Li, et. al. Pond: CXL-Based Memory Pooling Systems for Cloud Platforms.

October 24: Accelerators in WSCs, Pt. 3 + Server Design
Reading 1: P. Ranganathan, et. al. Warehouse-scale video acceleration: co-design and deployment in the wild.
Reading 2: A. Sriraman, et. al. SoftSKU: optimizing server architectures for microservice diversity @scale.
Reading 3: G. Ayers, et. al. Memory Hierarchy for Web Search.

October 31: RPC + Silent Data Corruption Pt. 1 + Data Analytics Pt. 1
Reading 1: K. Seemakhupt, et. al. A Cloud-Scale Characterization of Remote Procedure Calls.
Reading 2: H. D. Dixit, et. al. Silent Data Corruptions at Scale.
Reading 3: A. Gonzalez, et. al. Profiling Hyperscale Big Data Processing.

November 7: Silent Data Corruption Pt. 2 + Data Analytics Pt. 2 + Fault Tolerance
Reading 1: P. H. Hochschild, et. al. Cores that don’t count.
Reading 2: L. Wu, et. al. Q100: The Architecture and Design of a Database Processing Unit.
Reading 3: Y. Zhou, et. al. Carbink: Fault-Tolerant Far Memory.

November 14: Operating Systems
Reading 1: J. T. Humphries, et. al. ghOSt: Fast & Flexible User-Space Delegation of Linux Scheduling.
Reading 2: J. T. Humphries, et. al. A case against (most) context switches.
Reading 3: A. Belay, et. al. IX: A Protected Dataplane Operating System for High Throughput and Low Latency.

November 21: Cluster-level SW + Benchmarking
Reading 1: A. Verma, et. al. Large-scale cluster management at Google with Borg.
Reading 2: Y. Gan, et. al. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems.
Reading 3: M. Ferdman, et. al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware.

November 28: Feedback-Directed Optimization + Security
Reading 1: G. Ayers, et. al. AsmDB: understanding and mitigating front-end stalls in warehouse-scale computers.
Reading 2: Y. Zhang, et. al. OCOLOS: Online COde Layout OptimizationS.
Reading 3: C. Delimitrou, et. al. Bolt: I Know What You Did Last Summer… In The Cloud.

December 5: N/A (RRR Week)

December 12: Final Project Presentations (Finals Week)

Weekly Schedule

Lecture/Discussion: Tuesdays 2-4pm in 320 Soda
Weekly Reading Reviews: Due Mondays @ noon pacific. See Ed for submission links.
Weekly Student Presenter Slides: Due Fridays @ 11:59pm. See Ed for submission details.

Assignments and Grading

The course workload will consist of the following:

25% of grade: Each week, students will be required to read and provide a review of two of the week’s papers and attend and participate in the week’s discussion.
- Can drop two weeks worth, no questions asked.
25% of grade: Each student will lead the discussion of two papers during the semester.
50% of grade: Students will complete a semester-long research project, in groups of 2 or 3, related to the course material.

Staff

Krste Asanović

krste@berkeley.edu

Office Hours: By appointment.

Sagar Karandikar

sagark@eecs.berkeley.edu

Office Hours: By appointment.