CS294-252: Architectures and Systems for Warehouse-Scale Computers
Fall 2023, UC Berkeley
Location: Tuesdays, 2-4pm in 320 Soda
Course Overview: Warehouse-Scale Computers (WSCs) host hyperscale cloud services relied on by billions of daily users. While classical WSCs were built as homogeneous collections of servers and networking hardware, modern hardware scaling trends have resulted in the introduction of specialized hardware in datacenter environments (e.g., ML accelerators and ML “supercomputer pods”, SmartNICs, GPUs, etc.). Many proposals have also been made to solve challenges like datacenter tax overheads and killer microsecond overheads with further specialization.
This graduate-level course will explore both the opportunities for deeper co-design of hardware and software to meet WSC efficiency and performance goals and the challenges of hardware specialization for the cloud systems software stack.
Prerequisites: Students must have previously taken at least one of the following graduate-level architecture/systems/VLSI courses:
- CS252: Graduate Computer Architecture
- CS262A: Advanced Topics in Computer Systems
- CS268: Graduate Computer Networks
- EECS251: Digital Design and Integrated Circuits
Calendar
- August 29
- Intro to Warehouse-Scale Computers
- Reading 1
- L. Barroso, et. al. The Datacenter as a Computer, Third Edition.
- September 5
- Datacenter-Wide Trends
- Reading 1
- S. Kanev, et. al. Profiling a Warehouse-Scale Computer.
- Reading 2
- A. Sriraman, et. al. Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale.
- Reading 3
- J. Dean, et. al. The tail at scale. +
L. Barroso, et. al. Attack of the Killer Microseconds.
- September 12
- WSC Networking
- Reading 1
- L. Poutievski, et. al. Jupiter Evolving: Transforming Google’s Datacenter Network via Optical Circuit Switches and Software-Defined Networking.
- Reading 2
- D. Firestone, et. al. Azure Accelerated Networking: SmartNICs in the Public Cloud.
- Reading 3
- S. Ibanez, et. al. The nanoPU: A Nanosecond Network Stack for Datacenters.
- September 19
- Accelerators in WSCs, Pt. 1
- Reading 1
- I. Magaki, et. al. ASIC Clouds: Specializing the Datacenter.
- Reading 2
- N. Jouppi, et. al. TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings.
- Reading 3
- A. Putnam, et. al. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services.
- September 26
- Accelerators in WSCs, Pt. 2
- Reading 1
- N. Lazarev, et. al. Dagger: efficient and fast RPCs in cloud microservices with near-memory reconfigurable NICs.
- Reading 2
- S. Karandikar, et. al. A Hardware Accelerator for Protocol Buffers.
- Reading 3
- M. D. Hill, et. al. Accelerator-Level Parallelism. +
R. Murty. Powering Amazon EC2: Deep dive on the AWS Nitro System.
- October 3
- Memory and Disaggregation, Pt. 1
- Reading 1
- A. Lagar-Cavilla, et. al. Software-Defined Far Memory in Warehouse-Scale Computers.
- Reading 2
- J. Weiner, et. al. TMO: transparent memory offloading in datacenters.
- Reading 3
- K. Zhao, et. al. Contiguitas: The Pursuit of Physical Memory Contiguity in Datacenters.
- October 10
- Modeling and Evaluation + Sustainability (+ Project Proposal Presentations)
- Reading 1
- S. Karandikar, et. al. FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud.
- Reading 2
- D. Cock, et. al. Enzian: an open, general, CPU/FPGA platform for systems software research.
- Reading 3
- B. Acun, et. al. Carbon Explorer: A Holistic Framework for Designing Carbon Aware Datacenters.
- October 17
- Memory and Disaggregation, Pt. 2
- Reading 1
- P. Duraisamy, et. al. Towards an Adaptable Systems Architecture for Memory Tiering at Warehouse-Scale.
- Reading 2
- H. Al Maruf, et. al. TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory.
- Reading 3
- H. Li, et. al. Pond: CXL-Based Memory Pooling Systems for Cloud Platforms.
- October 24
- Accelerators in WSCs, Pt. 3 + Server Design
- Reading 1
- P. Ranganathan, et. al. Warehouse-scale video acceleration: co-design and deployment in the wild.
- Reading 2
- A. Sriraman, et. al. SoftSKU: optimizing server architectures for microservice diversity @scale.
- Reading 3
- G. Ayers, et. al. Memory Hierarchy for Web Search.
- October 31
- RPC + Silent Data Corruption Pt. 1 + Data Analytics Pt. 1
- Reading 1
- K. Seemakhupt, et. al. A Cloud-Scale Characterization of Remote Procedure Calls.
- Reading 2
- H. D. Dixit, et. al. Silent Data Corruptions at Scale.
- Reading 3
- A. Gonzalez, et. al. Profiling Hyperscale Big Data Processing.
- November 7
- Silent Data Corruption Pt. 2 + Data Analytics Pt. 2 + Fault Tolerance
- Reading 1
- P. H. Hochschild, et. al. Cores that don’t count.
- Reading 2
- L. Wu, et. al. Q100: The Architecture and Design of a Database Processing Unit.
- Reading 3
- Y. Zhou, et. al. Carbink: Fault-Tolerant Far Memory.
- November 14
- Operating Systems
- Reading 1
- J. T. Humphries, et. al. ghOSt: Fast & Flexible User-Space Delegation of Linux Scheduling.
- Reading 2
- J. T. Humphries, et. al. A case against (most) context switches.
- Reading 3
- A. Belay, et. al. IX: A Protected Dataplane Operating System for High Throughput and Low Latency.
- November 21
- Cluster-level SW + Benchmarking
- Reading 1
- A. Verma, et. al. Large-scale cluster management at Google with Borg.
- Reading 2
- Y. Gan, et. al. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems.
- Reading 3
- M. Ferdman, et. al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware.
- November 28
- Feedback-Directed Optimization + Security
- Reading 1
- G. Ayers, et. al. AsmDB: understanding and mitigating front-end stalls in warehouse-scale computers.
- Reading 2
- Y. Zhang, et. al. OCOLOS: Online COde Layout OptimizationS.
- Reading 3
- C. Delimitrou, et. al. Bolt: I Know What You Did Last Summer… In The Cloud.
- December 5
- N/A (RRR Week)
- December 12
- Final Project Presentations (Finals Week)
Weekly Schedule
- Lecture/Discussion: Tuesdays 2-4pm in 320 Soda
- Weekly Reading Reviews: Due Mondays @ noon pacific. See Ed for submission links.
- Weekly Student Presenter Slides: Due Fridays @ 11:59pm. See Ed for submission details.
Assignments and Grading
The course workload will consist of the following:
- 25% of grade: Each week, students will be required to read and provide a review of two of the week’s papers and attend and participate in the week’s discussion.
- Can drop two weeks worth, no questions asked.
- 25% of grade: Each student will lead the discussion of two papers during the semester.
- 50% of grade: Students will complete a semester-long research project, in groups of 2 or 3, related to the course material.