CS294-252: Architectures and Systems for Warehouse-Scale Computers

Fall 2023, UC Berkeley

Location: Tuesdays, 2-4pm in 320 Soda

Course Overview: Warehouse-Scale Computers (WSCs) host hyperscale cloud services relied on by billions of daily users. While classical WSCs were built as homogeneous collections of servers and networking hardware, modern hardware scaling trends have resulted in the introduction of specialized hardware in datacenter environments (e.g., ML accelerators and ML “supercomputer pods”, SmartNICs, GPUs, etc.). Many proposals have also been made to solve challenges like datacenter tax overheads and killer microsecond overheads with further specialization.

This graduate-level course will explore both the opportunities for deeper co-design of hardware and software to meet WSC efficiency and performance goals and the challenges of hardware specialization for the cloud systems software stack.

Prerequisites: Students must have previously taken at least one of the following graduate-level architecture/systems/VLSI courses:

  • CS252: Graduate Computer Architecture
  • CS262A: Advanced Topics in Computer Systems
  • CS268: Graduate Computer Networks
  • EECS251: Digital Design and Integrated Circuits


August 29
Intro to Warehouse-Scale Computers
Reading 1
L. Barroso, et. al. The Datacenter as a Computer, Third Edition.

September 5
Datacenter-Wide Trends
Reading 1
S. Kanev, et. al. Profiling a Warehouse-Scale Computer.
Reading 2
A. Sriraman, et. al. Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale.
Reading 3
J. Dean, et. al. The tail at scale. +
L. Barroso, et. al. Attack of the Killer Microseconds.

September 26
Accelerators in WSCs, Pt. 2
Reading 1
N. Lazarev, et. al. Dagger: efficient and fast RPCs in cloud microservices with near-memory reconfigurable NICs.
Reading 2
S. Karandikar, et. al. A Hardware Accelerator for Protocol Buffers.
Reading 3
M. D. Hill, et. al. Accelerator-Level Parallelism. +
R. Murty. Powering Amazon EC2: Deep dive on the AWS Nitro System.

October 3
Memory and Disaggregation, Pt. 1
Reading 1
K. Zhao, et. al. Contiguitas: The Pursuit of Physical Memory Contiguity in Datacenters.
Reading 2
J. Weiner, et. al. TMO: transparent memory offloading in datacenters.
Reading 3
A. Lagar-Cavilla, et. al. Software-Defined Far Memory in Warehouse-Scale Computers.

October 10
Project Proposal Presentations + Modeling and Evaluation
Reading 1
S. Karandikar, et. al. FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud.
Reading 2
D. Cock, et. al. Enzian: an open, general, CPU/FPGA platform for systems software research.

October 17
Memory and Disaggregation, Pt. 2
Reading 1
P. Duraisamy, et. al. Towards an Adaptable Systems Architecture for Memory Tiering at Warehouse-Scale.
Reading 2
H. Al Maruf, et. al. TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory.
Reading 3
H. Li, et. al. Pond: CXL-Based Memory Pooling Systems for Cloud Platforms.

October 24
Accelerators in WSCs, Pt. 3 + Server Design
Reading 1
P. Ranganathan, et. al. Warehouse-scale video acceleration: co-design and deployment in the wild.
Reading 2
A. Sriraman, et. al. SoftSKU: optimizing server architectures for microservice diversity @scale.
Reading 3
G. Ayers, et. al. Memory Hierarchy for Web Search.

October 31
Fleet-Profiling and Workloads
Reading 1
K. Seemakhupt, et. al. A Cloud-Scale Characterization of Remote Procedure Calls.
Reading 2
A. Gonzalez, et. al. Profiling Hyperscale Big Data Processing.
Reading 3
M. Ferdman, et. al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware.

November 7
Silent Data Corruption, Faults, and Fault Tolerance
Reading 1
P. H. Hochschild, et. al. Cores that don’t count.
Reading 2
H. D. Dixit, et. al. Silent Data Corruptions at Scale.
Reading 3
Y. Zhou, et. al. Carbink: Fault-Tolerant Far Memory.

November 14
Operating Systems
Reading 1
J. T. Humphries, et. al. ghOSt: Fast & Flexible User-Space Delegation of Linux Scheduling.
Reading 2
J. T. Humphries, et. al. A case against (most) context switches.
Reading 3
A. Belay, et. al. IX: A Protected Dataplane Operating System for High Throughput and Low Latency.

November 28
Feedback-Directed Optimization, Security
Reading 1
G. Ayers, et. al. AsmDB: understanding and mitigating front-end stalls in warehouse-scale computers.
Reading 2
Y. Zhang, et. al. OCOLOS: Online COde Layout OptimizationS.
Reading 3
C. Delimitrou, et. al. Bolt: I Know What You Did Last Summer… In The Cloud.

December 5
N/A (RRR Week)

December 12
Final Project Presentations (Finals Week)

Weekly Schedule

  • Lecture/Discussion: Tuesdays 2-4pm in 320 Soda
  • Weekly Reading Reviews: Due Mondays @ noon pacific. See Ed for submission links.
  • Weekly Student Presenter Slides: Due Fridays @ 11:59pm. See Ed for submission details.

Assignments and Grading

The course workload will consist of the following:

  • 25% of grade: Each week, students will be required to read and provide a review of two of the week’s papers and attend and participate in the week’s discussion.
    • Can drop two weeks worth, no questions asked.
  • 25% of grade: Each student will lead the discussion of two papers during the semester.
  • 50% of grade: Students will complete a semester-long research project, in groups of 2 or 3, related to the course material.


Krste Asanović


Office Hours

By appointment.

Sagar Karandikar


Office Hours

By appointment.