Wei Zhao

I'm a performance engineer at NVIDIA. Recently, I primarily focus on optimizing LLM inference performance on GPUs in the vllm project.

I earned my Master's degree in Computer Science from Stanford University. Before that, I received my bachelor's degree in Computer Science from the University of Toronto, where I was advised by Prof. Gennady Pekhimenko.

My interest lies in building system solutions to address computational challenges in science and engineering. In the past, I’ve worked on LLM inference, GPU sharing, and stream processing systems.

Email / Google Scholar / LinkedIn / Github

Research

Tally: Non-Intrusive Performance Isolation for Concurrent Deep Learning Workloads
Wei Zhao, Anand Jayarajan, Gennady Pekhimenko
ASPLOS 2025 (Distinguished Artifact Award)
paper / code

Seesaw: High-throughput LLM Inference via Model Re-sharding
Qidong Su, Wei Zhao, Xin Li, Muralidhar Andoorveedu, Chenhao Jiang, Zhanda Zhu, Kevin Song, Christina Giannoula, Gennady Pekhimenko
MLSys 2025 (Outstanding Paper Honorable Mention)
paper / code

TiLT: A Time-Centric Approach for Stream Query Optimization and Parallelization
Anand Jayarajan, Wei Zhao, Yudi Sun, Gennady Pekhimenko
ASPLOS 2023 (Distinguished Artifact Award)
paper / code

Website template from Jon Barron.
Last updated: June 6, 2026