
姓名:白有辉
个人主页:https://youhuibai.github.io/
主要研究方向:大模型分布式并行计算,算法系统co-design,AI systems
白有辉,特任副研究员,硕士生导师,2021年博士毕业于中国科学技术大学计算机科学与技术专业。围绕算法-系统-硬件协同创新范式,构建了面向大模型全生命周期的计算加速体系,相关成果形成数十篇高水平论文,其中包含计算机系统顶会SOSP一作1篇(中国科大作为第一单位的第二篇),以及IEEE TPDS、IEEE HPCA、AAAI等。申请数十项发明专利及1项核心商业机密,技术方案已全面应用于华为昇腾产品线并产生显著商业价值。相关研究成果荣获ACE SIGCSE优秀博士论文奖、2024世界人工智能大会青年优秀论文奖、2021年安徽省优秀毕业生等。于2021年7月至2025年6月在华为公司工作期间,作为项目经理多次参与公司级攻关项目,解决基于国产昇腾芯片做大模型训推加速的难题,荣获火花奖一等奖、治水攻关特等奖、中央研究院创新先锋一等奖、计算产品线高质量交付奖、总裁嘉奖令等,作为核心骨干获得公司金牌团队奖。
获奖情况:
2024年-世界人工智能大会青年优秀论文奖(全球10篇,是唯一入选的人工智能系统会议论文)
2023年-ACM SIGCSE 优秀博士论文奖
2024年-华为难题揭榜火花奖一等奖2项
2024年-华为中央研究院创新先锋一等奖2项
2023年-华为公司金牌团队奖
代表性论著:
*表示通讯作者
[1]. Youhui Bai, Cheng Li, Quan Zhou, Jun Yi, Ping Gong, Feng Yan, Ruichuan Chen, Yinlong Xu, Gradient Compression Supercharged High-Performance Data Parallel DNN Training, SOSP 2021.
[2]. Youhui Bai, Cheng Li, Zhiqi Lin, Yufei Wu, Youshan Miao, Yunxin Liu, Yinlong Xu, Efficient Data Loader for Fast Sampling-based GNN Training on Large Graphs, TPDS 2021.
[3]. Hao Wu, Shiyi Wang, Youhui Bai*, Cheng Li, Quan Zhou, Jun Yi, Feng Yan, Ruichuan Chen, Yinlong Xu, A Generic, High-Performance, Compression-Aware Framework for Data Parallel DNN Training, TPDS 2023.
[4]. Zewen Jin, Shen Fu, Chengjie Tang, Youhui Bai1*, Shengnan Wang, Jiaan Zhu, Chizheng Fang, Ping Gong, Cheng Li, SMIDT: High-Performance Inference Framework for MoE Models with Dynamic Top-K Routing, AAAI 2026.
[5]. Jiawei Yi, Ping Gong, Youhui Bai*, Jiaqi Ruan, Shengnan Wang, Pengcheng Wang, Haibo Wang, Weiguang Wang, Xia Zhu, Feng Wu, Cheng Li, CLO: Efficient LLM Inference System with CPU-Light KVCache Offloading via Algorithm-System Co-Design, arXiv 2025.
[6]. Shengnan Wang, Youhui Bai, Lin Zhang, Pingyi Zhou, Shixiong Zhao, Gong Zhang, Sen Wang, Renhai Chen, Hua Xu, Hongwei Sun, Xl3m: A training-free framework for llm length extension based on segment-wise inference, arXiv 2024.
[7]. Quan Zhou, Haiquan Wang, Xiaoyan Yu, Cheng Li, Youhui Bai, Feng Yan, Yinlong Xu, Mpress: Democratizing billion-scale model training on multi-gpu servers via memory-saving inter-operator parallelism, HPCA 2023.
[8]. Zewen Jin, Shengnan Wang, Jiaan Zhu, Hongrui Zhan, Youhui Bai, Lin Zhang, Zhenyu Ming, Cheng Li, BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference, AAAI 2025.
[9]. Ping Gong, Jiawei Yi, Shengnan Wang, Juncheng Zhang, Zewen Jin, Ouxiang Zhou, Ruibo Liu, Guanbin Xu, Youhui Bai, Bowen Ye, Kun Yuan, Tong Yang, Gong Zhang, Renhai Chen, Feng Wu, Cheng Li, HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference, ACL findings 2025.
[10].Peng Liang, Yu Tang, Xiaoda Zhang, Youhui Bai, Teng Su, Zhiquan Lai, Linbo Qiao, Dongsheng Li, A survey on auto-parallelism of large-scale deep learning training, TPDS 2023.
