โจ About Me
Hi! I am Yaxuan Li, an undergraduate student at ShanghaiTech University starting from Fall 2023, majoring in Computer Science and Technology. My research interests lie in natural language processing (NLP), with a particular focus on alignment and post-training for LLMs.
๐ฐ News
- 2026.04: ๐ Rethinking OPD was posted on arXiv and ranked #2 on Hugging Face Daily Papers [Code]
- 2026.03: ๐ DeepPrune was accepted to Findings of ACL 2026 [Code]
๐ Publications
(* denotes equal/core contribution, โ denotes project lead)
-
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Yaxuan Li*, Yuxin Zuo*โ , Bingxiang He*โ , Jinqian Zhang, Chaojun Xiao, Cheng Qian, Tianyu Yu, Huan-ang Gao, Wenkai Yang, Zhiyuan Liu, Ning Ding
Preprint [GitHub] -
DeepPrune: A Deep Reinforcement Learning Framework for Pruning Large Language Models
Shangqing Tu*, Yaxuan Li*, Yushi Bai, Lei Hou, Juanzi Li
Findings of ACL 2026 [GitHub]
๐ Educations
- Undergraduate student, ShanghaiTech University, Shanghai, China
๐งช Research Experience
-
Research Intern at THUNLP, Tsinghua University, Oct. 2025 โ Present
I worked closely with Bingxiang He under the supervision of Prof. Zhiyuan Liu. We systematically investigated the phenomenology and underlying mechanisms of both successful and unsuccessful on-policy distillation (OPD) in LLMs. Based on these findings, we proposed two practical recipes for improving OPD performance and further showed that dense reward signals are not a free lunch for OPD. This work led to our paper Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe. -
Research Intern at THU-KEG, Tsinghua University, Jun. 2025 โ Oct. 2025
I worked closely with Shangqing Tu under the supervision of Prof. Juanzi Li. We developed DeepPrune, an efficient parallel reasoning framework for large language models that reduces redundancy in parallel Chain-of-Thought decoding. To achieve this, we designed a judge model to predict answer equivalence from partial reasoning traces and integrated it with a dynamic pruning algorithm to eliminate redundant paths while preserving answer diversity. Experiments on challenging reasoning benchmarks showed that DeepPrune substantially improved inference efficiency while maintaining competitive accuracy. This work led to our paper DeepPrune: A Deep Reinforcement Learning Framework for Pruning Large Language Models, which was accepted to Findings of ACL 2026.
๐ Honors and Awards
- Outstanding Student at ShanghaiTech University Academic Year, 2023-2024
๐ข Invited Talks
- Talk title, venue, date