🤖 Current Research Project
Motivation. We hypothesize that the current RLVR setting is constrained by limited exploration.
Ongoing. We are developing a new fine-tuning algorithm that augments RLVR with an SFT phase to encourage exploration.
🎖 Awards
- 2025 Eleanor Quinlan Graduate Teaching Award, Department of Computer Science and Engineering, The Ohio State University