🤖 Current Research Project

Motivation. We hypothesize that the current RLVR setting is constrained by limited exploration.

Ongoing. We are developing a new fine-tuning algorithm that augments RLVR with an SFT phase to encourage exploration.