Leqi Liu
AI/ML Researcher | Assistant Professor at UT Austin | Postdoc at Princeton PLI | PhD, Machine Learning Department, CMU. Research goal: Building controllable machine intelligence that serves humanity safely. leqiliu.github.io
- We're hiring a fully-funded Ph.D. student in Use-Inspired AI @ UT Austin starting Fall 2026! Join us to work on impactful AI/ML research addressing real-world challenges. Learn more & apply: tinyurl.com/use-inspired....
- New method to crack hard reasoning problems with LLM! No expert traces. No test-time hacks. Just: Self-explanation + RL-style training Result? Accuracy on MATH level-5 jumped from 2% → 23%.
- Most RL post-training methods only work when the model has some chance to get answers right. But what if it mostly gets everything wrong? NO correct trajectory sampled -> NO learning signal -> Model stays the same and unlearns due to KL constraint This happens often in hard reasoning tasks.
- Our solution: Ask the model to explain the correct answer — even when it couldn’t solve the problem. These self-explanations are: ✅ in-distribution ✅ richer than failed CoTs ✅ Offer better guidance than expert-written CoTs We train on them. We call it ExPO.
-
View full threadFinal message: LLMs can improve from failure — if you ask the right question. “Explain the answer” > “Try again” Paper: arxiv.org/abs/2507.02834 Joint work with @ruiyang-zhou.bsky.social and Shuozhe Li.
- What if you could understand and control an LLM by studying its *smaller* sibling? Our new paper introduces the Linear Representation Transferability Hypothesis. We find that the internal representations of different-sized models can be translated into one another using a simple linear(affine) map.
- Here's the core idea: We hypothesize that models trained on similar data learn a **universal set of basis features**. Each model's internal representation space is just a unique, model-specific projection of this shared space. This means representations learned across models are transferable!
- We tested this by learning an affine map between Gemma-2B and Gemma-9B. The result? Steering vectors(directions for specific behaviors) from the 2B model successfully guided 9B's outputs. For example, a "dog-saying" steering vector from 2B made 9B talk more about dogs!
- This has huge practical implications! It opens the door to using small, efficient models as sandboxes to probe, understand, and even steer their much larger counterparts. Paper: arxiv.org/abs/2506.00653 Joint work with Femi Bello, @anubrata.bsky.social, Fanzhi Zeng, @fcyin.bsky.social
- Ever wondered why there are synchronized ups and downs for chosen and rejected log-probs during DPO (and most *POs: IPO, SimPO, CPO, R-DPO, DPOP, RRHF, SlicHF) training? Why do chosen logps decrease, and rejected logps sometimes increase? Our answer: Gradient Entanglement! arxiv.org/abs/2410.13828
- 1/4 We demystify the reason behind the synchronized change in chosen and rejected logps: the **Gradient Entanglement** effect! For any margin-based losses (esp. these *PO objectives), the chosen probability will depend on the rejected gradient, and vice versa.
- 2/4 The Gradient Entanglement effect becomes particularly concerning when the chosen and rejected gradient inner product is large, which often happens when the two responses are similar!
-
View full thread4/4 Joint work with Hui Yuan, Yifan Zeng, Yue Wu, Huazheng Wang, Mengdi Wang Paper: arxiv.org/abs/2410.13828 Check out our work at the NeurIPS AFM workshop, Exhibit Hall A, 12/14, 4:30 - 5:30 pm #NeurIPS2024
- How to **efficiently** build personalized language models **without** textual info on user preferences? Our Personalized-RLHF work: - light-weight user model - personalize all *PO alignment algorithms - strong performance on the largest personalized preference dataset arxiv.org/abs/2402.05133
- 1/4 Personalized-RLHF (P-RLHF) uses a **light-weight** user model to learn user embeddings, which serve as a soft prompt for generating personalized responses. The user model is much smaller (10-100x smaller) compared to the LORA adapters used for fine-tuning the language model.
- 2/4 For any base preference optimization (*PO) algorithm, P-RLHF can create its corresponding personalized version P-*PO, allowing for **flexible** choice of alignment algorithms.
-
View full thread4/4 Joint work with Xinyu Li, @ruiyang-zhou.bsky.social, @zacharylipton.bsky.social Paper: arxiv.org/abs/2402.05133, Code: github.com/HumainLab/Personalized_RLHF Check out our work at the NeurIPS AFM workshop, Exhibit Hall A, 12/14, 4:30 - 5:30 pm #NeurIPS2024