Leqi Liu

leqiliu.bsky.social

AI/ML Researcher | Assistant Professor at UT Austin | Postdoc at Princeton PLI | PhD, Machine Learning Department, CMU. Research goal: Building controllable machine intelligence that serves humanity safely. leqiliu.github.io

Joined November 2024

Posts Replies Media Original posts Likes

Leqi Liu leqiliu.bsky.social · Oct 31, 2025
We're hiring a fully-funded Ph.D. student in Use-Inspired AI @ UT Austin starting Fall 2026! Join us to work on impactful AI/ML research addressing real-world challenges. Learn more & apply: tinyurl.com/use-inspired....

View on Bluesky Show all post labels

Leqi Liu leqiliu.bsky.social · Jul 22, 2025
New method to crack hard reasoning problems with LLM! No expert traces. No test-time hacks. Just: Self-explanation + RL-style training Result? Accuracy on MATH level-5 jumped from 2% → 23%.

View on Bluesky Download image Show all post labels
Leqi Liu leqiliu.bsky.social · Jul 22, 2025
Most RL post-training methods only work when the model has some chance to get answers right. But what if it mostly gets everything wrong? NO correct trajectory sampled -> NO learning signal -> Model stays the same and unlearns due to KL constraint This happens often in hard reasoning tasks.

View on Bluesky Show all post labels
Leqi Liu leqiliu.bsky.social · Jul 22, 2025
Our solution: Ask the model to explain the correct answer — even when it couldn’t solve the problem. These self-explanations are: ✅ in-distribution ✅ richer than failed CoTs ✅ Offer better guidance than expert-written CoTs We train on them. We call it ExPO.

View on Bluesky Show all post labels
View full thread
Leqi Liu leqiliu.bsky.social · Jul 22, 2025
Final message: LLMs can improve from failure — if you ask the right question. “Explain the answer” > “Try again” Paper: arxiv.org/abs/2507.02834 Joint work with @ruiyang-zhou.bsky.social and Shuozhe Li.
ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning

Recent advances in large language models have been driven by reinforcement learning (RL)-style post-training, which improves reasoning by optimizing model outputs based on reward or preference signals...

arxiv.org

View on Bluesky Show all post labels

Leqi Liu leqiliu.bsky.social · Jul 10, 2025
What if you could understand and control an LLM by studying its *smaller* sibling? Our new paper introduces the Linear Representation Transferability Hypothesis. We find that the internal representations of different-sized models can be translated into one another using a simple linear(affine) map.

View on Bluesky Show all post labels
Leqi Liu leqiliu.bsky.social · Jul 10, 2025
Here's the core idea: We hypothesize that models trained on similar data learn a **universal set of basis features**. Each model's internal representation space is just a unique, model-specific projection of this shared space. This means representations learned across models are transferable!

View on Bluesky Show all post labels
Leqi Liu leqiliu.bsky.social · Jul 10, 2025
We tested this by learning an affine map between Gemma-2B and Gemma-9B. The result? Steering vectors(directions for specific behaviors) from the 2B model successfully guided 9B's outputs. For example, a "dog-saying" steering vector from 2B made 9B talk more about dogs!

View on Bluesky Show all post labels
Leqi Liu leqiliu.bsky.social · Jul 10, 2025
This has huge practical implications! It opens the door to using small, efficient models as sandboxes to probe, understand, and even steer their much larger counterparts. Paper: arxiv.org/abs/2506.00653 Joint work with Femi Bello, @anubrata.bsky.social, Fanzhi Zeng, @fcyin.bsky.social
Linear Representation Transferability Hypothesis: Leveraging Small Models to Steer Large Models

It has been hypothesized that neural networks with similar architectures trained on similar data learn shared representations relevant to the learning task. We build on this idea by extending the conc...

arxiv.org

View on Bluesky Show all post labels

Leqi Liu leqiliu.bsky.social · Dec 14, 2024
Ever wondered why there are synchronized ups and downs for chosen and rejected log-probs during DPO (and most *POs: IPO, SimPO, CPO, R-DPO, DPOP, RRHF, SlicHF) training? Why do chosen logps decrease, and rejected logps sometimes increase? Our answer: Gradient Entanglement! arxiv.org/abs/2410.13828

View on Bluesky Download image Show all post labels
Leqi Liu leqiliu.bsky.social · Dec 14, 2024
1/4 We demystify the reason behind the synchronized change in chosen and rejected logps: the **Gradient Entanglement** effect! For any margin-based losses (esp. these *PO objectives), the chosen probability will depend on the rejected gradient, and vice versa.

View on Bluesky Show all post labels
Leqi Liu leqiliu.bsky.social · Dec 14, 2024
2/4 The Gradient Entanglement effect becomes particularly concerning when the chosen and rejected gradient inner product is large, which often happens when the two responses are similar!

View on Bluesky Show all post labels
View full thread
Leqi Liu leqiliu.bsky.social · Dec 14, 2024
4/4 Joint work with Hui Yuan, Yifan Zeng, Yue Wu, Huazheng Wang, Mengdi Wang Paper: arxiv.org/abs/2410.13828 Check out our work at the NeurIPS AFM workshop, Exhibit Hall A, 12/14, 4:30 - 5:30 pm #NeurIPS2024

View on Bluesky Show all post labels

Leqi Liu leqiliu.bsky.social · Dec 14, 2024
How to **efficiently** build personalized language models **without** textual info on user preferences? Our Personalized-RLHF work: - light-weight user model - personalize all *PO alignment algorithms - strong performance on the largest personalized preference dataset arxiv.org/abs/2402.05133

View on Bluesky Show all post labels
Leqi Liu leqiliu.bsky.social · Dec 14, 2024
1/4 Personalized-RLHF (P-RLHF) uses a **light-weight** user model to learn user embeddings, which serve as a soft prompt for generating personalized responses. The user model is much smaller (10-100x smaller) compared to the LORA adapters used for fine-tuning the language model.

View on Bluesky Show all post labels
Leqi Liu leqiliu.bsky.social · Dec 14, 2024
2/4 For any base preference optimization (*PO) algorithm, P-RLHF can create its corresponding personalized version P-*PO, allowing for **flexible** choice of alignment algorithms.

View on Bluesky Show all post labels
View full thread
Leqi Liu leqiliu.bsky.social · Dec 14, 2024
4/4 Joint work with Xinyu Li, @ruiyang-zhou.bsky.social, @zacharylipton.bsky.social Paper: arxiv.org/abs/2402.05133, Code: github.com/HumainLab/Personalized_RLHF Check out our work at the NeurIPS AFM workshop, Exhibit Hall A, 12/14, 4:30 - 5:30 pm #NeurIPS2024

View on Bluesky Show all post labels

Leqi Liu

ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning

Linear Representation Transferability Hypothesis: Leveraging Small Models to Steer Large Models