Publications
In reversed chronological order | * denotes equal contribution
2025
- arXivTheoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theorysubmitted, under review , 2025
- arXivFundamental Limits of Game-Theoretic LLM Alignment: Smith Consistency and Preference Matchingsubmitted, under review , 2025
- ICML 2025Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning ApproachIn Proceedings of the 42nd International Conference on Machine Learning, 2025
- ICLR 2025Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Models AlignmentIn The Thirteenth International Conference on Learning Representations, 2025
- ICLR 2025Preserving Diversity in Supervised Fine-Tuning of Large Language ModelsIn The Thirteenth International Conference on Learning Representations, 2025
2024
- NeurIPS 2024 FITML Best Paper Runner-upEntropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better DiversityIn NeurIPS 2024, Fine-Tuning in Modern Machine Learning: Principles and Scalability, 2024
2023
- Ph.D. ThesisUnderstanding Adversarially Robust Generalization: A Learning Theory PerspectiveThe Chinese University of Hong Kong, Shenzhen, 2023
2022
- NeurIPS 2022 MLSSmoothed-SGDmax: A Stability-Inspired Algorithm to Improve Adversarial GeneralizationIn NeurIPS 2022, ML Safety, 2022