Peiyang Liu

Peiyang Liu 刘培阳 刘培阳 Peiyang Liu

Ph.D. Candidate 博士在读
Peking University · School of Software and Microelectronics · National Engineering Research Center for Software Engineering 北京大学 · 软件与微电子学院 · 软件工程国家工程研究中心
160
Citations
23
Publications
8
CCF-A
5
CCF-B
About Me关于我

Hi! I'm Peiyang Liu, a first-year Ph.D. student at the School of Software and Microelectronics, Peking University, advised by Prof. Wei Ye in The Knowledge Computing Lab, National Engineering Research Center for Software Engineering. My research interests lie in Retrieval-Augmented Generation (RAG), LLM Reasoning & Post-Training, and Information Retrieval.

你好!我是刘培阳,目前博士一年级,就读于北京大学软件与微电子学院,导师是软件工程国家工程研究中心知识计算实验室叶蔚老师。我的研究兴趣主要集中在检索增强生成(RAG)大模型推理与后训练信息检索等方向。

Research研究方向
Retrieval-Augmented Generation 检索增强生成
Building robust and trustworthy RAG systems — from context compression and counterfactual debiasing to multimodal visual attribution and data theft detection. 构建鲁棒可信的 RAG 系统——涵盖上下文压缩、反事实去偏、多模态视觉归因以及数据盗用检测。
Context Compression Robust RAG Multimodal RAG RAG Security
LLM Reasoning & Post-Training 大模型推理与后训练
Improving how LLMs learn and reason after pretraining — synthesizing reasoning paths from search trajectories, diagnosing SFT failure modes, designing efficient reward models, and calibrating code generation. 提升大模型在预训练之后的学习与推理能力——从搜索轨迹合成推理路径、诊断 SFT 失效模式、设计高效奖励模型,到代码生成的层级校准。
Reasoning Reward Model SFT Analysis Code Generation Long Context
Information Retrieval & Text Embedding 信息检索与文本嵌入
Large-scale dense retrieval with embedding optimization — label enhancement & smoothing, knowledge distillation, data quality control, and cross-modal retrieval. 面向大规模稠密检索的嵌入优化——标签增强与平滑、知识蒸馏、数据质量控制以及跨模态检索。
Dense Retrieval Text Embedding Label Quality Knowledge Distillation Video Search
Publications发表论文

* denotes equal contribution · blue highlight denotes myself * 表示共同一作 · 蓝色高亮为本人

2026
Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories
Peiyang Liu, Zhirui Chen, Xi Wang, Di Liang, Youru Li, Zhi Cai, Wei Ye
ACL 2026CCF-AOral The 64th Annual Meeting of the Association for Computational Linguistics
NeuroSym-Cal: Bridging the Reasoning-Execution Gap in Code Generation via Hierarchical Calibration
Peiyang Liu, Yining Wang, Youru Li, Long Li, Zhi Cai, Wei Ye
ACL 2026Findings Findings of the Association for Computational Linguistics: ACL 2026
StructKV: Preserving the Structural Skeleton for Scalable Long-Context Inference
Zhirui Chen, Peiyang Liu, Ling Shao
ACL 2026Findings Findings of the Association for Computational Linguistics: ACL 2026
Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models
Chao Xue, Yao Wang, Mengqiao Liu, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Chenyao Lu, Lei Jiang, Yu Lu, Haibo Shi, Shuang Liang, Minlong Peng, Flora D. Salim
ACL 2026CCF-A The 64th Annual Meeting of the Association for Computational Linguistics
Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty
Chao Xue, Yao Wang, Mengqiao Liu, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Chenyao Lu, Lei Jiang, Yu Lu, Haibo Shi, Shuang Liang, Minlong Peng, Flora D. Salim
ACL 2026Findings Findings of the Association for Computational Linguistics: ACL 2026
Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning
Zekai Lin, Chao Xue, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Lei Jiang, Yu Lu, Bob Simons, Shuang Liang, Minlong Peng
ACL 2026CCF-AOral The 64th Annual Meeting of the Association for Computational Linguistics
Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation
Peiyang Liu, Ziqiang Cui, Xi Wang, Di Liang, Wei Ye
SIGIR 2026CCF-AOral Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval
Beyond Semantic Relevance: Counterfactual Risk Minimization for Robust Retrieval-Augmented Generation
Peiyang Liu, Qiang Yan, Ziqiang Cui, Di Liang, Xi Wang, Wei Ye
SIGIR 2026CCF-AOral Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval
ToolSafe: Enhancing Tool Invocation Safety of LLM-based Agents via Proactive Step-level Guardrail and Feedback
Yutao Mou, Zhangchi Xue, Lijun Li, Peiyang Liu, Shikun Zhang, Wei Ye, Jing Shao
ACL 2026Findings Findings of the Association for Computational Linguistics: ACL 2026
SQL-ASTRA: Alleviating Sparse Feedback in Agentic SQL via Column-Set Matching and Trajectory Aggregation
Long Li, Zhijian Zhou, Jiangxuan Long, Peiyang Liu, Weidi Xu, Zhe Wang, Shirui Pan, Chao Qu
ACL 2026Findings Findings of the Association for Computational Linguistics: ACL 2026
Less Is More: Elevating RAG via Performance-Driven Context Compression
Ziqiang Cui, Yunpeng Weng, Xing Tang, Peiyang Liu, Shiwei Li, Bowei He, Jiamin Chen, Yansen Zhang, Xiuqiang He, Rui Zhang, Chen Ma
ICML 2026CCF-A The 43rd International Conference on Machine Learning
2025
Queries Are Not Alone: Clustering Text Embeddings for Video Search
Peiyang Liu, Xi Wang, Ziqiang Cui, Wei Ye
SIGIR 2025CCF-AOral Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 874-883
Who Stole Your Data? A Method for Detecting Unauthorized RAG Theft
Peiyang Liu, Ziqiang Cui, Di Liang, Wei Ye
Preprint arXiv:2510.07728, 2025
Structural Reward Model: Enhancing Interpretability, Efficiency, and Scalability in Reward Modeling
Xiaoyu Liu, Di Liang, Hongyu Shan, Peiyang Liu, Yonghao Liu, Muling Wu, Yuntao Li, Xianjie Wu, Li Miao, Jiangrong Shen, et al.
EMNLP 2025CCF-B Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pp. 672-685
Semantic Retrieval Augmented Contrastive Learning for Sequential Recommendation
Ziqiang Cui, Yunpeng Weng, Xing Tang, Xiaokun Zhang, Shiwei Li, Peiyang Liu, Bowei He, Dugang Liu, Weihong Luo, Xiuqiang He, et al.
NeurIPS 2025CCF-A The 39th Annual Conference on Neural Information Processing Systems
CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning
Ziqiang Cui, Yunpeng Weng, Xing Tang, Peiyang Liu, Shiwei Li, Bowei He, Jiamin Chen, Yansen Zhang, Xiuqiang He, Chen Ma
Preprint arXiv:2508.19282, 2025
2024
Unsupervised Corrupt Data Detection for Text Training
Peiyang Liu
ESWA 2024CCF-C Expert Systems with Applications, Volume 248, 123335
2023
Retrieval-Based Unsupervised Noisy Label Detection on Text Data
Peiyang Liu, Jinyu Yang, Lin Wang, Sen Wang, Yunlai Hao, Huihui Bai
CIKM 2023CCF-B Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 4099-4104
2022
Label Smoothing for Text Mining
Peiyang Liu, Xiangyu Xi, Wei Ye, Shikun Zhang
COLING 2022CCF-B Proceedings of the 29th International Conference on Computational Linguistics, pp. 2210-2219
2021
Improving Embedding-based Large-scale Retrieval via Label Enhancement
Peiyang Liu, Xi Wang, Sen Wang, Wei Ye, Xiangyu Xi, Shikun Zhang
EMNLP 2021Findings Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 133-142
Distilling Knowledge from BERT into Simple Fully Connected Neural Networks for Efficient Vertical Retrieval
Peiyang Liu, Xi Wang, Lin Wang, Wei Ye, Xiangyu Xi, Shikun Zhang
CIKM 2021CCF-B Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 3965-3975
QuadrupletBERT: An Efficient Model For Embedding-Based Large-Scale Retrieval
Peiyang Liu, Sen Wang, Xi Wang, Wei Ye, Shikun Zhang
NAACL 2021CCF-B Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3734-3739
2020
Not All Synonyms Are Created Equal: Incorporating Similarity of Synonyms to Enhance Word Embeddings
Peiyang Liu, Wei Ye, Xiangyu Xi, Tong Wang, Jinglei Zhang, Shikun Zhang
IJCNN 2020CCF-C 2020 International Joint Conference on Neural Networks, pp. 1-8, IEEE
Contact联系方式