Peiyang Liu

Peiyang Liu 刘培阳 刘培阳 Peiyang Liu

Ph.D. Candidate 博士在读
Peking University · School of Software and Microelectronics · National Engineering Research Center for Software Engineering 北京大学 · 软件与微电子学院 · 软件工程国家工程研究中心
84+
Citations
13
Publications
About Me关于我

Hi! I'm Peiyang Liu, a first-year Ph.D. student at the School of Software and Microelectronics, Peking University, advised by Prof. Wei Ye in The Knowledge Computing Lab, National Engineering Research Center for Software Engineering. My research interests lie in Information Retrieval, Large Language Models, Retrieval-Augmented Generation (RAG), and LLM Agents.

你好!我是刘培阳,目前博士一年级,就读于北京大学软件与微电子学院,导师是软件工程国家工程研究中心知识计算实验室叶蔚老师。我的研究兴趣主要集中在信息检索大语言模型检索增强生成(RAG)LLM Agent 等方向。

Research研究方向
Information Retrieval & Large Language Models 信息检索与大语言模型
My research spans the full pipeline from low-level representation learning to high-level intelligent applications around information retrieval and large language models. On the retrieval side, I have explored large-scale text embeddings, label enhancement and smoothing, knowledge distillation, and data quality optimization. On the LLM side, my work covers Retrieval-Augmented Generation (RAG), agentic tool use and planning, reward modeling with reinforcement learning alignment, and post-training optimization. 我的研究围绕信息检索与大语言模型展开,覆盖从底层表示学习到上层智能应用的完整链路。在检索方向,我探索了大规模文本嵌入、标签增强与平滑、知识蒸馏以及数据质量优化等核心问题;在大模型方向,我的工作涉及检索增强生成(RAG)、Agentic 工具调用与规划、奖励建模与强化学习对齐,以及模型后训练优化。
Information Retrieval Large Language Model RAG LLM Agent Post-Training Reinforcement Learning Reward Model Text Embedding Knowledge Distillation
Publications发表论文

* denotes equal contribution · blue highlight denotes myself * 表示共同一作 · 蓝色高亮为本人

2026
ToolSafe: Enhancing Tool Invocation Safety of LLM-based Agents via Proactive Step-level Guardrail and Feedback
Yutao Mou, Zhangchi Xue, Lijun Li, Peiyang Liu, Shikun Zhang, Wei Ye, Jing Shao
arXiv preprint arXiv:2601.10156, 2026
2025
Queries Are Not Alone: Clustering Text Embeddings for Video Search
Peiyang Liu, Xi Wang, Ziqiang Cui, Wei Ye
SIGIR 2025CCF-A Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 874-883
Who Stole Your Data? A Method for Detecting Unauthorized RAG Theft
Peiyang Liu, Ziqiang Cui, Di Liang, Wei Ye
arXiv preprint arXiv:2510.07728, 2025
Structural Reward Model: Enhancing Interpretability, Efficiency, and Scalability in Reward Modeling
Xiaoyu Liu, Di Liang, Hongyu Shan, Peiyang Liu, Yonghao Liu, Muling Wu, Yuntao Li, Xianjie Wu, Li Miao, Jiangrong Shen, et al.
EMNLP 2025CCF-B Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pp. 672-685
Semantic Retrieval Augmented Contrastive Learning for Sequential Recommendation
Ziqiang Cui, Yunpeng Weng, Xing Tang, Xiaokun Zhang, Shiwei Li, Peiyang Liu, Bowei He, Dugang Liu, Weihong Luo, Xiuqiang He, et al.
NeurIPS 2025CCF-A The 39th Annual Conference on Neural Information Processing Systems
CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning
Ziqiang Cui, Yunpeng Weng, Xing Tang, Peiyang Liu, Shiwei Li, Bowei He, Jiamin Chen, Yansen Zhang, Xiuqiang He, Chen Ma
arXiv preprint arXiv:2508.19282, 2025
2024
Unsupervised Corrupt Data Detection for Text Training
Peiyang Liu
ESA 2024CCF-C Expert Systems with Applications, Volume 248, 123335
2023
Retrieval-Based Unsupervised Noisy Label Detection on Text Data
Peiyang Liu, Jinyu Yang, Lin Wang, Sen Wang, Yunlai Hao, Huihui Bai
CIKM 2023CCF-B Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 4099-4104
2022
Label Smoothing for Text Mining
Peiyang Liu, Xiangyu Xi, Wei Ye, Shikun Zhang
COLING 2022CCF-B Proceedings of the 29th International Conference on Computational Linguistics, pp. 2210-2219
2021
Improving Embedding-based Large-scale Retrieval via Label Enhancement
Peiyang Liu, Xi Wang, Sen Wang, Wei Ye, Xiangyu Xi, Shikun Zhang
EMNLP 2021CCF-B Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 133-142
Distilling Knowledge from BERT into Simple Fully Connected Neural Networks for Efficient Vertical Retrieval
Peiyang Liu, Xi Wang, Lin Wang, Wei Ye, Xiangyu Xi, Shikun Zhang
CIKM 2021CCF-B Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 3965-3975
QuadrupletBERT: An Efficient Model For Embedding-Based Large-Scale Retrieval
Peiyang Liu, Sen Wang, Xi Wang, Wei Ye, Shikun Zhang
NAACL 2021CCF-B Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3734-3739
2020
Not All Synonyms Are Created Equal: Incorporating Similarity of Synonyms to Enhance Word Embeddings
Peiyang Liu, Wei Ye, Xiangyu Xi, Tong Wang, Jinglei Zhang, Shikun Zhang
IJCNN 2020CCF-C 2020 International Joint Conference on Neural Networks, pp. 1-8, IEEE
Contact联系方式