DynamicPO: Dynamic Preference Optimization for Recommendation
Paper • 2605.00327 • Published
How to use xingyuHuxingyu/DynamicPO with PEFT:
Task type is invalid.
This repository contains the model weights (LoRA adapters) for DynamicPO, a plug-and-play dynamic preference optimization framework for LLM-based recommender systems.
DynamicPO is designed to align Large Language Models (LLMs) with user preferences while mitigating "preference optimization collapse." This phenomenon occurs in multi-negative alignment when increasing the number of negative samples leads to performance degradation despite a decreasing training loss.
DynamicPO comprises two adaptive mechanisms:
This work received DASFAA 2026 Best Paper Award. If you find this work useful, please consider citing:
@inproceedings{hu2026dynamicpo,
title={DynamicPO: Dynamic Preference Optimization for Recommendation},
author={Hu, Xingyu and Zhang, Kai and Wu, Jiancan and Wang, Shuli and Wang, Chi and Chen, Wenshuai and Zhu, Yinhua and Wang, Haitao and Wang, Xingxing and Wang, Xiang},
booktitle={International Conference on Database Systems for Advanced Applications},
pages={372--387},
year={2026},
organization={Springer}
}
This implementation is built upon the TRL library.