DynamicPO: Dynamic Preference Optimization for Recommendation

This repository contains the model weights (LoRA adapters) for DynamicPO, a plug-and-play dynamic preference optimization framework for LLM-based recommender systems.

DynamicPO is designed to align Large Language Models (LLMs) with user preferences while mitigating "preference optimization collapse." This phenomenon occurs in multi-negative alignment when increasing the number of negative samples leads to performance degradation despite a decreasing training loss.

Key Features

DynamicPO comprises two adaptive mechanisms:

  • Dynamic Boundary Negative Selection: Identifies and prioritizes informative negatives near the model's decision boundary.
  • Dual-Margin Dynamic beta Adjustment: Calibrates optimization strength per sample according to boundary ambiguity.

Resources

Base Models

  • meta-llama/Llama-2-7b-chat-hf
  • meta-llama/Meta-Llama-3-8B-Instruct
  • Qwen/Qwen2.5-7B-Instruct

Citation

This work received DASFAA 2026 Best Paper Award. If you find this work useful, please consider citing:

@inproceedings{hu2026dynamicpo,
  title={DynamicPO: Dynamic Preference Optimization for Recommendation},
  author={Hu, Xingyu and Zhang, Kai and Wu, Jiancan and Wang, Shuli and Wang, Chi and Chen, Wenshuai and Zhu, Yinhua and Wang, Haitao and Wang, Xingxing and Wang, Xiang},
  booktitle={International Conference on Database Systems for Advanced Applications},
  pages={372--387},
  year={2026},
  organization={Springer}
}

Acknowledgment

This implementation is built upon the TRL library.

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for xingyuHuxingyu/DynamicPO

Base model

Qwen/Qwen2.5-7B
Adapter
(2201)
this model

Paper for xingyuHuxingyu/DynamicPO