UserMirrorrer-Qwen-DPO

This is a fine-tuned user simulator model introduced in the paper "Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation".

The model is designed to simulate user behavior in recommender systems (RSs) by leveraging extensive user feedback to achieve better preference alignment. It uses decision-making processes as explanatory rationales to reduce ambiguity in simulation samples.

Model Details

  • Base Model: Qwen-2.5-3B-Instruct
  • Fine-tuning Process:
    1. Supervised Finetuning (SFT): 1 epoch.
    2. Direct Preference Optimization (DPO): 2 epochs.
  • Dataset: UserMirrorer

Resources

Citation

If you find this work useful in your research, please consider citing the following paper:

@misc{wei2025mirroringusersbuildingpreferencealigned,
      title={Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation}, 
      author={Tianjun Wei and Huizhong Guo and Yingpeng Du and Zhu Sun and Huang Chen and Dongxia Wang and Jie Zhang},
      year={2025},
      eprint={2508.18142},
      archivePrefix={arXiv},
      primaryClass={cs.HC},
      url={https://arxiv.org/abs/2508.18142}, 
}
Downloads last month
301
Safetensors
Model size
3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Joinn/UserMirrorrer-Qwen-DPO

Base model

Qwen/Qwen2.5-3B
Finetuned
(1234)
this model

Paper for Joinn/UserMirrorrer-Qwen-DPO