Learning Smooth Reward Models with Temporal Difference for LLM RL and Inference
Dan Zhang
zd21
AI & ML interests
None yet
Recent Activity
updated
a model
26 days ago
zd21/qwen2.5-7b-td2
published
a model
26 days ago
zd21/qwen2.5-7b-td2
updated
a model
26 days ago
zd21/qwen2.5-7b-baseline-prm
Organizations
None yet