arxiv:2503.15478
Song Jiang
songjiang
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 2 months ago
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
upvoted
a
paper
2 months ago
Large Reasoning Models Learn Better Alignment from Flawed Thinking
authored
a paper
9 months ago
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning
Tasks
Organizations
None yet