Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction
Paper • 2605.12070 • Published • 16
None defined yet.
KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance
WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation