SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published 5 days ago • 24
RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning Paper • 2603.09160 • Published 20 days ago • 15
SkillOrchestra: Learning to Route Agents via Skill Transfer Paper • 2602.19672 • Published Feb 23 • 57
Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling Paper • 2601.22636 • Published Jan 30 • 22
FlowRL: Matching Reward Distributions for LLM Reasoning Paper • 2509.15207 • Published Sep 18, 2025 • 118
COSMOS: Predictable and Cost-Effective Adaptation of LLMs Paper • 2505.01449 • Published Apr 30, 2025 • 3
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training Paper • 2505.00358 • Published May 1, 2025 • 26