-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 98 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 36 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
Collections
Discover the best community collections!
Collections including paper arxiv:2601.09667
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 107 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 509 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 31
-
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 129 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 230 -
PRewrite: Prompt Rewriting with Reinforcement Learning
Paper • 2401.08189 • Published -
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning
Paper • 2509.11543 • Published • 49
-
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper • 2512.24618 • Published • 151 -
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Paper • 2512.24873 • Published • 105 -
AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents
Paper • 2512.23343 • Published • 29 -
Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking
Paper • 2512.24297 • Published • 6
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 32 -
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
Paper • 2510.19363 • Published • 62 -
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
Paper • 2510.25992 • Published • 48 -
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Paper • 2511.07384 • Published • 19
-
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Paper • 2602.12036 • Published • 98 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 36 -
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Paper • 2512.23705 • Published • 45 -
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper • 2512.19995 • Published • 16
-
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper • 2512.24618 • Published • 151 -
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Paper • 2512.24873 • Published • 105 -
AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents
Paper • 2512.23343 • Published • 29 -
Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking
Paper • 2512.24297 • Published • 6
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 107 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 509 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 31
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 32 -
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
Paper • 2510.19363 • Published • 62 -
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
Paper • 2510.25992 • Published • 48 -
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Paper • 2511.07384 • Published • 19
-
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 129 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 230 -
PRewrite: Prompt Rewriting with Reinforcement Learning
Paper • 2401.08189 • Published -
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning
Paper • 2509.11543 • Published • 49