-
SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills
Paper • 2605.24117 • Published • 21 -
MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation
Paper • 2605.27366 • Published • 26 -
SkillGrad: Optimizing Agent Skills Like Gradient Descent
Paper • 2605.27760 • Published • 27 -
Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning
Paper • 2605.28424 • Published • 31
Collections
Discover the best community collections!
Collections including paper arxiv:2604.17308
-
XSkill: Continual Learning from Experience and Skills in Multimodal Agents
Paper • 2603.12056 • Published • 34 -
Memento-Skills: Let Agents Design Agents
Paper • 2603.18743 • Published • 58 -
SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?
Paper • 2603.15401 • Published • 20 -
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Paper • 2603.25158 • Published • 54
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 731 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 40 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 89
-
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents
Paper • 2604.17308 • Published • 22 -
OpenGame: Open Agentic Coding for Games
Paper • 2604.18394 • Published • 81 -
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
Paper • 2604.18292 • Published • 85
-
Reasoning Shift: How Context Silently Shortens LLM Reasoning
Paper • 2604.01161 • Published • 32 -
Steerable Visual Representations
Paper • 2604.02327 • Published • 56 -
The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning
Paper • 2604.06427 • Published • 11 -
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation
Paper • 2604.09497 • Published • 29
-
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models
Paper • 2506.04180 • Published • 34 -
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
Paper • 2506.10540 • Published • 37 -
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Paper • 2506.10974 • Published • 19 -
SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search
Paper • 2507.15245 • Published • 11
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 70 -
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 37 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 195 -
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100
-
SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills
Paper • 2605.24117 • Published • 21 -
MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation
Paper • 2605.27366 • Published • 26 -
SkillGrad: Optimizing Agent Skills Like Gradient Descent
Paper • 2605.27760 • Published • 27 -
Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning
Paper • 2605.28424 • Published • 31
-
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents
Paper • 2604.17308 • Published • 22 -
OpenGame: Open Agentic Coding for Games
Paper • 2604.18394 • Published • 81 -
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
Paper • 2604.18292 • Published • 85
-
Reasoning Shift: How Context Silently Shortens LLM Reasoning
Paper • 2604.01161 • Published • 32 -
Steerable Visual Representations
Paper • 2604.02327 • Published • 56 -
The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning
Paper • 2604.06427 • Published • 11 -
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation
Paper • 2604.09497 • Published • 29
-
XSkill: Continual Learning from Experience and Skills in Multimodal Agents
Paper • 2603.12056 • Published • 34 -
Memento-Skills: Let Agents Design Agents
Paper • 2603.18743 • Published • 58 -
SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?
Paper • 2603.15401 • Published • 20 -
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Paper • 2603.25158 • Published • 54
-
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models
Paper • 2506.04180 • Published • 34 -
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
Paper • 2506.10540 • Published • 37 -
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Paper • 2506.10974 • Published • 19 -
SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search
Paper • 2507.15245 • Published • 11
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 731 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 40 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 89
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 70 -
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 37 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 195 -
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100