Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper
•
2512.24618
•
Published
•
142
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Paper
•
2512.24873
•
Published
•
103
AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents
Paper
•
2512.23343
•
Published
•
28
Scaling Open-Ended Reasoning to Predict the Future
Paper
•
2512.25070
•
Published
•
16
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper
•
2512.23988
•
Published
•
16
Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking
Paper
•
2512.24297
•
Published
•
6
Valori: A Deterministic Memory Substrate for AI Systems
Paper
•
2512.22280
•
Published
•
4
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
Paper
•
2512.23959
•
Published
•
109
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper
•
2512.24617
•
Published
•
61
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization
Paper
•
2512.24615
•
Published
•
118
Nested Learning: The Illusion of Deep Learning Architectures
Paper
•
2512.24695
•
Published
•
41
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning
Paper
•
2512.24330
•
Published
•
35
The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving
Paper
•
2601.00747
•
Published
•
20
Diversity or Precision? A Deep Dive into Next Token Prediction
Paper
•
2512.22955
•
Published
•
8
Fast-weight Product Key Memory
Paper
•
2601.00671
•
Published
•
5
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
Paper
•
2512.20578
•
Published
•
83
Can We Trust AI Explanations? Evidence of Systematic Underreporting in Chain-of-Thought Reasoning
Paper
•
2601.00830
•
Published
•
3
SimpleMem: Efficient Lifelong Memory for LLM Agents
Paper
•
2601.02553
•
Published
•
37
Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling
Paper
•
2601.02346
•
Published
•
26
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment
Paper
•
2601.01576
•
Published
•
18
Confidence Estimation for LLMs in Multi-turn Interactions
Paper
•
2601.02179
•
Published
•
16
CPPO: Contrastive Perception for Vision Language Policy Optimization
Paper
•
2601.00501
•
Published
•
7
Project Ariadne: A Structural Causal Framework for Auditing Faithfulness in LLM Agents
Paper
•
2601.02314
•
Published
•
2
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision
Paper
•
2601.03193
•
Published
•
46
NitroGen: An Open Foundation Model for Generalist Gaming Agents
Paper
•
2601.02427
•
Published
•
43
MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning
Paper
•
2512.23412
•
Published
•
39
CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving
Paper
•
2601.01874
•
Published
•
19
Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models
Paper
•
2601.01321
•
Published
•
18
WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks
Paper
•
2601.02439
•
Published
•
16
Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners
Paper
•
2601.02996
•
Published
•
5
Steerability of Instrumental-Convergence Tendencies in LLMs
Paper
•
2601.01584
•
Published
•
1
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
Paper
•
2601.02151
•
Published
•
102
Evolving Programmatic Skill Networks
Paper
•
2601.03509
•
Published
•
81
Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning
Paper
•
2601.03872
•
Published
•
42
Agentic Rubrics as Contextual Verifiers for SWE Agents
Paper
•
2601.04171
•
Published
•
11
MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics
Paper
•
2601.02075
•
Published
•
8
Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts
Paper
•
2601.03315
•
Published
•
6
MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents
Paper
•
2601.03236
•
Published
•
3
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper
•
2601.05242
•
Published
•
210
RelayLLM: Efficient Reasoning via Collaborative Decoding
Paper
•
2601.05167
•
Published
•
29
AT^2PO: Agentic Turn-based Policy Optimization via Tree Search
Paper
•
2601.04767
•
Published
•
28
Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing
Paper
•
2601.04575
•
Published
•
8
Paper
•
2601.05111
•
Published
•
19
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models
Paper
•
2601.03425
•
Published
•
16
DocDancer: Towards Agentic Document-Grounded Information Seeking
Paper
•
2601.05163
•
Published
•
5
One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling
Paper
•
2601.03111
•
Published
•
9
AgentDevel: Reframing Self-Evolving LLM Agents as Release Engineering
Paper
•
2601.04620
•
Published
•
3
Learning User Preferences Through Interaction for Long-Term Collaboration
Paper
•
2601.02702
•
Published
•
2
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
Paper
•
2601.05432
•
Published
•
163
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
Paper
•
2601.06002
•
Published
•
50
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards
Paper
•
2601.06021
•
Published
•
43
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis
Paper
•
2601.05808
•
Published
•
36
AgentOCR: Reimagining Agent History via Optical Self-Compression
Paper
•
2601.04786
•
Published
•
28
Can We Predict Before Executing Machine Learning Agents?
Paper
•
2601.05930
•
Published
•
26
An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift
Paper
•
2601.05882
•
Published
•
20
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency
Paper
•
2601.05905
•
Published
•
18
SmartSearch: Process Reward-Guided Query Refinement for Search Agents
Paper
•
2601.04888
•
Published
•
9
Over-Searching in Search-Augmented Large Language Models
Paper
•
2601.05503
•
Published
•
6
DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation
Paper
•
2601.04823
•
Published
•
6
Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning
Paper
•
2601.04726
•
Published
•
6
TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration
Paper
•
2601.04544
•
Published
•
6
IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck
Paper
•
2601.05870
•
Published
•
3
Distilling Feedback into Memory-as-a-Tool
Paper
•
2601.05960
•
Published
•
2
BabyVision: Visual Reasoning Beyond Language
Paper
•
2601.06521
•
Published
•
190
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
Paper
•
2601.05593
•
Published
•
79
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors
Paper
•
2601.07226
•
Published
•
32
Dr. Zero: Self-Evolving Search Agents without Training Data
Paper
•
2601.07055
•
Published
•
20
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent
Paper
•
2601.07779
•
Published
•
27
Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction
Paper
•
2601.05107
•
Published
•
23
ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration
Paper
•
2601.06860
•
Published
•
16
MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era
Paper
•
2601.07526
•
Published
•
23
Forest Before Trees: Latent Superposition for Efficient Visual Reasoning
Paper
•
2601.06803
•
Published
•
10
TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning
Paper
•
2601.04698
•
Published
•
10
How Do Large Language Models Learn Concepts During Continual Pre-Training?
Paper
•
2601.03570
•
Published
•
4
OpenTinker: Separating Concerns in Agentic Reinforcement Learning
Paper
•
2601.07376
•
Published
•
6
ShowUI-Aloha: Human-Taught GUI Agent
Paper
•
2601.07181
•
Published
•
3
Are LLM Decisions Faithful to Verbal Confidence?
Paper
•
2601.07767
•
Published
•
4
Structured Episodic Event Memory
Paper
•
2601.06411
•
Published
•
4
Artificial Entanglement in the Fine-Tuning of Large Language Models
Paper
•
2601.06788
•
Published
•
3
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale
Paper
•
2601.08225
•
Published
•
50
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking
Paper
•
2601.06487
•
Published
•
51
On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training
Paper
•
2601.07389
•
Published
•
2
MemoBrain: Executive Memory as an Agentic Brain for Reasoning
Paper
•
2601.08079
•
Published
•
37
MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences
Paper
•
2601.06789
•
Published
•
77
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents
Paper
•
2601.07264
•
Published
•
24
Parallel Context-of-Experts Decoding for Retrieval Augmented Generation
Paper
•
2601.08670
•
Published
•
19
Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization
Paper
•
2601.04582
•
Published
•
10
JudgeRLVR: Judge First, Generate Second for Efficient Reasoning
Paper
•
2601.08468
•
Published
•
6
EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs
Paper
•
2601.06786
•
Published
•
6
Controlled Self-Evolution for Algorithmic Code Optimization
Paper
•
2601.07348
•
Published
•
112
MAXS: Meta-Adaptive Exploration with LLM Agents
Paper
•
2601.09259
•
Published
•
94
EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines
Paper
•
2601.09465
•
Published
•
40
OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG
Paper
•
2601.09028
•
Published
•
33
ExpSeek: Self-Triggered Experience Seeking for Web Agents
Paper
•
2601.08605
•
Published
•
16
Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models
Paper
•
2601.08955
•
Published
•
13
No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning
Paper
•
2601.06794
•
Published
•
4
The AI Hippocampus: How Far are We From Human Memory?
Paper
•
2601.09113
•
Published
•
5
DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing
Paper
•
2601.09609
•
Published
•
3
Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning
Paper
•
2601.09536
•
Published
•
5
SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning
Paper
•
2601.04809
•
Published
•
3
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper
•
2601.08763
•
Published
•
141
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
Paper
•
2601.09667
•
Published
•
82
Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning
Paper
•
2601.07641
•
Published
•
45
Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering
Paper
•
2601.10402
•
Published
•
36
MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching
Paper
•
2601.10712
•
Published
•
24
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning
Paper
•
2601.10129
•
Published
•
11
PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution
Paper
•
2601.10657
•
Published
•
20
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following
Paper
•
2601.06431
•
Published
•
12
PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary
Paper
•
2601.10201
•
Published
•
8
Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale
Paper
•
2601.10338
•
Published
•
5
Memory Bank Compression for Continual Adaptation of Large Language Models
Paper
•
2601.00756
•
Published
•
2
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
Paper
•
2601.09088
•
Published
•
59
Your Group-Relative Advantage Is Biased
Paper
•
2601.08521
•
Published
•
145
The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents
Paper
•
2601.11496
•
Published
•
46
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text
Paper
•
2601.10355
•
Published
•
38
BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search
Paper
•
2601.11037
•
Published
•
17
ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection
Paper
•
2601.09195
•
Published
•
15
Reasoning Models Generate Societies of Thought
Paper
•
2601.10825
•
Published
•
12
PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records
Paper
•
2601.09636
•
Published
•
8
Language of Thought Shapes Output Diversity in Large Language Models
Paper
•
2601.11227
•
Published
•
9
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge
Paper
•
2601.08808
•
Published
•
38
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper
•
2601.11004
•
Published
•
29
Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs
Paper
•
2601.11061
•
Published
•
7
YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation
Paper
•
2601.08441
•
Published
•
7
CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion
Paper
•
2601.09512
•
Published
•
4
Think3D: Thinking with Space for Spatial Reasoning
Paper
•
2601.13029
•
Published
•
45
Toward Efficient Agents: Memory, Tool learning, and Planning
Paper
•
2601.14192
•
Published
•
49
FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs
Paper
•
2601.13836
•
Published
•
34
DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution
Paper
•
2601.13761
•
Published
•
15
ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents
Paper
•
2601.12294
•
Published
•
17
Aligning Agentic World Models via Knowledgeable Experience Learning
Paper
•
2601.13247
•
Published
•
15
Agentic-R: Learning to Retrieve for Agentic Search
Paper
•
2601.11888
•
Published
•
19
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
Paper
•
2601.14249
•
Published
•
8
InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning
Paper
•
2601.14209
•
Published
•
5
Uncertainty-Aware Gradient Signal-to-Noise Data Selection for Instruction Tuning
Paper
•
2601.13697
•
Published
•
3
Agentic Reasoning for Large Language Models
Paper
•
2601.12538
•
Published
•
180
Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance
Paper
•
2601.14171
•
Published
•
47
Behavior Knowledge Merge in Reinforced Agentic Models
Paper
•
2601.13572
•
Published
•
23
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
Paper
•
2601.14750
•
Published
•
16
Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics
Paper
•
2601.14027
•
Published
•
12
Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models
Paper
•
2601.14152
•
Published
•
4
The Responsibility Vacuum: Organizational Failure in Scaled Agent Systems
Paper
•
2601.15059
•
Published
•
3
Facilitating Proactive and Reactive Guidance for Decision Making on the Web: A Design Probe with WebSeek
Paper
•
2601.15100
•
Published
•
3
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience
Paper
•
2601.15876
•
Published
•
88
LLM-in-Sandbox Elicits General Agentic Intelligence
Paper
•
2601.16206
•
Published
•
75
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models
Paper
•
2601.15224
•
Published
•
12
Agentic Uncertainty Quantification
Paper
•
2601.15703
•
Published
•
8
Agentic Confidence Calibration
Paper
•
2601.15778
•
Published
•
5
From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models
Paper
•
2601.15690
•
Published
•
4