Papers
arxiv:2512.24601

Recursive Language Models

Published on Dec 31, 2025
· Submitted by
Rajkumar rawal
on Jan 6
Authors:
,
,

Abstract

We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference strategy that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. We find that RLMs successfully handle inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of base LLMs and common long-context scaffolds across four diverse long-context tasks, while having comparable (or cheaper) cost per query.

Community

Study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. They propose Recursive Language Models (RLMs), a general inference strategy that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. They find that RLMs successfully handle inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of base LLMs and common long-context scaffolds across four diverse long-context tasks, while having comparable (or cheaper) cost per query.

Some of the observations they found are :-
-- LLMs interacting with their own prompts as objects.

-- In their approach, a prompt isn’t “run” directly, instead it’s stored as a variable in an external Python REPL, and the language model writes code to inspect /slice/ decompose that long string, observes execution outputs, and then constructs sub-tasks where it recursively invokes an LLM on just the relevant snippets. Stitching the result together when the recursive process ends. So it can solve 10M+ token tasks with far less “context rot” and often lower cost than summarization/RAG, turning long-context scaling into an inference-time algorithm rather than just a bigger context window.

-- The ability to search the Prompt is what enables handling long context inputs, sub calls help handle information dense inputs.

-- Inference cost of RLMs remain comparable to a base model call but are high variance because it can keep making sub-calls or iterate if it can't solve the problem initially.

-- The key insight is that long prompts should not be fed into the LLM directly, but should instead be treated as part of the environment that the LLM can search, read and interact with as needed for the task.

arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/recursive-language-models-6610-16b3d94b

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Not bad. Almost there. If you make it a graph though, of symbolic concepts you can create something much larger.

https://signal-zero.ai

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.24601 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.24601 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.24601 in a Space README.md to link it from this page.

Collections including this paper 10