Papers
arxiv:2601.08670

Parallel Context-of-Experts Decoding for Retrieval Augmented Generation

Published on Jan 13
· Submitted by
Giulio Corallo
on Jan 14
Authors:

Abstract

Parallel Context-of-Experts Decoding enables multi-document reasoning in retrieval-augmented generation by treating retrieved documents as isolated experts and synchronizing predictions through a retrieval-aware contrastive decoding rule.

AI-generated summary

Retrieval Augmented Generation faces a trade-off: concatenating documents in a long prompt enables multi-document reasoning but creates prefill bottlenecks, while encoding document KV caches separately offers speed but breaks cross-document interaction. We propose Parallel Context-of-Experts Decoding (Pced), a training-free framework that shifts evidence aggregation from the attention mechanism to the decoding. Pced treats retrieved documents as isolated "experts", synchronizing their predictions via a novel retrieval-aware contrastive decoding rule that weighs expert logits against the model prior. This approach recovers cross-document reasoning capabilities without constructing a shared attention across documents.

Community

Paper author Paper submitter

Parallel Context-of-Experts Decoding (Pced) speeds up RAG by decoding in parallel from per-document KV-cache “experts” and selecting retrieval-supported tokens to recover cross-document reasoning.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.08670 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.08670 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.08670 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.