ICLR 2026 — Outstanding Paper
Q-RAG Rewires Retrieval With Reinforcement Learning — 10M-Token Contexts, No Degradation
A value-based RL approach to training text-chunk embedders achieves state-of-the-art performance on BabiLong and RULER at scales from 1 million to 10 million tokens — pushing RAG into a regime where context length is no longer the bottleneck.
The International Conference on Learning Representations named Q-RAG an Outstanding Paper on Sunday, Day 3 of ICLR 2026, in recognition of work that recasts retrieval-augmented generation as a reinforcement learning problem rather than a supervised embedding task. Where conventional RAG systems train chunk embedders on static relevance labels, Q-RAG uses value-based RL to teach the retriever to reason across multiple retrieval steps — asking, in effect, which chunk is most useful given what the model already knows.
The practical result is remarkable: Q-RAG achieves state-of-the-art performance on BabiLong and RULER, two benchmarks specifically designed to stress multi-hop reasoning over very long contexts, at scales ranging from 1 million to 10 million tokens. Crucially, accuracy on the hardest three-hop temporal reasoning tasks shows virtually no degradation as context length grows from 1M to 10M. Existing RAG and long-context models degrade measurably well before they hit the 1M boundary; Q-RAG’s RL-trained retriever effectively sidesteps the problem by never loading irrelevant chunks in the first place.
The implications run well beyond benchmark leaderboards. Enterprise knowledge bases, legal document corpora, and scientific literature collections routinely exceed 10 million tokens in practice. Q-RAG suggests that the engineering response to “the context window is too small” need not be simply a larger context window — a smarter retriever, trained with the right objective, may be the more efficient path. Program chairs at ICLR cited the paper’s combination of theoretical clarity and strong empirical validation across multiple long-context regimes as decisive in awarding it Outstanding Paper status.