Built by Forward AI Labs

We are an AI company that provides recruitment agents. | mira.day


MQR-A1: Mira Query Rewriter — Alignment v1

MQR-A1 (Mira Query Rewriter, Alignment v1) is a GRPO-aligned query rewriting model designed for reasoning-intensive retrieval tasks. It is the alignment component of our retrieval pipeline: by rewriting queries to distill core retrieval intents and filter misleading noise, MQR-A1 enables the downstream retriever MRE-T1 to achieve state-of-the-art performance.

Combined as the MQR-A1 + MRE-T1 pipeline, our system achieves No. 1 on the BRIGHT Benchmark across all evaluated dimensions — including both short and long document tracks — outperforming sophisticated rerankers, existing alignment models, and complex agentic pipelines.

Highlights

  • BRIGHT Short Pipeline nDCG@10: 66.9 — No. 1 on the short document retrieval leaderboard
  • BRIGHT Long Pipeline nDCG@10: 56.0 — No. 1 on the long document retrieval leaderboard
  • Intent Distillation over Query Expansion: Shifts the paradigm from simple additive expansion to RL-driven discriminative feature extraction
  • Immune to Semantic Traps: Strongly filters out redundant or misleading superficial semantic noise in complex reasoning tasks
  • SNR Enhancement: Significantly improves the signal-to-noise ratio of retrieval signals, mitigating the "feature dilution" effect under long-text inputs

Training Methodology

MQR-A1 is trained using a three-stage approach:

1. Candidate Rewrite Mining

To prevent the model from converging on rigid, template-based shortcuts during SFT, we implemented a heterogeneous candidate rewrite mining strategy. For every query, we dynamically synthesized a diverse set of natural-language-structured rewrites, forcing the model to prioritize underlying retrieval intent over superficial syntactic patterns.

2. Cold Start (SFT)

Traditional cross-entropy loss is inherently misaligned with retrieval objectives. By injecting natural language structural features derived from the mining stage, we equip the base model with strong discriminative feature extraction capabilities, laying a stable initialization foundation for the subsequent GRPO phase.

3. GRPO-Driven Intent Alignment

Unlike DPO, which relies on static preference data, GRPO (Group Relative Policy Optimization) enables the model to engage in interactive learning via retrieval feedback within the actual document corpus environment. This allows the model to autonomously explore and extract highly discriminative features, achieving a fundamental leap from superficial "textual matching" to deep "intent alignment."

Multi-Dimensional Reward Function:

Component Description
Primary Reward Dense retrieval NDCG scores
Constraint Length penalty to prevent feature dilution
Reward Shaping Cosine similarity with positive examples to maintain semantic grounding

BRIGHT Benchmark Results (Pipeline: MQR-A1 + MRE-T1)

Short Document Retrieval (nDCG@10)

Task MQR-A1 + MRE-T1
Biology 86.7
Earth Science 78.5
Economics 69.7
Psychology 78.2
Robotics 58.4
StackOverflow 67.0
Sustainable Living 65.9
LeetCode 46.8
Pony 73.4
AoPS 45.2
TheoremQA (Questions) 60.6
TheoremQA (Theorems) 72.3
Average 66.9

Long Document Retrieval (nDCG@10)

Task MQR-A1 + MRE-T1
Biology 77.1
Earth Science 59.0
Economics 71.2
Psychology 73.8
Robotics 46.0
StackOverflow 35.5
Sustainable Living 70.6
Pony 14.6
Average 56.0

Comparison with Other Retrieval Pipelines (Short Documents)

Pipeline Avg nDCG@10
MQR-A1 + MRE-T1 66.9
INF-X-Retriever 63.4
RakanEmb4B 52.4
Nemo Retriever's Agentic Retrieval 50.9
DIVER-v3-GroupRank 46.8
BGE-Reasoner-0928 46.4
Lattice Hierarchical Retrieval 42.1

Comparison with Other Retrieval Pipelines (Long Documents)

Pipeline Avg nDCG@10
MQR-A1 + MRE-T1 56.0
INF-X-Retriever 54.6

Usage

MQR-A1 rewrites user queries into intent-distilled versions optimized for dense retrieval with MRE-T1. The rewritten query preserves core retrieval signals while removing misleading surface-level noise.

Recommended Pipeline:

  1. Pass the raw query through MQR-A1 to obtain an intent-aligned rewrite
  2. Use the rewritten query with MRE-T1 for dense retrieval

Related Models

Model Description Link
MRE-T1 Reasoning-enhanced retriever (Mira Recruitment Embedding, Thought v1) ForwardAILabs/MRE-T1

Citation

If you use MQR-A1 in your research, please cite:

@misc{mqr-a1-2026,
  title={MQR-A1: GRPO-Aligned Query Rewriter for Reasoning-Intensive Retrieval},
  author={Forward AI Labs},
  year={2026},
  url={https://huggingface.co/ForwardAILabs/MQR-A1}
}

License

Apache 2.0


Built by Forward AI Labs | mira.day

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ForwardAILabs/MQR-A1

Finetuned
(1536)
this model

Dataset used to train ForwardAILabs/MQR-A1

Evaluation results