Built by Forward AI Labs
We are an AI company that provides recruitment agents. | mira.day
MQR-A1: Mira Query Rewriter — Alignment v1
MQR-A1 (Mira Query Rewriter, Alignment v1) is a GRPO-aligned query rewriting model designed for reasoning-intensive retrieval tasks. It is the alignment component of our retrieval pipeline: by rewriting queries to distill core retrieval intents and filter misleading noise, MQR-A1 enables the downstream retriever MRE-T1 to achieve state-of-the-art performance.
Combined as the MQR-A1 + MRE-T1 pipeline, our system achieves No. 1 on the BRIGHT Benchmark across all evaluated dimensions — including both short and long document tracks — outperforming sophisticated rerankers, existing alignment models, and complex agentic pipelines.
Highlights
- BRIGHT Short Pipeline nDCG@10: 66.9 — No. 1 on the short document retrieval leaderboard
- BRIGHT Long Pipeline nDCG@10: 56.0 — No. 1 on the long document retrieval leaderboard
- Intent Distillation over Query Expansion: Shifts the paradigm from simple additive expansion to RL-driven discriminative feature extraction
- Immune to Semantic Traps: Strongly filters out redundant or misleading superficial semantic noise in complex reasoning tasks
- SNR Enhancement: Significantly improves the signal-to-noise ratio of retrieval signals, mitigating the "feature dilution" effect under long-text inputs
Training Methodology
MQR-A1 is trained using a three-stage approach:
1. Candidate Rewrite Mining
To prevent the model from converging on rigid, template-based shortcuts during SFT, we implemented a heterogeneous candidate rewrite mining strategy. For every query, we dynamically synthesized a diverse set of natural-language-structured rewrites, forcing the model to prioritize underlying retrieval intent over superficial syntactic patterns.
2. Cold Start (SFT)
Traditional cross-entropy loss is inherently misaligned with retrieval objectives. By injecting natural language structural features derived from the mining stage, we equip the base model with strong discriminative feature extraction capabilities, laying a stable initialization foundation for the subsequent GRPO phase.
3. GRPO-Driven Intent Alignment
Unlike DPO, which relies on static preference data, GRPO (Group Relative Policy Optimization) enables the model to engage in interactive learning via retrieval feedback within the actual document corpus environment. This allows the model to autonomously explore and extract highly discriminative features, achieving a fundamental leap from superficial "textual matching" to deep "intent alignment."
Multi-Dimensional Reward Function:
| Component | Description |
|---|---|
| Primary Reward | Dense retrieval NDCG scores |
| Constraint | Length penalty to prevent feature dilution |
| Reward Shaping | Cosine similarity with positive examples to maintain semantic grounding |
BRIGHT Benchmark Results (Pipeline: MQR-A1 + MRE-T1)
Short Document Retrieval (nDCG@10)
| Task | MQR-A1 + MRE-T1 |
|---|---|
| Biology | 86.7 |
| Earth Science | 78.5 |
| Economics | 69.7 |
| Psychology | 78.2 |
| Robotics | 58.4 |
| StackOverflow | 67.0 |
| Sustainable Living | 65.9 |
| LeetCode | 46.8 |
| Pony | 73.4 |
| AoPS | 45.2 |
| TheoremQA (Questions) | 60.6 |
| TheoremQA (Theorems) | 72.3 |
| Average | 66.9 |
Long Document Retrieval (nDCG@10)
| Task | MQR-A1 + MRE-T1 |
|---|---|
| Biology | 77.1 |
| Earth Science | 59.0 |
| Economics | 71.2 |
| Psychology | 73.8 |
| Robotics | 46.0 |
| StackOverflow | 35.5 |
| Sustainable Living | 70.6 |
| Pony | 14.6 |
| Average | 56.0 |
Comparison with Other Retrieval Pipelines (Short Documents)
| Pipeline | Avg nDCG@10 |
|---|---|
| MQR-A1 + MRE-T1 | 66.9 |
| INF-X-Retriever | 63.4 |
| RakanEmb4B | 52.4 |
| Nemo Retriever's Agentic Retrieval | 50.9 |
| DIVER-v3-GroupRank | 46.8 |
| BGE-Reasoner-0928 | 46.4 |
| Lattice Hierarchical Retrieval | 42.1 |
Comparison with Other Retrieval Pipelines (Long Documents)
| Pipeline | Avg nDCG@10 |
|---|---|
| MQR-A1 + MRE-T1 | 56.0 |
| INF-X-Retriever | 54.6 |
Usage
MQR-A1 rewrites user queries into intent-distilled versions optimized for dense retrieval with MRE-T1. The rewritten query preserves core retrieval signals while removing misleading surface-level noise.
Recommended Pipeline:
- Pass the raw query through MQR-A1 to obtain an intent-aligned rewrite
- Use the rewritten query with MRE-T1 for dense retrieval
Related Models
| Model | Description | Link |
|---|---|---|
| MRE-T1 | Reasoning-enhanced retriever (Mira Recruitment Embedding, Thought v1) | ForwardAILabs/MRE-T1 |
Citation
If you use MQR-A1 in your research, please cite:
@misc{mqr-a1-2026,
title={MQR-A1: GRPO-Aligned Query Rewriter for Reasoning-Intensive Retrieval},
author={Forward AI Labs},
year={2026},
url={https://huggingface.co/ForwardAILabs/MQR-A1}
}
License
Apache 2.0
Built by Forward AI Labs | mira.day
Model tree for ForwardAILabs/MQR-A1
Base model
Qwen/Qwen3-4B-Instruct-2507Dataset used to train ForwardAILabs/MQR-A1
Evaluation results
- nDCG@10 on BRIGHT (Short, Pipeline)self-reported66.900
- nDCG@10 on BRIGHT (Long, Pipeline)self-reported56.000