We are an AI company that provides recruitment agents. | mira.day

MQR-A1: Mira Query Rewriter — Alignment v1

MQR-A1 (Mira Query Rewriter, Alignment v1) is a GRPO-aligned query rewriting model designed for reasoning-intensive retrieval tasks. It is the alignment component of our retrieval pipeline: by rewriting queries to distill core retrieval intents and filter misleading noise, MQR-A1 enables the downstream retriever MRE-T1 to achieve state-of-the-art performance.

Combined as the MQR-A1 + MRE-T1 pipeline, our system achieves No. 1 on the BRIGHT Benchmark across all evaluated dimensions — including both short and long document tracks — outperforming sophisticated rerankers, existing alignment models, and complex agentic pipelines.

Highlights

BRIGHT Short Pipeline nDCG@10: 66.9 — No. 1 on the short document retrieval leaderboard
BRIGHT Long Pipeline nDCG@10: 56.0 — No. 1 on the long document retrieval leaderboard
Intent Distillation over Query Expansion: Shifts the paradigm from simple additive expansion to RL-driven discriminative feature extraction
Immune to Semantic Traps: Strongly filters out redundant or misleading superficial semantic noise in complex reasoning tasks
SNR Enhancement: Significantly improves the signal-to-noise ratio of retrieval signals, mitigating the "feature dilution" effect under long-text inputs

Training Methodology

MQR-A1 is trained using a three-stage approach:

1. Candidate Rewrite Mining

To prevent the model from converging on rigid, template-based shortcuts during SFT, we implemented a heterogeneous candidate rewrite mining strategy. For every query, we dynamically synthesized a diverse set of natural-language-structured rewrites, forcing the model to prioritize underlying retrieval intent over superficial syntactic patterns.

2. Cold Start (SFT)

Traditional cross-entropy loss is inherently misaligned with retrieval objectives. By injecting natural language structural features derived from the mining stage, we equip the base model with strong discriminative feature extraction capabilities, laying a stable initialization foundation for the subsequent GRPO phase.

3. GRPO-Driven Intent Alignment

Unlike DPO, which relies on static preference data, GRPO (Group Relative Policy Optimization) enables the model to engage in interactive learning via retrieval feedback within the actual document corpus environment. This allows the model to autonomously explore and extract highly discriminative features, achieving a fundamental leap from superficial "textual matching" to deep "intent alignment."

Multi-Dimensional Reward Function:

Component	Description
Primary Reward	Dense retrieval NDCG scores
Constraint	Length penalty to prevent feature dilution
Reward Shaping	Cosine similarity with positive examples to maintain semantic grounding

BRIGHT Benchmark Results (Pipeline: MQR-A1 + MRE-T1)

Short Document Retrieval (nDCG@10)

Task	MQR-A1 + MRE-T1
Biology	86.7
Earth Science	78.5
Economics	69.7
Psychology	78.2
Robotics	58.4
StackOverflow	67.0
Sustainable Living	65.9
LeetCode	46.8
Pony	73.4
AoPS	45.2
TheoremQA (Questions)	60.6
TheoremQA (Theorems)	72.3
Average	66.9

Long Document Retrieval (nDCG@10)

Task	MQR-A1 + MRE-T1
Biology	77.1
Earth Science	59.0
Economics	71.2
Psychology	73.8
Robotics	46.0
StackOverflow	35.5
Sustainable Living	70.6
Pony	14.6
Average	56.0

Comparison with Other Retrieval Pipelines (Short Documents)

Pipeline	Avg nDCG@10
MQR-A1 + MRE-T1	66.9
INF-X-Retriever	63.4
RakanEmb4B	52.4
Nemo Retriever's Agentic Retrieval	50.9
DIVER-v3-GroupRank	46.8
BGE-Reasoner-0928	46.4
Lattice Hierarchical Retrieval	42.1

Comparison with Other Retrieval Pipelines (Long Documents)

Pipeline	Avg nDCG@10
MQR-A1 + MRE-T1	56.0
INF-X-Retriever	54.6

Usage

MQR-A1 rewrites user queries into intent-distilled versions optimized for dense retrieval with MRE-T1. The rewritten query preserves core retrieval signals while removing misleading surface-level noise.

Recommended Pipeline:

Pass the raw query through MQR-A1 to obtain an intent-aligned rewrite
Use the rewritten query with MRE-T1 for dense retrieval

Related Models

Model	Description	Link
MRE-T1	Reasoning-enhanced retriever (Mira Recruitment Embedding, Thought v1)	ForwardAILabs/MRE-T1

Citation

If you use MQR-A1 in your research, please cite:

@misc{mqr-a1-2026,
  title={MQR-A1: GRPO-Aligned Query Rewriter for Reasoning-Intensive Retrieval},
  author={Forward AI Labs},
  year={2026},
  url={https://huggingface.co/ForwardAILabs/MQR-A1}
}

License

Apache 2.0

Built by Forward AI Labs | mira.day

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for ForwardAILabs/MQR-A1

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1536)

this model

Dataset used to train ForwardAILabs/MQR-A1

Evaluation results

nDCG@10 on BRIGHT (Short, Pipeline)
self-reported

66.900
nDCG@10 on BRIGHT (Long, Pipeline)
self-reported

56.000