RLHN: Cleaned Training Datasets with False Negatives Identified & Relabeled as ground truth.
AI & ML interests
None defined yet.
Recent Activity
View all activity
Organization Card
Welcome to RLHN (EMNLP 2025 Findings)
RLHN (ReLabeing Hard Negatives) uses a cascading LLM framework to identify and relabel false negatives in IR training datasets.
This repository contains training datasets curated by RLHN & models fine-tuned on these curated datasets.
List of Contributors:
- Nandan Thakur*
- Crystina Zhang*
- Xueguang Ma
- Jimmy Lin
Paper URL: https://aclanthology.org/2025.findings-emnlp.481/
Citation
@inproceedings{thakur-etal-2025-hard,
title = "Hard Negatives, Hard Lessons: Revisiting Training Data Quality for Robust Information Retrieval with {LLM}s",
author = "Thakur, Nandan and
Zhang, Crystina and
Ma, Xueguang and
Lin, Jimmy",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-emnlp.481/",
doi = "10.18653/v1/2025.findings-emnlp.481",
pages = "9064--9083",
ISBN = "979-8-89176-335-7",
abstract = "Training robust retrieval and reranker models typically relies on large-scale retrieval datasets; for example, the BGE collection contains 1.6 million query-passage pairs sourced from various data sources. However, we find that certain datasets can negatively impact model effectiveness {---} pruning 8 out of 15 datasets from the BGE collection, reduces the training set size by 2.35{\texttimes}, surprisingly increases nDCG@10 on BEIR by 1.0 point. This motivates a deeper examination of training data quality, with a particular focus on ``false negatives'', where relevant passages are incorrectly labeled as irrelevant. We utilize LLMs as a simple, cost-effective approach to \textit{identify} and \textit{relabel} false negatives in training datasets. Experimental results show that relabeling false negatives as true positives improves both E5 (base) and Qwen2.5-7B retrieval models by 0.7-1.4 points on BEIR and by 1.7-1.8 points at nDCG@10 on zero-shot AIR-Bench evaluation. Similar gains are observed for rerankers fine-tuned on the relabeled data, such as Qwen2.5-3B on BEIR. The reliability of LLMs to identify false negatives is supported by human annotation results. Our training dataset and code are publicly available."
}
models
31
rlhn/Qwen2.5-7B-hn-remove-400K
Updated
rlhn/Qwen2.5-7B-default-400K
Updated
rlhn/Qwen2.5-3B-rlhn-680K-reranker
Updated
rlhn/Qwen2.5-3B-hn-remove-680K-reranker
Updated
rlhn/Qwen2.5-3B-rlhn-400K-reranker
Updated
rlhn/Qwen2.5-3B-rlhn-100K-reranker
Updated
rlhn/Qwen2.5-3B-default-680K-reranker
Updated
rlhn/Qwen2.5-3B-default-400K-reranker
Updated
rlhn/Qwen2.5-3B-default-250K-reranker
Updated
rlhn/Qwen2.5-3B-default-100K-reranker
Updated
datasets
20
rlhn/default-680K-bge-reranker-v2-gemma
Viewer
•
Updated
•
680k
•
97
•
1
rlhn/default-680K-mxbai-rerank-large-v2
Viewer
•
Updated
•
680k
•
112
•
2
rlhn/remove-100K
Viewer
•
Updated
•
61k
•
44
rlhn/remove-250K
Viewer
•
Updated
•
151k
•
55
rlhn/remove-400K
Viewer
•
Updated
•
248k
•
29
rlhn/remove-680K
Viewer
•
Updated
•
324k
•
43
rlhn/hn-remove-250K
Viewer
•
Updated
•
247k
•
44
rlhn/hn-remove-100K
Viewer
•
Updated
•
93.3k
•
34
rlhn/hn-remove-680K
Viewer
•
Updated
•
649k
•
23
rlhn/hn-remove-400K
Viewer
•
Updated
•
389k
•
24