RLCR - a mehuldamani Collection

mehuldamani 's Collections

RLCR

updated Aug 6, 2025

Collection of models and datasets for Beyond Binary Rewards: Training LMs to Reason about their Uncertainty

mehuldamani/big-math-digits-v2-correctness

Text Generation • 8B • Updated Jun 25, 2025 • 12
mehuldamani/hotpot-v2-correctness-7b

Text Generation • 8B • Updated Jul 29, 2025
mehuldamani/orm-big-math-digits-v2-correctness

Text Classification • 7B • Updated Jul 8, 2025 • 10
mehuldamani/big-math-digits-v2-brier

8B • Updated Aug 4, 2025 • 11
mehuldamani/big-math-digits

Viewer • Updated Aug 5, 2025 • 31k • 80
mehuldamani/hotpot_qa

Viewer • Updated Aug 5, 2025 • 20.5k • 46
mehuldamani/hotpot-v2-brier-7b-no-split

Text Generation • 8B • Updated Jun 5, 2025 • 1
mehuldamani/big-math-digits-v2-brier-base-tabc

Text Generation • 8B • Updated Jun 28, 2025 • 5
mehuldamani/orm-hotpot-v2-final-correctness

Text Classification • 7B • Updated Jun 9, 2025
mehuldamani/qwen-base-verifier-sft-v1

Text Generation • 8B • Updated Jun 13, 2025 • 298