thomasjhuang
/

qwen2-rloo-countdown-step250

Text Generation

reinforcement-learning

Model card Files Files and versions

qwen2-rloo-countdown-step250

Commit History

Add model card with training details

08251c0
verified

thomasjhuang commited on Jun 10, 2025

RLOO checkpoint at optimizer step 250 - Fixed prompt format, temp=0.1, lr=3e-6

e4ad155
verified

thomasjhuang commited on Jun 10, 2025

RLOO checkpoint at optimizer step 250 - Fixed prompt format, temp=0.1, lr=3e-6

4b722d3
verified

thomasjhuang commited on Jun 10, 2025

initial commit

0660638
verified

thomasjhuang commited on Jun 10, 2025