qwen2-rloo-countdown-step250 / tokenizer_config.json

Commit History

RLOO checkpoint at optimizer step 250 - Fixed prompt format, temp=0.1, lr=3e-6
e4ad155
verified

thomasjhuang commited on