thomasjhuang
/

qwen2-rloo-countdown-step250

Text Generation

reinforcement-learning

Model card Files Files and versions

qwen2-rloo-countdown-step250 / generation_config.json

thomasjhuang's picture

RLOO checkpoint at optimizer step 250 - Fixed prompt format, temp=0.1, lr=3e-6

4b722d3 verified 7 months ago

history blame contribute delete

117 Bytes

	{
	"bos_token_id": 151643,
	"eos_token_id": 151643,
	"max_new_tokens": 2048,
	"transformers_version": "4.52.4"
	}