erata/Qwen3-4B-dimacs_cube-grpo-v1-reasoning-dpo-8k-v3 Text Generation • 4B • Updated Sep 11, 2025 • 5