ValueFX9507/Tifa-Deepsex-14b-CoT-GGUF-Q4 Reinforcement Learning • 15B • Updated Feb 13, 2025 • 2.18k • 818
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 123
mistral-community/Mixtral-8x22B-Instruct-v0.1-4bit Text Generation • 143B • Updated Jul 1, 2024 • 27 • 11