reasoning
updated
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
•
2502.02737
•
Published
•
253
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper
•
2502.03373
•
Published
•
58
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper
•
2501.12599
•
Published
•
126
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
•
2501.17161
•
Published
•
123
s1: Simple test-time scaling
Paper
•
2501.19393
•
Published
•
124
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
•
2501.04682
•
Published
•
99
Language Models are Hidden Reasoners: Unlocking Latent Reasoning
Capabilities via Self-Rewarding
Paper
•
2411.04282
•
Published
•
37
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement
Learning
Paper
•
2502.14768
•
Published
•
47
LIMO: Less is More for Reasoning
Paper
•
2502.03387
•
Published
•
62
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
•
2501.09686
•
Published
•
41
LIMA: Less Is More for Alignment
Paper
•
2305.11206
•
Published
•
26
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
•
2501.07301
•
Published
•
99
Let's Verify Math Questions Step by Step
Paper
•
2505.13903
•
Published
•
2