TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration Paper • 2604.14116 • Published 2 days ago • 9
Exploration and Exploitation Errors Are Measurable for Language Model Agents Paper • 2604.13151 • Published 3 days ago • 21
From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space Paper • 2604.14142 • Published 2 days ago • 23
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models Paper • 2601.18734 • Published Jan 26 • 5
Towards Active Synthetic Data Generation for Finetuning Language Models Paper • 2512.00884 • Published Nov 30, 2025 • 1
Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities Paper • 2501.12147 • Published Jan 21, 2025 • 1
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation Paper • 2402.18191 • Published Feb 28, 2024 • 1
SCAR: Efficient Instruction-Tuning for Large Language Models via Style Consistency-Aware Response Ranking Paper • 2406.10882 • Published Jun 16, 2024 • 2
LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning Paper • 2505.07437 • Published May 12, 2025 • 1
The Best Instruction-Tuning Data are Those That Fit Paper • 2502.04194 • Published Feb 6, 2025 • 2
BARE: Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation Paper • 2502.01697 • Published Feb 3, 2025 • 1
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20, 2024 • 51
Beyond Random Sampling: Efficient Language Model Pretraining via Curriculum Learning Paper • 2506.11300 • Published Jun 12, 2025 • 2
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining Paper • 2305.10429 • Published May 17, 2023 • 5
Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness Paper • 2604.12373 • Published 3 days ago • 7
Accelerating Speculative Decoding with Block Diffusion Draft Trees Paper • 2604.12989 • Published 3 days ago • 5
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation Paper • 2604.09497 • Published 7 days ago • 26