Decoupling the Benefits of Subword Tokenization for Language Model Training via Byte-level Simulation Paper • 2604.27263 • Published 9 days ago • 6
Targeted Neuron Modulation via Contrastive Pair Search Paper • 2605.12290 • Published 11 days ago • 13
YaRN: Efficient Context Window Extension of Large Language Models Paper • 2309.00071 • Published Aug 31, 2023 • 83