Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis Paper • 2601.21709 • Published 21 days ago • 2
KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference Paper • 2502.04420 • Published Feb 6, 2025 • 1