GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling Paper • 2604.18556 • Published Apr 20 • 7
GSQ Collection GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling, https://huggingface.co/papers/2604.18556 • 9 items • Updated 6 days ago • 5
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift • Apr 2 • 903
NVIDIA Nemotron v3 Collection Open, Production-ready Enterprise Models • 18 items • Updated 2 days ago • 298
MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization Paper • 2602.03537 • Published Feb 3 • 5
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 173