Collections
Discover the best community collections!
Collections including paper arxiv:2501.19399
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 24 -
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation
Paper • 2502.01068 • Published • 18 -
Scaling Embedding Layers in Language Models
Paper • 2502.01637 • Published • 24
-
Agent Workflow Memory
Paper • 2409.07429 • Published • 32 -
MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis
Paper • 2409.07129 • Published • 8 -
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Paper • 2409.04593 • Published • 26 -
Imagine yourself: Tuning-Free Personalized Image Generation
Paper • 2409.13346 • Published • 70
-
Large Language Models Think Too Fast To Explore Effectively
Paper • 2501.18009 • Published • 24 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 124 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 24 -
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Paper • 2502.20545 • Published • 22
-
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 28 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 24 -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 8
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33
-
Large Language Models Think Too Fast To Explore Effectively
Paper • 2501.18009 • Published • 24 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 124 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 24 -
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Paper • 2502.20545 • Published • 22
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 24 -
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation
Paper • 2502.01068 • Published • 18 -
Scaling Embedding Layers in Language Models
Paper • 2502.01637 • Published • 24
-
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 28 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 24 -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 8
-
Agent Workflow Memory
Paper • 2409.07429 • Published • 32 -
MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis
Paper • 2409.07129 • Published • 8 -
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Paper • 2409.04593 • Published • 26 -
Imagine yourself: Tuning-Free Personalized Image Generation
Paper • 2409.13346 • Published • 70
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33