-
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Paper • 2310.19909 • Published • 21 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 37 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34
Collections
Discover the best community collections!
Collections including paper arxiv:2401.08565
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 22 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 58
-
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Paper • 2408.08152 • Published • 60 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 22 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 20
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 31 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 22 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
Contrastive Chain-of-Thought Prompting
Paper • 2311.09277 • Published • 36 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 14 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 44
-
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 111 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 82 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 22 -
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Paper • 2402.04833 • Published • 5
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 22 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 31 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 50
-
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper • 2401.01854 • Published • 11 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 55 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 27 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 82
-
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper • 2310.09199 • Published • 29 -
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 16 -
Personality Traits in Large Language Models
Paper • 2307.00184 • Published • 20 -
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Paper • 2310.12962 • Published • 13
-
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Paper • 2310.19909 • Published • 21 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 37 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34
-
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 111 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 82 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 22 -
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Paper • 2402.04833 • Published • 5
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 22 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 58
-
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Paper • 2408.08152 • Published • 60 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 22 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 20
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 22 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 31 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 50
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 31 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 22 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper • 2401.01854 • Published • 11 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 55 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 27 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 82
-
Contrastive Chain-of-Thought Prompting
Paper • 2311.09277 • Published • 36 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 14 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 44
-
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper • 2310.09199 • Published • 29 -
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 16 -
Personality Traits in Large Language Models
Paper • 2307.00184 • Published • 20 -
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Paper • 2310.12962 • Published • 13