Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2408.00118

Toolkit - AI Papers

Neural Machine Translation by Jointly Learning to Align and Translate

Paper • 1409.0473 • Published Sep 1, 2014 • 7
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 108
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 25
Hierarchical Reasoning Model

Paper • 2506.21734 • Published Jun 26, 2025 • 46

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 78

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 78

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27, 2024 • 144
Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 140
Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published Sep 4, 2024 • 72
Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5, 2024 • 92

foundation_models

Apple Intelligence Foundation Language Models

Paper • 2407.21075 • Published Jul 29, 2024 • 5
The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 117
Nemotron-4 340B Technical Report

Paper • 2406.11704 • Published Jun 17, 2024
Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 78

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 78

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263
A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8, 2025 • 93
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 18
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 15

SLM e Moe structure PHD tesis: SOTA e valutazione parametri

collezione di paper utili per redazione tesi 1-2-3- capitolo da valutare cambio di rotta e gestione PHD

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1, 2025 • 109
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Paper • 2501.01257 • Published Jan 2, 2025 • 51
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Paper • 2501.01423 • Published Jan 2, 2025 • 44
REDUCIO! Generating 1024times1024 Video within 16 Seconds using Extremely Compressed Motion Latents

Paper • 2411.13552 • Published Nov 20, 2024

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28, 2024 • 38
Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22, 2024 • 65
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 45
Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15, 2024 • 40

Most interesting Papers

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 78
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 120
The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 117

Toolkit - AI Papers

Neural Machine Translation by Jointly Learning to Align and Translate

Paper • 1409.0473 • Published Sep 1, 2014 • 7
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 108
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 25
Hierarchical Reasoning Model

Paper • 2506.21734 • Published Jun 26, 2025 • 46

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 78

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 78

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263
A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8, 2025 • 93
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 18
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 15

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 78

SLM e Moe structure PHD tesis: SOTA e valutazione parametri

collezione di paper utili per redazione tesi 1-2-3- capitolo da valutare cambio di rotta e gestione PHD

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1, 2025 • 109
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Paper • 2501.01257 • Published Jan 2, 2025 • 51
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Paper • 2501.01423 • Published Jan 2, 2025 • 44
REDUCIO! Generating 1024times1024 Video within 16 Seconds using Extremely Compressed Motion Latents

Paper • 2411.13552 • Published Nov 20, 2024

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27, 2024 • 144
Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 140
Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published Sep 4, 2024 • 72
Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5, 2024 • 92

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28, 2024 • 38
Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22, 2024 • 65
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 45
Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15, 2024 • 40

foundation_models

Apple Intelligence Foundation Language Models

Paper • 2407.21075 • Published Jul 29, 2024 • 5
The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 117
Nemotron-4 340B Technical Report

Paper • 2406.11704 • Published Jun 17, 2024
Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 78

Most interesting Papers

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 78
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 120
The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 117

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs