-
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper ⢠2406.06525 ⢠Published ⢠71 -
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Paper ⢠2406.06469 ⢠Published ⢠29 -
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Paper ⢠2406.04271 ⢠Published ⢠30 -
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper ⢠2406.02657 ⢠Published ⢠41
Collections
Discover the best community collections!
Collections including paper arxiv:2405.17405
-
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper ⢠2402.17485 ⢠Published ⢠195 -
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior
Paper ⢠2312.01841 ⢠Published ⢠1 -
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Paper ⢠2311.16498 ⢠Published ⢠1 -
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Paper ⢠2312.02134 ⢠Published ⢠2
-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper ⢠2401.13601 ⢠Published ⢠48 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper ⢠2402.13232 ⢠Published ⢠16 -
Neural Network Diffusion
Paper ⢠2402.13144 ⢠Published ⢠99 -
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Paper ⢠2402.13251 ⢠Published ⢠15
-
DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors
Paper ⢠2312.16837 ⢠Published ⢠6 -
Learning the 3D Fauna of the Web
Paper ⢠2401.02400 ⢠Published ⢠11 -
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Paper ⢠2310.15110 ⢠Published ⢠3 -
Zero-1-to-3: Zero-shot One Image to 3D Object
Paper ⢠2303.11328 ⢠Published ⢠5
-
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer
Paper ⢠2405.17405 ⢠Published ⢠16 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper ⢠2405.18991 ⢠Published ⢠12 -
4Diffusion: Multi-view Video Diffusion Model for 4D Generation
Paper ⢠2405.20674 ⢠Published ⢠15 -
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Paper ⢠2406.07472 ⢠Published ⢠13
-
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
Paper ⢠2403.01807 ⢠Published ⢠9 -
TripoSR: Fast 3D Object Reconstruction from a Single Image
Paper ⢠2403.02151 ⢠Published ⢠16 -
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Paper ⢠2403.01779 ⢠Published ⢠30 -
MagicClay: Sculpting Meshes With Generative Neural Fields
Paper ⢠2403.02460 ⢠Published ⢠8
-
Seamless Human Motion Composition with Blended Positional Encodings
Paper ⢠2402.15509 ⢠Published ⢠15 -
TripoSR: Fast 3D Object Reconstruction from a Single Image
Paper ⢠2403.02151 ⢠Published ⢠16 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper ⢠2403.09631 ⢠Published ⢠11 -
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting
Paper ⢠2403.09981 ⢠Published ⢠8
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper ⢠2401.09985 ⢠Published ⢠18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper ⢠2401.09962 ⢠Published ⢠9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper ⢠2401.10404 ⢠Published ⢠10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper ⢠2401.10822 ⢠Published ⢠13
-
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Paper ⢠2306.07967 ⢠Published ⢠25 -
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Paper ⢠2306.07954 ⢠Published ⢠111 -
TryOnDiffusion: A Tale of Two UNets
Paper ⢠2306.08276 ⢠Published ⢠74 -
Seeing the World through Your Eyes
Paper ⢠2306.09348 ⢠Published ⢠33
-
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper ⢠2406.06525 ⢠Published ⢠71 -
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Paper ⢠2406.06469 ⢠Published ⢠29 -
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Paper ⢠2406.04271 ⢠Published ⢠30 -
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper ⢠2406.02657 ⢠Published ⢠41
-
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer
Paper ⢠2405.17405 ⢠Published ⢠16 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper ⢠2405.18991 ⢠Published ⢠12 -
4Diffusion: Multi-view Video Diffusion Model for 4D Generation
Paper ⢠2405.20674 ⢠Published ⢠15 -
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Paper ⢠2406.07472 ⢠Published ⢠13
-
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
Paper ⢠2403.01807 ⢠Published ⢠9 -
TripoSR: Fast 3D Object Reconstruction from a Single Image
Paper ⢠2403.02151 ⢠Published ⢠16 -
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Paper ⢠2403.01779 ⢠Published ⢠30 -
MagicClay: Sculpting Meshes With Generative Neural Fields
Paper ⢠2403.02460 ⢠Published ⢠8
-
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper ⢠2402.17485 ⢠Published ⢠195 -
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior
Paper ⢠2312.01841 ⢠Published ⢠1 -
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Paper ⢠2311.16498 ⢠Published ⢠1 -
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Paper ⢠2312.02134 ⢠Published ⢠2
-
Seamless Human Motion Composition with Blended Positional Encodings
Paper ⢠2402.15509 ⢠Published ⢠15 -
TripoSR: Fast 3D Object Reconstruction from a Single Image
Paper ⢠2403.02151 ⢠Published ⢠16 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper ⢠2403.09631 ⢠Published ⢠11 -
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting
Paper ⢠2403.09981 ⢠Published ⢠8
-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper ⢠2401.13601 ⢠Published ⢠48 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper ⢠2402.13232 ⢠Published ⢠16 -
Neural Network Diffusion
Paper ⢠2402.13144 ⢠Published ⢠99 -
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Paper ⢠2402.13251 ⢠Published ⢠15
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper ⢠2401.09985 ⢠Published ⢠18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper ⢠2401.09962 ⢠Published ⢠9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper ⢠2401.10404 ⢠Published ⢠10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper ⢠2401.10822 ⢠Published ⢠13
-
DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors
Paper ⢠2312.16837 ⢠Published ⢠6 -
Learning the 3D Fauna of the Web
Paper ⢠2401.02400 ⢠Published ⢠11 -
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Paper ⢠2310.15110 ⢠Published ⢠3 -
Zero-1-to-3: Zero-shot One Image to 3D Object
Paper ⢠2303.11328 ⢠Published ⢠5
-
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Paper ⢠2306.07967 ⢠Published ⢠25 -
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Paper ⢠2306.07954 ⢠Published ⢠111 -
TryOnDiffusion: A Tale of Two UNets
Paper ⢠2306.08276 ⢠Published ⢠74 -
Seeing the World through Your Eyes
Paper ⢠2306.09348 ⢠Published ⢠33