-
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
Paper • 2408.15545 • Published • 38 -
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 65 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 44 -
Automated Design of Agentic Systems
Paper • 2408.08435 • Published • 40
Giuseppe Mantineo
PeppePasti
·
AI & ML interests
NLP & Computer Vision
Organizations
None yet
AI Safety
Agents
-
Automated Design of Agentic Systems
Paper • 2408.08435 • Published • 40 -
Self-Refine: Iterative Refinement with Self-Feedback
Paper • 2303.17651 • Published • 2 -
Automating Thought of Search: A Journey Towards Soundness and Completeness
Paper • 2408.11326 • Published • 3 -
Building Math Agents with Multi-Turn Iterative Preference Learning
Paper • 2409.02392 • Published • 16
Liquid Neural Networks
Diffusion Models
-
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
Paper • 2408.14176 • Published • 62 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 63 -
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Paper • 2409.01199 • Published • 14
Human Pose
Text Embedding & Rankers
-
Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever
Paper • 2408.16672 • Published • 9 -
Precise Zero-Shot Dense Retrieval without Relevance Labels
Paper • 2212.10496 • Published • 4 -
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent
Paper • 2304.09542 • Published • 5 -
Making Text Embedders Few-Shot Learners
Paper • 2409.15700 • Published • 29
Computer Vision
-
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
Paper • 2409.02095 • Published • 36 -
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper • 2409.01704 • Published • 83 -
CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation
Paper • 2409.03643 • Published • 19 -
UniDet3D: Multi-dataset Indoor 3D Object Detection
Paper • 2409.04234 • Published • 9
Multi-lingual Training Language Models
Interesting Stuffs
Multimodal LLMs
-
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 63 -
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Paper • 2408.16725 • Published • 52 -
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Paper • 2408.15998 • Published • 87
RAG
-
EfficientRAG: Efficient Retriever for Multi-Hop Question Answering
Paper • 2408.04259 • Published • 2 -
HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction
Paper • 2408.04948 • Published • 1 -
Graph Retrieval-Augmented Generation: A Survey
Paper • 2408.08921 • Published • 4 -
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper • 2408.14906 • Published • 144
Reinforcement learning (RL)
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Paper • 2306.01693 • Published • 3 -
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper • 2408.15240 • Published • 13 -
Diffusion Policy Policy Optimization
Paper • 2409.00588 • Published • 20
Audio Models
-
Foundation Models for Music: A Survey
Paper • 2408.14340 • Published • 44 -
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Paper • 2408.16532 • Published • 50 -
FLUX that Plays Music
Paper • 2409.00587 • Published • 33 -
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation
Paper • 2409.02245 • Published • 10
Tabular Data
Text-to-image
Segmentation
Time-series
NLP (no LLM related)
-
Universal Language Model Fine-tuning for Text Classification
Paper • 1801.06146 • Published • 8 -
Exploiting Similarities among Languages for Machine Translation
Paper • 1309.4168 • Published -
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Paper • 2409.04431 • Published • 2 -
Kolmogorov-Arnold Transformer
Paper • 2409.10594 • Published • 45
LLMs
-
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
Paper • 2408.15545 • Published • 38 -
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 65 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 44 -
Automated Design of Agentic Systems
Paper • 2408.08435 • Published • 40
Multimodal LLMs
-
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 133 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 63 -
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Paper • 2408.16725 • Published • 52 -
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Paper • 2408.15998 • Published • 87
AI Safety
RAG
-
EfficientRAG: Efficient Retriever for Multi-Hop Question Answering
Paper • 2408.04259 • Published • 2 -
HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction
Paper • 2408.04948 • Published • 1 -
Graph Retrieval-Augmented Generation: A Survey
Paper • 2408.08921 • Published • 4 -
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper • 2408.14906 • Published • 144
Agents
-
Automated Design of Agentic Systems
Paper • 2408.08435 • Published • 40 -
Self-Refine: Iterative Refinement with Self-Feedback
Paper • 2303.17651 • Published • 2 -
Automating Thought of Search: A Journey Towards Soundness and Completeness
Paper • 2408.11326 • Published • 3 -
Building Math Agents with Multi-Turn Iterative Preference Learning
Paper • 2409.02392 • Published • 16
Reinforcement learning (RL)
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Paper • 2306.01693 • Published • 3 -
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper • 2408.15240 • Published • 13 -
Diffusion Policy Policy Optimization
Paper • 2409.00588 • Published • 20
Liquid Neural Networks
Audio Models
-
Foundation Models for Music: A Survey
Paper • 2408.14340 • Published • 44 -
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Paper • 2408.16532 • Published • 50 -
FLUX that Plays Music
Paper • 2409.00587 • Published • 33 -
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation
Paper • 2409.02245 • Published • 10
Diffusion Models
-
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
Paper • 2408.14176 • Published • 62 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 63 -
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Paper • 2409.01199 • Published • 14
Tabular Data
Human Pose
Text-to-image
Text Embedding & Rankers
-
Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever
Paper • 2408.16672 • Published • 9 -
Precise Zero-Shot Dense Retrieval without Relevance Labels
Paper • 2212.10496 • Published • 4 -
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent
Paper • 2304.09542 • Published • 5 -
Making Text Embedders Few-Shot Learners
Paper • 2409.15700 • Published • 29
Segmentation
Computer Vision
-
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
Paper • 2409.02095 • Published • 36 -
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper • 2409.01704 • Published • 83 -
CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation
Paper • 2409.03643 • Published • 19 -
UniDet3D: Multi-dataset Indoor 3D Object Detection
Paper • 2409.04234 • Published • 9
Time-series
Multi-lingual Training Language Models
NLP (no LLM related)
-
Universal Language Model Fine-tuning for Text Classification
Paper • 1801.06146 • Published • 8 -
Exploiting Similarities among Languages for Machine Translation
Paper • 1309.4168 • Published -
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Paper • 2409.04431 • Published • 2 -
Kolmogorov-Arnold Transformer
Paper • 2409.10594 • Published • 45
Interesting Stuffs