Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Giuseppe Mantineo's picture

4 1

Giuseppe Mantineo

PeppePasti

·

AI & ML interests

NLP & Computer Vision

Organizations

None yet

PeppePasti 's collections 19

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28, 2024 • 38
Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22, 2024 • 65
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 44
Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15, 2024 • 40

Improving Alignment and Robustness with Short Circuiting

Paper • 2406.04313 • Published Jun 6, 2024 • 1
Efficient Detection of Toxic Prompts in Large Language Models

Paper • 2408.11727 • Published Aug 21, 2024 • 13
Diffusion Policy Policy Optimization

Paper • 2409.00588 • Published Sep 1, 2024 • 20

Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15, 2024 • 40
Self-Refine: Iterative Refinement with Self-Feedback

Paper • 2303.17651 • Published Mar 30, 2023 • 2
Automating Thought of Search: A Journey Towards Soundness and Completeness

Paper • 2408.11326 • Published Aug 21, 2024 • 3
Building Math Agents with Multi-Turn Iterative Preference Learning

Paper • 2409.02392 • Published Sep 4, 2024 • 16

Liquid Neural Networks

Liquid Time-constant Networks

Paper • 2006.04439 • Published Jun 8, 2020 • 3

Diffusion Models

SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

Paper • 2408.14176 • Published Aug 26, 2024 • 62
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 126
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 63
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

Paper • 2409.01199 • Published Sep 2, 2024 • 14

Sapiens: Foundation for Human Vision Models

Paper • 2408.12569 • Published Aug 22, 2024 • 94

Text Embedding & Rankers

Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever

Paper • 2408.16672 • Published Aug 29, 2024 • 9
Precise Zero-Shot Dense Retrieval without Relevance Labels

Paper • 2212.10496 • Published Dec 20, 2022 • 4
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent

Paper • 2304.09542 • Published Apr 19, 2023 • 5
Making Text Embedders Few-Shot Learners

Paper • 2409.15700 • Published Sep 24, 2024 • 29

Computer Vision

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Paper • 2409.02095 • Published Sep 3, 2024 • 36
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83
CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation

Paper • 2409.03643 • Published Sep 5, 2024 • 19
UniDet3D: Multi-dataset Indoor 3D Object Detection

Paper • 2409.04234 • Published Sep 6, 2024 • 9

Multi-lingual Training Language Models

Unsupervised Cross-lingual Representation Learning at Scale

Paper • 1911.02116 • Published Nov 5, 2019 • 3
Exploiting Similarities among Languages for Machine Translation

Paper • 1309.4168 • Published Sep 17, 2013

Interesting Stuffs

gsplat: An Open-Source Library for Gaussian Splatting

Paper • 2409.06765 • Published Sep 10, 2024 • 17
Generative Hierarchical Materials Search

Paper • 2409.06762 • Published Sep 10, 2024 • 7

Multimodal LLMs

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 63
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published Aug 29, 2024 • 52
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28, 2024 • 87

EfficientRAG: Efficient Retriever for Multi-Hop Question Answering

Paper • 2408.04259 • Published Aug 8, 2024 • 2
HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction

Paper • 2408.04948 • Published Aug 9, 2024 • 1
Graph Retrieval-Augmented Generation: A Survey

Paper • 2408.08921 • Published Aug 15, 2024 • 4
Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27, 2024 • 144

Reinforcement learning (RL)

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 11
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Paper • 2306.01693 • Published Jun 2, 2023 • 3
Generative Verifiers: Reward Modeling as Next-Token Prediction

Paper • 2408.15240 • Published Aug 27, 2024 • 13
Diffusion Policy Policy Optimization

Paper • 2409.00588 • Published Sep 1, 2024 • 20

Foundation Models for Music: A Survey

Paper • 2408.14340 • Published Aug 26, 2024 • 44
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Paper • 2408.16532 • Published Aug 29, 2024 • 50
FLUX that Plays Music

Paper • 2409.00587 • Published Sep 1, 2024 • 33
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation

Paper • 2409.02245 • Published Sep 3, 2024 • 10

Scaling Up Diffusion and Flow-based XGBoost Models

Paper • 2408.16046 • Published Aug 28, 2024 • 10

CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

Paper • 2408.15914 • Published Aug 28, 2024 • 24

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 119

VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters

Paper • 2408.17253 • Published Aug 30, 2024 • 39
Prithvi WxC: Foundation Model for Weather and Climate

Paper • 2409.13598 • Published Sep 20, 2024 • 46

NLP (no LLM related)

Universal Language Model Fine-tuning for Text Classification

Paper • 1801.06146 • Published Jan 18, 2018 • 8
Exploiting Similarities among Languages for Machine Translation

Paper • 1309.4168 • Published Sep 17, 2013
Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Paper • 2409.04431 • Published Sep 6, 2024 • 2
Kolmogorov-Arnold Transformer

Paper • 2409.10594 • Published Sep 16, 2024 • 45

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28, 2024 • 38
Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22, 2024 • 65
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 44
Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15, 2024 • 40

Multimodal LLMs

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 63
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published Aug 29, 2024 • 52
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28, 2024 • 87

Improving Alignment and Robustness with Short Circuiting

Paper • 2406.04313 • Published Jun 6, 2024 • 1
Efficient Detection of Toxic Prompts in Large Language Models

Paper • 2408.11727 • Published Aug 21, 2024 • 13
Diffusion Policy Policy Optimization

Paper • 2409.00588 • Published Sep 1, 2024 • 20

EfficientRAG: Efficient Retriever for Multi-Hop Question Answering

Paper • 2408.04259 • Published Aug 8, 2024 • 2
HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction

Paper • 2408.04948 • Published Aug 9, 2024 • 1
Graph Retrieval-Augmented Generation: A Survey

Paper • 2408.08921 • Published Aug 15, 2024 • 4
Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27, 2024 • 144

Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15, 2024 • 40
Self-Refine: Iterative Refinement with Self-Feedback

Paper • 2303.17651 • Published Mar 30, 2023 • 2
Automating Thought of Search: A Journey Towards Soundness and Completeness

Paper • 2408.11326 • Published Aug 21, 2024 • 3
Building Math Agents with Multi-Turn Iterative Preference Learning

Paper • 2409.02392 • Published Sep 4, 2024 • 16

Reinforcement learning (RL)

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 11
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Paper • 2306.01693 • Published Jun 2, 2023 • 3
Generative Verifiers: Reward Modeling as Next-Token Prediction

Paper • 2408.15240 • Published Aug 27, 2024 • 13
Diffusion Policy Policy Optimization

Paper • 2409.00588 • Published Sep 1, 2024 • 20

Liquid Neural Networks

Liquid Time-constant Networks

Paper • 2006.04439 • Published Jun 8, 2020 • 3

Foundation Models for Music: A Survey

Paper • 2408.14340 • Published Aug 26, 2024 • 44
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Paper • 2408.16532 • Published Aug 29, 2024 • 50
FLUX that Plays Music

Paper • 2409.00587 • Published Sep 1, 2024 • 33
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation

Paper • 2409.02245 • Published Sep 3, 2024 • 10

Diffusion Models

SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

Paper • 2408.14176 • Published Aug 26, 2024 • 62
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 126
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 63
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

Paper • 2409.01199 • Published Sep 2, 2024 • 14

Scaling Up Diffusion and Flow-based XGBoost Models

Paper • 2408.16046 • Published Aug 28, 2024 • 10

Sapiens: Foundation for Human Vision Models

Paper • 2408.12569 • Published Aug 22, 2024 • 94

CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

Paper • 2408.15914 • Published Aug 28, 2024 • 24

Text Embedding & Rankers

Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever

Paper • 2408.16672 • Published Aug 29, 2024 • 9
Precise Zero-Shot Dense Retrieval without Relevance Labels

Paper • 2212.10496 • Published Dec 20, 2022 • 4
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent

Paper • 2304.09542 • Published Apr 19, 2023 • 5
Making Text Embedders Few-Shot Learners

Paper • 2409.15700 • Published Sep 24, 2024 • 29

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 119

Computer Vision

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Paper • 2409.02095 • Published Sep 3, 2024 • 36
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83
CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation

Paper • 2409.03643 • Published Sep 5, 2024 • 19
UniDet3D: Multi-dataset Indoor 3D Object Detection

Paper • 2409.04234 • Published Sep 6, 2024 • 9

VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters

Paper • 2408.17253 • Published Aug 30, 2024 • 39
Prithvi WxC: Foundation Model for Weather and Climate

Paper • 2409.13598 • Published Sep 20, 2024 • 46

Multi-lingual Training Language Models

Unsupervised Cross-lingual Representation Learning at Scale

Paper • 1911.02116 • Published Nov 5, 2019 • 3
Exploiting Similarities among Languages for Machine Translation

Paper • 1309.4168 • Published Sep 17, 2013

NLP (no LLM related)

Universal Language Model Fine-tuning for Text Classification

Paper • 1801.06146 • Published Jan 18, 2018 • 8
Exploiting Similarities among Languages for Machine Translation

Paper • 1309.4168 • Published Sep 17, 2013
Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Paper • 2409.04431 • Published Sep 6, 2024 • 2
Kolmogorov-Arnold Transformer

Paper • 2409.10594 • Published Sep 16, 2024 • 45

Interesting Stuffs

gsplat: An Open-Source Library for Gaussian Splatting

Paper • 2409.06765 • Published Sep 10, 2024 • 17
Generative Hierarchical Materials Search

Paper • 2409.06762 • Published Sep 10, 2024 • 7

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs