ONNX Models for Vidupe.Net
This repository contains ONNX-exported models used by Vidupe.Net for visual similarity and perceptual comparison tasks.
Models
vidupe.net/models/clip_visual_vit_b32.onnx
CLIP visual encoder (ViT-B/32) exported to ONNX. This model encodes images into a 512-dimensional embedding space, enabling semantic image similarity comparisons.
- Source: openai/clip-vit-base-patch32
- Input: RGB image tensor
[batch, 3, 224, 224], normalized - Output: Image embeddings
[batch, 512]
vidupe.net/models/lpips_alexnet.onnx
LPIPS (Learned Perceptual Image Patch Similarity) model with an AlexNet backbone exported to ONNX. Computes perceptual distance between two image patches.
- Source: richzhang/PerceptualSimilarity
- Input: Two normalized RGB image tensors
[batch, 3, H, W] - Output: Perceptual distance score
[batch, 1, 1, 1]
Usage
import onnxruntime as ort
import numpy as np
# CLIP visual encoder
session = ort.InferenceSession("vidupe.net/models/clip_visual_vit_b32.onnx")
image = np.random.randn(1, 3, 224, 224).astype(np.float32)
embeddings = session.run(None, {"input": image})[0]
# LPIPS perceptual similarity
session = ort.InferenceSession("vidupe.net/models/lpips_alexnet.onnx")
img0 = np.random.randn(1, 3, 64, 64).astype(np.float32)
img1 = np.random.randn(1, 3, 64, 64).astype(np.float32)
distance = session.run(None, {"input0": img0, "input1": img1})[0]
Requirements
onnxruntime>=1.16.0
numpy
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support