--- license: mit tags: - onnx - clip - lpips - image-similarity - computer-vision --- # ONNX Models for Vidupe.Net This repository contains ONNX-exported models used by [Vidupe.Net](https://#) for visual similarity and perceptual comparison tasks. ## Models ### `vidupe.net/models/clip_visual_vit_b32.onnx` CLIP visual encoder (ViT-B/32) exported to ONNX. This model encodes images into a 512-dimensional embedding space, enabling semantic image similarity comparisons. - **Source:** [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) - **Input:** RGB image tensor `[batch, 3, 224, 224]`, normalized - **Output:** Image embeddings `[batch, 512]` ### `vidupe.net/models/lpips_alexnet.onnx` LPIPS (Learned Perceptual Image Patch Similarity) model with an AlexNet backbone exported to ONNX. Computes perceptual distance between two image patches. - **Source:** [richzhang/PerceptualSimilarity](https://github.com/richzhang/PerceptualSimilarity) - **Input:** Two normalized RGB image tensors `[batch, 3, H, W]` - **Output:** Perceptual distance score `[batch, 1, 1, 1]` ## Usage ```python import onnxruntime as ort import numpy as np # CLIP visual encoder session = ort.InferenceSession("vidupe.net/models/clip_visual_vit_b32.onnx") image = np.random.randn(1, 3, 224, 224).astype(np.float32) embeddings = session.run(None, {"input": image})[0] # LPIPS perceptual similarity session = ort.InferenceSession("vidupe.net/models/lpips_alexnet.onnx") img0 = np.random.randn(1, 3, 64, 64).astype(np.float32) img1 = np.random.randn(1, 3, 64, 64).astype(np.float32) distance = session.run(None, {"input0": img0, "input1": img1})[0] ``` ## Requirements ``` onnxruntime>=1.16.0 numpy ```