YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

CLIP_aievals: AI–Generated Image Detector

This model is a CLIP-based classifier fine-tuned to detect AI-generated images across a wide range of generative models. It is trained using a mixture of real datasets (FFHQ, COCO, ImageNet, AFHQ, etc.) and synthetic datasets from diffusion, GANs, and hybrid architectures.

Overview

CLIP_aievals is designed for robust AI-vs-Real detection by leveraging a CLIP Vision Transformer backbone and a lightweight classification head. It is optimized for generalization across unseen generative sources and large-scale evaluation pipelines.

This repository contains the model weights (clip_vith14_argus.pt) and supporting configuration files used for inference.

Model Architecture

Backbone

CLIP ViT-H/14 vision encoder
Pretrained on LAION-2B
Frozen or partially unfrozen depending on training configuration

Classifier Head

Two-layer MLP:
- Input: CLIP image embedding (1024-d)
- Hidden Layer: 512 with GELU activation
- Output Layer: 1-unit sigmoid classifier producing probability of AI-generated content

Regularization and Calibration

Dropout: 0.1
Weight decay: 1e-4
Temperature calibration performed post-hoc using validation logits
Optional threshold tuning using Eval metrics or Unknown-source analysis

Training Objective

Binary cross-entropy
Oversampling and class-balancing for multi-source synthetic datasets

Datasets

The training pipeline uses a mixture of curated datasets:

Real Data

FFHQ (70k)
COCO (160k)
ImageNet (90k+)
AFHQ v1/v2 (cats, dogs, wildlife)
DIV2K
OpenImages

Fake Data

Stable Diffusion (v1.x, v2.x)
Latent Diffusion Models
StyleGAN3
CIPS
BigGAN
GANformer
CycleGAN (horse2zebra, monet2photo)
DDPM and DDGAN
Face Synthetics
Glide
Generative Inpainting (partial and full)

Labels are binary: 0 = real, 1 = fake.

Performance Summary

Evaluated on 850k+ mixed-source images:

ROC-AUC: 0.764
PR-AUC (AI class): 0.612
Global FPR (real images): 0.0073
Accuracy: 0.693
Precision (AI): 0.853
Recall (AI): 0.086

Performance is dataset-dependent: high confidence on many synthetic sources, lower recall on advanced diffusion models exhibiting strong photorealism.

Intended Use

Primary

Detect whether an image is AI-generated
Large-scale offline evaluation of generative models
Data filtering for dataset curation
Quality and authenticity control in multimedia pipelines

Secondary

Research on generative model detection
Cross-model robustness evaluation

Not Intended For

Legal or forensic verification
High-stakes decision systems
Per-pixel or localized artifact detection

Limitations

Lower recall on highly realistic diffusion models.
Model can produce false positives on:
- Overprocessed images
- Heavy JPEG compression
- Artistic filters
Not calibrated for forensic authenticity analysis.

How to Use

In Python

from src.model import AIImageDetector
from PIL import Image
import torch

model = AIImageDetector(
    clip_model_name="ViT-H-14",
    device="cuda",
    dropout=0.1
)

model.load_state_dict(torch.load("clip_vith14_argus.pt", map_location="cpu"))
model.eval()

img = Image.open("your_image.jpg")
prob = model.predict(img)  # returns probability of AI generation
print(prob)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support