YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

CLIP_aievals: AI–Generated Image Detector

This model is a CLIP-based classifier fine-tuned to detect AI-generated images across a wide range of generative models. It is trained using a mixture of real datasets (FFHQ, COCO, ImageNet, AFHQ, etc.) and synthetic datasets from diffusion, GANs, and hybrid architectures.

Overview

CLIP_aievals is designed for robust AI-vs-Real detection by leveraging a CLIP Vision Transformer backbone and a lightweight classification head. It is optimized for generalization across unseen generative sources and large-scale evaluation pipelines.

This repository contains the model weights (clip_vith14_argus.pt) and supporting configuration files used for inference.


Model Architecture

Backbone

  • CLIP ViT-H/14 vision encoder
  • Pretrained on LAION-2B
  • Frozen or partially unfrozen depending on training configuration

Classifier Head

  • Two-layer MLP:

    • Input: CLIP image embedding (1024-d)
    • Hidden Layer: 512 with GELU activation
    • Output Layer: 1-unit sigmoid classifier producing probability of AI-generated content

Regularization and Calibration

  • Dropout: 0.1
  • Weight decay: 1e-4
  • Temperature calibration performed post-hoc using validation logits
  • Optional threshold tuning using Eval metrics or Unknown-source analysis

Training Objective

  • Binary cross-entropy
  • Oversampling and class-balancing for multi-source synthetic datasets

Datasets

The training pipeline uses a mixture of curated datasets:

Real Data

  • FFHQ (70k)
  • COCO (160k)
  • ImageNet (90k+)
  • AFHQ v1/v2 (cats, dogs, wildlife)
  • DIV2K
  • OpenImages

Fake Data

  • Stable Diffusion (v1.x, v2.x)
  • Latent Diffusion Models
  • StyleGAN3
  • CIPS
  • BigGAN
  • GANformer
  • CycleGAN (horse2zebra, monet2photo)
  • DDPM and DDGAN
  • Face Synthetics
  • Glide
  • Generative Inpainting (partial and full)

Labels are binary: 0 = real, 1 = fake.


Performance Summary

Evaluated on 850k+ mixed-source images:

  • ROC-AUC: 0.764
  • PR-AUC (AI class): 0.612
  • Global FPR (real images): 0.0073
  • Accuracy: 0.693
  • Precision (AI): 0.853
  • Recall (AI): 0.086

Performance is dataset-dependent: high confidence on many synthetic sources, lower recall on advanced diffusion models exhibiting strong photorealism.


Intended Use

Primary

  • Detect whether an image is AI-generated
  • Large-scale offline evaluation of generative models
  • Data filtering for dataset curation
  • Quality and authenticity control in multimedia pipelines

Secondary

  • Research on generative model detection
  • Cross-model robustness evaluation

Not Intended For

  • Legal or forensic verification
  • High-stakes decision systems
  • Per-pixel or localized artifact detection

Limitations

  • Lower recall on highly realistic diffusion models.

  • Model can produce false positives on:

    • Overprocessed images
    • Heavy JPEG compression
    • Artistic filters
  • Not calibrated for forensic authenticity analysis.


How to Use

In Python

from src.model import AIImageDetector
from PIL import Image
import torch

model = AIImageDetector(
    clip_model_name="ViT-H-14",
    device="cuda",
    dropout=0.1
)

model.load_state_dict(torch.load("clip_vith14_argus.pt", map_location="cpu"))
model.eval()

img = Image.open("your_image.jpg")
prob = model.predict(img)  # returns probability of AI generation
print(prob)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support