Gravity Spy Glitch Classifier: ViT-B/16 and ResNet-50v2 BiT

Two deep learning models for classifying LIGO gravitational-wave detector glitch morphologies from Q-transform spectrograms, trained on Gravity Spy O3 data (Zevin et al. 2017, CQG 34 064003).

Paper: Manuscript submitted to Classical and Quantum Gravity Code: GitHub

Model Overview

Property ViT-B/16 ResNet-50v2 BiT
Architecture Vision Transformer (patch 16, 224px) Big Transfer ResNet-50v2
Pretrained weights AugReg ImageNet-21k + ImageNet-1k ImageNet-21k + ImageNet-1k
Parameters ~86M ~25M
Framework timm timm

Both models classify 224x224 RGB Q-transform spectrograms into 23 Gravity Spy glitch classes (22 glitch morphologies + No_Glitch). The class taxonomy follows Zevin et al. (2017).

Key Finding

Architecture preference is class-dependent: ViT excels on spectrally distinctive classes (Power_Line: +0.507 F1) but shows insufficient evidence of improvement on rare classes (< 200 training samples). Neither architecture uniformly dominates.

Training Data

  • Source: Gravity Spy O3 (H1 + L1), filtered to ml_confidence > 0.9
  • Total samples: 227,943 training / 48,844 validation / 48,845 test
  • Split: Temporal split (70/15/15%) with 60-second gap enforcement (prevents data leakage)
  • Classes: 23 (see src/class_labels.json)
  • Preprocessing: Q-transform spectrograms resized to 224x224, normalized with ImageNet statistics
  • Rare classes: Chirp (11 train), Wandering_Line (30), Helix (33), Light_Modulation (142)

Performance (O3 Test Set)

Primary metric: macro-F1 (averaged equally across all 23 classes).

Metric ViT-B/16 ResNet-50v2 BiT
Macro-F1 (PRIMARY) 0.7230 [0.7031, 0.7397] 0.6786 [0.6598, 0.6944]
Rare-class macro-F1 0.2412 [0.2019, 0.2957] 0.3028 [0.2085, 0.3751]
Overall accuracy (secondary) 0.9343 0.9181

95% bootstrap confidence intervals (10,000 resamples). Macro-F1 difference is statistically significant (p = 0.0002).

Note: Overall accuracy is reported as a secondary sanity check only. It is not the primary metric because it masks rare-class performance differences (see "Limitations" below).

Per-Class Highlights

Class ViT F1 CNN F1 Difference Favors
Power_Line 0.742 0.235 +0.507 ViT
Paired_Doves 0.613 0.099 +0.514 ViT
Scratchy 0.875 0.503 +0.372 ViT
Light_Modulation 0.859 0.691 +0.168 ViT
Chirp 0.000 0.471 โˆ’0.471 CNN
Violin_Mode 0.544 0.683 โˆ’0.139 CNN
Scattered_Light 0.719 0.811 โˆ’0.092 CNN

O4 Generalization

Both models were evaluated on 38,587 O4a Gravity Spy spectrograms:

Metric ViT-B/16 ResNet-50v2 BiT
O4 macro-F1 0.6695 [0.6555, 0.6816] 0.6674 [0.6567, 0.6765]
Relative degradation from O3 โˆ’7.4% โˆ’1.7%

Both models pass the <20% degradation threshold.

Limitations

  • O3-trained only: Not fine-tuned on O4 data. Performance may degrade on later observing runs.
  • Single-view: Uses only the 1.0-second duration Q-transform view (Gravity Spy uses four).
  • 23 classes: New glitch morphologies in O4+ will be misclassified into existing categories.
  • Rare-class: ViT's macro-F1 advantage does not extend to rare classes (< 200 training samples). The rare-class comparison is statistically underpowered (aggregate power = 0.20).
  • Single seed: Results from one training run per architecture. No seed variance reported.
  • Label quality: Training labels from ml_confidence > 0.9 filtering of Gravity Spy citizen science classifications.

Usage

pip install torch timm albumentations numpy Pillow

# Classify a spectrogram with ViT
python src/inference.py --model vit --image path/to/spectrogram.png

# Classify with CNN, showing top-5 predictions
python src/inference.py --model cnn --image path/to/spectrogram.png --top-k 5

# JSON output
python src/inference.py --model vit --image path/to/spectrogram.png --json

Input Format

  • PNG or JPG image of a Q-transform spectrogram
  • Any resolution (automatically resized to 224x224)
  • RGB color (grayscale images are converted to 3-channel)

File Structure

checkpoints/
  vit_b16_gravityspy_o3.pt        # ViT-B/16 weights (~983 MB)
  resnet50v2_gravityspy_o3.pt     # ResNet-50v2 BiT weights (~270 MB)
  checksums.sha256                # SHA-256 checksums
src/
  inference.py                    # Standalone CLI inference script
  preprocessing.py                # Locked evaluation transforms
  class_labels.json               # 23-class index-to-label mapping
  model_config.json               # Architecture and training config
examples/
  expected_output.json            # Validated predictions for test images

Citation

@article{weigel2026vit_glitch,
  author = {Weigel, Jesse R.},
  title = {Vision Transformer vs.\ CNN for Gravitational Wave Glitch Classification:
           Class-Dependent Architecture Preferences and Implications for
           Continuous Wave Searches},
  year = {2026},
  note = {Manuscript submitted to Classical and Quantum Gravity}
}

License

CC BY 4.0

References

  • Zevin, M. et al. (2017). "Gravity Spy: integrating advanced LIGO detector characterization, machine learning, and citizen science." Classical and Quantum Gravity, 34(6), 064003.
  • Wu, Z. et al. (2025). "Multi-view Attention Fusion for Gravitational Wave Glitch Classification." Classical and Quantum Gravity, 42, 165015.
  • Srivastava, A. and Niedzielski, T. (2025). "Vision Transformer for Transient Noise Classification in Gravitational Wave Data." Acta Astronomica, 74(3).
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results

  • Macro-F1 on Gravity Spy O3 (temporal split)
    self-reported
    0.723
  • Accuracy on Gravity Spy O3 (temporal split)
    self-reported
    0.934
  • Rare-class Macro-F1 on Gravity Spy O3 (temporal split)
    self-reported
    0.241
  • Macro-F1 on Gravity Spy O3 (temporal split)
    self-reported
    0.679
  • Accuracy on Gravity Spy O3 (temporal split)
    self-reported
    0.918
  • Rare-class Macro-F1 on Gravity Spy O3 (temporal split)
    self-reported
    0.303