Gravity Spy Glitch Classifier: ViT-B/16 and ResNet-50v2 BiT
Two deep learning models for classifying LIGO gravitational-wave detector glitch morphologies from Q-transform spectrograms, trained on Gravity Spy O3 data (Zevin et al. 2017, CQG 34 064003).
Paper: Manuscript submitted to Classical and Quantum Gravity Code: GitHub
Model Overview
| Property | ViT-B/16 | ResNet-50v2 BiT |
|---|---|---|
| Architecture | Vision Transformer (patch 16, 224px) | Big Transfer ResNet-50v2 |
| Pretrained weights | AugReg ImageNet-21k + ImageNet-1k | ImageNet-21k + ImageNet-1k |
| Parameters | ~86M | ~25M |
| Framework | timm | timm |
Both models classify 224x224 RGB Q-transform spectrograms into 23 Gravity Spy glitch classes (22 glitch morphologies + No_Glitch). The class taxonomy follows Zevin et al. (2017).
Key Finding
Architecture preference is class-dependent: ViT excels on spectrally distinctive classes (Power_Line: +0.507 F1) but shows insufficient evidence of improvement on rare classes (< 200 training samples). Neither architecture uniformly dominates.
Training Data
- Source: Gravity Spy O3 (H1 + L1), filtered to ml_confidence > 0.9
- Total samples: 227,943 training / 48,844 validation / 48,845 test
- Split: Temporal split (70/15/15%) with 60-second gap enforcement (prevents data leakage)
- Classes: 23 (see
src/class_labels.json) - Preprocessing: Q-transform spectrograms resized to 224x224, normalized with ImageNet statistics
- Rare classes: Chirp (11 train), Wandering_Line (30), Helix (33), Light_Modulation (142)
Performance (O3 Test Set)
Primary metric: macro-F1 (averaged equally across all 23 classes).
| Metric | ViT-B/16 | ResNet-50v2 BiT |
|---|---|---|
| Macro-F1 (PRIMARY) | 0.7230 [0.7031, 0.7397] | 0.6786 [0.6598, 0.6944] |
| Rare-class macro-F1 | 0.2412 [0.2019, 0.2957] | 0.3028 [0.2085, 0.3751] |
| Overall accuracy (secondary) | 0.9343 | 0.9181 |
95% bootstrap confidence intervals (10,000 resamples). Macro-F1 difference is statistically significant (p = 0.0002).
Note: Overall accuracy is reported as a secondary sanity check only. It is not the primary metric because it masks rare-class performance differences (see "Limitations" below).
Per-Class Highlights
| Class | ViT F1 | CNN F1 | Difference | Favors |
|---|---|---|---|---|
| Power_Line | 0.742 | 0.235 | +0.507 | ViT |
| Paired_Doves | 0.613 | 0.099 | +0.514 | ViT |
| Scratchy | 0.875 | 0.503 | +0.372 | ViT |
| Light_Modulation | 0.859 | 0.691 | +0.168 | ViT |
| Chirp | 0.000 | 0.471 | โ0.471 | CNN |
| Violin_Mode | 0.544 | 0.683 | โ0.139 | CNN |
| Scattered_Light | 0.719 | 0.811 | โ0.092 | CNN |
O4 Generalization
Both models were evaluated on 38,587 O4a Gravity Spy spectrograms:
| Metric | ViT-B/16 | ResNet-50v2 BiT |
|---|---|---|
| O4 macro-F1 | 0.6695 [0.6555, 0.6816] | 0.6674 [0.6567, 0.6765] |
| Relative degradation from O3 | โ7.4% | โ1.7% |
Both models pass the <20% degradation threshold.
Limitations
- O3-trained only: Not fine-tuned on O4 data. Performance may degrade on later observing runs.
- Single-view: Uses only the 1.0-second duration Q-transform view (Gravity Spy uses four).
- 23 classes: New glitch morphologies in O4+ will be misclassified into existing categories.
- Rare-class: ViT's macro-F1 advantage does not extend to rare classes (< 200 training samples). The rare-class comparison is statistically underpowered (aggregate power = 0.20).
- Single seed: Results from one training run per architecture. No seed variance reported.
- Label quality: Training labels from ml_confidence > 0.9 filtering of Gravity Spy citizen science classifications.
Usage
pip install torch timm albumentations numpy Pillow
# Classify a spectrogram with ViT
python src/inference.py --model vit --image path/to/spectrogram.png
# Classify with CNN, showing top-5 predictions
python src/inference.py --model cnn --image path/to/spectrogram.png --top-k 5
# JSON output
python src/inference.py --model vit --image path/to/spectrogram.png --json
Input Format
- PNG or JPG image of a Q-transform spectrogram
- Any resolution (automatically resized to 224x224)
- RGB color (grayscale images are converted to 3-channel)
File Structure
checkpoints/
vit_b16_gravityspy_o3.pt # ViT-B/16 weights (~983 MB)
resnet50v2_gravityspy_o3.pt # ResNet-50v2 BiT weights (~270 MB)
checksums.sha256 # SHA-256 checksums
src/
inference.py # Standalone CLI inference script
preprocessing.py # Locked evaluation transforms
class_labels.json # 23-class index-to-label mapping
model_config.json # Architecture and training config
examples/
expected_output.json # Validated predictions for test images
Citation
@article{weigel2026vit_glitch,
author = {Weigel, Jesse R.},
title = {Vision Transformer vs.\ CNN for Gravitational Wave Glitch Classification:
Class-Dependent Architecture Preferences and Implications for
Continuous Wave Searches},
year = {2026},
note = {Manuscript submitted to Classical and Quantum Gravity}
}
License
CC BY 4.0
References
- Zevin, M. et al. (2017). "Gravity Spy: integrating advanced LIGO detector characterization, machine learning, and citizen science." Classical and Quantum Gravity, 34(6), 064003.
- Wu, Z. et al. (2025). "Multi-view Attention Fusion for Gravitational Wave Glitch Classification." Classical and Quantum Gravity, 42, 165015.
- Srivastava, A. and Niedzielski, T. (2025). "Vision Transformer for Transient Noise Classification in Gravitational Wave Data." Acta Astronomica, 74(3).
- Downloads last month
- -
Evaluation results
- Macro-F1 on Gravity Spy O3 (temporal split)self-reported0.723
- Accuracy on Gravity Spy O3 (temporal split)self-reported0.934
- Rare-class Macro-F1 on Gravity Spy O3 (temporal split)self-reported0.241
- Macro-F1 on Gravity Spy O3 (temporal split)self-reported0.679
- Accuracy on Gravity Spy O3 (temporal split)self-reported0.918
- Rare-class Macro-F1 on Gravity Spy O3 (temporal split)self-reported0.303