Bilateral Reference for High-Resolution Dichotomous Image Segmentation
Paper β’ 2401.03407 β’ Published β’ 4
Optimized BiRefNet Lite (Swin-T backbone, 44M parameters) for Intel Xeon W-2145 CPU inference using OpenVINO with NNCF INT8 post-training quantization.
| Path | Format | Size | Resolution | Best For |
|---|---|---|---|---|
openvino_int8/birefnet_lite_1024x1024_int8.xml |
INT8 (NNCF PTQ) | 48 MB | 1024Γ1024 | Maximum quality, fastest on Skylake-SP |
openvino_int8/birefnet_lite_512x512_int8.xml |
INT8 (NNCF PTQ) | 48 MB | 512Γ512 | Best throughput (~3 FPS) |
openvino_fp16/birefnet_lite_*_fp16.xml |
FP16 weights | 85 MB | both | Model size reduction; compute still FP32 on Skylake-SP |
openvino_fp32/birefnet_lite_*.xml |
FP32 | 85 MB | both | Reference accuracy, no quantization |
openvino_int8wo/birefnet_lite_*_int8wo.xml |
INT8 weight-only | 48 MB | both | Alternative compression; minimal latency change on Skylake-SP |
| CPU Architecture | Recommended Variant | Why |
|---|---|---|
| Skylake-SP (Xeon W-2145) | INT8 (NNCF) | ~7Γ faster than PyTorch; INT8 from reduced memory bandwidth (no VNNI on Skylake) |
| Cascade Lake+ (VNNI) | INT8 (NNCF) | ~14Γ faster; VNNI INT8 dot-product doubles throughput over Skylake-SP |
| Sapphire Rapids (AMX) | INT8 or FP16 | ~28-56Γ faster; AMX tile engines give massive INT8/BF16 throughput |
| AMD Zen 4 (AVX-512) | OpenVINO FP32 | No VNNI β INT8 gains limited to memory bandwidth; FP32 graph fusion is main win |
| Any CPU with <16GB RAM | INT8 | 3.5Γ model size reduction (169 MB β 48 MB) |
import openvino as ov
import numpy as np
from PIL import Image
from torchvision import transforms
# Load best variant for Skylake-SP
core = ov.Core()
model = core.read_model("openvino_int8/birefnet_lite_1024x1024_int8.xml")
# Optimal config for Xeon W-2145 (8 physical cores)
config = {
"PERFORMANCE_HINT": "LATENCY",
"NUM_STREAMS": "1",
"INFERENCE_NUM_THREADS": "8",
}
compiled = core.compile_model(model, "CPU", config)
infer_req = compiled.create_infer_request()
# Preprocess
transform = transforms.Compose([
transforms.Resize((1024, 1024)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
image = Image.open("input.jpg").convert("RGB")
input_tensor = transform(image).unsqueeze(0).numpy()
# Inference
result = infer_req.infer({0: input_tensor})
mask = 1 / (1 + np.exp(-result[0])) # sigmoid
mask = (mask[0, 0] * 255).astype(np.uint8)
# Save
Image.fromarray(mask).resize(image.size).save("mask.png")
export OMP_NUM_THREADS=8 # physical cores only
export KMP_AFFINITY="granularity=fine,compact,1,0"
export KMP_BLOCKTIME=1
export OMP_WAIT_POLICY=ACTIVE
pip install openvino torchvision pillow
Why OpenVINO is much faster than PyTorch on Xeon W-2145:
INT8 quantization:
Resolution: 1024Γ1024 for maximum quality; 512Γ512 for ~4Γ lower latency with minor quality loss.
swin_v1_t) β 28M params, window size 7@article{BiRefNet,
title={Bilateral Reference for High-Resolution Dichotomous Image Segmentation},
author={Zheng, Peng and Gao, Dehong and Fan, Deng-Ping and Liu, Li and Laaksonen, Jorma and Ouyang, Wanli and Sebe, Nicu},
journal={CAAI Artificial Intelligence Research},
year={2024}
}
Base model
ZhengPeng7/BiRefNet_lite