BiRefNet Lite — Optimized for Intel Xeon W-2145 (Skylake-SP)

Optimized BiRefNet Lite (Swin-T backbone, 44M parameters) for Intel Xeon W-2145 CPU inference using OpenVINO with NNCF INT8 post-training quantization.

Model Variants

Path	Format	Size	Resolution	Best For
`openvino_int8/birefnet_lite_1024x1024_int8.xml`	INT8 (NNCF PTQ)	48 MB	1024×1024	Maximum quality, fastest on Skylake-SP
`openvino_int8/birefnet_lite_512x512_int8.xml`	INT8 (NNCF PTQ)	48 MB	512×512	Best throughput (~3 FPS)
`openvino_fp16/birefnet_lite_*_fp16.xml`	FP16 weights	85 MB	both	Model size reduction; compute still FP32 on Skylake-SP
`openvino_fp32/birefnet_lite_*.xml`	FP32	85 MB	both	Reference accuracy, no quantization
`openvino_int8wo/birefnet_lite_*_int8wo.xml`	INT8 weight-only	48 MB	both	Alternative compression; minimal latency change on Skylake-SP

Which variant to use

CPU Architecture	Recommended Variant	Why
Skylake-SP (Xeon W-2145)	INT8 (NNCF)	~7× faster than PyTorch; INT8 from reduced memory bandwidth (no VNNI on Skylake)
Cascade Lake+ (VNNI)	INT8 (NNCF)	~14× faster; VNNI INT8 dot-product doubles throughput over Skylake-SP
Sapphire Rapids (AMX)	INT8 or FP16	~28-56× faster; AMX tile engines give massive INT8/BF16 throughput
AMD Zen 4 (AVX-512)	OpenVINO FP32	No VNNI → INT8 gains limited to memory bandwidth; FP32 graph fusion is main win
Any CPU with <16GB RAM	INT8	3.5× model size reduction (169 MB → 48 MB)

Quick Start

import openvino as ov
import numpy as np
from PIL import Image
from torchvision import transforms

# Load best variant for Skylake-SP
core = ov.Core()
model = core.read_model("openvino_int8/birefnet_lite_1024x1024_int8.xml")

# Optimal config for Xeon W-2145 (8 physical cores)
config = {
    "PERFORMANCE_HINT": "LATENCY",
    "NUM_STREAMS": "1",
    "INFERENCE_NUM_THREADS": "8",
}
compiled = core.compile_model(model, "CPU", config)
infer_req = compiled.create_infer_request()

# Preprocess
transform = transforms.Compose([
    transforms.Resize((1024, 1024)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

image = Image.open("input.jpg").convert("RGB")
input_tensor = transform(image).unsqueeze(0).numpy()

# Inference
result = infer_req.infer({0: input_tensor})
mask = 1 / (1 + np.exp(-result[0]))  # sigmoid
mask = (mask[0, 0] * 255).astype(np.uint8)

# Save
Image.fromarray(mask).resize(image.size).save("mask.png")

Environment Setup

export OMP_NUM_THREADS=8          # physical cores only
export KMP_AFFINITY="granularity=fine,compact,1,0"
export KMP_BLOCKTIME=1
export OMP_WAIT_POLICY=ACTIVE

pip install openvino torchvision pillow

Optimization Notes

Why OpenVINO is much faster than PyTorch on Xeon W-2145:

AVX-512 SIMD vectorization — 16 FP32 ops/cycle
Graph-level fusion — Conv+BN+Act, attention patterns merged
nChw16c memory layout — aligned to AVX-512 registers, maximizes L3 cache efficiency
Static shape compilation — no dynamic dispatch overhead
Thread affinity — avoids hyperthreading contention on 11 MB shared L3

INT8 quantization:

NNCF mixed-precision PTQ with 50 synthetic calibration images
On Skylake-SP (no VNNI): gain is primarily from memory bandwidth reduction, not compute throughput
On Cascade Lake+ (VNNI): INT8 dot-products give additional 2× throughput
On Sapphire Rapids (AMX): tile engines give additional 4-8× throughput
Model compression: 169 MB (PyTorch FP32) → 48 MB (OpenVINO INT8)

Resolution: 1024×1024 for maximum quality; 512×512 for ~4× lower latency with minor quality loss.

Architecture

Backbone: Swin Transformer Tiny (swin_v1_t) — 28M params, window size 7
Decoder: ASPP with deformable convolutions + bilateral reference blocks
Total: 44.3M parameters

Base Model

ZhengPeng7/BiRefNet_lite
Paper: arxiv:2401.03407

Citation

@article{BiRefNet,
  title={Bilateral Reference for High-Resolution Dichotomous Image Segmentation},
  author={Zheng, Peng and Gao, Dehong and Fan, Deng-Ping and Liu, Li and Laaksonen, Jorma and Ouyang, Wanli and Sebe, Nicu},
  journal={CAAI Artificial Intelligence Research},
  year={2024}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for ibrhr/BiRefNet-lite-openvino-xeon-w2145

Base model

ZhengPeng7/BiRefNet_lite

Quantized

(3)

this model

Space using ibrhr/BiRefNet-lite-openvino-xeon-w2145 1

Paper for ibrhr/BiRefNet-lite-openvino-xeon-w2145

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

Paper • 2401.03407 • Published Jan 7, 2024 • 4