BiRefNet Lite β€” Optimized for Intel Xeon W-2145 (Skylake-SP)

Optimized BiRefNet Lite (Swin-T backbone, 44M parameters) for Intel Xeon W-2145 CPU inference using OpenVINO with NNCF INT8 post-training quantization.

Model Variants

Path Format Size Resolution Best For
openvino_int8/birefnet_lite_1024x1024_int8.xml INT8 (NNCF PTQ) 48 MB 1024Γ—1024 Maximum quality, fastest on Skylake-SP
openvino_int8/birefnet_lite_512x512_int8.xml INT8 (NNCF PTQ) 48 MB 512Γ—512 Best throughput (~3 FPS)
openvino_fp16/birefnet_lite_*_fp16.xml FP16 weights 85 MB both Model size reduction; compute still FP32 on Skylake-SP
openvino_fp32/birefnet_lite_*.xml FP32 85 MB both Reference accuracy, no quantization
openvino_int8wo/birefnet_lite_*_int8wo.xml INT8 weight-only 48 MB both Alternative compression; minimal latency change on Skylake-SP

Which variant to use

CPU Architecture Recommended Variant Why
Skylake-SP (Xeon W-2145) INT8 (NNCF) ~7Γ— faster than PyTorch; INT8 from reduced memory bandwidth (no VNNI on Skylake)
Cascade Lake+ (VNNI) INT8 (NNCF) ~14Γ— faster; VNNI INT8 dot-product doubles throughput over Skylake-SP
Sapphire Rapids (AMX) INT8 or FP16 ~28-56Γ— faster; AMX tile engines give massive INT8/BF16 throughput
AMD Zen 4 (AVX-512) OpenVINO FP32 No VNNI β†’ INT8 gains limited to memory bandwidth; FP32 graph fusion is main win
Any CPU with <16GB RAM INT8 3.5Γ— model size reduction (169 MB β†’ 48 MB)

Quick Start

import openvino as ov
import numpy as np
from PIL import Image
from torchvision import transforms

# Load best variant for Skylake-SP
core = ov.Core()
model = core.read_model("openvino_int8/birefnet_lite_1024x1024_int8.xml")

# Optimal config for Xeon W-2145 (8 physical cores)
config = {
    "PERFORMANCE_HINT": "LATENCY",
    "NUM_STREAMS": "1",
    "INFERENCE_NUM_THREADS": "8",
}
compiled = core.compile_model(model, "CPU", config)
infer_req = compiled.create_infer_request()

# Preprocess
transform = transforms.Compose([
    transforms.Resize((1024, 1024)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

image = Image.open("input.jpg").convert("RGB")
input_tensor = transform(image).unsqueeze(0).numpy()

# Inference
result = infer_req.infer({0: input_tensor})
mask = 1 / (1 + np.exp(-result[0]))  # sigmoid
mask = (mask[0, 0] * 255).astype(np.uint8)

# Save
Image.fromarray(mask).resize(image.size).save("mask.png")

Environment Setup

export OMP_NUM_THREADS=8          # physical cores only
export KMP_AFFINITY="granularity=fine,compact,1,0"
export KMP_BLOCKTIME=1
export OMP_WAIT_POLICY=ACTIVE

pip install openvino torchvision pillow

Optimization Notes

Why OpenVINO is much faster than PyTorch on Xeon W-2145:

  1. AVX-512 SIMD vectorization β€” 16 FP32 ops/cycle
  2. Graph-level fusion β€” Conv+BN+Act, attention patterns merged
  3. nChw16c memory layout β€” aligned to AVX-512 registers, maximizes L3 cache efficiency
  4. Static shape compilation β€” no dynamic dispatch overhead
  5. Thread affinity β€” avoids hyperthreading contention on 11 MB shared L3

INT8 quantization:

  • NNCF mixed-precision PTQ with 50 synthetic calibration images
  • On Skylake-SP (no VNNI): gain is primarily from memory bandwidth reduction, not compute throughput
  • On Cascade Lake+ (VNNI): INT8 dot-products give additional 2Γ— throughput
  • On Sapphire Rapids (AMX): tile engines give additional 4-8Γ— throughput
  • Model compression: 169 MB (PyTorch FP32) β†’ 48 MB (OpenVINO INT8)

Resolution: 1024Γ—1024 for maximum quality; 512Γ—512 for ~4Γ— lower latency with minor quality loss.

Architecture

  • Backbone: Swin Transformer Tiny (swin_v1_t) β€” 28M params, window size 7
  • Decoder: ASPP with deformable convolutions + bilateral reference blocks
  • Total: 44.3M parameters

Base Model

Citation

@article{BiRefNet,
  title={Bilateral Reference for High-Resolution Dichotomous Image Segmentation},
  author={Zheng, Peng and Gao, Dehong and Fan, Deng-Ping and Liu, Li and Laaksonen, Jorma and Ouyang, Wanli and Sebe, Nicu},
  journal={CAAI Artificial Intelligence Research},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ibrhr/BiRefNet-lite-openvino-xeon-w2145

Quantized
(3)
this model

Space using ibrhr/BiRefNet-lite-openvino-xeon-w2145 1

Paper for ibrhr/BiRefNet-lite-openvino-xeon-w2145