YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DINOv3 → YOLO11 Distilled OCR Detector

This repository contains a YOLO11-based OCR object detector distilled from a DINOv3 ViT-B/16 teacher using LightlyTrain.
The goal: produce a lightweight but high-recall text box detector suitable for OCR, ID scanning, document parsing, and multi-language text extraction.

Model Summary

Teacher: dinov3/vitb16
Student: YOLO11s (custom convolutional backbone)
Method: LightlyTrain distillation (features-only MSE loss)
Data: 1,200 unlabeled resume-like document crops + synthetic webpage/document images
Use-case: OCR region detection (not recognition)
Export Format: Ultralytics .pt
File: exported_models/exported_last.pt

Intended Use

This model is trained to detect text regions inside real-world documents:

CVs / resumes
ID cards
Business documents
Screenshots
Webpage fragments
PDF pages (converted to images)

It does not perform OCR itself — recognition should be done with a second-stage model (Tesseract, TrOCR, Nougat, PaddleOCR, VietOCR, etc.)

Example Usage

Python (Ultralytics)

from ultralytics import YOLO

model = YOLO("exported_last.pt")
results = model("/content/example.jpg")

results[0].show()  # visualize text boxes

Extract BB

boxes = results[0].boxes.xyxy.cpu().numpy() confs = results[0].boxes.conf.cpu().numpy()

for xyxy, conf in zip(boxes, confs): print(xyxy, conf)

Distillation

lightly_train.train( out="dinov3_yolo11_distilled", data="/content/unlabeled_idl_images", model="yolo11s", method="distillation", method_args={ "teacher": "dinov3/vitb16", "teacher_weights": "/content/dinov3_vitb16_pretrain.pth" }, epochs=2, batch_size=4, precision="16-mixed" )

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support