YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DINOv3 β†’ YOLO11 Distilled OCR Detector

This repository contains a YOLO11-based OCR object detector distilled from a DINOv3 ViT-B/16 teacher using LightlyTrain.
The goal: produce a lightweight but high-recall text box detector suitable for OCR, ID scanning, document parsing, and multi-language text extraction.


Model Summary

  • Teacher: dinov3/vitb16
  • Student: YOLO11s (custom convolutional backbone)
  • Method: LightlyTrain distillation (features-only MSE loss)
  • Data: 1,200 unlabeled resume-like document crops + synthetic webpage/document images
  • Use-case: OCR region detection (not recognition)
  • Export Format: Ultralytics .pt
  • File: exported_models/exported_last.pt

Intended Use

This model is trained to detect text regions inside real-world documents:

  • CVs / resumes
  • ID cards
  • Business documents
  • Screenshots
  • Webpage fragments
  • PDF pages (converted to images)

It does not perform OCR itself β€” recognition should be done with a second-stage model (Tesseract, TrOCR, Nougat, PaddleOCR, VietOCR, etc.)


Example Usage

Python (Ultralytics)

from ultralytics import YOLO

model = YOLO("exported_last.pt")
results = model("/content/example.jpg")

results[0].show()  # visualize text boxes

Extract BB

boxes = results[0].boxes.xyxy.cpu().numpy() confs = results[0].boxes.conf.cpu().numpy()

for xyxy, conf in zip(boxes, confs): print(xyxy, conf)

Distillation

lightly_train.train( out="dinov3_yolo11_distilled", data="/content/unlabeled_idl_images", model="yolo11s", method="distillation", method_args={ "teacher": "dinov3/vitb16", "teacher_weights": "/content/dinov3_vitb16_pretrain.pth" }, epochs=2, batch_size=4, precision="16-mixed" )

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support