eren23
/

dinov3-yolo11m-distilled-ocr

Model card Files Files and versions

Metrics Training metrics Community

eren23 commited on 16 days ago

Commit

d3f6529

·

verified ·

1 Parent(s): ca085e6

Create README.md

Files changed (1) hide show

README.md +68 -0

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+#  DINOv3 → YOLO11 Distilled OCR Detector
+This repository contains a **YOLO11-based OCR object detector** distilled from a **DINOv3 ViT-B/16 teacher** using **LightlyTrain**.
+The goal: produce a *lightweight but high-recall text box detector* suitable for OCR, ID scanning, document parsing, and multi-language text extraction.
+---
+##  Model Summary
+- **Teacher:** `dinov3/vitb16`
+- **Student:** `YOLO11s` (custom convolutional backbone)
+- **Method:** LightlyTrain `distillation` (features-only MSE loss)
+- **Data:** 1,200 unlabeled resume-like document crops + synthetic webpage/document images
+- **Use-case:** OCR region detection (not recognition)
+- **Export Format:** Ultralytics `.pt`
+- **File:** `exported_models/exported_last.pt`
+---
+##  Intended Use
+This model is trained to **detect text regions** inside real-world documents:
+- CVs / resumes
+- ID cards
+- Business documents
+- Screenshots
+- Webpage fragments
+- PDF pages (converted to images)
+It **does not perform OCR itself** — recognition should be done with a second-stage model (Tesseract, TrOCR, Nougat, PaddleOCR, VietOCR, etc.)
+---
+##  Example Usage
+### Python (Ultralytics)
+```python
+from ultralytics import YOLO
+model = YOLO("exported_last.pt")
+results = model("/content/example.jpg")
+results[0].show()  # visualize text boxes
+```
+### Extract BB
+boxes = results[0].boxes.xyxy.cpu().numpy()
+confs = results[0].boxes.conf.cpu().numpy()
+for xyxy, conf in zip(boxes, confs):
+    print(xyxy, conf)
+### Distillation
+lightly_train.train(
+    out="dinov3_yolo11_distilled",
+    data="/content/unlabeled_idl_images",
+    model="yolo11s",
+    method="distillation",
+    method_args={
+        "teacher": "dinov3/vitb16",
+        "teacher_weights": "/content/dinov3_vitb16_pretrain.pth"
+    },
+    epochs=2,
+    batch_size=4,
+    precision="16-mixed"
+)