DocLayout-YOLO for Document Layout Analysis

This model is based on DocLayout-YOLO, fine-tuned on DocStructBench for document layout detection.

Model Description

DocLayout-YOLO is a real-time and robust layout detection model for diverse documents, based on YOLO-v10. It can detect:

title - Document titles
plain_text - Regular text blocks
figure - Images and graphics
figure_caption - Captions for figures
table - Tables
table_caption - Captions for tables
table_footnote - Footnotes in tables
isolate_formula - Mathematical formulas
formula_caption - Captions for formulas
abandon - Elements to ignore

Usage via Inference Endpoint

import requests
import base64

API_URL = "https://your-endpoint-url.huggingface.cloud"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}

# Load and encode image
with open("document.png", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

# Make request
response = requests.post(
    API_URL,
    headers=headers,
    json={
        "inputs": image_b64,
        "parameters": {
            "confidence": 0.2,
            "iou_threshold": 0.45
        }
    }
)

detections = response.json()
print(detections)

Response Format

[
  {
    "label": "title",
    "score": 0.95,
    "box": {"x1": 100, "y1": 50, "x2": 500, "y2": 80}
  },
  {
    "label": "plain_text",
    "score": 0.92,
    "box": {"x1": 100, "y1": 100, "x2": 500, "y2": 400}
  }
]

Credits

Based on opendatalab/DocLayout-YOLO

@misc{zhao2024doclayoutyoloenhancingdocumentlayout,
  title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception},
  author={Zhiyuan Zhao and Hengrui Kang and Bin Wang and Conghui He},
  year={2024},
  eprint={2410.12628},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for basiliskan/YOLOMH

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Paper • 2410.12628 • Published Oct 16, 2024 • 41