DocLayout-YOLO for Document Layout Analysis

This model is based on DocLayout-YOLO, fine-tuned on DocStructBench for document layout detection.

Model Description

DocLayout-YOLO is a real-time and robust layout detection model for diverse documents, based on YOLO-v10. It can detect:

  • title - Document titles
  • plain_text - Regular text blocks
  • figure - Images and graphics
  • figure_caption - Captions for figures
  • table - Tables
  • table_caption - Captions for tables
  • table_footnote - Footnotes in tables
  • isolate_formula - Mathematical formulas
  • formula_caption - Captions for formulas
  • abandon - Elements to ignore

Usage via Inference Endpoint

import requests
import base64

API_URL = "https://your-endpoint-url.huggingface.cloud"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}

# Load and encode image
with open("document.png", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

# Make request
response = requests.post(
    API_URL,
    headers=headers,
    json={
        "inputs": image_b64,
        "parameters": {
            "confidence": 0.2,
            "iou_threshold": 0.45
        }
    }
)

detections = response.json()
print(detections)

Response Format

[
  {
    "label": "title",
    "score": 0.95,
    "box": {"x1": 100, "y1": 50, "x2": 500, "y2": 80}
  },
  {
    "label": "plain_text",
    "score": 0.92,
    "box": {"x1": 100, "y1": 100, "x2": 500, "y2": 400}
  }
]

Credits

Based on opendatalab/DocLayout-YOLO

@misc{zhao2024doclayoutyoloenhancingdocumentlayout,
  title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception},
  author={Zhiyuan Zhao and Hengrui Kang and Bin Wang and Conghui He},
  year={2024},
  eprint={2410.12628},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for basiliskan/YOLOMH