DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Paper
•
2410.12628
•
Published
•
41
This model is based on DocLayout-YOLO, fine-tuned on DocStructBench for document layout detection.
DocLayout-YOLO is a real-time and robust layout detection model for diverse documents, based on YOLO-v10. It can detect:
import requests
import base64
API_URL = "https://your-endpoint-url.huggingface.cloud"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}
# Load and encode image
with open("document.png", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
# Make request
response = requests.post(
API_URL,
headers=headers,
json={
"inputs": image_b64,
"parameters": {
"confidence": 0.2,
"iou_threshold": 0.45
}
}
)
detections = response.json()
print(detections)
[
{
"label": "title",
"score": 0.95,
"box": {"x1": 100, "y1": 50, "x2": 500, "y2": 80}
},
{
"label": "plain_text",
"score": 0.92,
"box": {"x1": 100, "y1": 100, "x2": 500, "y2": 400}
}
]
Based on opendatalab/DocLayout-YOLO
@misc{zhao2024doclayoutyoloenhancingdocumentlayout,
title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception},
author={Zhiyuan Zhao and Hengrui Kang and Bin Wang and Conghui He},
year={2024},
eprint={2410.12628},
archivePrefix={arXiv},
primaryClass={cs.CV}
}