DIMI-Arabic-OCR / README.md
AhmedZaky1's picture
Update README.md
d53140e verified
metadata
language:
  - ar
license: apache-2.0
library_name: transformers
tags:
  - vision
  - ocr
  - arabic
  - qwen2.5-vl
  - unsloth
  - lora
  - text-recognition
  - image-to-text
base_model: unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit
datasets:
  - oddadmix/qari-0.2.2-news-dataset-large
  - oddadmix/qari-0.2.2-diacritics-dataset-large
pipeline_tag: image-to-text
model-index:
  - name: DIMI-Arabic-OCR
    results:
      - task:
          type: image-to-text
          name: Optical Character Recognition
        dataset:
          name: Combined Arabic OCR Dataset
          type: custom
        metrics:
          - type: wer
            value: 0.0XXX
            name: Word Error Rate
          - type: cer
            value: 0.0XXX
            name: Character Error Rate

DIMI Arabic OCR

Accurate Arabic OCR model for extracting printed Arabic text from images


🧠 Overview

DIMI-Arabic-OCR is a fine-tuned vision-language model (VLM) specialized for Arabic Optical Character Recognition (OCR).
It extracts printed Arabic text from images with high accuracy — including diacritics (tashkeel) and punctuation.

  • 🔤 Language: Arabic
  • 🧩 Base Model: Qwen2.5-VL-7B (via Unsloth 4-bit)
  • ⚙️ Task: Image-to-Text / OCR
  • 🪶 Quantization: 4-bit LoRA for efficient inference
  • 👨‍💻 Author: Ahmed Zaky

🚀 Quick Start

# IMPORTANT: Import unsloth first!
import unsloth
from unsloth import FastVisionModel
from PIL import Image
import torch

# Load the model
model, tokenizer = FastVisionModel.from_pretrained(
    "AhmedZaky1/DIMI-Arabic-OCR",
    load_in_4bit=True,
    use_gradient_checkpointing="unsloth",
)

FastVisionModel.for_inference(model)

# Prepare your image
image = Image.open("/content/2.jpg")

# Arabic instruction
instruction = "استخرج النص العربي والأرقام الموجودة في هذه الصورة بدقة عالية جدًا، مع الحفاظ الكامل على الترتيب الأصلي والتنسيق."

# Prepare messages
messages = [
    {"role": "user", "content": [
        {"type": "image", "image": image},  # Include image here
        {"type": "text", "text": instruction}
    ]}
]

# Apply chat template
input_text = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True,
)

# Tokenize with proper parameters to avoid truncation
inputs = tokenizer(
    text=input_text,
    images=image,  
    return_tensors="pt",
    padding=True,
    truncation=False, 
    max_length=None,   
).to("cuda")

# Generate
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        do_sample=False,
        temperature=None,
        top_p=None,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

# Decode the prediction
generated_ids = outputs[0][inputs['input_ids'].shape[1]:]
prediction = tokenizer.decode(generated_ids, skip_special_tokens=True).strip()

print("Extracted Arabic Text:")
print(prediction)

🧩 Model Architecture

  • Base: Qwen2.5-VL-7B-Instruct
  • Fine-tuning: LoRA (rank 16)
  • Quantization: 4-bit (bnb)
  • Framework: Unsloth for efficient training/inference

📊 Evaluation

Metric Description Score (↓ better)
CER Character Error Rate 0.22
WER Word Error Rate 0.40

Evaluation performed on a 2.6K image test set from combined Arabic OCR datasets (news + diacritics).


🧾 Training Data

Fine-tuned on 26,000 Arabic text images combining:

  1. oddadmix/qari-0.2.2-news-dataset-large
  2. oddadmix/qari-0.2.2-diacritics-dataset-large

The dataset covers modern standard Arabic with and without diacritics.


📚 Citation

If you use this model, please cite:

@misc{dimi-arabic-ocr-2025,
  author = {Ahmed Zaky},
  title = {DIMI-Arabic-OCR: Fine-tuned Qwen2.5-VL for Arabic Text Recognition},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/AhmedZaky1/DIMI-Arabic-OCR}}
}

🔗 Related Projects


Built with ❤️ by Ahmed Zaky

Advancing Arabic NLP through state-of-the-art embedding models