---
language:
- ar
license: apache-2.0
library_name: transformers
tags:
- vision
- ocr
- arabic
- qwen2.5-vl
- unsloth
- lora
- text-recognition
- image-to-text
base_model: unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit
datasets:
- oddadmix/qari-0.2.2-news-dataset-large
- oddadmix/qari-0.2.2-diacritics-dataset-large
pipeline_tag: image-to-text
model-index:
- name: DIMI-Arabic-OCR
  results:
  - task:
      type: image-to-text
      name: Optical Character Recognition
    dataset:
      name: Combined Arabic OCR Dataset
      type: custom
    metrics:
    - type: wer
      value: 0.0XXX
      name: Word Error Rate
    - type: cer
      value: 0.0XXX
      name: Character Error Rate
---

# DIMI Arabic OCR

<div align="center">

<img src="https://cdn-uploads.huggingface.co/production/uploads/65fb3ac20cfe262da2bb0fcc/uOuEn0LNhSVEBbOLwfFUu.jpeg" width="300"/>

*Accurate Arabic OCR model for extracting printed Arabic text from images*

</div>

---

## 🧠 Overview

**DIMI-Arabic-OCR** is a fine-tuned **vision-language model (VLM)** specialized for **Arabic Optical Character Recognition (OCR)**.  
It extracts **printed Arabic text** from images with high accuracy — including **diacritics (tashkeel)** and punctuation.

- 🔤 **Language:** Arabic  
- 🧩 **Base Model:** Qwen2.5-VL-7B (via Unsloth 4-bit)  
- ⚙️ **Task:** Image-to-Text / OCR  
- 🪶 **Quantization:** 4-bit LoRA for efficient inference  
- 👨‍💻 **Author:** [Ahmed Zaky](https://huggingface.co/AhmedZaky1)

---

## 🚀 Quick Start

```python
# IMPORTANT: Import unsloth first!
import unsloth
from unsloth import FastVisionModel
from PIL import Image
import torch

# Load the model
model, tokenizer = FastVisionModel.from_pretrained(
    "AhmedZaky1/DIMI-Arabic-OCR",
    load_in_4bit=True,
    use_gradient_checkpointing="unsloth",
)

FastVisionModel.for_inference(model)

# Prepare your image
image = Image.open("/content/2.jpg")

# Arabic instruction
instruction = "استخرج النص العربي والأرقام الموجودة في هذه الصورة بدقة عالية جدًا، مع الحفاظ الكامل على الترتيب الأصلي والتنسيق."

# Prepare messages
messages = [
    {"role": "user", "content": [
        {"type": "image", "image": image},  # Include image here
        {"type": "text", "text": instruction}
    ]}
]

# Apply chat template
input_text = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True,
)

# Tokenize with proper parameters to avoid truncation
inputs = tokenizer(
    text=input_text,
    images=image,  
    return_tensors="pt",
    padding=True,
    truncation=False, 
    max_length=None,   
).to("cuda")

# Generate
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        do_sample=False,
        temperature=None,
        top_p=None,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

# Decode the prediction
generated_ids = outputs[0][inputs['input_ids'].shape[1]:]
prediction = tokenizer.decode(generated_ids, skip_special_tokens=True).strip()

print("Extracted Arabic Text:")
print(prediction)
```

---

## 🧩 Model Architecture

- **Base:** Qwen2.5-VL-7B-Instruct  
- **Fine-tuning:** LoRA (rank 16)  
- **Quantization:** 4-bit (bnb)  
- **Framework:** Unsloth for efficient training/inference  

---

## 📊 Evaluation

| Metric | Description | Score (↓ better) |
|--------|--------------|------------------|
| **CER** | Character Error Rate | `0.22` |
| **WER** | Word Error Rate | `0.40` |

Evaluation performed on a **2.6K image test set** from combined Arabic OCR datasets (news + diacritics).

---

## 🧾 Training Data

Fine-tuned on **26,000 Arabic text images** combining:
1. [oddadmix/qari-0.2.2-news-dataset-large](https://huggingface.co/datasets/oddadmix/qari-0.2.2-news-dataset-large)  
2. [oddadmix/qari-0.2.2-diacritics-dataset-large](https://huggingface.co/datasets/oddadmix/qari-0.2.2-diacritics-dataset-large)

The dataset covers modern standard Arabic with and without diacritics.

---

## 📚 Citation

If you use this model, please cite:

```bibtex
@misc{dimi-arabic-ocr-2025,
  author = {Ahmed Zaky},
  title = {DIMI-Arabic-OCR: Fine-tuned Qwen2.5-VL for Arabic Text Recognition},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/AhmedZaky1/DIMI-Arabic-OCR}}
}
```

---

### 🔗 Related Projects
- [DIMI Models Series](https://huggingface.co/AhmedZaky1) — Arabic Vision & Language Models


---

<div align="center">

**Built with ❤️ by Ahmed Zaky**

*Advancing Arabic NLP through state-of-the-art embedding models*

</div>