---
base_model: AhmedZaky1/DIMI-Arabic-OCR
library_name: peft
language:
- ar
pipeline_tag: image-text-to-text
tags:
- vision
- ocr
- arabic
- qwen2.5-vl
- lora
- unsloth
- trl
- transformers
license: apache-2.0
datasets:
- oddadmix/qari-0.2.2-news-dataset-large
- oddadmix/qari-0.2.2-diacritics-dataset-large
metrics:
- wer
- cer
---

# DIMI Arabic OCR v2

<div align="center">

<img src="https://cdn-uploads.huggingface.co/production/uploads/65fb3ac20cfe262da2bb0fcc/uOuEn0LNhSVEBbOLwfFUu.jpeg" width="300"/>

*Accurate Arabic OCR model V2 for extracting printed Arabic text from images*

</div>

---

## Model Description

**DIMI Arabic OCR v2** is a specialized Arabic Optical Character Recognition model fine-tuned on **Qwen2.5-VL-7B-Instruct** using LoRA adapters. This is the **second iteration**, building upon v1 with improved diacritics handling and enhanced accuracy across diverse Arabic text scenarios.

- **Developed by:** Ahmed Zaky
- **Base Model:** AhmedZaky1/DIMI-Arabic-OCR (v1)
- **Original Base:** Qwen/Qwen2.5-VL-7B-Instruct
- **Model Type:** Vision-Language Model (VLM) for Arabic OCR
- **Language:** Arabic (ar)
- **License:** Apache 2.0
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation) with 4-bit quantization

### Key Improvements Over v1

✅ **30% reduction in WER** on diacritics-heavy text  
✅ **Enhanced training dataset** with balanced diacritics representation  
✅ **Improved generalization** across news articles and formal documents  
✅ **Better preservation** of text formatting and structure

## 📊 Performance Metrics

### Test Set Results (500 samples from 2,600)

| Metric | Score | Description |
|--------|-------|-------------|
| **WER** | 0.3049 | Word Error Rate (↓ lower is better) |
| **CER** | 0.1119 | Character Error Rate (↓ lower is better) |
| **Perfect Predictions** | 23% | Exact matches with ground truth |

### Validation Set Results (100 samples)

| Metric | Score |
|--------|-------|
| **WER** | 0.2315 |
| **CER** | 0.0776 |

### Comparison with v1

| Model | Test WER | Test CER | Val WER | Val CER |
|-------|----------|----------|---------|---------|
| **v1** | 0.404 | 0.226 | 0.3308 | 0.1820 |
| **v2** | **0.3049** ↓ | **0.1119** ↓ | **0.2315** | **0.0776** |

**Improvements:**
- **WER reduced by ~24.5%** (0.404 → 0.3049)
- **CER reduced by ~50.5%** (0.226 → 0.1119)

## 🎯 Intended Use

### Direct Use

This model is designed for extracting Arabic text from images, including:
- 📰 News articles and printed documents
- 📝 Formal Arabic text with diacritics (تشكيل)
- 🔢 Mixed Arabic text and numbers
- 📄 Scanned documents and screenshots

### Example Use Case
```python
from unsloth import FastVisionModel
from PIL import Image
import torch

# Load model
model, tokenizer = FastVisionModel.from_pretrained(
    "AhmedZaky1/DIMI-Arabic-OCR-v2",
    load_in_4bit=True,
    device_map="auto"
)
FastVisionModel.for_inference(model)

# Load image
image = Image.open("arabic_document.jpg")

# Prepare prompt
instruction = "استخرج النص العربي والأرقام الموجودة في هذه الصورة بدقة عالية."

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": instruction},
        ],
    }
]

# Apply chat template
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

# Tokenize
inputs = tokenizer(
    text=[text],
    images=[image],
    padding=True,
    return_tensors="pt",
    truncation=False
).to("cuda")

# Generate
with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        do_sample=False
    )

# Decode
generated_ids = [
    out[len(inp):] for inp, out in zip(inputs.input_ids, outputs)
]
prediction = tokenizer.batch_decode(
    generated_ids, 
    skip_special_tokens=True
)[0]

print(prediction)
```

## 🧾 Training Data

Fine-tuned on **11,000 Arabic text images** combining:
1. [oddadmix/qari-0.2.2-news-dataset-large](https://huggingface.co/datasets/oddadmix/qari-0.2.2-news-dataset-large)  
2. [oddadmix/qari-0.2.2-diacritics-dataset-large](https://huggingface.co/datasets/oddadmix/qari-0.2.2-diacritics-dataset-large)

The dataset covers modern standard Arabic with and without diacritics.

---

## 📚 Citation

If you use this model, please cite:

```bibtex
@misc{dimi-arabic-ocr-2025,
  author = {Ahmed Zaky},
  title = {DIMI-Arabic-OCR: Fine-tuned Qwen2.5-VL for Arabic Text Recognition},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/AhmedZaky1/DIMI-Arabic-OCR}}
}
```

---

### 🔗 Related Projects
- [DIMI Models Series](https://huggingface.co/AhmedZaky1) — Arabic Vision & Language Models


---

<div align="center">

**Built with ❤️ by Ahmed Zaky**

*Advancing Arabic NLP through state-of-the-art embedding models*

</div>