---
base_model: AhmedZaky1/DIMI-Arabic-OCR
library_name: peft
language:
- ar
pipeline_tag: image-text-to-text
tags:
- vision
- ocr
- arabic
- qwen2.5-vl
- lora
- unsloth
- trl
- transformers
license: apache-2.0
datasets:
- oddadmix/qari-0.2.2-news-dataset-large
- oddadmix/qari-0.2.2-diacritics-dataset-large
metrics:
- wer
- cer
---
# DIMI Arabic OCR v2

*Accurate Arabic OCR model V2 for extracting printed Arabic text from images*
---
## Model Description
**DIMI Arabic OCR v2** is a specialized Arabic Optical Character Recognition model fine-tuned on **Qwen2.5-VL-7B-Instruct** using LoRA adapters. This is the **second iteration**, building upon v1 with improved diacritics handling and enhanced accuracy across diverse Arabic text scenarios.
- **Developed by:** Ahmed Zaky
- **Base Model:** AhmedZaky1/DIMI-Arabic-OCR (v1)
- **Original Base:** Qwen/Qwen2.5-VL-7B-Instruct
- **Model Type:** Vision-Language Model (VLM) for Arabic OCR
- **Language:** Arabic (ar)
- **License:** Apache 2.0
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation) with 4-bit quantization
### Key Improvements Over v1
โ
**30% reduction in WER** on diacritics-heavy text
โ
**Enhanced training dataset** with balanced diacritics representation
โ
**Improved generalization** across news articles and formal documents
โ
**Better preservation** of text formatting and structure
## ๐ Performance Metrics
### Test Set Results (500 samples from 2,600)
| Metric | Score | Description |
|--------|-------|-------------|
| **WER** | 0.3049 | Word Error Rate (โ lower is better) |
| **CER** | 0.1119 | Character Error Rate (โ lower is better) |
| **Perfect Predictions** | 23% | Exact matches with ground truth |
### Validation Set Results (100 samples)
| Metric | Score |
|--------|-------|
| **WER** | 0.2315 |
| **CER** | 0.0776 |
### Comparison with v1
| Model | Test WER | Test CER | Val WER | Val CER |
|-------|----------|----------|---------|---------|
| **v1** | 0.404 | 0.226 | 0.3308 | 0.1820 |
| **v2** | **0.3049** โ | **0.1119** โ | **0.2315** | **0.0776** |
**Improvements:**
- **WER reduced by ~24.5%** (0.404 โ 0.3049)
- **CER reduced by ~50.5%** (0.226 โ 0.1119)
## ๐ฏ Intended Use
### Direct Use
This model is designed for extracting Arabic text from images, including:
- ๐ฐ News articles and printed documents
- ๐ Formal Arabic text with diacritics (ุชุดููู)
- ๐ข Mixed Arabic text and numbers
- ๐ Scanned documents and screenshots
### Example Use Case
```python
from unsloth import FastVisionModel
from PIL import Image
import torch
# Load model
model, tokenizer = FastVisionModel.from_pretrained(
"AhmedZaky1/DIMI-Arabic-OCR-v2",
load_in_4bit=True,
device_map="auto"
)
FastVisionModel.for_inference(model)
# Load image
image = Image.open("arabic_document.jpg")
# Prepare prompt
instruction = "ุงุณุชุฎุฑุฌ ุงููุต ุงูุนุฑุจู ูุงูุฃุฑูุงู
ุงูู
ูุฌูุฏุฉ ูู ูุฐู ุงูุตูุฑุฉ ุจุฏูุฉ ุนุงููุฉ."
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": instruction},
],
}
]
# Apply chat template
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
# Tokenize
inputs = tokenizer(
text=[text],
images=[image],
padding=True,
return_tensors="pt",
truncation=False
).to("cuda")
# Generate
with torch.inference_mode():
outputs = model.generate(
**inputs,
max_new_tokens=2048,
do_sample=False
)
# Decode
generated_ids = [
out[len(inp):] for inp, out in zip(inputs.input_ids, outputs)
]
prediction = tokenizer.batch_decode(
generated_ids,
skip_special_tokens=True
)[0]
print(prediction)
```
## ๐งพ Training Data
Fine-tuned on **11,000 Arabic text images** combining:
1. [oddadmix/qari-0.2.2-news-dataset-large](https://huggingface.co/datasets/oddadmix/qari-0.2.2-news-dataset-large)
2. [oddadmix/qari-0.2.2-diacritics-dataset-large](https://huggingface.co/datasets/oddadmix/qari-0.2.2-diacritics-dataset-large)
The dataset covers modern standard Arabic with and without diacritics.
---
## ๐ Citation
If you use this model, please cite:
```bibtex
@misc{dimi-arabic-ocr-2025,
author = {Ahmed Zaky},
title = {DIMI-Arabic-OCR: Fine-tuned Qwen2.5-VL for Arabic Text Recognition},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/AhmedZaky1/DIMI-Arabic-OCR}}
}
```
---
### ๐ Related Projects
- [DIMI Models Series](https://huggingface.co/AhmedZaky1) โ Arabic Vision & Language Models
---
**Built with โค๏ธ by Ahmed Zaky**
*Advancing Arabic NLP through state-of-the-art embedding models*