--- language: - ar license: apache-2.0 library_name: transformers tags: - vision - ocr - arabic - qwen2.5-vl - unsloth - lora - text-recognition - image-to-text base_model: unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit datasets: - oddadmix/qari-0.2.2-news-dataset-large - oddadmix/qari-0.2.2-diacritics-dataset-large pipeline_tag: image-to-text model-index: - name: DIMI-Arabic-OCR results: - task: type: image-to-text name: Optical Character Recognition dataset: name: Combined Arabic OCR Dataset type: custom metrics: - type: wer value: 0.0XXX name: Word Error Rate - type: cer value: 0.0XXX name: Character Error Rate --- # DIMI Arabic OCR
*Accurate Arabic OCR model for extracting printed Arabic text from images*
--- ## ๐Ÿง  Overview **DIMI-Arabic-OCR** is a fine-tuned **vision-language model (VLM)** specialized for **Arabic Optical Character Recognition (OCR)**. It extracts **printed Arabic text** from images with high accuracy โ€” including **diacritics (tashkeel)** and punctuation. - ๐Ÿ”ค **Language:** Arabic - ๐Ÿงฉ **Base Model:** Qwen2.5-VL-7B (via Unsloth 4-bit) - โš™๏ธ **Task:** Image-to-Text / OCR - ๐Ÿชถ **Quantization:** 4-bit LoRA for efficient inference - ๐Ÿ‘จโ€๐Ÿ’ป **Author:** [Ahmed Zaky](https://huggingface.co/AhmedZaky1) --- ## ๐Ÿš€ Quick Start ```python # IMPORTANT: Import unsloth first! import unsloth from unsloth import FastVisionModel from PIL import Image import torch # Load the model model, tokenizer = FastVisionModel.from_pretrained( "AhmedZaky1/DIMI-Arabic-OCR", load_in_4bit=True, use_gradient_checkpointing="unsloth", ) FastVisionModel.for_inference(model) # Prepare your image image = Image.open("/content/2.jpg") # Arabic instruction instruction = "ุงุณุชุฎุฑุฌ ุงู„ู†ุต ุงู„ุนุฑุจูŠ ูˆุงู„ุฃุฑู‚ุงู… ุงู„ู…ูˆุฌูˆุฏุฉ ููŠ ู‡ุฐู‡ ุงู„ุตูˆุฑุฉ ุจุฏู‚ุฉ ุนุงู„ูŠุฉ ุฌุฏู‹ุงุŒ ู…ุน ุงู„ุญูุงุธ ุงู„ูƒุงู…ู„ ุนู„ู‰ ุงู„ุชุฑุชูŠุจ ุงู„ุฃุตู„ูŠ ูˆุงู„ุชู†ุณูŠู‚." # Prepare messages messages = [ {"role": "user", "content": [ {"type": "image", "image": image}, # Include image here {"type": "text", "text": instruction} ]} ] # Apply chat template input_text = tokenizer.apply_chat_template( messages, add_generation_prompt=True, ) # Tokenize with proper parameters to avoid truncation inputs = tokenizer( text=input_text, images=image, return_tensors="pt", padding=True, truncation=False, max_length=None, ).to("cuda") # Generate with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=2048, do_sample=False, temperature=None, top_p=None, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, ) # Decode the prediction generated_ids = outputs[0][inputs['input_ids'].shape[1]:] prediction = tokenizer.decode(generated_ids, skip_special_tokens=True).strip() print("Extracted Arabic Text:") print(prediction) ``` --- ## ๐Ÿงฉ Model Architecture - **Base:** Qwen2.5-VL-7B-Instruct - **Fine-tuning:** LoRA (rank 16) - **Quantization:** 4-bit (bnb) - **Framework:** Unsloth for efficient training/inference --- ## ๐Ÿ“Š Evaluation | Metric | Description | Score (โ†“ better) | |--------|--------------|------------------| | **CER** | Character Error Rate | `0.22` | | **WER** | Word Error Rate | `0.40` | Evaluation performed on a **2.6K image test set** from combined Arabic OCR datasets (news + diacritics). --- ## ๐Ÿงพ Training Data Fine-tuned on **26,000 Arabic text images** combining: 1. [oddadmix/qari-0.2.2-news-dataset-large](https://huggingface.co/datasets/oddadmix/qari-0.2.2-news-dataset-large) 2. [oddadmix/qari-0.2.2-diacritics-dataset-large](https://huggingface.co/datasets/oddadmix/qari-0.2.2-diacritics-dataset-large) The dataset covers modern standard Arabic with and without diacritics. --- ## ๐Ÿ“š Citation If you use this model, please cite: ```bibtex @misc{dimi-arabic-ocr-2025, author = {Ahmed Zaky}, title = {DIMI-Arabic-OCR: Fine-tuned Qwen2.5-VL for Arabic Text Recognition}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/AhmedZaky1/DIMI-Arabic-OCR}} } ``` --- ### ๐Ÿ”— Related Projects - [DIMI Models Series](https://huggingface.co/AhmedZaky1) โ€” Arabic Vision & Language Models ---
**Built with โค๏ธ by Ahmed Zaky** *Advancing Arabic NLP through state-of-the-art embedding models*