--- base_model: AhmedZaky1/DIMI-Arabic-OCR library_name: peft language: - ar pipeline_tag: image-text-to-text tags: - vision - ocr - arabic - qwen2.5-vl - lora - unsloth - trl - transformers license: apache-2.0 datasets: - oddadmix/qari-0.2.2-news-dataset-large - oddadmix/qari-0.2.2-diacritics-dataset-large metrics: - wer - cer --- # DIMI Arabic OCR v2
*Accurate Arabic OCR model V2 for extracting printed Arabic text from images*
--- ## Model Description **DIMI Arabic OCR v2** is a specialized Arabic Optical Character Recognition model fine-tuned on **Qwen2.5-VL-7B-Instruct** using LoRA adapters. This is the **second iteration**, building upon v1 with improved diacritics handling and enhanced accuracy across diverse Arabic text scenarios. - **Developed by:** Ahmed Zaky - **Base Model:** AhmedZaky1/DIMI-Arabic-OCR (v1) - **Original Base:** Qwen/Qwen2.5-VL-7B-Instruct - **Model Type:** Vision-Language Model (VLM) for Arabic OCR - **Language:** Arabic (ar) - **License:** Apache 2.0 - **Fine-tuning Method:** LoRA (Low-Rank Adaptation) with 4-bit quantization ### Key Improvements Over v1 โœ… **30% reduction in WER** on diacritics-heavy text โœ… **Enhanced training dataset** with balanced diacritics representation โœ… **Improved generalization** across news articles and formal documents โœ… **Better preservation** of text formatting and structure ## ๐Ÿ“Š Performance Metrics ### Test Set Results (500 samples from 2,600) | Metric | Score | Description | |--------|-------|-------------| | **WER** | 0.3049 | Word Error Rate (โ†“ lower is better) | | **CER** | 0.1119 | Character Error Rate (โ†“ lower is better) | | **Perfect Predictions** | 23% | Exact matches with ground truth | ### Validation Set Results (100 samples) | Metric | Score | |--------|-------| | **WER** | 0.2315 | | **CER** | 0.0776 | ### Comparison with v1 | Model | Test WER | Test CER | Val WER | Val CER | |-------|----------|----------|---------|---------| | **v1** | 0.404 | 0.226 | 0.3308 | 0.1820 | | **v2** | **0.3049** โ†“ | **0.1119** โ†“ | **0.2315** | **0.0776** | **Improvements:** - **WER reduced by ~24.5%** (0.404 โ†’ 0.3049) - **CER reduced by ~50.5%** (0.226 โ†’ 0.1119) ## ๐ŸŽฏ Intended Use ### Direct Use This model is designed for extracting Arabic text from images, including: - ๐Ÿ“ฐ News articles and printed documents - ๐Ÿ“ Formal Arabic text with diacritics (ุชุดูƒูŠู„) - ๐Ÿ”ข Mixed Arabic text and numbers - ๐Ÿ“„ Scanned documents and screenshots ### Example Use Case ```python from unsloth import FastVisionModel from PIL import Image import torch # Load model model, tokenizer = FastVisionModel.from_pretrained( "AhmedZaky1/DIMI-Arabic-OCR-v2", load_in_4bit=True, device_map="auto" ) FastVisionModel.for_inference(model) # Load image image = Image.open("arabic_document.jpg") # Prepare prompt instruction = "ุงุณุชุฎุฑุฌ ุงู„ู†ุต ุงู„ุนุฑุจูŠ ูˆุงู„ุฃุฑู‚ุงู… ุงู„ู…ูˆุฌูˆุฏุฉ ููŠ ู‡ุฐู‡ ุงู„ุตูˆุฑุฉ ุจุฏู‚ุฉ ุนุงู„ูŠุฉ." messages = [ { "role": "user", "content": [ {"type": "image", "image": image}, {"type": "text", "text": instruction}, ], } ] # Apply chat template text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) # Tokenize inputs = tokenizer( text=[text], images=[image], padding=True, return_tensors="pt", truncation=False ).to("cuda") # Generate with torch.inference_mode(): outputs = model.generate( **inputs, max_new_tokens=2048, do_sample=False ) # Decode generated_ids = [ out[len(inp):] for inp, out in zip(inputs.input_ids, outputs) ] prediction = tokenizer.batch_decode( generated_ids, skip_special_tokens=True )[0] print(prediction) ``` ## ๐Ÿงพ Training Data Fine-tuned on **11,000 Arabic text images** combining: 1. [oddadmix/qari-0.2.2-news-dataset-large](https://huggingface.co/datasets/oddadmix/qari-0.2.2-news-dataset-large) 2. [oddadmix/qari-0.2.2-diacritics-dataset-large](https://huggingface.co/datasets/oddadmix/qari-0.2.2-diacritics-dataset-large) The dataset covers modern standard Arabic with and without diacritics. --- ## ๐Ÿ“š Citation If you use this model, please cite: ```bibtex @misc{dimi-arabic-ocr-2025, author = {Ahmed Zaky}, title = {DIMI-Arabic-OCR: Fine-tuned Qwen2.5-VL for Arabic Text Recognition}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/AhmedZaky1/DIMI-Arabic-OCR}} } ``` --- ### ๐Ÿ”— Related Projects - [DIMI Models Series](https://huggingface.co/AhmedZaky1) โ€” Arabic Vision & Language Models ---
**Built with โค๏ธ by Ahmed Zaky** *Advancing Arabic NLP through state-of-the-art embedding models*