Kiri OCR Model
Kiri OCR is a lightweight, OCR library for English and Khmer documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package.
β¨ Key Features
- Lightweight: Compact model optimized for speed and efficiency.
- Bi-lingual: Native support for English and Khmer (and mixed).
- Document Processing: Automatic text line and word detection.
π Dataset
The model is trained on the mrrtmob/khmer_english_ocr_image_line dataset, which contains 12 million synthetic images of Khmer and English text lines.
π» Usage
Installation
pip install kiri-ocr
Python API
from kiri_ocr import OCR
# Initialize (loads from Hugging Face automatically)
ocr = OCR()
# Extract text
text, results = ocr.extract_text('document.jpg')
print(text)
CLI Tool
kiri-ocr predict path/to/document.jpg --output results/
Model Details
- Architecture: CRNN (CNN + LSTM + CTC)
- Framework: PyTorch
- Input Size: Height 32px (width variable)
π Benchmarks
Results on synthetic test images (10 popular fonts):
- Downloads last month
- 36
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support

