PyTorch
English
Khmer
kiri-ocr
ocr
handwritten

Kiri OCR Model

Kiri OCR is a lightweight, OCR library for English and Khmer documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package.

✨ Key Features

  • Lightweight: Compact model optimized for speed and efficiency.
  • Bi-lingual: Native support for English and Khmer (and mixed).
  • Document Processing: Automatic text line and word detection.

πŸ“Š Dataset

The model is trained on the mrrtmob/khmer_english_ocr_image_line dataset, which contains 12 million synthetic images of Khmer and English text lines.

πŸ’» Usage

Installation

pip install kiri-ocr

Python API

from kiri_ocr import OCR

# Initialize (loads from Hugging Face automatically)
ocr = OCR()

# Extract text
text, results = ocr.extract_text('document.jpg')
print(text)

CLI Tool

kiri-ocr predict path/to/document.jpg --output results/

Model Details

  • Architecture: CRNN (CNN + LSTM + CTC)
  • Framework: PyTorch
  • Input Size: Height 32px (width variable)

πŸ“ˆ Benchmarks

Results on synthetic test images (10 popular fonts):

benchmark_table.png

benchmark_graph.png

Downloads last month
36
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train mrrtmob/kiri-ocr

Space using mrrtmob/kiri-ocr 1