CernisOCR

A vision language model OCR model fine-tuned on Qwen2.5-VL-7B-Instruct for handling mathematical formulas, handwritten text, and structured documents in a single model.

Model Description

CernisOCR is a vision language model, optimized for diverse OCR tasks across multiple document domains. Unlike domain-specific OCR models, CernisOCR unifies three traditionally separate OCR tasks into a single, efficient model:

  • Mathematical LaTeX conversion: Converts handwritten or printed mathematical formulas to LaTeX notation
  • Handwritten text transcription: Transcribes cursive and printed handwriting
  • Structured document extraction: Extracts structured data from invoices and receipts

Key Features:

  • Multi-domain capability in a single model
  • Handles varied image types, layouts, and text styles
  • Extracts both raw text and structured information
  • Robust to noise and variable image quality

Training Details

  • Base Model: Qwen2.5-VL-7B-Instruct
  • Training Data: 10,000 samples from three domains:
    • LaTeX OCR: 3,978 samples (mathematical notation)
    • Invoices & Receipts: 2,043 samples (structured documents)
    • Handwritten Text: 3,978 samples (handwriting transcription)
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Training Loss: Reduced from 4.802 to 0.116 (97.6% improvement)
  • Training Time: ~8.7 minutes on RTX 5090

Intended Use

This model is designed for:

  • Mathematical formula recognition and LaTeX conversion
  • Handwritten text transcription
  • Invoice and receipt data extraction
  • Multi-domain document processing workflows
  • Applications requiring unified OCR across different document types

How to Use

from unsloth import FastVisionModel
from transformers import AutoTokenizer
from PIL import Image

# Load model and tokenizer
model, tokenizer = FastVisionModel.from_pretrained(
    "coolAI/cernis-ocr",  # or "coolAI/cernis-vision-ocr" for merged model
    load_in_4bit=True,
)
FastVisionModel.for_inference(model)

# Example 1: LaTeX conversion
image = Image.open("formula.png")
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Write the LaTeX representation for this image."}
    ]
}]

# Example 2: Handwritten transcription
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Transcribe the handwritten text in this image."}
    ]
}]

# Example 3: Invoice extraction
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Extract and structure all text content from this invoice/receipt image."}
    ]
}]

# Generate
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)

Citation

If you use this model, please cite:

@misc{cernis-ocr,
  title={CernisOCR: A Unified Multi-Domain OCR Model},
  author={Cernis AI},
  year={2025},
  howpublished={\url{https://huggingface.co/coolAI/cernis-ocr}}
}

Acknowledgments

Built using Unsloth for efficient fine-tuning. Training data sourced from publicly available OCR datasets on Hugging Face.

Downloads last month
9
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cernis-intelligence/cernis-vision-ocr

Adapter
(4)
this model