--- base_model: deepseek-ai/DeepSeek-OCR tags: - text-generation-inference - transformers - unsloth - deepseek_vl_v2 license: apache-2.0 language: - en - de datasets: - neuralabs/german-synth-ocr ---
DeepSeek OCR
# DeepSeek OCR - Fine-tuned for German/Deutsch This model is a fine-tuned version of DeepSeek OCR on German text for Optical Character Recognition (OCR) tasks. ## Model Description - **Base Model:** DeepSeek OCR - **Language:** German (de) - **Task:** Image-to-Text (OCR) - **Training Data:** 200K synthetic German text images - **License:** Apache 2.0 This model has been fine-tuned specifically for recognizing German text in images, including handling of German-specific characters (ä, ö, ü, ß) and common German compound words. ## Intended Uses This model is designed for: - Extracting German text from scanned documents - Digitizing printed German materials - Reading German text from photographs - Processing German forms and receipts - Any German text recognition tasks ## How to Use ### Basic Usage ```python from transformers import TrOCRProcessor, VisionEncoderDecoderModel from PIL import Image import requests # Load model and processor processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german") model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german") # Load image url = "path_to_your_german_text_image.jpg" image = Image.open(url).convert("RGB") # Process pixel_values = processor(image, return_tensors="pt").pixel_values generated_ids = model.generate(pixel_values) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] print(generated_text) ``` ### Batch Processing ```python from transformers import TrOCRProcessor, VisionEncoderDecoderModel from PIL import Image processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german") model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german") # Multiple images images = [Image.open(f"image_{i}.jpg").convert("RGB") for i in range(5)] # Batch process pixel_values = processor(images, return_tensors="pt", padding=True).pixel_values generated_ids = model.generate(pixel_values) generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True) for text in generated_texts: print(text) ``` ### With GPU Acceleration ```python import torch from transformers import TrOCRProcessor, VisionEncoderDecoderModel from PIL import Image device = "cuda" if torch.cuda.is_available() else "cpu" processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german") model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german").to(device) image = Image.open("german_text.jpg").convert("RGB") pixel_values = processor(image, return_tensors="pt").pixel_values.to(device) generated_ids = model.generate(pixel_values) text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] print(text) ``` ## Training Details ### Training Data The model was fine-tuned on a synthetic German OCR dataset containing 200,000 images with: - Diverse German sentences covering multiple domains (everyday conversation, news, literature, technical, business) - Various fonts and font sizes (16-48pt) - Multiple augmentations: noise, blur, brightness/contrast variations - Different text and background colors **Data Split:** - Train: 180,000 samples (90%) - Validation: 10,000 samples (5%) - Test: 10,000 samples (5%) ### Training Framework ```python # Example training configuration from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments training_args = Seq2SeqTrainingArguments( output_dir="./deepseek-ocr-german", per_device_train_batch_size=8, per_device_eval_batch_size=8, learning_rate=5e-5, num_train_epochs=10, logging_steps=100, save_steps=1000, eval_steps=1000, evaluation_strategy="steps", save_total_limit=2, fp16=True, predict_with_generate=True, ) ``` # # Limitations - **Font coverage:** Performance may vary with handwritten text - **Image quality:** Works best with clear, high-contrast images - **Domain specificity:** Best performance on printed German text similar to training distribution ## Citation If you use this model, please cite: ```bibtex @misc{deepseek-ocr-german, author = {Santosh Pandit}, title = {DeepSeek OCR - German Fine-tuned}, year = {2025}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/YOUR_USERNAME/deepseek-ocr-german}}, } ``` ## Model Card Contact For questions or feedback, please open an issue on the model repository or contact [hello@neuralabs.one]. --- ### Acknowledgments - Base model: DeepSeek AI - Training data generation: LM Studio with local LLM - Framework: Hugging Face Transformers # Uploaded finetuned model - **Developed by:** neuralabs - **License:** apache-2.0 - **Finetuned from model :** deepseek-ai/DeepSeek-OCR