deepseek_ocr_de / README.md
neuraax's picture
Update README.md
a5a31d7 verified
---
base_model: deepseek-ai/DeepSeek-OCR
tags:
- text-generation-inference
- transformers
- unsloth
- deepseek_vl_v2
license: apache-2.0
language:
- en
- de
datasets:
- neuralabs/german-synth-ocr
---
<div align="center">
<img src="https://huggingface.co/neuralabs/deepseek_ocr_de/resolve/main/neuranew.png" alt="DeepSeek OCR"/>
</div>
# DeepSeek OCR - Fine-tuned for German/Deutsch
This model is a fine-tuned version of DeepSeek OCR on German text for Optical Character Recognition (OCR) tasks.
## Model Description
- **Base Model:** DeepSeek OCR
- **Language:** German (de)
- **Task:** Image-to-Text (OCR)
- **Training Data:** 200K synthetic German text images
- **License:** Apache 2.0
This model has been fine-tuned specifically for recognizing German text in images, including handling of German-specific characters (ä, ö, ü, ß) and common German compound words.
## Intended Uses
This model is designed for:
- Extracting German text from scanned documents
- Digitizing printed German materials
- Reading German text from photographs
- Processing German forms and receipts
- Any German text recognition tasks
## How to Use
### Basic Usage
```python
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
# Load model and processor
processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
# Load image
url = "path_to_your_german_text_image.jpg"
image = Image.open(url).convert("RGB")
# Process
pixel_values = processor(image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
```
### Batch Processing
```python
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
# Multiple images
images = [Image.open(f"image_{i}.jpg").convert("RGB") for i in range(5)]
# Batch process
pixel_values = processor(images, return_tensors="pt", padding=True).pixel_values
generated_ids = model.generate(pixel_values)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
for text in generated_texts:
print(text)
```
### With GPU Acceleration
```python
import torch
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german").to(device)
image = Image.open("german_text.jpg").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values.to(device)
generated_ids = model.generate(pixel_values)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text)
```
## Training Details
### Training Data
The model was fine-tuned on a synthetic German OCR dataset containing 200,000 images with:
- Diverse German sentences covering multiple domains (everyday conversation, news, literature, technical, business)
- Various fonts and font sizes (16-48pt)
- Multiple augmentations: noise, blur, brightness/contrast variations
- Different text and background colors
**Data Split:**
- Train: 180,000 samples (90%)
- Validation: 10,000 samples (5%)
- Test: 10,000 samples (5%)
### Training Framework
```python
# Example training configuration
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
training_args = Seq2SeqTrainingArguments(
output_dir="./deepseek-ocr-german",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
learning_rate=5e-5,
num_train_epochs=10,
logging_steps=100,
save_steps=1000,
eval_steps=1000,
evaluation_strategy="steps",
save_total_limit=2,
fp16=True,
predict_with_generate=True,
)
```
# # Limitations
- **Font coverage:** Performance may vary with handwritten text
- **Image quality:** Works best with clear, high-contrast images
- **Domain specificity:** Best performance on printed German text similar to training distribution
## Citation
If you use this model, please cite:
```bibtex
@misc{deepseek-ocr-german,
author = {Santosh Pandit},
title = {DeepSeek OCR - German Fine-tuned},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/YOUR_USERNAME/deepseek-ocr-german}},
}
```
## Model Card Contact
For questions or feedback, please open an issue on the model repository or contact [[email protected]].
---
### Acknowledgments
- Base model: DeepSeek AI
- Training data generation: LM Studio with local LLM
- Framework: Hugging Face Transformers
# Uploaded finetuned model
- **Developed by:** neuralabs
- **License:** apache-2.0
- **Finetuned from model :** deepseek-ai/DeepSeek-OCR