|
|
--- |
|
|
base_model: deepseek-ai/DeepSeek-OCR |
|
|
tags: |
|
|
- text-generation-inference |
|
|
- transformers |
|
|
- unsloth |
|
|
- deepseek_vl_v2 |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- de |
|
|
datasets: |
|
|
- neuralabs/german-synth-ocr |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
<img src="https://huggingface.co/neuralabs/deepseek_ocr_de/resolve/main/neuranew.png" alt="DeepSeek OCR"/> |
|
|
</div> |
|
|
|
|
|
# DeepSeek OCR - Fine-tuned for German/Deutsch |
|
|
|
|
|
This model is a fine-tuned version of DeepSeek OCR on German text for Optical Character Recognition (OCR) tasks. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Base Model:** DeepSeek OCR |
|
|
- **Language:** German (de) |
|
|
- **Task:** Image-to-Text (OCR) |
|
|
- **Training Data:** 200K synthetic German text images |
|
|
- **License:** Apache 2.0 |
|
|
|
|
|
This model has been fine-tuned specifically for recognizing German text in images, including handling of German-specific characters (ä, ö, ü, ß) and common German compound words. |
|
|
|
|
|
## Intended Uses |
|
|
|
|
|
This model is designed for: |
|
|
- Extracting German text from scanned documents |
|
|
- Digitizing printed German materials |
|
|
- Reading German text from photographs |
|
|
- Processing German forms and receipts |
|
|
- Any German text recognition tasks |
|
|
|
|
|
## How to Use |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
```python |
|
|
from transformers import TrOCRProcessor, VisionEncoderDecoderModel |
|
|
from PIL import Image |
|
|
import requests |
|
|
|
|
|
# Load model and processor |
|
|
processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german") |
|
|
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german") |
|
|
|
|
|
# Load image |
|
|
url = "path_to_your_german_text_image.jpg" |
|
|
image = Image.open(url).convert("RGB") |
|
|
|
|
|
# Process |
|
|
pixel_values = processor(image, return_tensors="pt").pixel_values |
|
|
generated_ids = model.generate(pixel_values) |
|
|
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
|
|
|
|
print(generated_text) |
|
|
``` |
|
|
|
|
|
### Batch Processing |
|
|
|
|
|
```python |
|
|
from transformers import TrOCRProcessor, VisionEncoderDecoderModel |
|
|
from PIL import Image |
|
|
|
|
|
processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german") |
|
|
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german") |
|
|
|
|
|
# Multiple images |
|
|
images = [Image.open(f"image_{i}.jpg").convert("RGB") for i in range(5)] |
|
|
|
|
|
# Batch process |
|
|
pixel_values = processor(images, return_tensors="pt", padding=True).pixel_values |
|
|
generated_ids = model.generate(pixel_values) |
|
|
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True) |
|
|
|
|
|
for text in generated_texts: |
|
|
print(text) |
|
|
``` |
|
|
|
|
|
### With GPU Acceleration |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import TrOCRProcessor, VisionEncoderDecoderModel |
|
|
from PIL import Image |
|
|
|
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
|
|
processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german") |
|
|
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german").to(device) |
|
|
|
|
|
image = Image.open("german_text.jpg").convert("RGB") |
|
|
pixel_values = processor(image, return_tensors="pt").pixel_values.to(device) |
|
|
|
|
|
generated_ids = model.generate(pixel_values) |
|
|
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
|
print(text) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
The model was fine-tuned on a synthetic German OCR dataset containing 200,000 images with: |
|
|
- Diverse German sentences covering multiple domains (everyday conversation, news, literature, technical, business) |
|
|
- Various fonts and font sizes (16-48pt) |
|
|
- Multiple augmentations: noise, blur, brightness/contrast variations |
|
|
- Different text and background colors |
|
|
|
|
|
**Data Split:** |
|
|
- Train: 180,000 samples (90%) |
|
|
- Validation: 10,000 samples (5%) |
|
|
- Test: 10,000 samples (5%) |
|
|
|
|
|
### Training Framework |
|
|
|
|
|
```python |
|
|
# Example training configuration |
|
|
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments |
|
|
|
|
|
training_args = Seq2SeqTrainingArguments( |
|
|
output_dir="./deepseek-ocr-german", |
|
|
per_device_train_batch_size=8, |
|
|
per_device_eval_batch_size=8, |
|
|
learning_rate=5e-5, |
|
|
num_train_epochs=10, |
|
|
logging_steps=100, |
|
|
save_steps=1000, |
|
|
eval_steps=1000, |
|
|
evaluation_strategy="steps", |
|
|
save_total_limit=2, |
|
|
fp16=True, |
|
|
predict_with_generate=True, |
|
|
) |
|
|
``` |
|
|
|
|
|
# # Limitations |
|
|
|
|
|
- **Font coverage:** Performance may vary with handwritten text |
|
|
- **Image quality:** Works best with clear, high-contrast images |
|
|
- **Domain specificity:** Best performance on printed German text similar to training distribution |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{deepseek-ocr-german, |
|
|
author = {Santosh Pandit}, |
|
|
title = {DeepSeek OCR - German Fine-tuned}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
howpublished = {\url{https://huggingface.co/YOUR_USERNAME/deepseek-ocr-german}}, |
|
|
} |
|
|
``` |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
For questions or feedback, please open an issue on the model repository or contact [[email protected]]. |
|
|
|
|
|
--- |
|
|
|
|
|
### Acknowledgments |
|
|
|
|
|
- Base model: DeepSeek AI |
|
|
- Training data generation: LM Studio with local LLM |
|
|
- Framework: Hugging Face Transformers |
|
|
|
|
|
# Uploaded finetuned model |
|
|
|
|
|
- **Developed by:** neuralabs |
|
|
- **License:** apache-2.0 |
|
|
- **Finetuned from model :** deepseek-ai/DeepSeek-OCR |