File size: 5,078 Bytes
db173d9
 
 
 
 
 
 
 
 
 
ed9808f
 
 
db173d9
 
3ddc33b
a5a31d7
3ddc33b
 
ad527e3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
db173d9
 
 
 
ad527e3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
---
base_model: deepseek-ai/DeepSeek-OCR
tags:
- text-generation-inference
- transformers
- unsloth
- deepseek_vl_v2
license: apache-2.0
language:
- en
- de
datasets:
- neuralabs/german-synth-ocr
---

<div align="center">
    <img src="https://huggingface.co/neuralabs/deepseek_ocr_de/resolve/main/neuranew.png" alt="DeepSeek OCR"/>
</div>

# DeepSeek OCR - Fine-tuned for German/Deutsch

This model is a fine-tuned version of DeepSeek OCR on German text for Optical Character Recognition (OCR) tasks.

## Model Description

- **Base Model:** DeepSeek OCR
- **Language:** German (de)
- **Task:** Image-to-Text (OCR)
- **Training Data:** 200K synthetic German text images
- **License:** Apache 2.0

This model has been fine-tuned specifically for recognizing German text in images, including handling of German-specific characters (ä, ö, ü, ß) and common German compound words.

## Intended Uses

This model is designed for:
- Extracting German text from scanned documents
- Digitizing printed German materials
- Reading German text from photographs
- Processing German forms and receipts
- Any German text recognition tasks

## How to Use

### Basic Usage

```python
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

# Load model and processor
processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")

# Load image
url = "path_to_your_german_text_image.jpg"
image = Image.open(url).convert("RGB")

# Process
pixel_values = processor(image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(generated_text)
```

### Batch Processing

```python
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")

# Multiple images
images = [Image.open(f"image_{i}.jpg").convert("RGB") for i in range(5)]

# Batch process
pixel_values = processor(images, return_tensors="pt", padding=True).pixel_values
generated_ids = model.generate(pixel_values)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)

for text in generated_texts:
    print(text)
```

### With GPU Acceleration

```python
import torch
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"

processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german").to(device)

image = Image.open("german_text.jpg").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values.to(device)

generated_ids = model.generate(pixel_values)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text)
```

## Training Details

### Training Data

The model was fine-tuned on a synthetic German OCR dataset containing 200,000 images with:
- Diverse German sentences covering multiple domains (everyday conversation, news, literature, technical, business)
- Various fonts and font sizes (16-48pt)
- Multiple augmentations: noise, blur, brightness/contrast variations
- Different text and background colors

**Data Split:**
- Train: 180,000 samples (90%)
- Validation: 10,000 samples (5%)
- Test: 10,000 samples (5%)

### Training Framework

```python
# Example training configuration
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

training_args = Seq2SeqTrainingArguments(
    output_dir="./deepseek-ocr-german",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    learning_rate=5e-5,
    num_train_epochs=10,
    logging_steps=100,
    save_steps=1000,
    eval_steps=1000,
    evaluation_strategy="steps",
    save_total_limit=2,
    fp16=True,
    predict_with_generate=True,
)
```

# # Limitations

- **Font coverage:** Performance may vary with handwritten text 
- **Image quality:** Works best with clear, high-contrast images
- **Domain specificity:** Best performance on printed German text similar to training distribution


## Citation

If you use this model, please cite:

```bibtex
@misc{deepseek-ocr-german,
  author = {Santosh Pandit},
  title = {DeepSeek OCR - German Fine-tuned},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/deepseek-ocr-german}},
}
```

## Model Card Contact

For questions or feedback, please open an issue on the model repository or contact [hello@neuralabs.one].

---

### Acknowledgments

- Base model: DeepSeek AI
- Training data generation: LM Studio with local LLM
- Framework: Hugging Face Transformers

# Uploaded finetuned  model

- **Developed by:** neuralabs
- **License:** apache-2.0
- **Finetuned from model :** deepseek-ai/DeepSeek-OCR