File size: 5,078 Bytes
db173d9 ed9808f db173d9 3ddc33b a5a31d7 3ddc33b ad527e3 db173d9 ad527e3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
---
base_model: deepseek-ai/DeepSeek-OCR
tags:
- text-generation-inference
- transformers
- unsloth
- deepseek_vl_v2
license: apache-2.0
language:
- en
- de
datasets:
- neuralabs/german-synth-ocr
---
<div align="center">
<img src="https://huggingface.co/neuralabs/deepseek_ocr_de/resolve/main/neuranew.png" alt="DeepSeek OCR"/>
</div>
# DeepSeek OCR - Fine-tuned for German/Deutsch
This model is a fine-tuned version of DeepSeek OCR on German text for Optical Character Recognition (OCR) tasks.
## Model Description
- **Base Model:** DeepSeek OCR
- **Language:** German (de)
- **Task:** Image-to-Text (OCR)
- **Training Data:** 200K synthetic German text images
- **License:** Apache 2.0
This model has been fine-tuned specifically for recognizing German text in images, including handling of German-specific characters (ä, ö, ü, ß) and common German compound words.
## Intended Uses
This model is designed for:
- Extracting German text from scanned documents
- Digitizing printed German materials
- Reading German text from photographs
- Processing German forms and receipts
- Any German text recognition tasks
## How to Use
### Basic Usage
```python
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests
# Load model and processor
processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
# Load image
url = "path_to_your_german_text_image.jpg"
image = Image.open(url).convert("RGB")
# Process
pixel_values = processor(image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
```
### Batch Processing
```python
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
# Multiple images
images = [Image.open(f"image_{i}.jpg").convert("RGB") for i in range(5)]
# Batch process
pixel_values = processor(images, return_tensors="pt", padding=True).pixel_values
generated_ids = model.generate(pixel_values)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
for text in generated_texts:
print(text)
```
### With GPU Acceleration
```python
import torch
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german").to(device)
image = Image.open("german_text.jpg").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values.to(device)
generated_ids = model.generate(pixel_values)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text)
```
## Training Details
### Training Data
The model was fine-tuned on a synthetic German OCR dataset containing 200,000 images with:
- Diverse German sentences covering multiple domains (everyday conversation, news, literature, technical, business)
- Various fonts and font sizes (16-48pt)
- Multiple augmentations: noise, blur, brightness/contrast variations
- Different text and background colors
**Data Split:**
- Train: 180,000 samples (90%)
- Validation: 10,000 samples (5%)
- Test: 10,000 samples (5%)
### Training Framework
```python
# Example training configuration
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
training_args = Seq2SeqTrainingArguments(
output_dir="./deepseek-ocr-german",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
learning_rate=5e-5,
num_train_epochs=10,
logging_steps=100,
save_steps=1000,
eval_steps=1000,
evaluation_strategy="steps",
save_total_limit=2,
fp16=True,
predict_with_generate=True,
)
```
# # Limitations
- **Font coverage:** Performance may vary with handwritten text
- **Image quality:** Works best with clear, high-contrast images
- **Domain specificity:** Best performance on printed German text similar to training distribution
## Citation
If you use this model, please cite:
```bibtex
@misc{deepseek-ocr-german,
author = {Santosh Pandit},
title = {DeepSeek OCR - German Fine-tuned},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/YOUR_USERNAME/deepseek-ocr-german}},
}
```
## Model Card Contact
For questions or feedback, please open an issue on the model repository or contact [hello@neuralabs.one].
---
### Acknowledgments
- Base model: DeepSeek AI
- Training data generation: LM Studio with local LLM
- Framework: Hugging Face Transformers
# Uploaded finetuned model
- **Developed by:** neuralabs
- **License:** apache-2.0
- **Finetuned from model :** deepseek-ai/DeepSeek-OCR |