DE-T5-Sci-Transfer-Init

WECHSEL-initialized checkpoint: English EN-T5-Sci weights + German tokenizer (GermanT5/t5-efficient-gc4-german-base-nl36) aligned using WECHSEL (static embeddings + bilingual dictionary). No additional German training after transfer. Folder includes transfer_metadata.pt with alignment diagnostics.

Model Details

  • Embedding init: Orthogonal Procrustes map (fastText n-gram embeddings) + temperature-weighted mixtures (k-nearest neighbors)
  • Special tokens: <extra_id_0..99> aligned, sentinel behavior preserved
  • Tokenizer: GermanT5 SentencePiece (files bundled here)

Evaluation (Global-MMLU, zero-shot)

Metric EN DE
Overall accuracy 0.2434 0.2463
Humanities 0.2485 0.2559
STEM 0.2391 0.2445
Social Sciences 0.2317 0.2307
Other 0.2517 0.2491

This demonstrates immediate cross-lingual transfer without any German gradient steps.

Intended Use

Starting point for German continued pretraining or fine-tuning where English scientific knowledge should be retained but a German tokenizer is required.

Limitations

  • No German data exposure beyond embedding alignment; you should run additional continued pretraining (see next model) for best performance.
  • Still limited to 512-token context.
Downloads last month
16
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including rausch/de-t5-sci-transfer-init