This is a quantized variant of google/translategemma-4b-it, created by The Kaitchup (newsletter: https://kaitchup.substack.com).

More details (training recipe, benchmarks, and recommended settings) will be added later. In the meantime, here are the current notes and a working inference example.

Status / limitations

  • Quick smoke test only (not fully evaluated).
  • RoPE parameters were removed for compatibility with vLLM. As a result, long-context behavior may be degraded. I have not verified the impact yet.
  • Chat template not supported (for now). To use the model in vLLM, call a completions endpoint and provide a fully formatted prompt.

Serving with vLLM

vllm serve kaitchup/translategemma-4b-it-FP8-Dynamic  --max-model-len 2048   --chat-template-content-format openai --served-model-name  gemma
curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
    "model": "gemma",
    "prompt": "<bos><start_of_turn>user\nYou are a professional French (fr) to English (en) translator. Your goal is to accurately convey the meaning and nuances of the original French text while adhering to English grammar, vocabulary, and cultural sensitivities.\nProduce only the English translation, without any additional explanations or commentary. Please translate the following French text into English:\n\n\nJaime les pâtes !<end_of_turn>\n<start_of_turn>model\n",
    "temperature": 0,
    "max_tokens": 200,
    "stop": ["<end_of_turn>"]
  }'
Downloads last month
12
Safetensors
Model size
4B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kaitchup/translategemma-4b-it-FP8-Dynamic

Quantized
(14)
this model

Dataset used to train kaitchup/translategemma-4b-it-FP8-Dynamic

Collection including kaitchup/translategemma-4b-it-FP8-Dynamic