Quantized translategemma
Collection
Quickly tested with vLLM. Not fully compatible yet.
•
7 items
•
Updated
•
1
This is a quantized variant of google/translategemma-4b-it, created by The Kaitchup (newsletter: https://kaitchup.substack.com).
More details (training recipe, benchmarks, and recommended settings) will be added later. In the meantime, here are the current notes and a working inference example.
vllm serve kaitchup/translategemma-4b-it-FP8-Dynamic --max-model-len 2048 --chat-template-content-format openai --served-model-name gemma
curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
"model": "gemma",
"prompt": "<bos><start_of_turn>user\nYou are a professional French (fr) to English (en) translator. Your goal is to accurately convey the meaning and nuances of the original French text while adhering to English grammar, vocabulary, and cultural sensitivities.\nProduce only the English translation, without any additional explanations or commentary. Please translate the following French text into English:\n\n\nJaime les pâtes !<end_of_turn>\n<start_of_turn>model\n",
"temperature": 0,
"max_tokens": 200,
"stop": ["<end_of_turn>"]
}'
Base model
google/translategemma-4b-it