Whisper Medium ta

This model is a fine-tuned version of openai/whisper-medium on the Common Voice 17.0 dataset. It achieves the following results on the evaluation set:

Loss: 0.1192
Wer: 28.4963
Cer: 4.8178

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.04
training_steps: 18000

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
0.1425	0.0556	1000	0.2033	42.3465	8.2575
0.0992	0.1111	2000	0.1824	38.1999	7.0789
0.0969	0.1667	3000	0.1749	36.7353	6.9087
0.0724	0.2222	4000	0.1716	36.3682	6.8174
0.0605	0.2778	5000	0.1591	34.1861	6.2048
0.0544	0.3333	6000	0.1556	33.2663	5.9622
0.0544	0.3889	7000	0.1504	32.3505	5.7064
0.0459	0.4444	8000	0.1410	32.2191	5.5940
0.0533	0.5	9000	0.1434	31.6562	5.5619
0.0529	0.5556	10000	0.1386	30.9747	5.4560
0.0377	0.6111	11000	0.1435	31.1190	5.5364
0.0367	0.6667	12000	0.1457	30.4170	5.2740
0.0414	0.7222	13000	0.1375	30.3294	5.2244
0.0479	0.7778	14000	0.1338	29.7381	5.0581
0.031	0.8333	15000	0.1362	29.5707	4.9853
0.026	0.8889	16000	0.1341	29.1894	4.9600
0.0399	0.9444	17000	0.1217	28.9021	4.8740
0.0454	1.0	18000	0.1192	28.4963	4.8178

Framework versions

Transformers 4.48.0.dev0
Pytorch 2.5.1+cu121
Datasets 3.6.0
Tokenizers 0.21.0

Citation

Please cite the model using the following BibTeX entry:

@misc{deepdml/whisper-medium-ta-mix-norm,
      title={Fine-tuned Whisper medium ASR model for speech recognition in Tamil},
      author={Jimenez, David},
      howpublished={\url{https://huggingface.co/deepdml/whisper-medium-ta-mix-norm}},
      year={2026}
    }

Downloads last month: 7

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for deepdml/whisper-medium-ta-mix-norm

Base model

openai/whisper-medium

Finetuned

(868)

this model

Datasets used to train deepdml/whisper-medium-ta-mix-norm

Evaluation results

Wer on Common Voice 17.0
self-reported

28.496