gemma-3-4b-intent-1128-v3-lr1e05-bs2

Fine-tuned model for Vietnamese intent classification.

Evaluation Results

Dataset: allganize/viettel-intent-augmented-total-1128 (validation)
Samples: 2,467
Evaluated: 2025-11-28 08:34:43

πŸ“Š Overall Metrics

Metric Score
Format Accuracy 100.00%
Level 1 Accuracy 99.27%
Level 1 F1 (macro) 0.9884
Level 2 Accuracy 97.97%
Level 2 F1 (macro) 0.9751
Combined Accuracy 97.93%
Combined F1 (macro) 0.9742

Overall Metrics

🎯 Hierarchical Breakdown

Category Rate Count
Both Correct 97.93% 2416
Level 1 Only 1.34% 33
Both Wrong 0.69% 17

πŸ” Detailed Analysis

Confusion Patterns:
Confusion Flows

Precision-Recall Analysis:
Precision-Recall Scatter

Intent Performance Distribution:
Intent Issues

Top Confused Intents:

🎯 High-Performance Analysis

Zoomed Precision-Recall (0.85-1.0):
Precision-Recall Zoom

Relative Performance Ranking (Bottom 20):
Relative Ranking

🎯 Intents for Improvement

Level 1:

  • Chat (P: 97.44%, R: 92.68%, F1: 95.00%)
  • FindMyPhone (P: 100.00%, R: 92.86%, F1: 96.30%)
  • News (P: 97.62%, R: 97.62%, F1: 97.62%)
  • UserQuery (P: 95.92%, R: 100.00%, F1: 97.92%)

Level 2:

  • DeactivateSetting (P: 93.33%, R: 82.35%, F1: 87.50%)
  • NavigateMove (P: 92.86%, R: 86.67%, F1: 89.66%)
  • SuggestContent (P: 81.82%, R: 100.00%, F1: 90.00%)
  • ChannelDown (P: 93.33%, R: 87.50%, F1: 90.32%)
  • StopRinging (P: 83.33%, R: 100.00%, F1: 90.91%)

Downloads last month
113
Safetensors
Model size
4B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support