MALIBA-AI/bambara-asr-v3
Bambara automatic speech recognition model. Built on OpenAI's openai/whisper-large-v3.
Currently ranked #1 on the Bambara ASR Benchmark Leaderboard, making it the best publicly available Bambara ASR model as of January 2026.
⚠️ Non-commercial use only. One of the training sources (Bible recordings) carries a license that restricts commercial use. See License.
Evaluation
Internal Test Set
Evaluated on oza75/bambara-asr (clean-combined split, 2,088 samples), 3h duration.
| Condition | WER (%) | CER (%) |
|---|---|---|
| Raw (no normalization) | 17.99 | 8.08 |
| Minimal normalization | 16.94 | 7.74 |
| Normalized (expand mode) | 13.23 | 6.89 |
| Normalized (contract mode) | 13.79 | 6.89 |
Normalization applied using bambara-text-normalization with the for_wer_evaluation() preset. Expand mode handles contraction disambiguation (e.g., k'a → ka a) while contract mode collapses expanded forms (e.g., bɛ a → b'a). See the normalizer documentation for details on contraction modes.
Bambara ASR Leaderboard
Evaluated on the MALIBA-AI/bambara-asr-benchmark — 1 hour of studio-recorded Malian constitutional text (pure bambara), validated by linguists from Mali's DNENF-LN.
| Metric | Score |
|---|---|
| WER | 45.73% |
| CER | 13.45% |
| Combined (0.5 × WER + 0.5 × CER) | 29.59% |
| Rank | 🏆 1st / 37 models |
See the Leaderboard
What Changed from v1
| v1 | v3 | |
|---|---|---|
| Base model | whisper-large-v2 | whisper-large-v3 |
| Training data | jeli-asr + Mali-Pense | jeli-asr + Mali-Pense + Bible + Common Voice (fr/en) |
| Benchmark WER | 61.74% | 45.73% |
| Benchmark CER | 17.90% | 13.45% |
| License | Apache 2.0 | CC-BY-NC-4.0 (non-commercial) |
The ~16 point WER improvement comes from both the stronger base model and broader, more diverse training data.
Training Data
| Source | Language | Description |
|---|---|---|
| RobotsMali/jeli-asr | Bambara | Conversational and read speech |
| Mali-Pense | Bambara | Transcribed audio from the Mali-Pense linguistic project |
| Bible recordings | Bambara | Narrated Bible text formal register. Non-commercial license |
| Common Voice | French, English | Supplementary multilingual data |
The mix of conversational (jeli-asr), formal/literary (Bible, Mali-Pense), and multilingual (Common Voice) data gives broader coverage than previous Bambara-only systems. The French and English data helps the model handle code-switching, which is common in everyday Bambara speech in Mali.
Training
- Base model: openai/whisper-large-v3
- Method: LoRA (Low-Rank Adaptation) via PEFT
Usage
pip install git+https://github.com/sudoping01/whosper.git
from whosper import WhosperTranscriber
transcriber = WhosperTranscriber(model_id="MALIBA-AI/bambara-asr-v3")
result = transcriber.transcribe_audio("path/to/audio.wav")
print(result)
Intended Use
For:
- Research on Bambara and low-resource African language ASR
- Benchmarking and comparison of ASR systems
- Transcription assistance
- Educational and non-profit applications
- Linguistic analysis and documentation
Not for:
- Commercial products or services (license restriction from Bible training data)
- Medical, legal, or safety-critical transcription
License
CC-BY-NC-4.0 — non-commercial use only.
This restriction exists because the Bible audio used in training does not permit commercial use. If you need a commercially-licensed Bambara ASR model, see MALIBA-AI/bambara-asr-v1 (Apache 2.0, lower accuracy).
Citation
@misc{maliba_asr_v3,
author = {{MALIBA-AI}},
title = {MALIBA-AI Bambara ASR v3},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/MALIBA-AI/bambara-asr-v3}}
}
If reporting benchmark results, please also cite:
@misc{BambaraASRBenchmark2025,
title = {Where Are We at with Automatic Speech Recognition for the Bambara Language?},
author = {Seydou Diallo and Yacouba Diarra and Mamadou K. Keita and Panga Azazia Kamat{\'e} and Adam Bouno Kampo and Aboubacar Ouattara},
year = {2025},
howpublished = {Hugging Face Datasets},
url = {https://huggingface.co/datasets/MALIBA-AI/bambara-asr-benchmark}
}
- Downloads last month
- -
Model tree for MALIBA-AI/bambara-asr-v3
Base model
openai/whisper-large-v3Evaluation results
- WER (raw) on oza75/bambara-asrtest set self-reported17.990
- CER (raw) on oza75/bambara-asrtest set self-reported8.080
- WER (normalized) on oza75/bambara-asrtest set self-reported13.230
- CER (normalized) on oza75/bambara-asrtest set self-reported6.890
- WER on Bambara ASR Benchmark (Constitution)test set self-reported45.730
- CER on Bambara ASR Benchmark (Constitution)test set self-reported13.450

