Add pipeline tag and library name
Browse filesThis PR adds the `pipeline_tag: audio-text-to-text` and `library_name: transformers` metadata to improve model discoverability and usability.
README.md
CHANGED
|
@@ -1,9 +1,11 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
datasets:
|
| 4 |
-
- amaai-lab/MusicBench
|
| 5 |
base_model:
|
| 6 |
- Qwen/Qwen2.5-Omni-7B
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
# Ke-Omni-R: Achieving Advanced Audio Reasoning with a Concise 50-Words Think Process
|
|
@@ -19,7 +21,7 @@ Ke-Omni-R is an advanced audio reasoning model built upon [Qwen2.5-Omni-7B](http
|
|
| 19 |
|
| 20 |
## Performance: Accuracies (%) on MMAU Test-mini and Test benchmark
|
| 21 |
| Model | Method | Sound (Test-mini) | Sound (Test) | Music (Test-mini) | Music (Test) | Speech (Test-mini) | Speech (Test) | Average (Test-mini) | Average (Test) |
|
| 22 |
-
|
| 23 |
| - | Human\* | 86.31 | - | 78.22 | - | 82.17 | - | 82.23 | - |
|
| 24 |
| Gemini Pro 2.0 Flash | Direct Inference\* | 56.46 | 61.73 | 58.68 | 56.53 | 51.65 | 61.53 | 55.60 | 59.93 |
|
| 25 |
| Audio Flamingo 2 | Direct Inference\* | 61.56 | 65.10 | **73.95** |**72.90**| 30.93 | 40.26 | 55.48 | 59.42 |
|
|
@@ -80,7 +82,9 @@ print(completions)
|
|
| 80 |
|
| 81 |
the output should be
|
| 82 |
```
|
| 83 |
-
["Well, it sounds like there's a car accelerating. You can hear the engine revving up, and there's a bit of a thump or thud sound too. It might be the car hitting something or just a part of the acceleration process. It gives off a sense of speed and power. What do you think about it? Do you have any other audio samples you want to talk about?", '<think>The audio features a vehicle accelerating and revving, which is characteristic of a car. The sound is consistent with a car engine, not an aircraft, tank, or missile.</think
|
|
|
|
|
|
|
| 84 |
```
|
| 85 |
|
| 86 |
## Acknowledgements
|
|
@@ -102,5 +106,4 @@ We express our gratitude to the following projects and teams for their contribut
|
|
| 102 |
publisher = {GitHub},
|
| 103 |
journal = {GitHub Repository},
|
| 104 |
howpublished = {\url{https://github.com/shuaijiang/Ke-Omni-R}},
|
| 105 |
-
}
|
| 106 |
-
```
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Qwen/Qwen2.5-Omni-7B
|
| 4 |
+
datasets:
|
| 5 |
+
- amaai-lab/MusicBench
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
pipeline_tag: audio-text-to-text
|
| 8 |
+
library_name: transformers
|
| 9 |
---
|
| 10 |
|
| 11 |
# Ke-Omni-R: Achieving Advanced Audio Reasoning with a Concise 50-Words Think Process
|
|
|
|
| 21 |
|
| 22 |
## Performance: Accuracies (%) on MMAU Test-mini and Test benchmark
|
| 23 |
| Model | Method | Sound (Test-mini) | Sound (Test) | Music (Test-mini) | Music (Test) | Speech (Test-mini) | Speech (Test) | Average (Test-mini) | Average (Test) |
|
| 24 |
+
|---------------------------------------|----------------اريات-------|-----------|-------|-----------|-------|-----------|------|------------|-------|
|
| 25 |
| - | Human\* | 86.31 | - | 78.22 | - | 82.17 | - | 82.23 | - |
|
| 26 |
| Gemini Pro 2.0 Flash | Direct Inference\* | 56.46 | 61.73 | 58.68 | 56.53 | 51.65 | 61.53 | 55.60 | 59.93 |
|
| 27 |
| Audio Flamingo 2 | Direct Inference\* | 61.56 | 65.10 | **73.95** |**72.90**| 30.93 | 40.26 | 55.48 | 59.42 |
|
|
|
|
| 82 |
|
| 83 |
the output should be
|
| 84 |
```
|
| 85 |
+
["Well, it sounds like there's a car accelerating. You can hear the engine revving up, and there's a bit of a thump or thud sound too. It might be the car hitting something or just a part of the acceleration process. It gives off a sense of speed and power. What do you think about it? Do you have any other audio samples you want to talk about?", '<think>The audio features a vehicle accelerating and revving, which is characteristic of a car. The sound is consistent with a car engine, not an aircraft, tank, or missile.</think>
|
| 86 |
+
<answer>Car</answer>', "<think>The main source of sound is a buzzing insect, which is consistent with the size and sound of a honeybee. The other options don't match the sound or context.</think>
|
| 87 |
+
<answer>honeybee</answer>"]
|
| 88 |
```
|
| 89 |
|
| 90 |
## Acknowledgements
|
|
|
|
| 106 |
publisher = {GitHub},
|
| 107 |
journal = {GitHub Repository},
|
| 108 |
howpublished = {\url{https://github.com/shuaijiang/Ke-Omni-R}},
|
| 109 |
+
}
|
|
|