File size: 3,984 Bytes
dee29a4
 
 
 
 
 
 
 
 
 
 
 
 
 
3217423
 
 
 
8b60e8f
3217423
 
8b60e8f
da88bf8
5a7aff7
 
 
8b60e8f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
pretty_name: Perch
license: apache-2.0
tags:
- audio
- bird
- nature
- science
- vocalization
- bio
- birds-classification
- bioacoustics
base_model:
- cgeorgiaw/Perch
---

# Perch

tflite and munually optimized onnx format of the Perch v2 model.

Source https://www.kaggle.com/models/google/bird-vocalization-classifier/

- `perch_v2_no_dft.onnx`: ONNX model with the DFT node converted to MatMul using `scripts/convert_dft_to_matmul.py` for addtional speedup. It is slightly less accurate (tolerance within 2e-4 vs 1e-5 when comparing agaist the tflite model).
- `perch_v2.onnx`: The converted ONNX model.
- `perch_v2.tflite`: The tflite model.

## Model information

```
ONNX Model Information:
Inputs:
  - Name: inputs, Shape: ['batch', 160000], Type: tensor(float)
Outputs:
  - Name: embedding, Shape: ['batch', 1536], Type: tensor(float)
  - Name: spatial_embedding, Shape: ['batch', 16, 4, 1536], Type: tensor(float)
  - Name: spectrogram, Shape: ['batch', 500, 128], Type: tensor(float)
  - Name: label, Shape: ['batch', 14795], Type: tensor(float)

TFLite Model Information:
Inputs:
  - Name: serving_default_inputs:0, Shape: [     1 160000], Type: <class 'numpy.float32'>
Outputs:
  - Name: StatefulPartitionedCall:0, Shape: [   1 1536], Type: <class 'numpy.float32'>
  - Name: StatefulPartitionedCall:2, Shape: [   1   16    4 1536], Type: <class 'numpy.float32'>
  - Name: StatefulPartitionedCall:3, Shape: [  1 500 128], Type: <class 'numpy.float32'>
  - Name: StatefulPartitionedCall:1, Shape: [    1 14795], Type: <class 'numpy.float32'>

Generating random inputs:
  - inputs: shape=(1, 160000), dtype=float32

Running ONNX model inference...
Running TFLite model inference...

================================================================================
COMPARISON RESULTS
================================================================================

Output 0:
  ONNX Runtime shape: (1, 1536), dtype: float32
  TFLite shape:       (1, 1536), dtype: float32

  ONNX Runtime vs TFLite:
    Max difference:  0.0000007208
    Mean difference: 0.0000001543
    Relative tolerance: 1e-05
    Absolute tolerance: 1e-05
    ✅ Outputs match within tolerance

Output 1:
  ONNX Runtime shape: (1, 16, 4, 1536), dtype: float32
  TFLite shape:       (1, 16, 4, 1536), dtype: float32

  ONNX Runtime vs TFLite:
    Max difference:  0.0000131130
    Mean difference: 0.0000005482
    Relative tolerance: 1e-05
    Absolute tolerance: 1e-05
    ✅ Outputs match within tolerance

Output 2:
  ONNX Runtime shape: (1, 500, 128), dtype: float32
  TFLite shape:       (1, 500, 128), dtype: float32

  ONNX Runtime vs TFLite:
    Max difference:  0.0000005960
    Mean difference: 0.0000000100
    Relative tolerance: 1e-05
    Absolute tolerance: 1e-05
    ✅ Outputs match within tolerance

Output 3:
  ONNX Runtime shape: (1, 14795), dtype: float32
  TFLite shape:       (1, 14795), dtype: float32

  ONNX Runtime vs TFLite:
    Max difference:  0.0000152588
    Mean difference: 0.0000014861
    Relative tolerance: 1e-05
    Absolute tolerance: 1e-05
    ✅ Outputs match within tolerance

================================================================================
✅ ALL OUTPUTS MATCH!
================================================================================

Benchmarking ONNX model (10 warmup + 100 test runs)...
Benchmarking TFLite model (10 warmup + 100 test runs)...

================================================================================
BENCHMARK RESULTS
================================================================================

ONNX Model:
  Mean:   66.350 ms
  Median: 66.339 ms
  Std:    2.160 ms
  Min:    61.801 ms
  Max:    74.614 ms

TFLite Model:
  Mean:   608.777 ms
  Median: 606.753 ms
  Std:    11.304 ms
  Min:    602.735 ms
  Max:    684.807 ms

Comparison:
  ONNX Runtime is 9.18x faster than TFLite
  Difference: 542.427 ms
================================================================================
```