--- pretty_name: Perch license: apache-2.0 tags: - audio - bird - nature - science - vocalization - bio - birds-classification - bioacoustics base_model: - cgeorgiaw/Perch --- # Perch tflite and munually optimized onnx format of the Perch v2 model. Source https://www.kaggle.com/models/google/bird-vocalization-classifier/ - `perch_v2_no_dft.onnx`: ONNX model with the DFT node converted to MatMul using `scripts/convert_dft_to_matmul.py` for addtional speedup. It is slightly less accurate (tolerance within 2e-4 vs 1e-5 when comparing agaist the tflite model). - `perch_v2.onnx`: The converted ONNX model. - `perch_v2.tflite`: The tflite model. ## Model information ``` ONNX Model Information: Inputs: - Name: inputs, Shape: ['batch', 160000], Type: tensor(float) Outputs: - Name: embedding, Shape: ['batch', 1536], Type: tensor(float) - Name: spatial_embedding, Shape: ['batch', 16, 4, 1536], Type: tensor(float) - Name: spectrogram, Shape: ['batch', 500, 128], Type: tensor(float) - Name: label, Shape: ['batch', 14795], Type: tensor(float) TFLite Model Information: Inputs: - Name: serving_default_inputs:0, Shape: [ 1 160000], Type: Outputs: - Name: StatefulPartitionedCall:0, Shape: [ 1 1536], Type: - Name: StatefulPartitionedCall:2, Shape: [ 1 16 4 1536], Type: - Name: StatefulPartitionedCall:3, Shape: [ 1 500 128], Type: - Name: StatefulPartitionedCall:1, Shape: [ 1 14795], Type: Generating random inputs: - inputs: shape=(1, 160000), dtype=float32 Running ONNX model inference... Running TFLite model inference... ================================================================================ COMPARISON RESULTS ================================================================================ Output 0: ONNX Runtime shape: (1, 1536), dtype: float32 TFLite shape: (1, 1536), dtype: float32 ONNX Runtime vs TFLite: Max difference: 0.0000007208 Mean difference: 0.0000001543 Relative tolerance: 1e-05 Absolute tolerance: 1e-05 ✅ Outputs match within tolerance Output 1: ONNX Runtime shape: (1, 16, 4, 1536), dtype: float32 TFLite shape: (1, 16, 4, 1536), dtype: float32 ONNX Runtime vs TFLite: Max difference: 0.0000131130 Mean difference: 0.0000005482 Relative tolerance: 1e-05 Absolute tolerance: 1e-05 ✅ Outputs match within tolerance Output 2: ONNX Runtime shape: (1, 500, 128), dtype: float32 TFLite shape: (1, 500, 128), dtype: float32 ONNX Runtime vs TFLite: Max difference: 0.0000005960 Mean difference: 0.0000000100 Relative tolerance: 1e-05 Absolute tolerance: 1e-05 ✅ Outputs match within tolerance Output 3: ONNX Runtime shape: (1, 14795), dtype: float32 TFLite shape: (1, 14795), dtype: float32 ONNX Runtime vs TFLite: Max difference: 0.0000152588 Mean difference: 0.0000014861 Relative tolerance: 1e-05 Absolute tolerance: 1e-05 ✅ Outputs match within tolerance ================================================================================ ✅ ALL OUTPUTS MATCH! ================================================================================ Benchmarking ONNX model (10 warmup + 100 test runs)... Benchmarking TFLite model (10 warmup + 100 test runs)... ================================================================================ BENCHMARK RESULTS ================================================================================ ONNX Model: Mean: 66.350 ms Median: 66.339 ms Std: 2.160 ms Min: 61.801 ms Max: 74.614 ms TFLite Model: Mean: 608.777 ms Median: 606.753 ms Std: 11.304 ms Min: 602.735 ms Max: 684.807 ms Comparison: ONNX Runtime is 9.18x faster than TFLite Difference: 542.427 ms ================================================================================ ```