Perch-onnx / README.md
justinchuby's picture
Update README.md
da88bf8 verified
metadata
pretty_name: Perch
license: apache-2.0
tags:
  - audio
  - bird
  - nature
  - science
  - vocalization
  - bio
  - birds-classification
  - bioacoustics
base_model:
  - cgeorgiaw/Perch

Perch

tflite and munually optimized onnx format of the Perch v2 model.

Source https://www.kaggle.com/models/google/bird-vocalization-classifier/

  • perch_v2_no_dft.onnx: ONNX model with the DFT node converted to MatMul using scripts/convert_dft_to_matmul.py for addtional speedup. It is slightly less accurate (tolerance within 2e-4 vs 1e-5 when comparing agaist the tflite model).
  • perch_v2.onnx: The converted ONNX model.
  • perch_v2.tflite: The tflite model.

Model information

ONNX Model Information:
Inputs:
  - Name: inputs, Shape: ['batch', 160000], Type: tensor(float)
Outputs:
  - Name: embedding, Shape: ['batch', 1536], Type: tensor(float)
  - Name: spatial_embedding, Shape: ['batch', 16, 4, 1536], Type: tensor(float)
  - Name: spectrogram, Shape: ['batch', 500, 128], Type: tensor(float)
  - Name: label, Shape: ['batch', 14795], Type: tensor(float)

TFLite Model Information:
Inputs:
  - Name: serving_default_inputs:0, Shape: [     1 160000], Type: <class 'numpy.float32'>
Outputs:
  - Name: StatefulPartitionedCall:0, Shape: [   1 1536], Type: <class 'numpy.float32'>
  - Name: StatefulPartitionedCall:2, Shape: [   1   16    4 1536], Type: <class 'numpy.float32'>
  - Name: StatefulPartitionedCall:3, Shape: [  1 500 128], Type: <class 'numpy.float32'>
  - Name: StatefulPartitionedCall:1, Shape: [    1 14795], Type: <class 'numpy.float32'>

Generating random inputs:
  - inputs: shape=(1, 160000), dtype=float32

Running ONNX model inference...
Running TFLite model inference...

================================================================================
COMPARISON RESULTS
================================================================================

Output 0:
  ONNX Runtime shape: (1, 1536), dtype: float32
  TFLite shape:       (1, 1536), dtype: float32

  ONNX Runtime vs TFLite:
    Max difference:  0.0000007208
    Mean difference: 0.0000001543
    Relative tolerance: 1e-05
    Absolute tolerance: 1e-05
    ✅ Outputs match within tolerance

Output 1:
  ONNX Runtime shape: (1, 16, 4, 1536), dtype: float32
  TFLite shape:       (1, 16, 4, 1536), dtype: float32

  ONNX Runtime vs TFLite:
    Max difference:  0.0000131130
    Mean difference: 0.0000005482
    Relative tolerance: 1e-05
    Absolute tolerance: 1e-05
    ✅ Outputs match within tolerance

Output 2:
  ONNX Runtime shape: (1, 500, 128), dtype: float32
  TFLite shape:       (1, 500, 128), dtype: float32

  ONNX Runtime vs TFLite:
    Max difference:  0.0000005960
    Mean difference: 0.0000000100
    Relative tolerance: 1e-05
    Absolute tolerance: 1e-05
    ✅ Outputs match within tolerance

Output 3:
  ONNX Runtime shape: (1, 14795), dtype: float32
  TFLite shape:       (1, 14795), dtype: float32

  ONNX Runtime vs TFLite:
    Max difference:  0.0000152588
    Mean difference: 0.0000014861
    Relative tolerance: 1e-05
    Absolute tolerance: 1e-05
    ✅ Outputs match within tolerance

================================================================================
✅ ALL OUTPUTS MATCH!
================================================================================

Benchmarking ONNX model (10 warmup + 100 test runs)...
Benchmarking TFLite model (10 warmup + 100 test runs)...

================================================================================
BENCHMARK RESULTS
================================================================================

ONNX Model:
  Mean:   66.350 ms
  Median: 66.339 ms
  Std:    2.160 ms
  Min:    61.801 ms
  Max:    74.614 ms

TFLite Model:
  Mean:   608.777 ms
  Median: 606.753 ms
  Std:    11.304 ms
  Min:    602.735 ms
  Max:    684.807 ms

Comparison:
  ONNX Runtime is 9.18x faster than TFLite
  Difference: 542.427 ms
================================================================================