feat: Introduce audio captioning and categorization model with ONNX/ExecuTorch hybrid inference and category embedding generation. 5c8d855
stivenDR14 commited on