Visualize video inference results from multiple models
Generate and transform audio from text prompts