NPU Deployment Issue — Whisper Model (Genio 510)

Hi,
I’m trying to deploy the Whisper model on the MediaTek Genio 510 NPU (MDLA), and I’m stuck at the final step. Below is the exact workflow I followed and where I’m blocked.

Steps Completed So Far

  1. Downloaded the Whisper PyTorch model (.pt) and converted the encoder to TFLite using mtk_pytorch_converter.

  2. Could not convert both encoder + decoder into a single TFLite file (as expected, Whisper decoder is not supported for full TFLite conversion).

  3. Quantized the TFLite encoder using calibration data with mtk_pytorch_converter.

  4. Generated the .dla model using:
    ncc-tflite --arch=mdla3.0

  5. Now I have a Whisper encoder in DLA format and I want to execute it on the Genio 510 board’s NPU.

Where I’m Stuck — Step 5

I don’t know how to actually run the .dla file on the Genio 510 board.

My Questions

1) To run a .dla model on the MDLA NPU, do I need to use the C++ Runtime API from the SDK?

The only available API I see in the SDK is the NeuroPilot Runtime API, which seems to be C++ oriented.

2) Can I run the DLA model using Python?

Or is Python completely unsupported for MDLA inference?
If Python is not possible, do I need to move my entire encoder-inference logic to C++ using the runtime SDK?

Summary of What I Want

I want to load and run my Whisper encoder DLA model on the Genio 510 NPU.
I need guidance on how to execute a .dla file — using Python or C++, and what the recommended method is.

If you have any suggestions or a proper workflow, I would be truly grateful.