Hi,
I’m trying to deploy the Whisper model on the MediaTek Genio 510 NPU (MDLA), and I’m stuck at the final step. Below is the exact workflow I followed and where I’m blocked.
Steps Completed So Far
-
Downloaded the Whisper PyTorch model (
.pt) and converted the encoder to TFLite usingmtk_pytorch_converter. -
Could not convert both encoder + decoder into a single TFLite file (as expected, Whisper decoder is not supported for full TFLite conversion).
-
Quantized the TFLite encoder using calibration data with
mtk_pytorch_converter. -
Generated the .dla model using:
ncc-tflite --arch=mdla3.0 -
Now I have a Whisper encoder in DLA format and I want to execute it on the Genio 510 board’s NPU.
Where I’m Stuck — Step 5
I don’t know how to actually run the .dla file on the Genio 510 board.
My Questions
1) To run a .dla model on the MDLA NPU, do I need to use the C++ Runtime API from the SDK?
The only available API I see in the SDK is the NeuroPilot Runtime API, which seems to be C++ oriented.
2) Can I run the DLA model using Python?
Or is Python completely unsupported for MDLA inference?
If Python is not possible, do I need to move my entire encoder-inference logic to C++ using the runtime SDK?
Summary of What I Want
I want to load and run my Whisper encoder DLA model on the Genio 510 NPU.
I need guidance on how to execute a .dla file — using Python or C++, and what the recommended method is.
If you have any suggestions or a proper workflow, I would be truly grateful.