Need support regarding online compilation for NPU

I have visited the discussion on difference-between-online-compilation-and-offline-compilation and would like to thank @joying.kuo for the wonderful explanation.
Recently I have been trying to run the whisper small model on the MediaTek Genio 510 NPU.

Since I was unaware of the online compilation, I had to separate my whisper small model to encoder and decoder. Just so I could have the encoder (DLA) running on the NPU. But since I have read about this interesting alternative approach of online compilation, I would prefer to give it a try. I plan on running the entire TFLite model rather than separating it to encoder - decoder format and analyse their performance to take informed decisions.

As of now I have full TFLite. Which I have generated without the help of mtk_converter. I was looking for guidance on using the online compilation to run the entire model on NPU with CPU fallbacks for unsupported operation. I am currently not able to come up with a code to help me utilize the online compilation.

Previously I have tried using neuron_stable_delegate in TFLite runtime, but it throws missing symbol error. I am not sure if it is the right way.

I would be sincerely grateful for your guidance on this issue. A coding example with any model will be really helpful as I am struggling with implementation part.

Hi @Murali_M_G!

Thanks a lot for trying Whisper on the Genio 510 NPU and sharing your experience.

At the moment we actually don’t have internal deployment experience of Whisper on the 510 yet, so your case is very valuable to us. Could you help share the exact error messages/logs you are seeing (for example from neuron_stable_delegate or other attempts)? That will help us better understand what is going wrong.

If possible, you can also send your .tflite model to my email: joying.kuo@mediatek.com. We can analyze it in more detail and see what’s feasible on the NPU and where CPU fallback would be needed.

Additionally, starting from Yocto v25.1, ONNX Runtime will be supported. The current v25.1-dev images already include this feature. You may try deploying Whisper via ONNX Runtime on that image and see if it works better for your use case.

Thanks again for your effort on this, and we look forward to your feedback and logs.

Hi @Joying.kuo,

Thanks for taking your time for helping me out.

I have built my image (rity-demo-image) using Yocto v25.0 scarthgap

For online compilation I have converted the entire whisper small model to TFLite without the using neuro pilot SDK (Mtk_Converter).

In my inference code, I have set the interpreter to use libneuron_stable_delegate.so as the delegate. But I am bit sceptical about my approach. Please correct me if I am on the wrong path. As my goal here is to achieve online compilation.

The below is the code where I am loading the delegate

os.environ["TFLITE_ENABLE_XNNPACK"] = "0"
# Load MediaTek Neuron Delegate
delegate_path = "/usr/lib/libneuron_stable_delegate.so"   
try:
    neuron_delegate = [
        tf.lite.experimental.load_delegate(delegate_path)
    ]
    print("✔ Neuron stable delegate loaded successfully.")
except Exception as e:
    print(" Failed to load Neuron delegate, falling back to CPU:", e)
    neuron_delegate = []  # Fallback to CPU

On execution, my code runs into the following error.

Just jotting down some of the issues I ran into for your reference during offline compilation.

  • Whisper small encoder can be made NPU compatible with few modifications.

  • Main challenges arise when working with decoder as the decoder expects dynamic tensors. So, conversion was impossible.

So, in a hope that offloading at least few of the supported decoder operations to the NPU can improve efficiency. Also wanted to know, if I switch to MediaTek Genio 520 will it be able to support dynamic tensors.

Your support on this matter will be greatly appreciated.