Hi! hope the following reply could answer your questions:) .
Extending NPU Support for Custom Ops
MediaTek currently does not provide a workflow or tool for adding custom operator (op) support to the NPU backend.
Custom or unsupported ops must be adapted to use supported operators, or models must be redesigned for compatibility.
Recommended Workflow for Supported NPU Deployment
- Prepare your model in TFLite format.
- Avoid dynamic shapes—use fixed shape tensors to ensure stable conversion and execution.
- Use quantized models for optimal performance on the NPU.
- Quantization helps ensure support for hardware-accelerated inference and lower latency.
- Verify supported operators and avoid custom/uncommon ops.
- Use the official operator list for your platform (see Converter Supported Operators).
- For NP8: follow the NP8 documentation. For NP6/NP7, consult the respective hardware documentation.
- Use MediaTek Converter tools for model conversion and validation.
Tip:
Before retraining, use the converter tools with the TFLite specification to confirm your model is compatible and supported. This reduces the chance of running into unsupported operator failures in final deployment.
Fallback Mechanism: Offline vs Online Inference
-
Offline Inference (using precompiled binaries) does not support fallback to CPU/GPU; if any operator is unsupported, model inference fails.
-
Online Inference (using TFLite Interpreter on-device) may fallback to CPU for unsupported operators, provided that the TFLite runtime and your platform support this feature.
For detailed explanation, see:
What is the difference between Online Compilation and Offline Compilation and which one should be preferred and why?
Summary Table
| Approach | Custom Op Support | Fallback to CPU/GPU | Recommended Use Case |
|---|---|---|---|
| Offline Inference | No | No | Model fully supported by NPU |
| Online Inference | No | Yes (CPU only for supported TFLite ops) | Mixed operator support or rapid prototyping |
MediaTek Recommendation
- Redesign or reimplement custom operators using supported TFLite ops for NPU deployment.
- Prefer offline inference for production, provided your model is fully supported.
- Use online inference for prototyping or mixed workloads needing fallback.
- Always consult the latest supported operator list and platform documentation.