How to resolve inference failures due to unsupported operations on NPU?

apuovia · June 17, 2025, 8:03am

During model deployment, I have encountered compilation failures with ncc-tflie because the NPU (Neural Processing Unit) on my device does not support some custom or uncommon operations (also known as custom ops or unsupported ops). This issue causes the entire inference process to fail rather than falling back to CPU or GPU, which significantly impacts deployment feasibility.

How can I enable or extend NPU support for these custom operations? Are there common workflows or tools that allow for adding custom op support to an NPU backend, or at least a way to allow graceful fallback to other computation units instead of a complete inference failure?

joying.kuo · October 2, 2025, 7:10am

Hi! hope the following reply could answer your questions:) .

Extending NPU Support for Custom Ops

MediaTek currently does not provide a workflow or tool for adding custom operator (op) support to the NPU backend.
Custom or unsupported ops must be adapted to use supported operators, or models must be redesigned for compatibility.

Recommended Workflow for Supported NPU Deployment

Prepare your model in TFLite format.
Avoid dynamic shapes—use fixed shape tensors to ensure stable conversion and execution.
Use quantized models for optimal performance on the NPU.
- Quantization helps ensure support for hardware-accelerated inference and lower latency.
Verify supported operators and avoid custom/uncommon ops.
- Use the official operator list for your platform (see Converter Supported Operators).
- For NP8: follow the NP8 documentation. For NP6/NP7, consult the respective hardware documentation.
Use MediaTek Converter tools for model conversion and validation.
- Download MediaTek Converter
- Installation Guide & Examples

Tip:
Before retraining, use the converter tools with the TFLite specification to confirm your model is compatible and supported. This reduces the chance of running into unsupported operator failures in final deployment.

Fallback Mechanism: Offline vs Online Inference

Offline Inference (using precompiled binaries) does not support fallback to CPU/GPU; if any operator is unsupported, model inference fails.
Online Inference (using TFLite Interpreter on-device) may fallback to CPU for unsupported operators, provided that the TFLite runtime and your platform support this feature.

For detailed explanation, see:
What is the difference between Online Compilation and Offline Compilation and which one should be preferred and why?

Summary Table

Approach	Custom Op Support	Fallback to CPU/GPU	Recommended Use Case
Offline Inference	No	No	Model fully supported by NPU
Online Inference	No	Yes (CPU only for supported TFLite ops)	Mixed operator support or rapid prototyping

MediaTek Recommendation

Redesign or reimplement custom operators using supported TFLite ops for NPU deployment.
Prefer offline inference for production, provided your model is fully supported.
Use online inference for prototyping or mixed workloads needing fallback.
Always consult the latest supported operator list and platform documentation.

Topic		Replies	Views
Unsupported Model Op on Yocto Stable Delegate not falling back to CPU NeuroPilot - Analytical AI AI , Question , IoT-Yocto	1	34	November 11, 2025
Workaround for Unsupported Ops NeuroPilot - Analytical AI AI	2	22	November 11, 2025
How to Determine If a Model Is Running on the NPU? NeuroPilot - Analytical AI AI , Question	1	99	June 17, 2025
What is the difference between Online Compilation and Offline Compilation, and which one should be preferred and why? NeuroPilot - Analytical AI AI , Question , IoT-Yocto	1	43	September 23, 2025
What is the difference between `NP_ACCELERATION_NNAPI` and `NP_ACCELERATION_NEURON` in TFLiteRunner? NeuroPilot - Analytical AI AI , Question , Android	1	34	July 5, 2025