How to resolve inference failures due to unsupported operations on NPU?

During model deployment, I have encountered compilation failures with ncc-tflie because the NPU (Neural Processing Unit) on my device does not support some custom or uncommon operations (also known as custom ops or unsupported ops). This issue causes the entire inference process to fail rather than falling back to CPU or GPU, which significantly impacts deployment feasibility.

How can I enable or extend NPU support for these custom operations? Are there common workflows or tools that allow for adding custom op support to an NPU backend, or at least a way to allow graceful fallback to other computation units instead of a complete inference failure?

Hi! hope the following reply could answer your questions:) .

Extending NPU Support for Custom Ops

MediaTek currently does not provide a workflow or tool for adding custom operator (op) support to the NPU backend.
Custom or unsupported ops must be adapted to use supported operators, or models must be redesigned for compatibility.

Recommended Workflow for Supported NPU Deployment
  1. Prepare your model in TFLite format.
  2. Avoid dynamic shapes—use fixed shape tensors to ensure stable conversion and execution.
  3. Use quantized models for optimal performance on the NPU.
    • Quantization helps ensure support for hardware-accelerated inference and lower latency.
  4. Verify supported operators and avoid custom/uncommon ops.
    • Use the official operator list for your platform (see Converter Supported Operators).
    • For NP8: follow the NP8 documentation. For NP6/NP7, consult the respective hardware documentation.
  5. Use MediaTek Converter tools for model conversion and validation.

Tip:
Before retraining, use the converter tools with the TFLite specification to confirm your model is compatible and supported. This reduces the chance of running into unsupported operator failures in final deployment.

Fallback Mechanism: Offline vs Online Inference

Summary Table

Approach Custom Op Support Fallback to CPU/GPU Recommended Use Case
Offline Inference No No Model fully supported by NPU
Online Inference No Yes (CPU only for supported TFLite ops) Mixed operator support or rapid prototyping

MediaTek Recommendation

  • Redesign or reimplement custom operators using supported TFLite ops for NPU deployment.
  • Prefer offline inference for production, provided your model is fully supported.
  • Use online inference for prototyping or mixed workloads needing fallback.
  • Always consult the latest supported operator list and platform documentation.