How to Enable NPU Support for Custom Operations and Resolve Inference Failures Due to Unsupported Ops?

During model deployment, I have encountered inference failures because the NPU (Neural Processing Unit) on my device does not support some custom or uncommon operations (also known as custom ops or unsupported ops). This issue causes the entire inference process to fail rather than falling back to CPU or GPU, which significantly impacts deployment feasibility.

How can I enable or extend NPU support for these custom operations? Are there common workflows or tools that allow for adding custom op support to an NPU backend, or at least a way to allow graceful fallback to other computation units instead of a complete inference failure?

Additionally, what are the recommended steps for:

  • Identifying which ops are not supported by the NPU during model conversion or execution?
  • Implementing or registering custom op kernels so they can run on the NPU (if possible)?
  • Modifying the model, runtime, or delegate to handle unsupported ops gracefully?