What is the difference between Online Compilation and Offline Compilation, and which one should be preferred and why?

In the Neuropilot documentation we see offline compilation to be preferred pathway, but this does not have any fallback to CPU. If model is not supported entirely on NPU, the model compilation fails.
Online pathway tends to resolve this a bit, but again, if an op is supported on NPU, but the op constraints don’t fit within the NPU, the model execution fails again instead of falling back to CPU.
So in this scenario, which method is preferred?

Thank you for this question.

MediaTek NeuroPilot supports two main model execution pathways on supported platforms: online compilation and offline compilation. Each approach offers unique advantages as well as limitations regarding operator fallback, performance, and deployment flexibility.

Online Compilation

  • Process: Both model compilation (TFLite Interpreter) and inference are performed directly on the device in real time.
  • Pros:
    • Automatically falls back to CPU for any operators not supported by the NPU (Neural Processing Unit).
    • Useful for rapid prototyping or for models with mixed operator support.
  • Cons:
    • Incurs additional compilation overhead at inference runtime, which can lead to slower startup, especially for large models.
    • Performance may be suboptimal compared with offline compilation.

Offline Compilation

  • Process: Model compilation is performed in advance using the ncc-tflite tool on a development host PC; only inference is done on-device.
  • Pros:
    • Provides optimized model binaries for the target NPU (MDLA) hardware, resulting in faster inference and lower runtime latency.
    • Avoids runtime compilation cost—especially beneficial for production or repeated deployments.
  • Cons:
    • Only supports models and operators that are fully compatible with the selected NPU/MDLA hardware version.
    • Operators not supported by the target NPU cannot be executed, and there is no fallback to CPU—if unsupported operators are present, offline compilation fails.

Best Practice and MediaTek Recommendation

  • Initial Testing: Use offline compilation to test your model’s compatibility and to validate NPU support. If your model is fully supported, this method is recommended for deployment due to its superior performance.
  • Fallback Handling: If offline compilation fails due to unsupported operators, you may try online compilation to leverage CPU fallback for those specific operators, provided that your version of TFLite supports them.
  • Issue Reporting: If a model fails online inference because of operator constraints that cannot fall back to the CPU, and you believe fallback should be possible, please report the issue to MediaTek for further analysis and toolchain improvement.

Additional Notes

  • The supported operator (op) list is documented for each platform; for example, see the Genio-700 supported operations.
  • Offline compilation provides the most reliable and performant pathway, but always verify full operator coverage for your model.
  • Online compilation is useful for development and for handling models with partial NPU support.

For detailed steps and additional guidance, refer to the MediaTek NeuroPilot offline tool documentation.