What is the difference between Online Compilation and Offline Compilation, and which one should be preferred and why?

Suyash_Narain · June 17, 2025, 10:48pm

In the Neuropilot documentation we see offline compilation to be preferred pathway, but this does not have any fallback to CPU. If model is not supported entirely on NPU, the model compilation fails.
Online pathway tends to resolve this a bit, but again, if an op is supported on NPU, but the op constraints don’t fit within the NPU, the model execution fails again instead of falling back to CPU.
So in this scenario, which method is preferred?

joying.kuo · September 23, 2025, 10:02pm

Thank you for this question.

MediaTek NeuroPilot supports two main model execution pathways on supported platforms: online compilation and offline compilation. Each approach offers unique advantages as well as limitations regarding operator fallback, performance, and deployment flexibility.

Online Compilation

Process: Both model compilation (TFLite Interpreter) and inference are performed directly on the device in real time.
Pros:
- Automatically falls back to CPU for any operators not supported by the NPU (Neural Processing Unit).
- Useful for rapid prototyping or for models with mixed operator support.
Cons:
- Incurs additional compilation overhead at inference runtime, which can lead to slower startup, especially for large models.
- Performance may be suboptimal compared with offline compilation.

Offline Compilation

Process: Model compilation is performed in advance using the ncc-tflite tool on a development host PC; only inference is done on-device.
Pros:
- Provides optimized model binaries for the target NPU (MDLA) hardware, resulting in faster inference and lower runtime latency.
- Avoids runtime compilation cost—especially beneficial for production or repeated deployments.
Cons:
- Only supports models and operators that are fully compatible with the selected NPU/MDLA hardware version.
- Operators not supported by the target NPU cannot be executed, and there is no fallback to CPU—if unsupported operators are present, offline compilation fails.

Best Practice and MediaTek Recommendation

Initial Testing: Use offline compilation to test your model’s compatibility and to validate NPU support. If your model is fully supported, this method is recommended for deployment due to its superior performance.
Fallback Handling: If offline compilation fails due to unsupported operators, you may try online compilation to leverage CPU fallback for those specific operators, provided that your version of TFLite supports them.
Issue Reporting: If a model fails online inference because of operator constraints that cannot fall back to the CPU, and you believe fallback should be possible, please report the issue to MediaTek for further analysis and toolchain improvement.

Additional Notes

The supported operator (op) list is documented for each platform; for example, see the Genio-700 supported operations.
Offline compilation provides the most reliable and performant pathway, but always verify full operator coverage for your model.
Online compilation is useful for development and for handling models with partial NPU support.

For detailed steps and additional guidance, refer to the MediaTek NeuroPilot offline tool documentation.

Topic		Replies	Views
Need support regarding online compilation for NPU NeuroPilot - Analytical AI Genio-510 , Audio	2	55	December 10, 2025
How to resolve inference failures due to unsupported operations on NPU? NeuroPilot - Analytical AI AI , Question	1	99	October 2, 2025
Which method should be chosen to convert platform-compatible TFLite (.tflite) models? NeuroPilot - Analytical AI AI , Question	1	72	December 10, 2025
TFLite model gives apusys memImport error in online compilation but works in offline pathway NeuroPilot - Analytical AI AI , IoT-Yocto , Genio-720	1	48	November 11, 2025
Unsupported Model Op on Yocto Stable Delegate not falling back to CPU NeuroPilot - Analytical AI AI , Question , IoT-Yocto	1	51	November 11, 2025

What is the difference between Online Compilation and Offline Compilation, and which one should be preferred and why?

Online Compilation

Offline Compilation

Best Practice and MediaTek Recommendation

Additional Notes

Related topics