How does the `--opt-accuracy` flag affect ncc-tflite (TFLite-to-DLA) conversion results and performance?

apuovia · July 6, 2025, 3:52am

During MediaTek DLA model conversion with ncc-tflite, enabling the --opt-accuracy flag produces results that closely match CPU execution, while omitting this flag can lead to significant accuracy deviations.

What changes does --opt-accuracy introduce in the conversion process?
For which models is it necessary?
What is the impact on inference performance?

joying.kuo · July 6, 2025, 10:03am

The --opt-accuracy flag in ncc-tflite instructs Neuron to apply additional optimization methods that improve inference accuracy, aligning results more closely with CPU execution.
- Without --opt-accuracy, the tool prioritizes performance, sometimes causing quantization or operation simplifications that may significantly differ from FP32 reference results—especially when using --relax-fp32 (forcing FP16 execution on MDLA, since FP32 is generally unsupported).
- With --opt-accuracy, the compiler adjusts for known accuracy issues:
  - Converts int16 data to fp16.
  - Sets conv2d layer biases to zero and introduces a channel-wise addition for improved precision.
  - Increases average pooling (avgpool2d) cascade depth.
  - Other operations may gain extra steps to improve output fidelity.
- These changes may slightly increase model complexity and inference latency, but often result in near-reference accuracy.
When to use --opt-accuracy:
- Strongly recommended whenever output accuracy is critical for your application, or if initial conversion/testing results differ from expected CPU outputs.
- Particularly important for models with sensitive operations (conv2d, avgpool2d, or custom ops sensitive to quantization).
Performance impact:
- In practical tests (example: YOLOv3-tiny on MDLA 3.0), the average difference in per-inference latency was minimal (e.g., 9.19ms vs. 9.23ms per run), indicating negligible performance loss for most real-world use cases.
- The model’s size or runtime may slightly increase due to extra steps introduced for accuracy.

Recommendation:
Always validate results with and without --opt-accuracy. When in doubt, prioritize using --opt-accuracy to ensure alignment with CPU inference behavior, especially for production or customer-facing AI pipelines.

Topic		Replies	Views
Impact and Recommendations for NCC-TFLite `--opt` Optimization Levels NeuroPilot Question	1	18	June 19, 2025
Why Does Ncc-TFLite(TFLite-to-DLA) Conversion Fail Without Error Log on Genio (Neuron6) Platforms? NeuroPilot AI , Question , Android , IoT-Yocto	1	14	July 5, 2025
What is the difference between `NP_ACCELERATION_NNAPI` and `NP_ACCELERATION_NEURON` in TFLiteRunner? NeuroPilot AI , Question , Android	1	8	July 5, 2025
What do the parameters in the TFLiteRunner command mean, and why does NNAPI delegate fail while Neuron Delegate succeeds? NeuroPilot AI , Question , Android	1	16	July 5, 2025
How does MDLA handle FP32 models on Genio platform, and what are the accuracy and deployment implications of `relax-fp32`? NeuroPilot AI , Question	1	15	June 20, 2025

How does the `--opt-accuracy` flag affect ncc-tflite (TFLite-to-DLA) conversion results and performance?

Related topics