How does the `--opt-accuracy` flag affect ncc-tflite (TFLite-to-DLA) conversion results and performance?

During MediaTek DLA model conversion with ncc-tflite, enabling the --opt-accuracy flag produces results that closely match CPU execution, while omitting this flag can lead to significant accuracy deviations.

  • What changes does --opt-accuracy introduce in the conversion process?

  • For which models is it necessary?

  • What is the impact on inference performance?

  • The --opt-accuracy flag in ncc-tflite instructs Neuron to apply additional optimization methods that improve inference accuracy, aligning results more closely with CPU execution.

    • Without --opt-accuracy, the tool prioritizes performance, sometimes causing quantization or operation simplifications that may significantly differ from FP32 reference results—especially when using --relax-fp32 (forcing FP16 execution on MDLA, since FP32 is generally unsupported).
    • With --opt-accuracy, the compiler adjusts for known accuracy issues:
      • Converts int16 data to fp16.
      • Sets conv2d layer biases to zero and introduces a channel-wise addition for improved precision.
      • Increases average pooling (avgpool2d) cascade depth.
      • Other operations may gain extra steps to improve output fidelity.
    • These changes may slightly increase model complexity and inference latency, but often result in near-reference accuracy.
  • When to use --opt-accuracy:

    • Strongly recommended whenever output accuracy is critical for your application, or if initial conversion/testing results differ from expected CPU outputs.
    • Particularly important for models with sensitive operations (conv2d, avgpool2d, or custom ops sensitive to quantization).
  • Performance impact:

    • In practical tests (example: YOLOv3-tiny on MDLA 3.0), the average difference in per-inference latency was minimal (e.g., 9.19ms vs. 9.23ms per run), indicating negligible performance loss for most real-world use cases.
    • The model’s size or runtime may slightly increase due to extra steps introduced for accuracy.

Recommendation:
Always validate results with and without --opt-accuracy. When in doubt, prioritize using --opt-accuracy to ensure alignment with CPU inference behavior, especially for production or customer-facing AI pipelines.