Hi everyone,
I’m working on a model where I want to integrate an Instance Normalization layer. I’m using the MTK converter tool to convert the model into a quantized TFLite model. By default, the converter uses the MTKEXT_INSTANCE_NORMALIZATION operator.
However, in the quantized model, this operator significantly slows down execution because it adds dequantize before and quantize after the operator. This causes the network to run very slowly when using the neuronrt tool.
Using ncc-tflite tool, I noticed that the model is partitioned into many subgraphs, which leads to inefficient execution. I also tried enabling several optimization options in ncc-tflite, but this did not improve the execution speed in neuronrt.
The MTK converter provides an option converter.decompose_instance_normalization_ops. When I enable it, the converter replaces the instance normalization with basic operations (mean, multiply, etc.), which allows ncc-tflite to compile the model into fewer subgraphs.
The problem is that even though the model successfully compiles to .dla, running it with neuronrt fails and reports an error related to parallel execution.
I would appreciate your help with the following questions:
- Is there a recommended way to build the graph efficiently when using an instance normalization layer with the ncc-tflite tool?
- Why does the neuronrt tool report a parallel execution failure when I replace the instance normalization layer with simpler operators, even though ncc-tflite compiles the model successfully?
Any insights or guidance on handling instance normalization efficiently would be greatly appreciated.
Thank you in advance!