Hi,
I have a TFLite model which has input shape 518x518. When i execute the model on stable_delegate, i get the below apusys memImport error
[apusys][info]construct: Cmd v5(0xaaaae002c9c0): total_vlm_size(516096)
INFO: Explicitly applied STABLE_DELEGATE delegate, and the model graph will be partially executed by the delegate w/ 25 delegate kernels.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: The input model file size (MB): 99.1302
INFO: Initialized session in 4630.01ms.
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
[apusys][error]memImport: import mem(89) fail(Cannot allocate memory)
[apusys][error]memImport: import mem(0x0/45045600) handle(89) flags(0) fail(Cannot allocate memory)
[apusys][error]memImport: import memory(89/45045600) fail
ERROR: APUSysEngine::MemImportV1() failed to import handle 89
ERROR: APUSys imports ION buffer fd = 89 failed
ERROR: HintFrontendBuffer() failed on input #0, buffer addr = 0xfffe4fd76000 on DMM = neuron::platforms::apusys::V2_0::APUSysEngine
ERROR: Neuron returned error NEURON_BAD_DATA at line 1506 while associating NNAPI execution input with a memory object.
ERROR: Node number 827 (DELEGATE) failed to invoke.
INFO: count=1 curr=2151260
ERROR: Benchmarking failed.
i created a quick script to figure out the ops consuming max memory in my model, the output for which is as below:
— Top 10 Largest Tensors in model.tflite —
Tensor Index: 230 Size: 42.96 MB Shape: [6, 1370, 1370] Produced by: Op 32 (BATCH_MATMUL)
Tensor Index: 231 Size: 42.96 MB Shape: [6, 1370, 1370] Produced by: Op 33 (SOFTMAX)
Tensor Index: 289 Size: 42.96 MB Shape: [6, 1370, 1370] Produced by: Op 91 (BATCH_MATMUL)
Tensor Index: 290 Size: 42.96 MB Shape: [6, 1370, 1370] Produced by: Op 92 (SOFTMAX)
Tensor Index: 348 Size: 42.96 MB Shape: [6, 1370, 1370] Produced by: Op 150 (BATCH_MATMUL)
Tensor Index: 349 Size: 42.96 MB Shape: [6, 1370, 1370] Produced by: Op 151 (SOFTMAX)
Tensor Index: 407 Size: 42.96 MB Shape: [6, 1370, 1370] Produced by: Op 209 (BATCH_MATMUL)
Tensor Index: 408 Size: 42.96 MB Shape: [6, 1370, 1370] Produced by: Op 210 (SOFTMAX)
Tensor Index: 466 Size: 42.96 MB Shape: [6, 1370, 1370] Produced by: Op 268 (BATCH_MATMUL)
Tensor Index: 467 Size: 42.96 MB Shape: [6, 1370, 1370] Produced by: Op 269 (SOFTMAX)
But the issue is, if I use offline pathway instead, the model gets compiled to DLA using ncc-tflite without any issue. Using --show-memory-summary flag, i get below output for the model:
DRAM Usage:
Target Input Output Temp Static Code Total
[ 0]: eDMA 3.6 3.1M 0 0 0 64 3.1M
[ 1]: MDLA 5.3 0 1.0M 0 47M 171K 48M
Total Memory:
DRAM: 51M + 50M (Shared) = 102M
L1: 896K
TargetExecutionOrder:
APUSYS_2_0
Herein, static memory usage is again 47mb, and it can be compiled to DLA easily,
So why when we are using online compilation, we end up getting memImport error? Is this error in compilation stage or execution and what are the NPU memory parameter limits for G720?