Hello,
I am trying to execute Whisper TFLite model on Genio-700 using stable delegate but it leads to apusys memory error.
i get the below log
[apusys][error]memAlloc: alloc mem(160/0/0/0x5) fail(Cannot allocate memory)
[apusys][error]construct: Cmd v2(0xaaaafc834160): alloc execInfo fail
ERROR: APUSysEngine::BuildCmd() failed
ERROR: Failed to build APUSys command.
ERROR: Fail to rewrite DLA
ERROR: Fail to run APUSysRewritePass
WARNING: Fail to run runtime CompiledGraph pipeline
ERROR: Cannot prepare execution.
ERROR: Neuron returned error NEURON_BAD_STATE at line 1393 while creating Neuron execution.
what is this apusys memory error? My understanding is that the model ops is consuming memory which is far greater than the apusys memory and hence the error.
Any way we can add rule to NPU to allow ops consuming large memory to CPU for execution?
Thank you for this question.
The MediaTek Genio series only guarantees support for transformer-based models on platforms with MDLA version 5 or higher.
The Genio-700 platform uses MDLA3, so deployment and execution of large transformer models (such as Whisper) are not guaranteed to work on Genio-700.
APUSys Memory Allocation Error
The APUSys memory error indicates that the requested memory for the model’s operators exceeds the available APUSys resources (memory allocation failed). This is often encountered with large models or operations that have high memory requirements.
Operator Delegation and Hardware Assignment
MediaTek’s AI stack automatically decides the optimal hardware resource for each model operation (operator), delegating to the NPU (MDLA) or CPU according to model structure and system constraints.
Currently, there is no method available to manually assign specific operators to run on the CPU for large-memory scenarios. Hardware delegation is handled by the toolkit’s automatic optimization and cannot be overridden by user configuration.
Recommendations
- For transformer models such as Whisper, use a Genio platform with MDLA5 or above for full compatibility and performance.
- There is no user-level control for delegating large memory ops to the CPU; rely on the AI toolkit’s built-in automatic delegation.
- If hardware constraints prevent successful execution, consider reducing model size or switching to a platform with sufficient AI hardware resources.