Executing TFLite Whisper Model on NPU leads to apusys memory error

Suyash_Narain · July 17, 2025, 6:22pm

Hello,

I am trying to execute Whisper TFLite model on Genio-700 using stable delegate but it leads to apusys memory error.

i get the below log

[apusys][error]memAlloc: alloc mem(160/0/0/0x5) fail(Cannot allocate memory)
[apusys][error]construct: Cmd v2(0xaaaafc834160): alloc execInfo fail
ERROR: APUSysEngine::BuildCmd() failed
ERROR: Failed to build APUSys command.
ERROR: Fail to rewrite DLA
ERROR: Fail to run APUSysRewritePass
WARNING: Fail to run runtime CompiledGraph pipeline
ERROR: Cannot prepare execution.
ERROR: Neuron returned error NEURON_BAD_STATE at line 1393 while creating Neuron execution.

what is this apusys memory error? My understanding is that the model ops is consuming memory which is far greater than the apusys memory and hence the error.
Any way we can add rule to NPU to allow ops consuming large memory to CPU for execution?

joying.kuo · September 23, 2025, 1:44pm

Thank you for this question.

The MediaTek Genio series only guarantees support for transformer-based models on platforms with MDLA version 5 or higher.
The Genio-700 platform uses MDLA3, so deployment and execution of large transformer models (such as Whisper) are not guaranteed to work on Genio-700.

APUSys Memory Allocation Error

The APUSys memory error indicates that the requested memory for the model’s operators exceeds the available APUSys resources (memory allocation failed). This is often encountered with large models or operations that have high memory requirements.

Operator Delegation and Hardware Assignment

MediaTek’s AI stack automatically decides the optimal hardware resource for each model operation (operator), delegating to the NPU (MDLA) or CPU according to model structure and system constraints.
Currently, there is no method available to manually assign specific operators to run on the CPU for large-memory scenarios. Hardware delegation is handled by the toolkit’s automatic optimization and cannot be overridden by user configuration.

Recommendations

For transformer models such as Whisper, use a Genio platform with MDLA5 or above for full compatibility and performance.
There is no user-level control for delegating large memory ops to the CPU; rely on the AI toolkit’s built-in automatic delegation.
If hardware constraints prevent successful execution, consider reducing model size or switching to a platform with sufficient AI hardware resources.

Topic		Replies	Views
TFLite model gives apusys memImport error in online compilation but works in offline pathway NeuroPilot - Analytical AI AI , IoT-Yocto , Genio-720	1	48	November 11, 2025
NPU Deployment Issue — Whisper Model (Genio 510) NeuroPilot - Analytical AI Genio-510 , AI , Question , Ubuntu	1	71	December 9, 2025
Unsupported Model Op on Yocto Stable Delegate not falling back to CPU NeuroPilot - Analytical AI AI , Question , IoT-Yocto	1	51	November 11, 2025
Need support regarding online compilation for NPU NeuroPilot - Analytical AI Genio-510 , Audio	2	55	December 10, 2025
Garbage transcription when using Whisper encoder converted via Neuropilot SDK (DLA) with PyTorch decoder NeuroPilot - Analytical AI Genio-510 , AI	1	48	November 28, 2025

Executing TFLite Whisper Model on NPU leads to apusys memory error

APUSys Memory Allocation Error

Operator Delegation and Hardware Assignment

Recommendations

Related topics