Hi,
I am working on running a Whisper encoder model on the MediaTek NPU (MDLA). The model is converted to DLA format using ncc-tflite with --relax-fp32. Most of the time, the NPU executes successfully and produces the expected output.bin.
However, intermittently the NPU crashes and prints an EL3 exception like the following:
INFO: dlopen libneuronusdk_runtime.mtk.so
Unhandled Exception in EL3.
x30 = 0x0000000054603e58
x0 = 0x000000000d298000
x1 = 0x0000000000000000
x2 = 0x0000000000000001
x3 = 0x0000000000000000
x4 = 0x0000000000000000
x5 = 0x0000000000000001
x6 = 0x0000000054604e04
x7 = 0x0000000000000001
x8 = 0x0000000000000000
x9 = 0x0000000041453060
x10 = 0x0000000000000000
x11 = 0x0000000054661cb0
x12 = 0x0000000054664040
x13 = 0x3232206c6c616320
x14 = 0x0000000054670e22
x15 = 0x000000005460cc70
x16 = 0x0000000000000042
x17 = 0xffff80008002aeb4
x18 = 0x0000000000000735
x19 = 0x000000001900100c
x20 = 0x0000000054670e18
x21 = 0x0000000000000001
x22 = 0xffff0000c0b3a410
x23 = 0xffff800085b639d8
x24 = 0x0000000000000010
x25 = 0xffff0000c7827e80
x26 = 0xffff0000ff6b9ee8
x27 = 0xffff80007c6d4000
x28 = 0xffff0000ca3074d0
x29 = 0x0000000054663fc0
scr_el3 = 0x0000000000000735
sctlr_el3 = 0x0000000030cd183f
cptr_el3 = 0x0000000000000000
tcr_el3 = 0x0000000080803520
daif = 0x00000000000003c0
mair_el3 = 0x00000000004404ff
spsr_el3 = 0x00000000004002cc
elr_el3 = 0x0000000054603e6c
ttbr0_el3 = 0x0000000054670f01
esr_el3 = 0x0000000096000046
far_el3 = 0x000000000d298000
spsr_el1 = 0x0000000000000000
elr_el1 = 0x0000000000000000
spsr_abt = 0x00000000cec7f220
spsr_und = 0x00000000fadfff22
spsr_irq = 0x00000000fedff3f2
spsr_fiq = 0x00000000fe1fe3f9
sctlr_el1 = 0x0000000030500801
actlr_el1 = 0x0000000000000000
cpacr_el1 = 0x0000000000300000
csselr_el1 = 0x0000000000000000
sp_el1 = 0x0000000000000000
esr_el1 = 0x0000000000000000
ttbr0_el1 = 0x0000000066d88000
ttbr1_el1 = 0x0000000066d8b000
mair_el1 = 0x000000040044ffff
amair_el1 = 0x0000000000000000
tcr_el1 = 0x000000f2b5d03590
tpidr_el1 = 0x0000000000000000
tpidr_el0 = 0x0000000000000000
tpidrro_el0 = 0x0000000000000000
par_el1 = 0x0000000000000000
mpidr_el1 = 0x0000000081000000
afsr0_el1 = 0x0000000000000000
afsr1_el1 = 0x0000000000000000
contextidr_el1 = 0x0000000000000000
vbar_el1 = 0x0000000000000000
cntp_ctl_el0 = 0x0000000000000000
cntp_cval_el0 = 0x0000000000000000
cntv_ctl_el0 = 0x0000000000000000
cntv_cval_el0 = 0x0000000000000000
cntkctl_el1 = 0x0000000000000000
sp_el0 = 0x0000000054663fc0
isr_el1 = 0x0000000000000040
cpuectlr_el1 = 0x000000002808bc00
Sometimes the same DLA file runs perfectly and produces the correct result, but other times it crashes with this EL3 exception during NPU execution.
Can you please help me understand:
-
What typically causes EL3-level exceptions when executing MDLA/DLA workloads?
-
Does this point to a memory access violation inside the MDLA driver or firmware?
-
Is this likely due to
-
incorrect DLA conversion,
-
missing input alignment,
-
buffer size mismatch,
-
firmware instability, or
-
a known MDLA runtime issue?
-
-
How can I debug this further?
-
Are there logs I should enable (e.g., MDLA runtime / RPC logs)?
-
Any recommended checks for input/output tensor alignment or address boundaries?
-
Any guidance to help resolve or debug this intermittent crash would be greatly appreciated.
Thank you.