Critical GPU Hang and System Freeze on G720 EVK - Ubuntu 24.04

Dear MediaTek Support Team,

We are currently working with the G720 EVK platform and encountering a critical system hang issue.

Environment:

  • Hardware: G720 EVK (MT8391)

  • OS: Ubuntu 24.04

Issue Description: The entire system hangs and becomes completely unresponsive during operation. The only way to recover the system is via a manual hardware reset. This issue occurs randomly.

Error Message: The following error is captured via serial console/dmesg right before the system freeze: [ 5177.082998] mali 13000000.gpu: Failed to soft-reset GPU (timed out after 500 ms), now attempting a hard reset


root@ubuntu:/home/ubuntu# dmesg | grep -i mali | tail -n 20
[    0.630658] mali 13000000.gpu: Kernel DDK version r48p0-01eac0
[    0.630669] mali 13000000.gpu: Looking for irq names JOB in lower case
[    0.630707] mali 13000000.gpu: Looking for irq names MMU in lower case
[    0.630722] mali 13000000.gpu: Looking for irq names GPU in lower case
[    0.630746] mali 13000000.gpu: No supply-names in gpu node, use default names
[    0.631237] mali 13000000.gpu: Platform: mt8391, MC2
[    0.631241] mali 13000000.gpu: Setup for 2 cores/pm domains
[    0.633319] mali: [kbase_gpuprops_parse_gpu_id] Detected G-57 variant with GPU ID 0x90930010. Changing the ID to 0x90910010.
[    0.633337] mali 13000000.gpu: Register LUT 00090000 initialized for GPU arch 0x00090009
[    0.633344] mali 13000000.gpu: mali [kbase_backend_gpuprops_get] Detected G-57 variant with GPU ID 0x90930010. Changing the ID to 0x90910010.
[    0.633360] mali 13000000.gpu: GPU identified as 0x1 arch 9.0.9 r0p1 status 0
[    0.633516] mali 13000000.gpu: No priority control manager is configured
[    0.633519] mali 13000000.gpu: Large page support was disabled at compile-time!
[    0.633565] mali 13000000.gpu: No memory group manager is configured
[    0.634229] mali 13000000.gpu: recalculation of power model mali-simple-power-model returned error -517
[    0.634268] mali 13000000.gpu: IPA initialization failed
[    0.634270] mali 13000000.gpu: Continuing without devfreq
[    0.634646] mali 13000000.gpu: * MALI kbase_mmap_min_addr compiled to CONFIG_DEFAULT_MMAP_MIN_ADDR, no runtime update possible! *
[    0.634649] mali 13000000.gpu: Probed as mali0

Could you provide guidance on how to further debug the GPU state once this timeout occurs?

Please let us know what additional logs or configuration files (such as Device Tree or Kernel Config) you require to investigate this further.

Best Regards

Hi Percy,

Thanks for your reporting this issue.

We have already addressed this issue in our latest development cycle. The fix is integrated into the upcoming Beta version.

While it will take a short amount of time to complete the release process, please stay tuned for the update. We will notify you as soon as the Beta version is live.

Thanks

好的,感谢。