[Qwen2.5-7B] No .dla file generated when running compile_generative.sh on Genio 520

Environment:

  • Platform: Genio-520
  • Toolkit: GAI-Deployment-Toolkit-v2.0.8_qwen2.5-0.5b-1.5b-7b-v0.1
  • Neuron SDK: 20250423_Neuron_SDK_v1.2517.03_neuron-8.0-release.tar.gz

Hi MediaTek team and community,

With Qwen 2.5, I’m still encountering the issue: “No .dla file generated when running compile_generative.sh”, even after applying the adjustments you suggested.

BACKEND=“mdla5.3,edma3.6”
L1_SIZE_KB=“256”
NUM_MDLA=“1”

Here are the logs/images when running compile_generative.sh:

Ảnh chụp màn hình 2026-03-30 100027

After running compile_generative.sh, it does not return any logs or generate any .dla file. This issue only occurs with compile_generative.sh, while compile_prompt_qwen2.5_0.5B_7B.sh still successfully generates .dla files and produces some log output such as:

(py38_test) bkav@bkav-Super-Server:~/Downloads/GAI_Toolkits/GAI-Deployment-Toolkit-v2.0.8_qwen2.5-0.5b-1.5b-7b-v0.1/compile$ ./compile_prompt_qwen2.5_0.5B_7B.sh 
/home/bkav/Downloads/GAI_Toolkits/GAI-Deployment-Toolkit-v2.0.8_qwen2.5-0.5b-1.5b-7b-v0.1/post_training_quantize/tflite/Qwen2.5-7B-Instruct_asym4W_sym16A_Overall_hessian_wgt_opt_cum_layer_error_rotate_ortho_0_128t2048c/Qwen2.5-7B-Instruct_asym4W_sym16A_Overall_hessian_wgt_opt_cum_layer_error_rotate_ortho_0_7layer_128t2048c_2.tflite 
/home/bkav/Downloads/SDK/Neuropilot_SDK/20250423_Neuron_SDK_v1.2517.03_neuron-8.0-release/neuron_sdk
WARNING: SMP is skipped since all backend targets are single-core or unknown.
WARNING: SMP is skipped since all backend targets are single-core or unknown.
WARNING: SMP is skipped since all backend targets are single-core or unknown.
WARNING: SMP is skipped since all backend targets are single-core or unknown.
WARNING: SMP is skipped since all backend targets are single-core or unknown.
Patch done!
WARNING: 16A4W FC signed input activation but not has 128 offset may cause the input value of sw workaround saturated.
WARNING: 16A4W FC signed input activation but not has 128 offset may cause the input value of sw workaround saturated.
WARNING: 16A4W FC signed input activation but not has 128 offset may cause the input value of sw workaround saturated.
WARNING: 16A4W FC signed input activation but not has 128 offset may cause the input value of sw workaround saturated.
DRAM Usage:
Target       Input  Output  Temp  Static  Code  Total
[ 0]: MDLA 5.3     29M    2.6M    20M   781M    765K  834M

Total Memory:
DRAM: 834M + 0 (Shared) = 834M
L1:   256K


Thank you very much for your time and support.

Best regards,
dmd955

I added some flags to the compile_generative.sh script:

–verbose
–show-exec-plan
–show-memory-summary
–check-target-only

After running compile_generative.sh, I found logs like this:

OP[431]: CUSTOM: MTKEXT_RMS_NORMALIZATION
├ MDLA: Cannot support Float32 input
├ MDLA: Cannot support Float32 output
├ EDMA: unsupported operation
OP[432]: QUANTIZE
├ MDLA: Cannot support Float32 input
├ EDMA: unsupported operation
OP[435]: DEQUANTIZE
├ MDLA: Cannot support Float32 output
├ EDMA: unsupported operation
OP[436]: CUSTOM: MTKEXT_SILU
├ MDLA: Cannot support Float32 input
├ MDLA: Cannot support Float32 output
├ EDMA: unsupported operation
OP[437]: QUANTIZE
├ MDLA: Cannot support Float32 input
├ EDMA: unsupported operation
[ncc-tflite] Success

Hi dmd955,

I am able to successfully build qwen2.5-0.5b-1.5b-7b-v0.1 and deploy 0.5b onto my Genio 720 with Android 15.

Here are my configurations for your reference.

  • Install NeuroPilot 8.0.7 and offline tools in it
  • Install associated Neuron SDK (20250203_Neuron_SDK_v1.2506.01_neuron-8.0-release.tar.gz)
  • Install mtk_llm_sdk 2.8.2
  • During inference stage
    • Please use build_all_usdk instead of build_all, this is because you’re going to use/load libneuron.so in Neuron SDK, but NOT the one in Android vendor partition due to security constarints
    • you need to also adb push, onto your Android device, the libraries in 20250203_Neuron_SDK_v1.2506.01_neuron-8.0-release_release/mt8189/lib
      • And also set LD_LIBRARY_PATH accordingly when executing main on your Android device

Regards

~Sting

Hi Sting_Cheng,

Could you confirm whether you modified the compile_generative.sh file with the following settings?

BACKEND="mdla5.3,edma3.6"
L1_SIZE_KB="256"
NUM_MDLA="1"

I encountered an issue where the .dla file could not be generated after applying these configurations.

Regards,
dmd955

Hi dmd955,

Yes, I’ve patch my compile_generative.sh with the same settings with yours.

BTW, I’ve also modified “Qwen2.5-0.5B-Instruct/config.json“ and add the followings @ the end.

  "mask_value": -10000,
  "rotate": true,
  "rotate_mode": "ortho"

Regards

Sting

Hello @dmd955
these configs are correct. Can you confirm the version of NeuroPilot that you are using? We recommend v8.0.7 for G720

Hi Suyash_Narain,

I am currently using NeuroPilot version 8.0.7.

Details are as follows:

  • Neuron SDK: 20250423_Neuron_SDK_v1.2517.03_neuron-8.0-release_release
    (There are two folders: MT8189 and host. I am using the host folder.)

  • Offline tools: neuropilot-sdk-basic-8.0.7-build20250122

Best regards,
dmd955