[Qwen2.5-0.5B] compile_generative.sh with --num-mdla=1 skips generative DLA output (no .dla file) on Genio 520

Environment:
- Platform: Genio-520
- Toolkit: GAI-Deployment-Toolkit-v2.0.8_qwen2.5-0.5b-1.5b-7b-v0.1
- Neuron SDK: 20250423_Neuron_SDK_v1.2517.03_neuron-8.0-release

Hi MediaTek team and community,

I’m using GAI-Deployment-Toolkit-v2.0.8_qwen2.5-0.5b-1.5b-7b-v0.1on Genio 520.

When compiling generative part with MDLA_NUM=1 (correct for 1x MDLA5.3), the script runs quickly, maps tensors (curr_keys_0 to … , MTKEXT_FULLY_CONNECTED, SILU, etc.), then finishes without generating any .dla file for generative.

- With MDLA_NUM=4: Compile pass but inference outputs repetitive “!!!”.

How can I fix this to make generative compilation work with --num-mdla=1 on Genio 520?

Thanks a lot!

I tried running the ncc-tflite command directly while removing some of the flags from the original script to test, and I got the error as shown in the attached log.

Hi dmd955,

A warm welcome to the MTK Community — we’re excited to have you here!.


Reference Documentation

Regarding your question, please refer to the following documentation for detailed guidance:

Compiling the Model


Compile Script for Prompt-Based and Generative TFLite Models

For prompt-based and generative TFLite models, we provide a corresponding compile script within the toolkit.

Please modify the script parameters as follows:

BACKEND="mdla5.3medma3.6"
L1_SIZE_KB="256"
NUM_MDLA="1"

Example Compilation Command

bash compile_generative.sh \
/path/to/Qwen2.5-0.5B-Instruct_asym4W_sym16A_Overall_hessian_wgt_opt_cum_layer_error_rotate_ortho_0_24layer_1t2048c_0.tflite \
/path/to/20250203_Neuron_SDK_v1.2506.01_neuron-8.0-release_release

If you encounter any issues during the process, please feel free to let us know.

Best regards,
Jing

Hi Jing,

While waiting for support from the community, I tried switching to LLaMA 3.2-1B and was able to successfully generate the DLA file. However, when deploying it on the device, I encountered the error shown below.

I would sincerely appreciate it if you could help me identify the cause of this issue and suggest how to resolve it. After fixing this problem, I plan to try the same workflow with Qwen 2.5 and will report the results back to you.

Thank you very much for your time and support. I really appreciate your help.

Best regards,
dmd955

Hi dmd955,

I would like to first align on the process you followed. During the DLA conversion stage, did you use the script provided in the toolkit compile directory, or did you manually convert the model using the ncc-tflite tool?

To avoid potential inference issues caused by missing compilation parameters, we strongly recommend using the official script provided in the toolkit for model conversion.

You can download the toolkit here:

GAI-Deployment-Toolkit-v2.0.6_llama3.2-1b-3b-v0.1.tar.gz

For detailed instructions, please refer to the following documentation:

https://neuropilot.mediatek.com/sphinx/neuropilot-8-basic-gai-full/8.0.7/l1_gai_toolkit/l2_model_deployment/tutorial_llm.html#compilation

Best regards,
Jing

Hi Jing,

I’ve checked again and confirmed that I can successfully compile and run LLaMA 3.2. However, with Qwen 2.5, I’m still encountering the issue: “No .dla file generated when running compile_generative.sh”, even after applying the adjustments you suggested.

BACKEND=“mdla5.3,edma3.6”
L1_SIZE_KB=“256”
NUM_MDLA=“1”

I am currently using the following SDK :

20250423_Neuron_SDK_v1.2517.03_neuron-8.0-release

Here are the logs/images when running compile_generative.sh:

Ảnh chụp màn hình 2026-03-30 100027

After running compile_generative.sh, it does not return any logs or generate any .dla file. This issue only occurs with compile_generative.sh, while compile_prompt_qwen2.5_0.5B_7B.sh still successfully generates .dla files and produces some log output such as:

(py38_test) bkav@bkav-Super-Server:~/Downloads/GAI_Toolkits/GAI-Deployment-Toolkit-v2.0.8_qwen2.5-0.5b-1.5b-7b-v0.1/compile$ ./compile_prompt_qwen2.5_0.5B_7B.sh 
/home/bkav/Downloads/GAI_Toolkits/GAI-Deployment-Toolkit-v2.0.8_qwen2.5-0.5b-1.5b-7b-v0.1/post_training_quantize/tflite/Qwen2.5-7B-Instruct_asym4W_sym16A_Overall_hessian_wgt_opt_cum_layer_error_rotate_ortho_0_128t2048c/Qwen2.5-7B-Instruct_asym4W_sym16A_Overall_hessian_wgt_opt_cum_layer_error_rotate_ortho_0_7layer_128t2048c_2.tflite 
/home/bkav/Downloads/SDK/Neuropilot_SDK/20250423_Neuron_SDK_v1.2517.03_neuron-8.0-release/neuron_sdk
WARNING: SMP is skipped since all backend targets are single-core or unknown.
WARNING: SMP is skipped since all backend targets are single-core or unknown.
WARNING: SMP is skipped since all backend targets are single-core or unknown.
WARNING: SMP is skipped since all backend targets are single-core or unknown.
WARNING: SMP is skipped since all backend targets are single-core or unknown.
Patch done!
WARNING: 16A4W FC signed input activation but not has 128 offset may cause the input value of sw workaround saturated.
WARNING: 16A4W FC signed input activation but not has 128 offset may cause the input value of sw workaround saturated.
WARNING: 16A4W FC signed input activation but not has 128 offset may cause the input value of sw workaround saturated.
WARNING: 16A4W FC signed input activation but not has 128 offset may cause the input value of sw workaround saturated.
DRAM Usage:
Target       Input  Output  Temp  Static  Code  Total
[ 0]: MDLA 5.3     29M    2.6M    20M   781M    765K  834M

Total Memory:
DRAM: 834M + 0 (Shared) = 834M
L1:   256K


Thank you very much for your time and support.

Best regards,
dmd955

Hi dmd955,

I’ve post a topic about issues when “Deploying Qwen2.5-0.5B onto Genio-720 with Android 15”, can you please share with me the current Genio 520 operating system one which you’re working on ?

If it is Android you’re woking on, are you using libneuron.so in vendor partition ? or you’re using the library in SDK ? my understanding is that “shell” user in Android does NOT have the right to load libneuron.so in vendor partition, how do you solve this probelm ?

Regards

Sting

Hi Sting,

I’m currently working on Genio 520 (mt8189) with Android 15.

I haven’t tested with Qwen2.5 yet because I haven’t been able to successfully generate the DLA file. However, I was able to run Llama 3.2 successfully and previously encountered a similar issue as you.

Regarding libneuron.so, I initially placed it under the vendor partition, but my app (running as the shell user) was unable to load it due to access restrictions.

To resolve this, I moved the related files (including configs) to the /system_ext partition (e.g., /system_ext/llm_sdk/...). Although it is still a system partition, it does not have the same access restrictions as the vendor partition, so the app can load and run successfully.

So in my case, I am not loading libneuron.so from vendor, but instead using a setup under system_ext, which works for the shell user.

Regards,
dmd955

Hi dmd955,

According to your input, I suppose you’re using the library in “Neuron_SDK_v1.2517.03_neuron-8.0-release_mt8171.tar.gz“, are you ?

I have the following further questions, hope you can provide more inout for my reference.

  • Are suing the essential MTK pre-built Android image ? or you build your Android from scratch ?
  • Which version of MTK off-line toos you’re using ? are you using this one ?

~Sting

Hi Sting,

I am using “20250423_Neuron_SDK_v1.2517.03_neuron-8.0-release.tar.gz” for MT8189.

I am currently working with the MTK pre-built Android image.

For the offline tools, I am using “neuropilot-sdk-basic-8.0.9-build20250611-001.tar.gz”.

Regards,
dmd955

Hi dmd955,

In order to prevent some unextected compatibility issues, it will be helpful if you can share the following system configurations with the community.

item Package Version Note
1 Host OS Are you using Ubuntu ? Ubuntu 20.04 or 22.04 or other?
2 Ptyhon Which python version are you using ?
3 offline tools neuropilot-sdk-basic-8.0.9-build20250611-001.tar.gz Where do you download neuropilot-sdk-basic-8.0.9-build20250611-001.tar.gz from ? I can find only neuropilot-sdk-basic-8.0.9-build20250611.tar.gz from NeuroPilot Online Documentation
4 libneuron.so and related 20250423_Neuron_SDK_v1.2517.03_neuron-8.0-release.tar.gz from NeuroPilot Online Documentation
5 llm sdk 3.4.2 or 2.8.2 ? from NeuroPilot Online Documentation
6 ncc-tflite are you using the one in neuropilot-sdk-basic-8.0.9-build20250611-001.tar.gz ? or the one in 20250423_Neuron_SDK_v1.2517.03_neuron-8.0-release.tar.gz ?

Regards

~Sting

Hi Sting_Cheng,

Here are my current system configurations:

  1. Host OS:
    Ubuntu 22.04

  2. Python:
    Python 3.8.20

  3. Offline tools:
    neuropilot-sdk-basic-8.0.9-build20250611-001.tar.gz
    (downloaded from NeuroPilot Online Documentation )

  4. libneuron.so and related:
    From 20250423_Neuron_SDK_v1.2517.03_neuron-8.0-release.tar.gz
    (downloaded from NeuroPilot Online Documentation )

  5. LLM SDK:
    mtk_llm_sdk 2.8.2

  6. ncc-tflite:
    Using the version from
    20250423_Neuron_SDK_v1.2517.03_neuron-8.0-release.tar.gz

Please let me know if you need any additional information.

Regards,