I’m using GAI-Deployment-Toolkit-v2.0.8_qwen2.5-0.5b-1.5b-7b-v0.1on Genio 520.
When compiling generative part with MDLA_NUM=1 (correct for 1x MDLA5.3), the script runs quickly, maps tensors (curr_keys_0 to … , MTKEXT_FULLY_CONNECTED, SILU, etc.), then finishes without generating any .dla file for generative.
- With MDLA_NUM=4: Compile pass but inference outputs repetitive “!!!”.
How can I fix this to make generative compilation work with --num-mdla=1 on Genio 520?
Thanks a lot!
I tried running the ncc-tflite command directly while removing some of the flags from the original script to test, and I got the error as shown in the attached log.
While waiting for support from the community, I tried switching to LLaMA 3.2-1B and was able to successfully generate the DLA file. However, when deploying it on the device, I encountered the error shown below.
I would sincerely appreciate it if you could help me identify the cause of this issue and suggest how to resolve it. After fixing this problem, I plan to try the same workflow with Qwen 2.5 and will report the results back to you.
Thank you very much for your time and support. I really appreciate your help.
I would like to first align on the process you followed. During the DLA conversion stage, did you use the script provided in the toolkit compile directory, or did you manually convert the model using the ncc-tflite tool?
To avoid potential inference issues caused by missing compilation parameters, we strongly recommend using the official script provided in the toolkit for model conversion.
I’ve checked again and confirmed that I can successfully compile and run LLaMA 3.2. However, with Qwen 2.5, I’m still encountering the issue: “No .dla file generated when running compile_generative.sh”, even after applying the adjustments you suggested.
Here are the logs/images when running compile_generative.sh:
After running compile_generative.sh, it does not return any logs or generate any .dla file. This issue only occurs with compile_generative.sh, while compile_prompt_qwen2.5_0.5B_7B.sh still successfully generates .dla files and produces some log output such as:
(py38_test) bkav@bkav-Super-Server:~/Downloads/GAI_Toolkits/GAI-Deployment-Toolkit-v2.0.8_qwen2.5-0.5b-1.5b-7b-v0.1/compile$ ./compile_prompt_qwen2.5_0.5B_7B.sh
/home/bkav/Downloads/GAI_Toolkits/GAI-Deployment-Toolkit-v2.0.8_qwen2.5-0.5b-1.5b-7b-v0.1/post_training_quantize/tflite/Qwen2.5-7B-Instruct_asym4W_sym16A_Overall_hessian_wgt_opt_cum_layer_error_rotate_ortho_0_128t2048c/Qwen2.5-7B-Instruct_asym4W_sym16A_Overall_hessian_wgt_opt_cum_layer_error_rotate_ortho_0_7layer_128t2048c_2.tflite
/home/bkav/Downloads/SDK/Neuropilot_SDK/20250423_Neuron_SDK_v1.2517.03_neuron-8.0-release/neuron_sdk
WARNING: SMP is skipped since all backend targets are single-core or unknown.
WARNING: SMP is skipped since all backend targets are single-core or unknown.
WARNING: SMP is skipped since all backend targets are single-core or unknown.
WARNING: SMP is skipped since all backend targets are single-core or unknown.
WARNING: SMP is skipped since all backend targets are single-core or unknown.
Patch done!
WARNING: 16A4W FC signed input activation but not has 128 offset may cause the input value of sw workaround saturated.
WARNING: 16A4W FC signed input activation but not has 128 offset may cause the input value of sw workaround saturated.
WARNING: 16A4W FC signed input activation but not has 128 offset may cause the input value of sw workaround saturated.
WARNING: 16A4W FC signed input activation but not has 128 offset may cause the input value of sw workaround saturated.
DRAM Usage:
Target Input Output Temp Static Code Total
[ 0]: MDLA 5.3 29M 2.6M 20M 781M 765K 834M
Total Memory:
DRAM: 834M + 0 (Shared) = 834M
L1: 256K
If it is Android you’re woking on, are you using libneuron.so in vendor partition ? or you’re using the library in SDK ? my understanding is that “shell” user in Android does NOT have the right to load libneuron.so in vendor partition, how do you solve this probelm ?
I’m currently working on Genio 520 (mt8189) with Android 15.
I haven’t tested with Qwen2.5 yet because I haven’t been able to successfully generate the DLA file. However, I was able to run Llama 3.2 successfully and previously encountered a similar issue as you.
Regarding libneuron.so, I initially placed it under the vendor partition, but my app (running as the shell user) was unable to load it due to access restrictions.
To resolve this, I moved the related files (including configs) to the /system_ext partition (e.g., /system_ext/llm_sdk/...). Although it is still a system partition, it does not have the same access restrictions as the vendor partition, so the app can load and run successfully.
So in my case, I am not loading libneuron.so from vendor, but instead using a setup under system_ext, which works for the shell user.
Where do you download neuropilot-sdk-basic-8.0.9-build20250611-001.tar.gz from ? I can find only neuropilot-sdk-basic-8.0.9-build20250611.tar.gz from NeuroPilot Online Documentation
are you using the one in neuropilot-sdk-basic-8.0.9-build20250611-001.tar.gz ? or the one in 20250423_Neuron_SDK_v1.2517.03_neuron-8.0-release.tar.gz ?