[Qwen2.5-0.5B] compile_generative.sh with --num-mdla=1 skips generative DLA output (no .dla file) on Genio 520

Environment:
- Platform: Genio-520
- Toolkit: GAI-Deployment-Toolkit-v2.0.8_qwen2.5-0.5b-1.5b-7b-v0.1
- Neuron SDK: 20250423_Neuron_SDK_v1.2517.03_neuron-8.0-release

Hi MediaTek team and community,

I’m using GAI-Deployment-Toolkit-v2.0.8_qwen2.5-0.5b-1.5b-7b-v0.1on Genio 520.

When compiling generative part with MDLA_NUM=1 (correct for 1x MDLA5.3), the script runs quickly, maps tensors (curr_keys_0 to … , MTKEXT_FULLY_CONNECTED, SILU, etc.), then finishes without generating any .dla file for generative.

- With MDLA_NUM=4: Compile pass but inference outputs repetitive “!!!”.

How can I fix this to make generative compilation work with --num-mdla=1 on Genio 520?

Thanks a lot!

I tried running the ncc-tflite command directly while removing some of the flags from the original script to test, and I got the error as shown in the attached log.

Hi dmd955,

A warm welcome to the MTK Community — we’re excited to have you here!.


Reference Documentation

Regarding your question, please refer to the following documentation for detailed guidance:

Compiling the Model


Compile Script for Prompt-Based and Generative TFLite Models

For prompt-based and generative TFLite models, we provide a corresponding compile script within the toolkit.

Please modify the script parameters as follows:

BACKEND="mdla5.3medma3.6"
L1_SIZE_KB="256"
NUM_MDLA="1"

Example Compilation Command

bash compile_generative.sh \
/path/to/Qwen2.5-0.5B-Instruct_asym4W_sym16A_Overall_hessian_wgt_opt_cum_layer_error_rotate_ortho_0_24layer_1t2048c_0.tflite \
/path/to/20250203_Neuron_SDK_v1.2506.01_neuron-8.0-release_release

If you encounter any issues during the process, please feel free to let us know.

Best regards,
Jing

Hi Jing,

While waiting for support from the community, I tried switching to LLaMA 3.2-1B and was able to successfully generate the DLA file. However, when deploying it on the device, I encountered the error shown below.

I would sincerely appreciate it if you could help me identify the cause of this issue and suggest how to resolve it. After fixing this problem, I plan to try the same workflow with Qwen 2.5 and will report the results back to you.

Thank you very much for your time and support. I really appreciate your help.

Best regards,
dmd955

Hi dmd955,

I would like to first align on the process you followed. During the DLA conversion stage, did you use the script provided in the toolkit compile directory, or did you manually convert the model using the ncc-tflite tool?

To avoid potential inference issues caused by missing compilation parameters, we strongly recommend using the official script provided in the toolkit for model conversion.

You can download the toolkit here:

GAI-Deployment-Toolkit-v2.0.6_llama3.2-1b-3b-v0.1.tar.gz

For detailed instructions, please refer to the following documentation:

https://neuropilot.mediatek.com/sphinx/neuropilot-8-basic-gai-full/8.0.7/l1_gai_toolkit/l2_model_deployment/tutorial_llm.html#compilation

Best regards,
Jing