[LLaMA3-8B] [PTQ] PTQ process is extreamingly slow

I try to deploy LLaMA3-8B onto my Genio 720, but I find the PTQ process (1_make_ptq_calibration_dataset.sh) is extreamingly slow, I am not sure if my system configurations are correct or NOT ?

The followings are my system configurations, hope you can provide some hints for my reference.

  • Host
    • Ubuntu 20.04
    • NVIDIA RTX A5000 * 2
  • Python packages
    • mtk-converter 8.13.0 (from NeuroPilot 8.0.7)
    • mtk_llm_sdk 2.8.2 (from mtk_llm_sdk 2.8.2)
    • mtk-quantization 8.2.0 (from NeuroPilot 8.0.7)
    • transformers 4.41.2
    • nvidia-cuda-runtime-cu12 12.1.105
  • Model
    • meta-llama/Meta-Llama-3-8B-Instruct (cloned from huggingface)