I try to deploy LLaMA3-8B onto my Genio 720, but I find the PTQ process (1_make_ptq_calibration_dataset.sh) is extreamingly slow, I am not sure if my system configurations are correct or NOT ?
The followings are my system configurations, hope you can provide some hints for my reference.
- Host
- Ubuntu 20.04
- NVIDIA RTX A5000 * 2
- Python packages
- mtk-converter 8.13.0 (from NeuroPilot 8.0.7)
- mtk_llm_sdk 2.8.2 (from mtk_llm_sdk 2.8.2)
- mtk-quantization 8.2.0 (from NeuroPilot 8.0.7)
- transformers 4.41.2
- nvidia-cuda-runtime-cu12 12.1.105
- Model
- meta-llama/Meta-Llama-3-8B-Instruct (cloned from huggingface)