[Qwen2.5-7B] [On-device Inference] Repetitive Output Content for Inference

Environment Information

Issue Description:
The interface for Qwen2.5-7B inference often produces repeated responses.

Solution:

  • Upgrade to NP8 and verify issue persistence.
  • Confirm post-PTQ model correctness.
  • Adjust configuration as per NP8 documentation. In config.json, set:
"mask_value": -10000
  • For PTQ Qwen2.5 models, enable rotation:
"rotate": true,
"rotate_mode": "ortho"

The solution was tested and verified on July 21th, 2025