How to Monitor NPU resource utilization during inference?

apuovia · June 19, 2025, 3:10am

During model inference, how can developers monitor real-time NPU (APU) resource utilization or workload distribution? Are there any recommended tools, methods, or official documentation for observing NPU usage or debugging performance issues?

joying.kuo · October 1, 2025, 5:50am

Thank you for your question.

MediaTek provides several options and recommendations for monitoring NPU resource utilization, analyzing workload distribution, and debugging AI inference performance on different Genio platforms.

Neuron Studio Profiler (Genio-720/Genio-520, Yocto/Android)

Starting later this year (by the end of 2025), MediaTek will support Neuron Studio Profiler for the Genio-720 and Genio-520 platforms (both Yocto and Android).
Neuron Studio Profiler is a comprehensive profiling tool that helps developers analyze and optimize AI models running on the NPU:

Key Features:
- Real-time visualization of NPU load, frequency, and bandwidth utilization
- Detailed performance breakdown by model and operator
- Exportable reports for profiling and optimization
Documentation:
For an overview and feature guide, see the Neuron Studio documentation (MediaTek internal access required).

Note: Neuron Studio Profiler will not be provided for legacy platforms such as Genio-350, Genio-510, Genio-700, or Genio-1200.

Monitoring CPU and DRAM Utilization

For a system-level overview, developers may use classic Linux tools such as:

top -h: Monitor CPU and DRAM usage in real time.
htop (if installed): Offers advanced filtering and visualization.

These tools help to identify bottlenecks when NPU performance is impacted by overall system load.

Monitoring GPU Utilization

Platform and OS-specific methods are recommended to observe GPU activity:

On IoT Yocto:
Query aggregate GPU utilization using:

cat /sys/devices/platform/soc/13000000.mali/gpu_utilization

On Android:
Query the GPU load via:
```
cat /sys/module/ged/parameters/gpu_loading
```
The result will be a single utilization value (0–100) representing total system GPU load.

Tip: Per-process GPU utilization is not currently supported. To estimate the usage attributable to a single application:

Measure the utilization before launching the application.

Measure again with only your application running.

Subtract the baseline to estimate your app’s GPU impact (some variability may exist).

Best Practices & Recommendations

For NPU workload profiling and debugging on Genio-720/520, use Neuron Studio Profiler.
For legacy platforms or for system resource monitoring, rely on the provided sysfs nodes and standard Linux tools.
Track overall system workload, as CPU, DRAM, and GPU bottlenecks may also impact real-world AI inference performance.

Summary Table

Platform	NPU Monitoring Tool	GPU Monitoring	CPU/DRAM Monitoring	Notes
Genio-720/520 (Yocto/Android)	Neuron Studio Profiler (upcoming)	Sysfs: `gpu_utilization`	`top -h`	Full-featured, model/operator-level profiling
Genio-350/510/700/1200	Not supported	Sysfs: `gpu_utilization` (Yocto), `gpu_loading` (Android)	`top -h`	No dedicated NPU profiler

For future updates and in-depth profiling, refer to the Neuron Studio documentation.

Topic		Replies	Views
How to Determine If a Model Is Running on the NPU? NeuroPilot - Analytical AI AI , Question	1	90	June 17, 2025
How to check and adjust NPU frequency during AI inference? NeuroPilot - Analytical AI AI , Question	1	45	September 27, 2025
Can Specific NPU Cores Be Selected for Inference on Genio 1200? NeuroPilot - Analytical AI AI , Question , Android	1	38	June 23, 2025
Accelerating AI on Genio with the ONNX Runtime NeuronExecutionProvider NeuroPilot - Analytical AI AI	0	53	November 2, 2025
How Is AI Performance (TOPS) Calculated on Genio Platform? NeuroPilot - Analytical AI AI , Question	1	82	June 23, 2025