During model inference, how can developers monitor real-time NPU (APU) resource utilization or workload distribution? Are there any recommended tools, methods, or official documentation for observing NPU usage or debugging performance issues?
Thank you for your question.
MediaTek provides several options and recommendations for monitoring NPU resource utilization, analyzing workload distribution, and debugging AI inference performance on different Genio platforms.
Neuron Studio Profiler (Genio-720/Genio-520, Yocto/Android)
Starting later this year (by the end of 2025), MediaTek will support Neuron Studio Profiler for the Genio-720 and Genio-520 platforms (both Yocto and Android).
Neuron Studio Profiler is a comprehensive profiling tool that helps developers analyze and optimize AI models running on the NPU:
-
Key Features:
- Real-time visualization of NPU load, frequency, and bandwidth utilization
- Detailed performance breakdown by model and operator
- Exportable reports for profiling and optimization
-
Documentation:
For an overview and feature guide, see the Neuron Studio documentation (MediaTek internal access required).
Note: Neuron Studio Profiler will not be provided for legacy platforms such as Genio-350, Genio-510, Genio-700, or Genio-1200.
Monitoring CPU and DRAM Utilization
For a system-level overview, developers may use classic Linux tools such as:
top -h: Monitor CPU and DRAM usage in real time.htop(if installed): Offers advanced filtering and visualization.
These tools help to identify bottlenecks when NPU performance is impacted by overall system load.
Monitoring GPU Utilization
Platform and OS-specific methods are recommended to observe GPU activity:
- On IoT Yocto:
Query aggregate GPU utilization using:cat /sys/devices/platform/soc/13000000.mali/gpu_utilization - On Android:
Query the GPU load via:
The result will be a single utilization value (0–100) representing total system GPU load.cat /sys/module/ged/parameters/gpu_loading
Tip: Per-process GPU utilization is not currently supported. To estimate the usage attributable to a single application:
- Measure the utilization before launching the application.
- Measure again with only your application running.
- Subtract the baseline to estimate your app’s GPU impact (some variability may exist).
Best Practices & Recommendations
- For NPU workload profiling and debugging on Genio-720/520, use Neuron Studio Profiler.
- For legacy platforms or for system resource monitoring, rely on the provided sysfs nodes and standard Linux tools.
- Track overall system workload, as CPU, DRAM, and GPU bottlenecks may also impact real-world AI inference performance.
Summary Table
| Platform | NPU Monitoring Tool | GPU Monitoring | CPU/DRAM Monitoring | Notes |
|---|---|---|---|---|
| Genio-720/520 (Yocto/Android) | Neuron Studio Profiler (upcoming) | Sysfs: gpu_utilization |
top -h |
Full-featured, model/operator-level profiling |
| Genio-350/510/700/1200 | Not supported | Sysfs: gpu_utilization (Yocto), gpu_loading (Android) |
top -h |
No dedicated NPU profiler |
For future updates and in-depth profiling, refer to the Neuron Studio documentation.