Hello,
I would like to run an LLM on the Genio-720 using a Yocto Linux platform via the Offline Inference Path, as described in the Gen AI Workflow below.
To optimize performance, I want to utilize Offline Inference to directly access and operate the NPU via the Neuro runtime, not the ONNX runtime and TFLite interpreter.
However, I have not been able to find any documentation or sample code specific to Yocto Linux. Could you please point me to any guides or tutorials for deploying Gen AI workflows on Yocto Linux?
Thank you in advance for your help.
