Hi Jiahao_Zhao,
Below is a step-by-step guide on how to load a DLA and perform inference using the Neuron Adapter API. Each step is accompanied by the corresponding code snippet for your reference.
Note: The code snippets below are simplified examples for clarity, focusing on the core flow of loading a DLA and running inference. Please adapt the code to your project’s needs.
Parameter Definitions (recommended to place at the top of your file)
#define DLA_INPUT_BIN_SIZE 376832
#define DLA_OUTPUT_BIN_SIZE 1536000
#define RESTORE_DLA_EXTENSION_OPERAND_TYPE 0x0100
#define RESTORE_DLA_EXTENSION_OPERATION_TYPE 0x0000
#define RESTORE_DLA_EXTENSION_NAME "com.mediatek.compiled_network"
| Macro |
Description |
DLA_INPUT_BIN_SIZE |
Byte size of the model input tensor; please adjust according to your DLA spec |
DLA_OUTPUT_BIN_SIZE |
Byte size of the model output tensor; please adjust according to your DLA spec |
RESTORE_DLA_EXTENSION_OPERAND_TYPE |
Operand type ID of the MTK extension “DLA Raw Data” (fixed value) |
RESTORE_DLA_EXTENSION_OPERATION_TYPE |
Operation type ID of the MTK extension “Load DLA” (fixed value) |
RESTORE_DLA_EXTENSION_NAME |
Extension name string (fixed value) |
Step 1. Read the DLA file into memory
std::ifstream input_dla(dla_path, std::ios::binary);
input_dla.seekg(0, input_dla.end);
int length = input_dla.tellg();
input_dla.seekg(0, input_dla.beg);
char* buffer = static_cast<char*>(malloc(length * sizeof(char)));
input_dla.read(buffer, length);
input_dla.close();
For APK deployment, please change dla_path to the app’s private directory (corresponding to Context.getFilesDir() in Java/Kotlin), so that the file can be accessed without root permission.
Step 2. Create a NeuronModel
NeuronModel* model = nullptr;
NeuronModel_create(&model);
Step 3. Define the Input / Output tensor types
// Input
NeuronOperandType tensorInputType;
tensorInputType.type = NEURON_TENSOR_QUANT8_ASYMM;
tensorInputType.scale = 1.0f;
tensorInputType.zeroPoint = 0;
tensorInputType.dimensionCount = 1;
uint32_t dims_input[1] = {DLA_INPUT_BIN_SIZE};
tensorInputType.dimensions = dims_input;
// Output
NeuronOperandType tensorOutputType;
tensorOutputType.type = NEURON_TENSOR_QUANT8_ASYMM;
tensorOutputType.scale = 1.0f;
tensorOutputType.zeroPoint = 0;
tensorOutputType.dimensionCount = 1;
uint32_t dims_output[1] = {DLA_OUTPUT_BIN_SIZE};
tensorOutputType.dimensions = dims_output;
Step 4. Get the DLA extension operand type
int32_t operandType = 0;
NeuronModel_getExtensionOperandType(model,
RESTORE_DLA_EXTENSION_NAME,
RESTORE_DLA_EXTENSION_OPERAND_TYPE,
&operandType);
NeuronOperandType extenOperandType;
extenOperandType.type = operandType;
extenOperandType.scale = 0.0f;
extenOperandType.zeroPoint = 0;
extenOperandType.dimensionCount = 0;
Step 5. Add operands to the model
NeuronModel_addOperand(model, &tensorInputType); // 0: model input 1
NeuronModel_addOperand(model, &tensorInputType); // 1: model input 2
NeuronModel_addOperand(model, &extenOperandType); // 2: DLA Raw Data
NeuronModel_addOperand(model, &tensorOutputType); // 3: model output
| Index |
Role |
| 0 |
Input tensor 1 |
| 1 |
Input tensor 2 |
| 2 |
DLA raw data (extension operand) |
| 3 |
Output tensor |
Step 6. Feed the DLA buffer into the model
NeuronModel_setOperandValue(model, 2, buffer, length);
This is the key step for “loading the precompiled DLA”.
Step 7. Get the DLA extension operation type and add the operation
int32_t operationType = 0;
NeuronModel_getExtensionOperationType(model,
RESTORE_DLA_EXTENSION_NAME,
RESTORE_DLA_EXTENSION_OPERATION_TYPE,
&operationType);
uint32_t addInputIndexes[3] = {0, 1, 2};
uint32_t addOutputIndexes[1] = {3};
NeuronModel_addOperation(model,
static_cast<NeuronOperationType>(operationType),
3, addInputIndexes,
1, addOutputIndexes);
Step 8. Identify the model’s inputs/outputs and finish the model
uint32_t modelInputIndexes[2] = {0, 1};
uint32_t modelOutputIndexes[1] = {3};
NeuronModel_identifyInputsAndOutputs(model, 2, modelInputIndexes, 1, modelOutputIndexes);
NeuronModel_finish(model);
Step 9. Create and finish the Compilation
NeuronCompilation* compilation = nullptr;
NeuronCompilation_create(model, &compilation);
NeuronCompilation_finish(compilation);
free(buffer); // The DLA buffer can be released after compilation is finished
Step 10. Create the Execution
NeuronExecution* run = nullptr;
NeuronExecution_create(compilation, &run);
Step 11. Set the input via AHardwareBuffer
const uint32_t input_size = DLA_INPUT_BIN_SIZE;
const auto usage = AHARDWAREBUFFER_USAGE_CPU_READ_OFTEN
| AHARDWAREBUFFER_USAGE_CPU_WRITE_OFTEN;
AHardwareBuffer_Desc iDesc{
.width = input_size,
.height = 1,
.layers = 1,
.format = AHARDWAREBUFFER_FORMAT_BLOB,
.usage = usage,
.stride = input_size,
};
AHardwareBuffer* inputAhwb = nullptr;
AHardwareBuffer_allocate(&iDesc, &inputAhwb);
void* inputBuffer = nullptr;
AHardwareBuffer_lock(inputAhwb, usage, -1, nullptr, &inputBuffer);
memcpy(inputBuffer, input_buffer, input_size);
AHardwareBuffer_unlock(inputAhwb, nullptr);
NeuronMemory* iMemory = nullptr;
NeuronMemory_createFromAHardwareBuffer(inputAhwb, &iMemory);
NeuronExecution_setInputFromMemory(run, 0, nullptr, iMemory, 0, input_size);
NeuronExecution_setInputFromMemory(run, 1, nullptr, iMemory, 0, input_size);
Step 12. Set the output via AHardwareBuffer
const uint32_t output_size = DLA_OUTPUT_BIN_SIZE;
AHardwareBuffer_Desc oDesc{
.width = output_size,
.height = 1,
.layers = 1,
.format = AHARDWAREBUFFER_FORMAT_BLOB,
.usage = usage,
.stride = output_size,
};
AHardwareBuffer* outputAhwb = nullptr;
AHardwareBuffer_allocate(&oDesc, &outputAhwb);
NeuronMemory* oMemory = nullptr;
NeuronMemory_createFromAHardwareBuffer(outputAhwb, &oMemory);
NeuronExecution_setOutputFromMemory(run, 0, nullptr, oMemory, 0, output_size);
The output buffer does not need to be locked or written in advance; simply read it after inference completes.
Step 13. Run inference
NeuronExecution_compute(run);
Step 14. Read the output result
void* outputBuffer = nullptr;
AHardwareBuffer_lock(outputAhwb, usage, -1, nullptr, &outputBuffer);
// TODO: Read the inference result from outputBuffer and pass it to the downstream pipeline
AHardwareBuffer_unlock(outputAhwb, nullptr);
Step 15. Release resources
AHardwareBuffer_release(inputAhwb);
AHardwareBuffer_release(outputAhwb);
NeuronExecution_free(run);
NeuronMemory_free(iMemory);
NeuronMemory_free(oMemory);
free(input_buffer);
NeuronCompilation_free(compilation);
NeuronModel_free(model);
Best,
Jun