Running BERT Model on LicheePi 4A TH1520 NPU (HHB Quantization & Inference)

This article details how to deploy and run the BERT model on the TH1520 NPU of Licheepi 4A, including the complete process of quantization and inference using the HHB tool.

BERT Model Introduction

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model widely used in natural language processing.

This tutorial introduces how to use the HHB (Heterogeneous Hybrid Binary) toolchain on the Licheepi 4A TH1520 development board to compile and run the BERT model for reading comprehension inference tasks.

1. Environment Preparation

1.1. Ensure HHB is Installed

After setting up the NPU-related environment according to the documentation, enter the Docker image of the HHB environment.

1.2. Download the BERT Model and Sample Code

First, obtain the model. The model used in this tutorial is from the Google BERT repository, converted to an ONNX version of the BERT model. It can be downloaded to the /home/example/c920/bert_small directory using the following command:

cd /home/example/c920/bert_small

wget https://github.com/zhangwm-pt/bert/releases/download/onnx/bert_small_int32_input.onnx

2. Compile the BERT Model Using HHB

To cross-compile the ONNX model into an executable program for NPU, you need to use the hhb command. Note that NPU only supports 8-bit or 16-bit fixed-point operations; this example specifies int8 asymmetric quantization. When compiling, you need to first enter the example directory.

2.1. Enter the BERT Directory

cd /home/example/c920/bert_small

2.2. Run HHB Compilation

Note that you must use the toolchain here, otherwise the compiled binary file will not run on LicheePi4A.

export PATH=/tools/Xuantie-900-gcc-linux-5.10.4-glibc-x86_64-V2.6.1-light.1/bin/:$PATH

hhb --model-file bert_small_int32_input.onnx --input-name "input_ids;input_mask;segment_ids" --input-shape '1 384;1 384;1 384' --output-name "output_start_logits;output_end_logits" --board c920 --quantization-scheme "float16" --postprocess save_and_top5 -D --without-preprocess

2.3. Option Descriptions

Option	Description
`-D`	Generate executable file
`--model-file`	Specify ONNX BERT model
`--input-name`	Model input name
`--output-name`	Model output name
`--input-shape`	Input data shape
`--board`	Specify target platform (TH1520)
`--quantization-scheme`	Quantization method (int8/float16)
`--postprocess`	Output results and print top5

3. Generated Files

After HHB runs, a hhb_out/ directory is generated in the current directory, containing:

hhb_out/
├── hhb.bm               # Quantized model file
├── hhb_runtime          # Executable inference program
├── main.c               # Reference example entry point
├── model.c              # Model structure code
├── model.params         # Quantized weight data
├── io.c / io.h          # File I/O helper code
├── process.c / process.h # Preprocessing functions

4. Transfer to the Development Board

Copy the compiled model and files to the host machine:

docker cp  65f872394fa5837ef2c24ade731b152da074ac6091f0766c04ac54092ff32780:/home/example/c920/bert_small C:\Users\knifefire\Downloads\

Then upload to the development board, and on the development board:

cd ~/bert_small
chmod +x hhb_out/hhb_runtime  # Grant execution permission

5. Run Inference

python3 inference.py

6. Expected Output

BERT processing questions from the SQuAD dataset:

The reference input in this example comes from the SQuAD dataset, which is a reading comprehension dataset consisting of questions posed on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage or the question. The input for this example is as follows, where the article content describes a football game, and the question is about who participated in the game.

[Context]:  Super Bowl 50 was an American football game...
[Question]:  Which NFL team represented the AFC at Super Bowl 50?

BERT Output Answer

Based on the reading comprehension result, the expected output will be Denver Broncos

[Answer]: Denver Broncos

Execution Time

Run graph execution time: 1713.15491ms, FPS=0.58

Reference Output:

# python3 inference.py
 ********** preprocess test **********
[Context]:  Super Bowl 50 was an American football game to determine the champion of the National Football League (N
FL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Footba
ll Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on Fe
bruary 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Sup
er Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily
 suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been k
nown as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.
[Question]:  Which NFL team represented the AFC at Super Bowl 50?
 ******* run bert *******
Run graph execution time: 1713.15491ms, FPS=0.58

=== tensor info ===
shape: 1 384 
data pointer: 0x183d60

=== tensor info ===
shape: 1 384 
data pointer: 0x185380

=== tensor info ===
shape: 1 384 
data pointer: 0x1869a0

=== tensor info ===
shape: 1 384 
data pointer: 0x2a8610
The max_value of output: 3.826172
The min_value of output: -9.968750
The mean_value of output: -8.412353
The std_value of output: 5.128320
 ============ top5: ===========
 46: 3.826172
 57: 3.142578
 39: 1.303711
 38: 1.179688
 27: 0.624512

=== tensor info ===
shape: 1 384 
data pointer: 0x2a8300
The max_value of output: 3.617188
The min_value of output: -9.625000
The mean_value of output: -7.798176
The std_value of output: 4.820137
 ============ top5: ===========
 47: 3.617188
 58: 3.482422
 32: 2.523438
 29: 1.541992
 41: 1.473633
 ********** postprocess **********
[Answer]:  Denver Broncos

With this, you have successfully run BERT quantized inference on the Licheepi4A development board! 🚀

Reference document: https://wiki.sipeed.com/hardware/zh/lichee/th1520/lpi4a/8_application.html

BERT Model Introduction​

1. Environment Preparation​

1.1. Ensure HHB is Installed​

1.2. Download the BERT Model and Sample Code​

2. Compile the BERT Model Using HHB​

2.1. Enter the BERT Directory​

2.2. Run HHB Compilation​

2.3. Option Descriptions​

3. Generated Files​

4. Transfer to the Development Board​

5. Run Inference​

6. Expected Output​

Reference Output:​