Basic Environment Setup
This section describes how to set up an x86 computer and a LicheePi development board to prepare for running models on the NPU or CPU of the board.
To run models on the development board, it is first necessary to use the hhb tool on an x86 machine to convert general models such as ONNX into computation graphs and glue code executable by the board's CPU/NPU. The glue code and related application code must then be cross-compiled into binaries that can run on the board.
Since x86 machines generally offer higher performance and the hhb tool only supports the x86 architecture, an additional x86 computer is required for model conversion.
Development Board Setup
Lichee Pi 4A is a development board launched by Sipeed based on the TH1520 chip. For basic out-of-the-box configuration, refer to the official hardware documentation. The recommended operating system is RevyOS version 20250526. For instructions on flashing RevyOS, see the installation guide.
Installing Basic Tools
RevyOS does not include pip
and other basic tools by default. Install them as follows:
debian@revyos-lpi4a:~$ sudo apt install python3-pip python3-venv wget curl git # Install required packages
debian@revyos-lpi4a:~$ mkdir npu # Create a Python virtual environment
debian@revyos-lpi4a:~$ cd npu
debian@revyos-lpi4a:~/npu$ python3 -m venv .venv
debian@revyos-lpi4a:~/npu$ . .venv/bin/activate # Activate the virtual environment
(.venv) debian@revyos-lpi4a:~/npu$ # Now inside the virtual environment
If the installation and activation are successful, the command prompt should be prefixed with (.venv)
, indicating the active virtual environment.
Unlike the original Yuque documentation, this guide adds a section on creating a virtual environment. In the original, packages such as shl-python
are installed system-wide, which may conflict with system Python packages managed by apt
(e.g., python3-*
). Using a virtual environment isolates packages installed via pip
and prevents conflicts with apt
. For details, see PEP 668.
Installing the SHL Library
SHL (Structure of Heterogeneous Library) is a high-performance heterogeneous computing library provided by T-Head. Its main function interfaces use the CSI-NN2 API for the C-SKY CPU platform and provide a series of optimized binary libraries. User Manual
Install the SHL library using pip
within the virtual environment:
The shl-python
package is updated frequently, and the latest version may not be compatible with all boards or example code.
It is important to install a version of shl-python
that matches your board and the examples you are following.
For example, for the LPI4A board, version 2.6.17
is known to be compatible with most provided examples.
Check your board documentation or example requirements for the recommended version.
To install a specific compatible version (e.g., 2.6.17
):
:::
To install a specific compatible version (e.g., 2.6.17
):
The default PyPI server is located overseas, which may cause network issues in mainland China. Use the -i
option to specify a temporary PyPI mirror. For more information, refer to Tsinghua University PyPI Mirror Usage Guide
After installation, use the SHL module's --whereis
command to check the installation path.
- NPU
- CPU Only
(.venv) debian@revyos-lpi4a:~/npu$ python3 -m shl --whereis th1520
/home/debian/npu/.venv/lib/python3.11/site-packages/shl/install_nn2/th1520
(.venv) debian@revyos-lpi4a:~/npu$ python3 -m shl --whereis c920
/home/debian/npu/.venv/lib/python3.11/site-packages/shl/install_nn2/c920
Based on the output path (as highlighted above), set the LD_LIBRARY_PATH
environment variable to specify the dynamic library search path. For example:
- NPU
- CPU Only
$ export SHL_PATH=/home/debian/npu/.venv/lib/python3.11/site-packages/shl/install_nn2/th1520/lib # Use the path from above
$ export SHL_PATH=/home/debian/npu/.venv/lib/python3.11/site-packages/shl/install_nn2/c920/lib # Use the path from above
Add it to the environment variables:
$ export LD_LIBRARY_PATH=$SHL_PATH:$LD_LIBRARY_PATH
To make this setting persistent, add the above export
command to your ~/.bashrc
or ~/.profile
.
Installing HHB-onnxruntime
HHB-onnxruntime integrates the SHL backend (execution providers), enabling onnxruntime to utilize SHL's high-performance code optimized for C-SKY CPUs.
- NPU
- CPU Only
$ wget https://github.com/zhangwm-pt/prebuilt_whl/raw/refs/heads/python3.11/numpy-1.25.0-cp311-cp311-linux_riscv64.whl
$ wget https://github.com/zhangwm-pt/onnxruntime/releases/download/riscv_whl_v2.6.0/hhb_onnxruntime_th1520-2.6.0-cp311-cp311-linux_riscv64.whl
$ pip install numpy-1.25.0-cp311-cp311-linux_riscv64.whl hhb_onnxruntime_th1520-2.6.0-cp311-cp311-linux_riscv64.whl
$ wget https://github.com/zhangwm-pt/prebuilt_whl/raw/refs/heads/python3.11/numpy-1.25.0-cp311-cp311-linux_riscv64.whl
$ wget https://github.com/zhangwm-pt/onnxruntime/releases/download/riscv_whl_v2.6.0/hhb_onnxruntime_c920-2.6.0-cp311-cp311-linux_riscv64.whl
$ pip install numpy-1.25.0-cp311-cp311-linux_riscv64.whl hhb_onnxruntime_c920-2.6.0-cp311-cp311-linux_riscv64.whl
If you experience network issues accessing GitHub from mainland China, consider using a network proxy tool to accelerate access.
NPU Driver Configuration
Compared to CPU execution, NPU inference requires the NPU driver module vha
to be loaded. Use lsmod
to list loaded drivers:
$ lsmod | grep vha
vha 970752 0
img_mem 827392 1 vha
If the vha
module (highlighted above) is not listed, load it manually:
$ sudo modprobe vha vha_info img_mem
In addition to the kernel driver, user-space drivers are also required for NPU execution. Check the /usr/lib
directory for the following libraries:
libimgdnn.so
libnnasession.so
libimgdnn_execute.so
These libraries are pre-installed in RevyOS and generally do not require manual installation.
NPU Device Permission Configuration
After loading the NPU driver, the device /dev/vha0
may require permission adjustment for user access. For convenience, set the device permission to 0666 (read/write for all users):
$ sudo chmod 0666 /dev/vha0 # Set device permission to 0666
For security, it is recommended to configure udev
rules for device management. Consult AI or documentation for udev
configuration.
x86 Machine Setup
Unlike the original Yuque documentation, this guide uses a pre-built Docker image for installation. Please ensure Docker is installed on your computer.
Obtaining and Running the HHB Image
$ docker pull hhb4tools/hhb:2.6.17 # Pull the HHB image
$ docker run --rm -it hhb4tools/hhb:2.6.17 bash # Start a temporary HHB container
The HHB image is large (~7GB). Downloading may take some time. Please ensure sufficient disk space.
After running the above commands, the container will start and you can operate within it.
The container filesystem is temporary. All files created inside the container will be deleted upon exit. It is recommended to copy generated files to the host or use a persistent container.
For more information, refer to the Docker Official Documentation or consult AI.
Obtaining Example Code
The example code for this tutorial is available on Github. Clone it locally using:
$ git clone https://github.com/zhangwm-pt/lpi4a-example.git
Obtaining OpenCV
This tutorial uses OpenCV 4.5, optimized for the C920 RISC-V vector spec 0.7.1. Precompiled binaries are available, and the source code can be downloaded from the OCC download page.
The precompiled C++ binaries are hosted on Github. To update the submodule in the example program, run:
$ # Assuming you are in the repository root
$ git submodule update --init --