Basic Environment Setup
This guide configures an x86 computer and a Lichee Pi development board so that models can run on the board's NPU or CPU.
Before a model can run on the development board, you must use an x86 machine to convert generic models such as ONNX into computation graphs and glue code that are executable on the board's CPU/NPU via the hhb tool. The glue code and related application code must then be cross-compiled into binaries for the board.
Because x86 machines generally offer higher performance—and because the hhb tool only supports the x86 architecture—an additional x86 computer is required to complete the model conversion workflow.
Development Board Setup
Lichee Pi 4A is a development board produced by Sipeed that features the TH1520 SoC. For the basic out-of-the-box setup, consult the official hardware guide. Flash RevyOS version 20250729 onto the board. See the image flashing guide for details.
Installing Basic Tools
By default, RevyOS does not include pip and several other fundamental tools, so install them first:
debian@revyos-lpi4a:~$ sudo apt install python3-pip python3-venv wget curl git # Install required packages
debian@revyos-lpi4a:~$ mkdir npu # Create a directory for the Python virtual environment
debian@revyos-lpi4a:~$ cd npu
debian@revyos-lpi4a:~/npu$ python3 -m venv .venv
debian@revyos-lpi4a:~/npu$ . .venv/bin/activate # Activate the virtual environment
(.venv) debian@revyos-lpi4a:~/npu$ # Prompt shows the active virtual environment
After activation, the shell prompt should display (.venv) to indicate the current virtual environment.
Unlike the original Yuque article, this document adds a dedicated section for virtual environments. The original guide installs packages such as shl-python system-wide, which conflicts with the system Python packages managed by apt (e.g. python3-*). Using a virtual environment isolates packages installed via pip and avoids clashes with apt. See PEP 668 for details.
Installing the SHL Library
SHL (Structure of Heterogeneous Library) is a high-performance heterogeneous computing library provided by T-Head. Its main function interfaces adopt the CSI-NN2 API for the XuanTie CPU platform and come with a collection of optimised binary libraries. Refer to the user manual for more information.
Install SHL within the virtual environment using pip:
(.venv) debian@revyos-lpi4a:~/npu$ pip3 install shl-python -i https://pypi.tuna.tsinghua.edu.cn/simple
...
Successfully installed shl-python-3.2.2
The default PyPI infrastructure is hosted overseas, which can lead to connectivity issues in mainland China. The -i argument lets you select an alternative mirror. For more instructions, see the Tsinghua University PyPI mirror guide.
After installation, query SHL's location with the --whereis option:
- NPU
- CPU only
(.venv) debian@revyos-lpi4a:~/npu$ python3 -m shl --whereis th1520
/home/debian/npu/.venv/lib/python3.11/site-packages/shl/install_nn2/th1520
(.venv) debian@revyos-lpi4a:~/npu$ python3 -m shl --whereis c920
/home/debian/npu/.venv/lib/python3.11/site-packages/shl/install_nn2/c920
Based on the reported path (highlighted above), set the LD_LIBRARY_PATH so the dynamic linker can find SHL's libraries:
- NPU
- CPU only
$ export SHL_PATH=/home/debian/npu/.venv/lib/python3.11/site-packages/shl/install_nn2/th1520/lib # Use the path printed above
$ export SHL_PATH=/home/debian/npu/.venv/lib/python3.11/site-packages/shl/install_nn2/c920/lib # Use the path printed above
$ export LD_LIBRARY_PATH=$SHL_PATH:$LD_LIBRARY_PATH
Append the export command to ~/.bashrc or ~/.profile if you need the setting to persist.
Installing HHB-onnxruntime
HHB-onnxruntime integrates the SHL backend (execution providers), allowing onnxruntime to reuse SHL's high-performance kernels tuned for XuanTie CPUs.
- NPU
- CPU only
$ wget https://github.com/zhangwm-pt/prebuilt_whl/raw/refs/heads/python3.11/numpy-1.25.0-cp311-cp311-linux_riscv64.whl
$ wget https://github.com/zhangwm-pt/onnxruntime/releases/download/riscv_whl_v2.6.0/hhb_onnxruntime_th1520-2.6.0-cp311-cp311-linux_riscv64.whl
$ pip install numpy-1.25.0-cp311-cp311-linux_riscv64.whl hhb_onnxruntime_th1520-2.6.0-cp311-cp311-linux_riscv64.whl
$ wget https://github.com/zhangwm-pt/prebuilt_whl/raw/refs/heads/python3.11/numpy-1.25.0-cp311-cp311-linux_riscv64.whl
$ wget https://github.com/zhangwm-pt/onnxruntime/releases/download/riscv_whl_v2.6.0/hhb_onnxruntime_c920-2.6.0-cp311-cp311-linux_riscv64.whl
$ pip install numpy-1.25.0-cp311-cp311-linux_riscv64.whl hhb_onnxruntime_c920-2.6.0-cp311-cp311-linux_riscv64.whl
If accessing GitHub from mainland China is slow or unreliable, consider enabling a network proxy to improve connectivity.
NPU Driver Configuration
Compared with CPU execution, NPU inference requires that the vha kernel module be loaded. Check the loaded modules with:
$ lsmod | grep vha
vha 970752 0
img_mem 827392 1 vha
If vha (highlighted above) is absent, load it manually:
$ sudo modprobe vha vha_info img_mem
Besides the kernel driver, user-space components are required. Verify that /usr/lib already contains the following libraries:
libimgdnn.solibnnasession.solibimgdnn_execute.so
These libraries are preinstalled in RevyOS, so manual installation is normally unnecessary.
NPU Device Permission Configuration
After the driver is loaded, you might need to adjust the permissions on /dev/vha0 so regular users can access the device. For convenience, grant read/write access to all users:
$ sudo chmod 0666 /dev/vha0 # Set the device permission to 0666
For better security, configure udev rules to manage the device permissions; you can ask an AI assistant how to write the rules.
x86 Machine Setup
Unlike the original Yuque tutorial, this guide relies on a pre-built Docker image. Make sure Docker is installed on your x86 machine.
Obtaining and Running the HHB Image
$ docker pull hhb4tools/hhb:2.6.17 # Pull the HHB image
$ docker run --rm -it hhb4tools/hhb:2.6.17 bash # Start a temporary HHB container
The HHB image is large (about 7 GB). Downloading it may take time, so please wait patiently and confirm that you have sufficient disk space.
After running the commands above, the container starts and you can work inside it.
The container filesystem is ephemeral. All files created inside the container are removed when it exits. Therefore, copy the generated files back to the host after you finish, or run a persistent container if required.
For more information about Docker, see the official documentation or consult an AI assistant.
Obtaining Example Code
The example code accompanying this tutorial is hosted on GitHub. Clone it locally:
$ git clone https://github.com/zhangwm-pt/lpi4a-example.git
Obtaining OpenCV
This tutorial uses OpenCV 4.5, which has been tuned for the C920 processor and RISC-V vector specification 0.7.1. Prebuilt binaries are supplied, and the source code can be downloaded from the OCC download page.
The precompiled C++ binaries are stored on GitHub. To update the submodule in the example program, run:
$ # Assume you are in the repository root
$ git submodule update --init --recursive