目录
RKNN
OEC/OEC-Turbo 使用的芯片是 RK3566/RK3568, 这个系列是内建神经网络处理器 NPU 的, 利用 RKNN 可以部署运行 AI 模型利用 NPU 硬件加速模型推理. 要使用 NPU, 首先需要在电脑使用 RKNN-Toolkit2 将训练好的模型转换为 RKNN 格式的模型, 然后在传到 OEC/OEC-Turbo盒子上使用 RKNN C API 或 Python API进行推断.
涉及的工具有:
- RKNN-Toolkit2 是一个软件开发工具包, 供用户在 PC 和 Rockchip NPU 平台上执行模型转换、推断和性能评估
- RKNN-Toolkit-Lite2 为 Rockchip NPU 平台提供了 Python 编程接口, 帮助用户部署 RKNN 模型并加速实施 AI 应用
- RKNN Runtime 为 Rockchip NPU 平台提供了 C/C++ 编程接口, 帮助用户部署 RKNN 模型并加速实施 AI 应用
- RKNPU 内核驱动负责与 NPU 硬件交互
RKNN-Toolkit2 和 RKNN-Toolkit-Lite2 都在同一个GitHub仓库 https://github.com/airockchip/rknn-toolkit2
下面以 RKNN-Toolkit2 自带的 rknpu2 示例项目为例, 说明如何编译并在 OEC/OEC-Turbo 上运行 RKNN 项目.
准备GCC工具链
注意: 刷机的固件系统自带的 glibc 版本是 GLIBC_2.36, 因此对应的 gcc 版本最高到 12.2. 如果用 gcc 12.3 编译, 产生的二进制在板子上执行会报"/lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.38’ not found" 这样的错误.
从 https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads 下载 gcc 12.2版本工具链.
Arm GCC 12.2系列的最后一个版本 12.2.MPACBTI-Rel1 没有支持 host 为 Linux X86-64 的工具链, 支持 Linux X86-64 的最后一个版本是 12.2.Rel1, 需要下载这个版本,
在 x86_64 Linux hosted cross toolchains 下面找到 arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu.tar.xz, 注意不是 elf, 也不是 big-endian, 不要下错了.
下载之后解压, 并移动到 /opt/gcc-arm/ 备用
准备 rknn-toolkit2
从GitHub导出 airockchip/rknn-toolkit2
git clone https://github.com/airockchip/rknn-toolkit2.git
这个仓库有2个多GB, 需要耐心等待导出完成.
注意: 导出的文件中存在一处链接错误, 需要手动修复一下, 不然后面编译rknn程序的时候会报错. 到 rknpu2/examples/3rdparty/mpp/Linux/aarch64 目录下
rknn-toolkit2/rknpu2/examples/3rdparty/mpp/Linux/aarch64$ ll
total 2268
lrwxrwxrwx 1 milton milton 8 Jun 2 20:38 librockchip_mpp.so -> ''$'\177''ELF'$'\002\001\001\003'
-rw-rw-r-- 1 milton milton 2321616 Jun 2 20:38 librockchip_mpp.so.0
lrwxrwxrwx 1 milton milton 8 Jun 2 20:38 librockchip_mpp.so.1 -> ''$'\177''ELF'$'\002\001\001\003'
删除这两个软链, 同时将 librockchip_mpp.so.0 复制为 librockchip_mpp.so 和 librockchip_mpp.so.1. 不用软链, 是因为使用软链的话, 传输到盒子的时候会出错.
$ rm librockchip_mpp.so
$ rm librockchip_mpp.so.1
$ cp librockchip_mpp.so.0 librockchip_mpp.so
$ cp librockchip_mpp.so.0 librockchip_mpp.so.1
编译示例代码 rknn_yolov5_demo
到 rknpu2/examples/rknn_yolov5_demo 目录下, 先将两个sh文件设为可执行
$ chmod +x *.sh
执行编译, 将下面的/opt/gcc-arm/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu
换成你刚才解压的gcc路径以及文件前缀
GCC_COMPILER=/opt/gcc-arm/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu ./build-linux.sh -t rk3566 -a aarch64 -b Release
编译输出
./build-linux.sh -t rk3566 -a aarch64 -b Release
/opt/gcc-arm/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu
===================================
TARGET_SOC=RK3566_RK3568
TARGET_ARCH=aarch64
BUILD_TYPE=Release
BUILD_DIR=/home/milton/WorkLR3576/rknn-toolkit2/rknpu2/examples/rknn_yolov5_demo/build/build_RK3566_RK3568_linux_aarch64_Release
CC=/opt/gcc-arm/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc
CXX=/opt/gcc-arm/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-g++
===================================
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /home/milton/WorkLR3576/rknn-toolkit2/rknpu2/examples/rknn_yolov5_demo/build/build_RK3566_RK3568_linux_aarch64_Release
[ 10%] Linking CXX executable rknn_yolov5_video_demo
[ 50%] Built target rknn_yolov5_demo
[100%] Built target rknn_yolov5_video_demo
[ 40%] Built target rknn_yolov5_demo
[100%] Built target rknn_yolov5_video_demo
Install the project...
-- Install configuration: "Release"
-- Installing: /home/milton/WorkLR3576/rknn-toolkit2/rknpu2/examples/rknn_yolov5_demo/install/rknn_yolov5_demo_Linux/./rknn_yolov5_demo
-- Installing: /home/milton/WorkLR3576/rknn-toolkit2/rknpu2/examples/rknn_yolov5_demo/install/rknn_yolov5_demo_Linux/lib/librknnrt.so
-- Installing: /home/milton/WorkLR3576/rknn-toolkit2/rknpu2/examples/rknn_yolov5_demo/install/rknn_yolov5_demo_Linux/lib/librga.so
-- Installing: /home/milton/WorkLR3576/rknn-toolkit2/rknpu2/examples/rknn_yolov5_demo/install/rknn_yolov5_demo_Linux/./model/RK3566_RK3568
-- Installing: /home/milton/WorkLR3576/rknn-toolkit2/rknpu2/examples/rknn_yolov5_demo/install/rknn_yolov5_demo_Linux/./model/RK3566_RK3568/yolov5s-640-640.rknn
-- Installing: /home/milton/WorkLR3576/rknn-toolkit2/rknpu2/examples/rknn_yolov5_demo/install/rknn_yolov5_demo_Linux/./model/bus.jpg
-- Installing: /home/milton/WorkLR3576/rknn-toolkit2/rknpu2/examples/rknn_yolov5_demo/install/rknn_yolov5_demo_Linux/./model/coco_80_labels_list.txt
-- Installing: /home/milton/WorkLR3576/rknn-toolkit2/rknpu2/examples/rknn_yolov5_demo/install/rknn_yolov5_demo_Linux/./rknn_yolov5_video_demo
-- Installing: /home/milton/WorkLR3576/rknn-toolkit2/rknpu2/examples/rknn_yolov5_demo/install/rknn_yolov5_demo_Linux/lib/librockchip_mpp.so
-- Installing: /home/milton/WorkLR3576/rknn-toolkit2/rknpu2/examples/rknn_yolov5_demo/install/rknn_yolov5_demo_Linux/lib/libmk_api.so
编译产生的文件在 install 目录下
└── rknn_yolov5_demo_Linux
├── lib
│ ├── libmk_api.so
│ ├── librga.so
│ ├── librknnrt.so
│ └── librockchip_mpp.so
├── model
│ ├── bus.jpg
│ ├── coco_80_labels_list.txt
│ └── RK3566_RK3568
│ └── yolov5s-640-640.rknn
├── rknn_yolov5_demo
└── rknn_yolov5_video_demo
运行 rknn_yolov5_demo
将上面的 rknn_yolov5_demo_Linux 整个目录复制到 OEC/OEC-Turbo 文件系统里, 在 rknn_yolov5_demo_Linux 目录下执行以下命令
LD_LIBRARY_PATH=./lib ./rknn_yolov5_demo model/RK3566_RK3568/yolov5s-640-640.rknn model/bus.jpg
输出
post process config: box_conf_threshold = 0.25, nms_threshold = 0.45
Loading mode...
sdk version: 2.3.2 (429f97ae6b@2025-04-09T09:09:27) driver version: 0.9.8
model input num: 1, output num: 3
index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, w_stride = 640, size_with_stride=1228800, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
index=0, name=output0, n_dims=4, dims=[1, 255, 80, 80], n_elems=1632000, size=1632000, w_stride = 0, size_with_stride=1638400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
index=1, name=286, n_dims=4, dims=[1, 255, 40, 40], n_elems=408000, size=408000, w_stride = 0, size_with_stride=409600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
index=2, name=288, n_dims=4, dims=[1, 255, 20, 20], n_elems=102000, size=102000, w_stride = 0, size_with_stride=122880, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
model is NHWC input fmt
model input height=640, width=640, channel=3
Read model/bus.jpg ...
img width = 640, img height = 640
once run use 55.863000 ms
loadLabelName ./model/coco_80_labels_list.txt
person @ (209 243 286 510) 0.879723
person @ (479 238 560 526) 0.870588
person @ (109 238 231 534) 0.839831
bus @ (91 129 555 464) 0.692042
person @ (79 353 121 517) 0.300961
save detect result to ./out.jpg
loop count = 10 , average run 48.848400 ms
将产生的 out.jpg 传回本地电脑, 就能看到已经标记上识别结果
编译和运行 rknn_benchmark
在 rknpu2/examples/rknn_benchmark 目录下, 编译命令和上面的示例是一样的, 编译完成后传输到 OEC/OEC-Turbo 后, 假定之前执行过 rknn_yolov5_demo 这个例子, 并且都在同一个目录下, 执行下面的命令
LD_LIBRARY_PATH=./lib ./rknn_benchmark ../rknn_yolov5_demo_Linux/model/RK3566_RK3568/yolov5s-640-640.rknn ../rknn_yolov5_demo_Linux/model/bus.jpg
输出
rknn_api/rknnrt version: 2.0.0b0 (35a6907d79@2024-03-24T10:31:14), driver version: 0.9.8
total weight size: 7299584, total internal size: 10585600
total dma used size: 26521600
model input num: 1, output num: 3
input tensors:
index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, w_stride = 640, size_with_stride=1228800, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
output tensors:
index=0, name=output0, n_dims=4, dims=[1, 255, 80, 80], n_elems=1632000, size=1632000, w_stride = 0, size_with_stride=1638400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
index=1, name=286, n_dims=4, dims=[1, 255, 40, 40], n_elems=408000, size=408000, w_stride = 0, size_with_stride=409600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
index=2, name=288, n_dims=4, dims=[1, 255, 20, 20], n_elems=102000, size=102000, w_stride = 0, size_with_stride=122880, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
custom string:
Warmup ...
0: Elapse Time = 40.69ms, FPS = 24.57
1: Elapse Time = 40.19ms, FPS = 24.88
2: Elapse Time = 40.11ms, FPS = 24.93
3: Elapse Time = 40.19ms, FPS = 24.88
4: Elapse Time = 40.74ms, FPS = 24.54
Begin perf ...
0: Elapse Time = 41.03ms, FPS = 24.37
1: Elapse Time = 41.12ms, FPS = 24.32
2: Elapse Time = 41.20ms, FPS = 24.27
3: Elapse Time = 41.17ms, FPS = 24.29
4: Elapse Time = 41.11ms, FPS = 24.32
5: Elapse Time = 41.17ms, FPS = 24.29
6: Elapse Time = 41.08ms, FPS = 24.34
7: Elapse Time = 41.09ms, FPS = 24.34
8: Elapse Time = 41.25ms, FPS = 24.24
9: Elapse Time = 41.10ms, FPS = 24.33
Avg Time 41.13ms, Avg FPS = 24.312
Save output to rt_output0.npy
Save output to rt_output1.npy
Save output to rt_output2.npy
---- Top5 ----
0.984299 - 17902
0.984299 - 1122607
0.984299 - 1122705
0.984299 - 1122706
0.984299 - 1122707
---- Top5 ----
0.999985 - 280992
0.996063 - 9032
0.996063 - 280970
0.996063 - 280993
0.996063 - 281010
---- Top5 ----
1.000000 - 36255
1.000000 - 36256
0.996078 - 2236
0.996078 - 2245
0.996078 - 2255