0
  • 聊天消息
  • 系统消息
  • 评论与回复
登录后你可以
  • 下载海量资料
  • 学习在线课程
  • 观看技术视频
  • 写文章/发帖/加入社区
会员中心
创作中心

完善资料让更多小伙伴认识你,还能领取20积分哦,立即完善>

3天内不再提示

NPU和CPU对比运行速度有何不同?基于i.MX 8M Plus处理器的MYD-JX8MPQ开发板

米尔电子 2022-05-09 16:46 次阅读
加入交流群
微信小助手二维码

扫码添加小助手

加入工程师交流群

参考

https://www.toradex.cn/blog/nxp-imx8ji-yueiq-kuang-jia-ce-shi-machine-learning

IMX-MACHINE-LEARNING-UG.pdf


CPU和NPU图像分类

cd /usr/bin/tensoRFlow-lite-2.4.0/examples

CPU运行

./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt

INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite

INFO: resolved reporter

INFO: invoked

INFO: averagetime:50.66ms

INFO: 0.780392: 653 military unIForm

INFO: 0.105882: 907 Windsor tie

INFO: 0.0156863: 458 bow tie

INFO: 0.0117647: 466 bulletproof vest

INFO: 0.00784314: 835 suit


GPU/NPU加速运行

./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt-a 1

INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite

INFO: resolved reporter

INFO: Created TensorFlow Lite delegate for NNAPI.

INFO: Applied NNAPI delegate.

INFO: invoked

INFO: average time:2.775ms

INFO: 0.768627: 653 military uniform

INFO: 0.105882: 907 Windsor tie

INFO: 0.0196078: 458 bow tie

INFO: 0.0117647: 466 bulletproof vest

INFO: 0.00784314: 835 suit

USE_GPU_INFERENCE=0./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt--external_delegate_path=/usr/lib/libvx_delegate.so

Python运行

python3 label_image.py

INFO: Created TensorFlow Lite delegate for NNAPI.

Applied NNAPI delegate.

WARM-up time:6628.5ms

Inference time: 2.9 ms

0.870588: military uniform

0.031373: Windsor tie

0.011765: mortarboard

0.007843: bow tie

0.007843: bulletproof vest


基准测试CPU单核运行

./benchmark_model --graph=mobilenet_v1_1.0_224_quant.tflite

STARTING!

Log parameter values verbosely: [0]

Graph: [mobilenet_v1_1.0_224_quant.tflite]

Loaded model mobilenet_v1_1.0_224_quant.tflite

The input model file size (MB): 4.27635

Initialized session in 15.076ms.

Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.

count=4 first=166743 curr=161124 min=161054 max=166743avg=162728std=2347

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.

count=50 first=161039 curr=161030 min=160877 max=161292 avg=161039std=94

Inference timings in us: Init: 15076, First inference: 166743, Warmup (avg):162728, Inference (avg):161039

Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.

Peak memory footprint (MB): init=2.65234 overall=9.00391

CPU多核运行

./benchmark_model --graph=mobilenet_v1_1.0_224_quant.tflite --num_threads=4

4核--num_threads设置为4性能最好

STARTING!

Log parameter values verbosely: [0]

Num threads: [4]

Graph: [mobilenet_v1_1.0_224_quant.tflite]

#threads used for CPU inference: [4]

Loaded model mobilenet_v1_1.0_224_quant.tflite

The input model file size (MB): 4.27635

Initialized session in 2.536ms.

Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.

count=11 first=48722 curr=44756 min=44597 max=49397 avg=45518.9 std=1679

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.

count=50 first=44678 curr=44591 min=44590 max=50798avg=44965.2std=1170

Inference timings in us: Init: 2536, First inference: 48722, Warmup (avg):45518.9, Inference (avg):44965.2

Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.

Peak memory footprint (MB): init=1.38281 overall=8.69922

GPU/NPU加速

./benchmark_model --graph=mobilenet_v1_1.0_224_quant.tflite --num_threads=4 --use_nnapi=true

STARTING!

Log parameter values verbosely: [0]

Num threads: [4]

Graph: [mobilenet_v1_1.0_224_quant.tflite]

#threads used for CPU inference: [4]

Use NNAPI: [1]

NNAPI accelerators available: [vsi-npu]

Loaded model mobilenet_v1_1.0_224_quant.tflite

INFO: Created TensorFlow Lite delegate for NNAPI.

Explicitly applied NNAPI delegate, and the model graph will be completely executed by the delegate.

The input model file size (MB): 4.27635

Initialized session in 3.968ms.

Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.

count=1 curr=6611085

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.

count=369 first=2715 curr=2623 min=2572 max=2776avg=2634.2std=20

Inference timings in us: Init: 3968, First inference: 6611085, Warmup (avg): 6.61108e+06, Inference (avg): 2634.2

Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.

Peak memory footprint (MB): init=2.42188 overall=28.4062

结果对比

CPU运行CPU多核多线程NPU加速
图像分类50.66 ms2.775 ms
基准测试161039uS44965.2uS2634.2uS

OpenCV DNN

cd /usr/share/OpenCV/samples/bin

./example_dnn_classification --input=dog416.png --zoo=models.yml squeezenet

下载模型

cd /usr/share/opencv4/testdata/dnn/

python3 download_models_basic.py

图像分类

cd /usr/share/OpenCV/samples/bin

./example_dnn_classification --input=dog416.png --zoo=models.yml squeezenet

e2a1f644-c70d-11ec-8521-dac502259ad0.jpg


文件浏览器地址栏输入

ftp://ftp.toradex.cn/Linux/i.MX8/eIQ/OpenCV/Image_Classification.zip

下载文件

解压得到文件models.yml和squeezenet_v1.1.caffemodel

cd /usr/share/OpenCV/samples/bin

将文件导入到开发板的/usr/share/OpenCV/samples/bin目录下

$cp/usr/share/opencv4/testdata/dnn/dog416.png /usr/share/OpenCV/samples/bin/
$cp/usr/share/opencv4/testdata/dnn/squeezenet_v1.1.prototxt /usr/share/OpenCV/samples/bin/
$cp/usr/share/OpenCV/samples/data/dnn/classification_classes_ILSVRC2012.txt /usr/share/OpenCV/samples/bin/
$ cd /usr/share/OpenCV/samples/bin/

图片输入

./example_dnn_classification --input=dog416.png --zoo=models.yml squeezenet

报错

root@myd-jx8mp:/usr/share/OpenCV/samples/bin# ./example_dnn_classification --input=dog416.png --zoo=model.yml squeezenet

ERRORS:

Missing parameter: 'mean'

Missing parameter: 'rgb'

加入参数--rgb 和 --mean=1

还是报错加入参数--mode

root@myd-jx8mp:/usr/share/OpenCV/samples/bin# ./example_dnn_classification --rgb --mean=1 --input=dog416.png --zoo=models.yml squeezenet

[WARN:0]global/usr/src/debug/opencv/4.4.0.imx-r0/git/modules/videoio/src/cap_gstreamer.cpp (898) open OpenCV | GStreamer warning: unable to query duration of stream

[WARN:0]global/usr/src/debug/opencv/4.4.0.imx-r0/git/modules/videoio/src/cap_gstreamer.cpp (935) open OpenCV | GStreamer warning: Cannot query video position: status=1, value=0, duration=-1

root@myd-jx8mp:/usr/share/OpenCV/samples/bin#./example_dnn_classification --rgb --mean=1 --input=dog416.png --zoo=models.yml squeezenet --mode

[WARN:0]global/usr/src/debug/opencv/4.4.0.imx-r0/git/modules/videoio/src/cap_gstreamer.cpp (898) open OpenCV | GStreamer warning: unable to query duration of stream

[WARN:0]global/usr/src/debug/opencv/4.4.0.imx-r0/git/modules/videoio/src/cap_gstreamer.cpp (935) open OpenCV | GStreamer warning: Cannot query video position: status=1, value=0, duration=-1

视频输入

./example_dnn_classification --device=2 --zoo=models.yml squeezenet

问题

如果testdata目录下没有文件,则查找下

lhj@DESKTOP-BINN7F8:~/myd-jx8mp-yocto$ find . -name "dog416.png"

./build-xwayland/tmp/work/cortexa53-crypto-mx8mp-poky-linux/opencv/4.4.0.imx-r0/extra/testdata/dnn/dog416.png

再将相应的文件复制到开发板

cd./build-xwayland/tmp/work/cortexa53-crypto-mx8mp-poky-linux/opencv/4.4.0.imx-r0/extra/testdata/

tar -cvf /mnt/e/dnn.tar ./dnn/

cd/usr/share/opencv4/testdata目录不存在则先创建

rz导入dnn.tar

解压tar -xvf dnn.tar

terminate calLEDafter throwing an instance of 'cv::Exception'

what():OpenCV(4.4.0)/usr/src/debug/opencv/4.4.0.imx-r0/git/samples/dnn/classification.cpperrorAssertion failed) !model.empty() in function 'main'

Aborted

lhj@DESKTOP-BINN7F8:~/myd-jx8mp-yocto/build-xwayland$ find . -name classification.cpp

lhj@DESKTOP-BINN7F8:~/myd-jx8mp-yocto/build-xwayland$ cp ./tmp/work/cortexa53-crypto-mx8mp-poky-linux/opencv/4.4.0.imx-r0/packages-split/opencv-src/usr/src/debug/opencv/4.4.0.imx-r0/git/samples/dnn/classification.cpp /mnt/e

lhj@DESKTOP-BINN7F8:~/myd-jx8mp-yocto/build-xwayland$

YOLO对象检测

cd /usr/share/OpenCV/samples/bin

./example_dnn_object_detection --width=1024 --height=1024 --scale=0.00392 --input=dog416.png --rgb --zoo=models.yml yolo

e2ba8f74-c70d-11ec-8521-dac502259ad0.jpg


https://pjreddie.com/darknet/yolo/下载cfg和weights文件

cd/usr/share/OpenCV/samples/bin/

导入上面下载的文件

cp/usr/share/OpenCV/samples/data/dnn/object_detection_classes_yolov3.txt/usr/share/OpenCV/samples/bin/

cp/usr/share/opencv4/testdata/dnn/yolov3.cfg/usr/share/OpenCV/samples/bin/./example_dnn_object_detection --width=1024 --height=1024 --scale=0.00392 --input=dog416.png --rgb --zoo=models.yml yolo

OpenCV经典机器学

cd /usr/share/OpenCV/samples/bin

线性SVM

./example_tutorial_introduction_to_svm

e2d1263a-c70d-11ec-8521-dac502259ad0.jpg

非线性SVM

./example_tutorial_non_linear_svms

e2e33c80-c70d-11ec-8521-dac502259ad0.jpg

PCA分析

./example_tutorial_introduction_to_pca ../data/pca_test1.jpg

e2fa2152-c70d-11ec-8521-dac502259ad0.jpg

逻辑回归

./example_cpp_logistic_regression

e310c22c-c70d-11ec-8521-dac502259ad0.jpg

e323f9c8-c70d-11ec-8521-dac502259ad0.jpg

e3371f58-c70d-11ec-8521-dac502259ad0.jpg

声明:本文内容及配图由入驻作者撰写或者入驻合作网站授权转载。文章观点仅代表作者本人,不代表电子发烧友网立场。文章及其配图仅供工程师学习之用,如有内容侵权或者其他违规问题,请联系本站处理。 举报投诉
  • 嵌入式开发
    +关注

    关注

    18

    文章

    1103

    浏览量

    49844
收藏 人收藏
加入交流群
微信小助手二维码

扫码添加小助手

加入工程师交流群

    评论

    相关推荐
    热点推荐

    恩智浦FRDM i.MX 8M Plus开发板详解

    开发高级HMI应用、计算机视觉系统以及边缘AI项目时,开发人员常常面临一个共同挑战:如何在不依赖昂贵且复杂的开发平台的前提下,获得足够的处理能力。这正是FRDM
    的头像 发表于 11-18 15:07 672次阅读

    简单认识NXP FRDM i.MX 93开发板

    FRDM i.MX 93开发板是一款入门级、紧凑型开发板,采用i.MX93应用处理器。该配备板
    的头像 发表于 11-17 09:45 575次阅读
    简单认识NXP FRDM <b class='flag-5'>i.MX</b> 93<b class='flag-5'>开发板</b>

    恩智浦FRDM i.MX 8M Plus开发板上架

    i.MX 8M Plus应用处理器集成2个或4个Arm Cortex-A53核、1个专用于实时控制的Arm Cortex-M7核,以及1个算
    的头像 发表于 08-16 17:38 1820次阅读
    恩智浦FRDM <b class='flag-5'>i.MX</b> <b class='flag-5'>8M</b> <b class='flag-5'>Plus</b><b class='flag-5'>开发板</b>上架

    米尔NXP i.MX 91核心发布,助力新一代入门级Linux应用开发

    本帖最后由 blingbling111 于 2025-5-30 16:17 编辑 米尔电子基于与NXP长期合作的嵌入式处理器开发经验,在i.MX 6和i.MX
    发表于 05-30 11:20

    NXP i.MX 91开发板#支持快速创建基于Linux®的边缘器件

    NXP Semiconductors FRDM i.MX 91开发板设计用于评估i.MX 91应用处理器,支持快速创建基于Linux ^®^ 的边缘器件。该
    的头像 发表于 05-19 10:55 1994次阅读
    NXP <b class='flag-5'>i.MX</b> 91<b class='flag-5'>开发板</b>#支持快速创建基于Linux®的边缘器件

    焕新登场!飞凌嵌入式FET-MX8MPQ-SMARC核心发布

    飞凌嵌入式FET-MX8MPQ-SMARC核心基于NXP i.MX8MPQ处理器开发设计,该系列处理器
    的头像 发表于 05-07 11:29 929次阅读
    焕新登场!飞凌嵌入式FET-<b class='flag-5'>MX8MPQ</b>-SMARC核心<b class='flag-5'>板</b>发布

    TPS6521825 适用于 NXP i.MX 8M mini 的电源管理 IC数据手册

    TPS6521825 是一款单芯片电源管理 IC (PMIC),专门用于支持 i.MX 8M Mini 处理器和 LP873347 器件。该器件的额定温度范围为 –40°C 至 +105°C,适用于各种工业应用。
    的头像 发表于 05-04 10:44 553次阅读
    TPS6521825 适用于 NXP <b class='flag-5'>i.MX</b> <b class='flag-5'>8M</b> mini 的电源管理 IC数据手册

    将Deepseek移植到i.MX 8MP|93 EVK的步骤

    显示了 DeepSeek 模型运行期间 i.MXCPU 和内存使用情况。需要指出的是,CPU 效率会影响模型 Token 生成的速度
    发表于 03-26 06:08

    NXP基于i.MX 91应用处理器打造的FRDM i.MX 91开发板特性参数详解

    FRDM i.MX 91开发板。该开发板基于i.MX 91应用处理器打造,专为加速工业与物联网的开发
    的头像 发表于 03-21 09:37 12.3w次阅读
    NXP基于<b class='flag-5'>i.MX</b> 91应用<b class='flag-5'>处理器</b>打造的FRDM <b class='flag-5'>i.MX</b> 91<b class='flag-5'>开发板</b>特性参数详解

    NXP i.MX 93 开发板#提供高效的机器学习 支持高能效的边缘计算

    NXP Semiconductors FRDM-IMX93开发板i.MX 93应用处理器提供支持。NXP Semiconductors FRDM-IMX93配备板载IW612模块,
    的头像 发表于 03-10 15:30 2519次阅读
    NXP <b class='flag-5'>i.MX</b> 93 <b class='flag-5'>开发板</b>#提供高效的机器学习 支持高能效的边缘计算

    分享!基于NXP i.MX 8M Plus平台的OpenAMP核间通信方案

    i.MX 8M Plus平台。 OpenAMP架构 AMP(Asymmetric Multi-Processing),即非对称多处理架构。“非对称AMP”双系统是指多个核心相对独立
    的头像 发表于 02-27 10:44 872次阅读
    分享!基于NXP <b class='flag-5'>i.MX</b> <b class='flag-5'>8M</b> <b class='flag-5'>Plus</b>平台的OpenAMP核间通信方案

    NXP首款搭载MPU的FRDM产品怎么样?FRDM i.MX93开发板开箱速览

    近期,NXP将i.MX 93系列处理器引入了FRDM系列产品线,推出了FRDM i.MX 93开发板,作为首款配备i.MX MPU的FRDM
    的头像 发表于 02-26 14:56 2367次阅读
    NXP首款搭载MPU的FRDM产品怎么样?FRDM <b class='flag-5'>i.MX</b>93<b class='flag-5'>开发板</b>开箱速览

    恩智浦推出FRDM i.MX 93开发板

    备受嵌入式开发者青睐的恩智浦FRDM开发平台,迎来了一位新成员——FRDM i.MX 93开发板,这也是第一块配备i.MX MPU的FRDM
    的头像 发表于 02-21 14:22 2915次阅读

    NXP推出FRDM i.MX 93开发板, 助力现代工业与边缘智能开发

    近日,恩智浦半导体(NXP Semiconductor)推出了FRDM i.MX 93开发板,这是FRDM系列中第一款基于MPU推出的开发板,以低成本、紧凑的设计为核心,搭载了NXP i.MX
    的头像 发表于 02-21 09:19 2636次阅读
    NXP推出FRDM <b class='flag-5'>i.MX</b> 93<b class='flag-5'>开发板</b>, 助力现代工业与边缘智能<b class='flag-5'>开发</b>

    2.3T算力,真的强!1分钟学会NPU开发,基于NXP i.MX 8MP平台!

    科技飞速发展,人工智能与工业领域的融合日益深入。NXP旗下的i.MX 8M Plus作为一款高端工业处理器NPU算力高达2.3TOPS,正
    的头像 发表于 01-24 10:21 978次阅读
    2.3T算力,真的强!1分钟学会<b class='flag-5'>NPU</b><b class='flag-5'>开发</b>,基于NXP <b class='flag-5'>i.MX</b> <b class='flag-5'>8</b>MP平台!