0
  • 聊天消息
  • 系统消息
  • 评论与回复
登录后你可以
  • 下载海量资料
  • 学习在线课程
  • 观看技术视频
  • 写文章/发帖/加入社区
会员中心
创作中心

完善资料让更多小伙伴认识你,还能领取20积分哦,立即完善>

3天内不再提示

eBPF技术实践之virtio-net网卡队列可观测

Linux阅码场 来源:Linux阅码场 2024-11-14 11:18 次阅读
加入交流群
微信小助手二维码

扫码添加小助手

加入工程师交流群

在系统领域中,最具挑战性的问题通常是组件之间的边界定位。其中,virtio-net前后端的定界尤为困难。当网络报文从内核发送到virtio-net后端,或者从virtio-net后端发送到内核时,这一路径难以进行观测。一些复杂的网络抖动问题很可能是由于网卡队列不正常工作引起的。为了解决这类问题,我们基于eBPF技术扩展了网卡队列的可观测能力,使得virtio网卡前后端的定界问题不再困扰。

virtio-net 前后端驱动简介

virtio-net (后面称为 virtio 网卡)通常由两个组件组成:virtio driver(也称为virtio前端)和virtio device(也称为virtio后端)。virtio前端运行在客户机的内核中,而virtio后端可以由宿主机的内核承担。virtio网卡通常支持多队列,包括发送队列和接收队列。每个队列通过三个 ring 来实现,即avail ring、used ring和desc ring。现在我们将重点介绍 virtio 网卡前端的报文发送和接收流程,以更好地理解整个工作流程。

virtio 网卡前端发送报文

virto网卡前端发送报文主要流程包括:

a.start_xmit:virtio网卡驱动的报文发送入口函数会首先清理已发送的报文,即通过调用free_old_xmit_skbs函数来释放描述符中的报文,直到avail->idx等于used->idx为止;

b.xmit_skb:主要是为报文添加vnet_hdr头部信息,并将skb以scatter-gather形式显示,以记录报文数据的地址和长度信息;

c.virtqueue_add_outbuf:进行DMA映射,将scatter-gather记录的报文数据地址和长度信息添加到desc环中,并增加avail->idx的值;

d.virtqueue_notify:当发送队列存在数据,则通知后端。

7aaa4c30-9069-11ef-a511-92fbcf53809c.png

virtio 网卡前端接收报文

virito网阿卡前端接收报文主要流程包括:

a.网卡硬中断:硬中断会将napi加入到CPU的处理队列,并启用中断抑制,以及触发软中断;

b.net_rx_action:网络软中断入口函数;

c.virtnet_poll:这个函数是virtio网卡的NAPI poll的回调函数。如果当前队列是发送队列,它将清理发送队列,也就是执行virtnet_poll_cleantx函数。如果当前队列是接收队列,它将进行报文的接收;

d.virtnet_receive:根据used->idx的值,从描述符环中读取报文数据,并更新last_used_idx。内核会为报文数据分配skb,并进入GRO流程,进行报文的合并;e.try_fill_recv:要给desc环添加空的内存区域,并增加avail->idx的值,以确保接收队列始终有可用的内存;

f.virtqueue_napi_complete:当接收的报文数量少于预定的budget(一般为64)时,表示没有更多的数据可以接收。这时,调用virtqueue_napi_complete来表示单次napi处理完毕。同时,通过virtqueue_enable_cb_prepare来关闭中断抑制。

7adc430c-9069-11ef-a511-92fbcf53809c.png

网卡队列可观测

经过前面的分析,我们了解到virtio网卡队列中的几个重要参数,即avail->idx、used->idx和last_used_idx。使用这些参数,我们可以清晰地了解网卡队列当前包含的报文数量,并进一步得到以下可观测指标:

a.发送队列报文数:表示尚未被virtio网卡后端发送的报文数量。计算方法是avail->idx - used->idx;

b.接收队列报文数:表示尚未被virtio网卡前端接收的报文数量。计算方法是used->idx - last_used_idx;

c.网卡队列的last_used_idx:表示virtio网卡后端处理报文的进度;

d.队列饱和度:表示当前网卡队列使用量,计算方法是队列报文数/队列长度。

工作原理

我们将可观测的代码集成在了rtrace的工具里,rtrace是龙蜥社区推出的系统工具集SysAK的一个网络诊断分析工具,关于rtrace的具体原理,我们将在下回分析,eBPF 具体代码请参考代码:

https://gitee.com/anolis/sysak/blob/opensource_branch_sync/source/tools/detect/net/rtrace/src/bpf/virtio.bpf.c

virtio 网卡队列指标采集的主要流程如下:

a.rtrace挂载eBPF采集程序到内核dev_id_show和dev_port_show函数;

b.rtrace周期性读取/sys/class/net/[interface]/dev_id和/sys/class/net/[interface]/dev_port两个文件,其中dev_id文件用来表示采集发送队列信息,dev_port文件用来表示采集接收队列信息;

c.当读取文件时,会触发内核执行dev_id_show和dev_port_show两个函数。由于已经挂载了eBPF采集程序,内核会先执行eBPF采集程序;

d.eBPF采集程序通过解析dev_id_show和dev_port_show入参struct net_device获取网卡队列vring,然后从vring中解析出avail idx、used idx、队列长度和last_used_idx;

e.将数据发送给rtrace做进一步处理。

7af93be2-9069-11ef-a511-92fbcf53809c.png

故障检测

下面是rtrace采集的网卡队列信息输出。

我们可以看到0926的1号发送队列的饱和度和last_used_idx分别是0.05%/3593,0928的1号发送队列的饱和度和last_used_idx分别是0.07%/3593,可以看到发送队列的饱和度在增加,但是last_used_idx在多个采集周期内保持不变。因此,可以确定1号发送队列出现了故障。

随后我们修复了1号发送队列故障,可以看见在0906的1号发送队列饱和度和last_used_idx分别是0.00%/3599,队列里面不再有驻留的报文,恢复了正常。

0924
SendQueue0.05%/35930.00%/8520.00%/45060.00%/16000.00%/4570.00%/5090.00%/31400.00%/13520.00%/3860.00%/4100.00%/17140.00%/17580.00%/16190.00%/4460.00%/35770.00%/24430.00%/460.00%/940.00%/2120.00%/2310.00%/1460.00%/1480.00%/2260.00%/640.00%/1090.00%/840.00%/780.00%/560.00%/870.00%/880.00%/850.00%/52
RecvQueue0.00%/28050.00%/132970.00%/4750.00%/3670.00%/123780.00%/1300.00%/2220.00%/111200.00%/3550.00%/30160.00%/1330.00%/1800.00%/129800.00%/103630.00%/28250.00%/6500.00%/1510.00%/5050.00%/51800.00%/2000.00%/266700.00%/1690.00%/10420.00%/98200.00%/95860.00%/33740.00%/2290.00%/14020.00%/87960.00%/1170.00%/3010.00%/275
0925
SendQueue0.05%/35930.00%/8520.00%/45060.00%/16000.00%/4570.00%/5090.00%/31400.00%/13520.00%/3860.00%/4100.00%/17140.00%/17580.00%/16190.00%/4460.00%/35770.00%/24440.00%/460.00%/940.00%/2120.00%/2310.00%/1460.00%/1480.00%/2260.00%/640.00%/1090.00%/840.00%/780.00%/560.00%/870.00%/890.00%/850.00%/52
RecvQueue0.00%/28050.00%/132970.00%/4750.00%/3670.00%/123780.00%/1300.00%/2220.00%/111200.00%/3550.00%/30160.00%/1330.00%/1800.00%/129800.00%/103630.00%/28250.00%/6500.00%/1510.00%/5050.00%/51800.00%/2000.00%/266700.00%/1690.00%/10420.00%/98200.00%/95860.00%/33740.00%/2290.00%/14020.00%/87960.00%/1170.00%/3030.00%/275
0926
SendQueue0.05%/35930.00%/8520.00%/45060.00%/16000.00%/4570.00%/5090.00%/31400.00%/13520.00%/3860.00%/4100.00%/17140.00%/17580.00%/16190.00%/4460.00%/35770.00%/24440.00%/460.00%/940.00%/2120.00%/2310.00%/1460.00%/1480.00%/2260.00%/640.00%/1090.00%/840.00%/780.00%/560.00%/870.00%/910.00%/850.00%/52
RecvQueue0.00%/28050.00%/132970.00%/4750.00%/3670.00%/123780.00%/1300.00%/2220.00%/111200.00%/3550.00%/30160.00%/1330.00%/1800.00%/129800.00%/103630.00%/28250.00%/6500.00%/1510.00%/5050.00%/51800.00%/2000.00%/266700.00%/1690.00%/10420.00%/98200.00%/95860.00%/33740.00%/2290.00%/14020.00%/87960.00%/1170.00%/3050.00%/275
0927
SendQueue0.07%/35930.00%/8520.00%/45060.00%/16000.00%/4570.00%/5090.00%/31400.00%/13520.00%/3860.00%/4100.00%/17140.00%/17580.00%/16190.00%/4460.00%/35770.00%/24440.00%/460.00%/940.00%/2120.00%/2310.00%/1460.00%/1480.00%/2260.00%/640.00%/1090.00%/840.00%/780.00%/560.00%/870.00%/930.00%/850.00%/52
RecvQueue0.00%/28050.00%/132980.00%/4750.00%/3670.00%/123780.00%/1300.00%/2220.00%/111200.00%/3550.00%/30160.00%/1330.00%/1800.00%/129800.00%/103630.00%/28250.00%/6500.00%/1510.00%/5050.00%/51800.00%/2000.00%/266700.00%/1690.00%/10420.00%/98200.00%/95860.00%/33740.00%/2290.00%/14020.00%/87960.00%/1170.00%/3070.00%/275
0928
SendQueue0.07%/35930.00%/8520.00%/45060.00%/16000.00%/4570.00%/5090.00%/31400.00%/13520.00%/3860.00%/4140.00%/17140.00%/17580.00%/16190.00%/4460.00%/35770.00%/24450.00%/460.00%/940.00%/2120.00%/2310.00%/1460.00%/1490.00%/2260.00%/640.00%/1090.00%/840.00%/780.00%/560.00%/870.00%/960.00%/870.00%/52
RecvQueue0.00%/28050.00%/132980.00%/4750.00%/3670.00%/123780.00%/1300.00%/2220.00%/111200.00%/3550.00%/30160.00%/1330.00%/1800.00%/129800.00%/103630.00%/28250.00%/6500.00%/1510.00%/5050.00%/51800.00%/2050.00%/266700.00%/1690.00%/10420.00%/98200.00%/95860.00%/33740.00%/2290.00%/14020.00%/87970.00%/1180.00%/3090.00%/275
0929
SendQueue0.07%/35930.00%/8520.00%/45060.00%/16000.00%/4570.00%/5090.00%/31400.00%/13520.00%/3860.00%/4140.00%/17140.00%/17580.00%/16190.00%/4460.00%/35770.00%/24450.00%/460.00%/940.00%/2120.00%/2310.00%/1460.00%/1490.00%/2260.00%/640.00%/1090.00%/840.00%/780.00%/560.00%/870.00%/980.00%/870.00%/52
RecvQueue0.00%/28050.00%/132980.00%/4750.00%/3670.00%/123780.00%/1300.00%/2220.00%/111200.00%/3550.00%/30160.00%/1330.00%/1800.00%/129800.00%/103630.00%/28250.00%/6500.00%/1510.00%/5050.00%/51800.00%/2050.00%/266700.00%/1690.00%/10420.00%/98200.00%/95860.00%/33740.00%/2290.00%/14020.00%/87970.00%/1180.00%/3110.00%/275
0930
SendQueue0.07%/35930.00%/8520.00%/45060.00%/16000.00%/4570.00%/5090.00%/31400.00%/13520.00%/3860.00%/4140.00%/17140.00%/17580.00%/16190.00%/4460.00%/35770.00%/24450.00%/460.00%/940.00%/2120.00%/2310.00%/1460.00%/1490.00%/2260.00%/640.00%/1090.00%/840.00%/780.00%/560.00%/870.00%/1000.00%/870.00%/52
RecvQueue0.00%/28050.00%/132980.00%/4750.00%/3670.00%/123780.00%/1300.00%/2220.00%/111200.00%/3550.00%/30160.00%/1330.00%/1800.00%/129800.00%/103630.00%/28250.00%/6500.00%/1510.00%/5050.00%/51800.00%/2050.00%/266700.00%/1690.00%/10420.00%/98200.00%/95860.00%/33740.00%/2290.00%/14020.00%/87970.00%/1180.00%/3130.00%/275
//...省略
0906
SendQueue0.00%/35990.00%/8560.00%/45110.00%/16020.00%/4650.00%/5100.00%/31400.00%/13520.00%/3860.00%/4200.00%/17160.00%/17660.00%/16190.00%/4480.00%/35780.00%/24510.00%/460.00%/940.00%/2120.00%/2310.00%/1480.00%/1490.00%/2260.00%/640.00%/1090.00%/850.00%/870.00%/560.00%/870.00%/1010.00%/1030.00%/52
RecvQueue0.00%/28070.00%/132990.00%/4770.00%/3690.00%/123780.00%/1400.00%/2230.00%/111200.00%/3550.00%/30320.00%/1420.00%/1800.00%/129800.00%/103630.00%/28250.00%/6520.00%/1510.00%/5050.00%/51800.00%/2050.00%/266700.00%/1700.00%/10570.00%/98200.00%/95860.00%/33740.00%/2300.00%/14140.00%/88000.00%/1180.00%/3270.00%/275

总结

在virtio网卡中,前端和后端之间通过共享的网卡队列进行通信。为了更好地理解和观测网卡队列的状态和性能指标,通过观测avail idx、used idx、last_used_idx等指标,我们可以对virtio网卡的性能进行评估和优化。同时,这些指标也为我们提供了对网卡队列状态的深入理解,有助于进行故障排查和性能调优。

声明:本文内容及配图由入驻作者撰写或者入驻合作网站授权转载。文章观点仅代表作者本人,不代表电子发烧友网立场。文章及其配图仅供工程师学习之用,如有内容侵权或者其他违规问题,请联系本站处理。 举报投诉
  • 数据
    +关注

    关注

    8

    文章

    7375

    浏览量

    95249
  • 网卡
    +关注

    关注

    4

    文章

    350

    浏览量

    29135
  • 程序
    +关注

    关注

    117

    文章

    3849

    浏览量

    85693

原文标题:eBPF 技术实践之 virtio-net 网卡队列可观测

文章出处:【微信号:LinuxDev,微信公众号:Linux阅码场】欢迎添加关注!文章转载请注明出处。

收藏 人收藏
加入交流群
微信小助手二维码

扫码添加小助手

加入工程师交流群

    评论

    相关推荐
    热点推荐

    光伏四可装置的可观功能对发电效能具体有哪些影响

    、智能分析,将电站运行物理量转化为发电效能优化的“决策依据”。从组件级的微损耗管控到系统级的策略优化,可观功能的技术特性直接渗透到发电全流程,成为提升光伏电站发电量、降低损耗、稳定出力的关键支撑。本文将从技术构成的核心环节出发,
    的头像 发表于 03-02 16:53 662次阅读
    光伏四可装置的<b class='flag-5'>可观</b>功能对发电效能具体有哪些影响

    基于OpenTelemetry的全链路追踪微服务可观测实践

    -> 支付服务,中间还有消息队列和缓存的调用。某天线上出现下单接口P99延迟从200ms飙到了3秒,排查了两个小时才定位到是库存服务调用Redis超时导致的。
    的头像 发表于 02-26 15:43 849次阅读

    RDMA设计40:队列管理及连接建立功能验证与分析

    会配置网卡端口、远程地址、对端 QP 信息等关键参数。当 QP 到达准备发送状态后,整个队列创建流程完成,可以正式进行 RDMA 数据传输。 B站已给出相关性能的视频,如想进一步了解,请搜索B站用户
    发表于 02-13 10:15

    RDMA设计25:队列管理模块发送模块详细设计分析

    发送队列存储为所有发送队列共用的存储空间,根据用户环境和开发板环境不同可由 BRAM、URAM 或 LUTRAM 实现。发送队列管理单元则负责管理这个存储空间,并处理用户指令和发送队列
    的头像 发表于 01-25 16:27 5939次阅读
    RDMA设计25:<b class='flag-5'>队列</b>管理模块<b class='flag-5'>之</b>发送模块详细设计分析

    RDMA设计26:队列管理模块设计接收队列模块详细分析

    本文主要交流设计思路,在本博客已给出相关博文100多篇,希望对初学者有用。注意这里只是抛砖引玉,切莫认为参考这就可以完成商用IP设计。 (2)接收队列 接收队列由一个接收队列管理单元组成。与发送
    发表于 01-22 09:03

    RDMA设计24:队列管理模块设计

    队列管理模块采用管理与存储分离的结构进行设计,由发送队列存储、发送队列管理、接收队列管理、完成条目解析、异常完成条目处理和 Round-Robin 仲裁组成。
    的头像 发表于 01-20 11:45 1646次阅读
    RDMA设计24:<b class='flag-5'>队列</b>管理模块设计

    光伏“可观”功能效果如何量化?——效益与技术实现深度评估

    光伏电站可观功能的技术构成(采集-传输-处理-呈现)是支撑电站“数据透明化”的核心体系,其效果并非单一指标可衡量——既需验证技术本身的可靠性,更要考量其对运营效率、发电收益、合规安全的实际价值,详细
    的头像 发表于 01-16 15:11 687次阅读
    光伏“<b class='flag-5'>可观</b>”功能效果如何量化?——效益与<b class='flag-5'>技术</b>实现深度评估

    机器视觉网卡与普通网卡的5点关键不同

    随着人工智能和工业自动化的蓬勃发展,机器视觉技术扮演着越来越重要的角色。机器视觉系统依赖于高质量的图像数据进行分析和决策,而高性能的网络传输是确保这些数据可靠、快速到达处理单元的关键。虽然机器视觉系
    的头像 发表于 01-15 16:38 423次阅读
    机器视觉<b class='flag-5'>网卡</b>与普通<b class='flag-5'>网卡</b>的5点关键不同

    Amphenol Ve - NET™:汽车多千兆位差分连接器系统的卓越

    Amphenol Ve - NET™:汽车多千兆位差分连接器系统的卓越选 在汽车电子技术飞速发展的今天,高速、可靠的以太网连接对于汽车的智能化和高级驾驶辅助系统(ADAS)等功能的实现至关重要
    的头像 发表于 12-12 09:15 565次阅读

    IBM被 2025年 Gartner® 可观测性平台魔力象限™ 评为领导者

    在 Gartner 发布的 2025年《可观测性平台魔力象限》[1]中,IBM 被评为领导者(Leader)。我们相信,这是对于我们持续致力于提供创新、易用的全栈可观测性软件的认可,其中的核心产品正是 IBM Instana。
    的头像 发表于 09-02 09:45 1130次阅读
    IBM被 2025年 Gartner® <b class='flag-5'>可观测</b>性平台魔力象限™ 评为领导者

    教学实习基地气象观测系统:架起理论与实践的 “气象桥梁”

    教学实习基地气象观测系统:架起理论与实践的 “气象桥梁”柏峰【BF-XQX】在教学实习基地的田野间、草坪上,一套集观测、教学、科研于一体的气象观测系统正悄然运转。它不仅是记录阴晴雨雪的
    的头像 发表于 08-20 14:24 843次阅读
    教学实习基地气象<b class='flag-5'>观测</b>系统:架起理论与<b class='flag-5'>实践</b>的 “气象桥梁”

    NVMe IP高速传输却不依赖XDMA设计九:队列管理模块(上)

    这是采用PCIe设计NVMe,并非调用XDMA方式,后者在PCIe4.0时不大方便,故团队直接采用PCIe设计,结合UVM验证加快设计速度。 队列管理模块采用队列的存储与控制分离的设计结构。
    的头像 发表于 08-04 09:53 901次阅读
    NVMe IP高速传输却不依赖XDMA设计<b class='flag-5'>之</b>九:<b class='flag-5'>队列</b>管理模块(上)

    基于eBPF的Kubernetes网络异常检测系统

    作为一名在云原生领域深耕多年的运维工程师,我见过太多因为网络问题导致的生产事故。传统的监控手段往往是事后诸葛亮,当你发现问题时,用户已经在抱怨了。今天,我将分享如何利用 eBPF 这一革命性技术,构建一套能够实时检测 Kubernetes 网络异常的系统。
    的头像 发表于 07-24 14:09 1021次阅读

    RabbitMQ消息队列解决方案

    在现代分布式系统架构中,消息队列作为核心组件,承担着系统解耦、异步处理、流量削峰等重要职责。RabbitMQ作为一款成熟的消息队列中间件,以其高可用性、高可靠性和丰富的特性,成为众多企业的首选方案。本文将从运维工程师的角度,详细阐述RabbitMQ从单机部署到集群搭建的完
    的头像 发表于 07-08 15:55 830次阅读

    RDMA简介5RoCE V2队列分析

    工作队列元素(WQE),该元素包括数据发送缓冲区的起始地址、数据长度、操作类型等相关信息,用于后续的传输操作。在该WQE被网卡操作结束后,网卡将生成一个CQE并放入与工作队列(SQ)对
    发表于 06-05 17:28