0
  • 聊天消息
  • 系统消息
  • 评论与回复
登录后你可以
  • 下载海量资料
  • 学习在线课程
  • 观看技术视频
  • 写文章/发帖/加入社区
创作中心

完善资料让更多小伙伴认识你,还能领取20积分哦,立即完善>

3天内不再提示

探索aarch64架构上使用ftrace的BPF LSM

Linux阅码场 来源:Linux阅码场 2024-01-25 09:30 次阅读

译者注

笔者在MacBook M2上搭建Linux虚拟机上开发eBPF程序时,遇到一些LSM eBPF类型程序无法运行的问题,哪怕是5.15内核的ubuntu server,依旧无法正常运行。显然,aarch64跟x86_64的内核功能有差异。在笔者尝试定位这些差异时,看到这篇文章,可以让大家更直观地了解LSM eBPF在两种CPU 内核上的差异。

原文本博客文章是我们在Linux中对于`aarch64`上`BPF LSM`支持的内部研究的摘要。如果你对内核代码库不熟悉,要开始查看内核源码是非常困难的,因此我们决定发布这篇文章,展示我们的方法,因为这对于想要探索内核内部的任何人都可能有所帮助。

简介

在x86_64上,我们已经在使用BPF LSM,而在aarch64上,我们依赖于Kprobes,因此我们想知道内核中缺少了哪些功能,才能让这些功能在aarch64上可用。

我们曾多次深入研究内核源代码,但通常我们搜索的是已经存在的东西,以了解其工作原理。但在这种情况下,我们在寻找的是不存在的东西,我们追寻的是那些因为未实现而返回错误的内容。

回想起Steven Rostedt关于如何开始学习Linux内核的讲话,我们从ftrace(以及构建在跟踪基础设施上的工具)开始,以了解当我们将一个不受支持的BPF程序加载到内核时会发生什么。

问题

这是当我们尝试将一个BPF LSM程序加载到aarch64 5.15 Linux内核时,使用我们的软件pulsar[2]时的输出:

root@pine64-1:/home/exein#./pulsar-enterprise-execpulsard
[2023-02-16T1445ZINFOpulsar::daemon]Startingmoduleprocess-monitor
[2023-02-16T1445ZINFOpulsar::daemon]Startingmodulefile-system-monitor
[2023-02-16T1446ZINFOpulsar::daemon]Startingmodulenetwork-monitor
[2023-02-16T1446ZINFOpulsar::daemon]Startingmodulelogger
[2023-02-16T1446ZINFOpulsar::daemon]Startingmodulerules-engine
[2023-02-16T1446ZINFOpulsar::daemon]Startingmoduledesktop-notifier
[2023-02-16T1446ZERRORpulsar::module_manager]Moduleerrorinfile-system-monitor:failedprogramattachlsmpath_mknod

Causedby:
0:`bpf_raw_tracepoint_open`failed
1:Noerrorinformation(oserror524)
[2023-02-16T1446ZINFOpulsar::daemon]Startingmoduleanomaly-detection
[2023-02-16T1446ZINFOpulsar::daemon]Startingmodulemalware-detection
[2023-02-16T1446ZERRORpulsar::module_manager]Moduleerrorinmalware-detection:/var/lib/pulsar/malware_detection/models/parameters.jsonnotfound
[2023-02-16T1446ZINFOpulsar::daemon]Startingmoduleplatform-connector
[2023-02-16T1446ZINFOplatform_connector::client]Connectedtohttps://platform-dev-instance.exein.io:8001/
[2023-02-16T1446ZINFOpulsar::daemon]Startingmodulethreat-response
[2023-02-16T1446ZERRORpulsar::module_manager]Moduleerrorinnetwork-monitor:failedprogramattachlsmsocket_bind

Causedby:
0:`bpf_raw_tracepoint_open`failed
1:Noerrorinformation(oserror524)

我们在尝试加载与path_mknodLSM挂钩相关的BPF程序时,pulsar出现了错误524或ENOTSUPP。让我们尝试深入研究这个问题。

注意: 在进行这项研究时,我们当时无法找到预先编译为启用BPF和BTF的aarch64,因此我们不得不编译一个自定义内核。我们还启用了跟踪选项和function_graph插件,以使用下面的工具。
所有的实验都是在一台装有定制Armbian[3]镜像的Pine A64上进行的。
这些镜像具有带有标准Ubuntu 22.04 LTS Jammy用户空间的自定义内核。

工具

为了调查这个问题,我们使用了以下工具:

bpftrace[4]:基于BPF的工具,使用自定义类C语言动态附加探针。

trace-cmd[5]:围绕tracefs文件系统的包装器,与ftrace基础设施交互。

要使用这些工具,您需要在Linux内核中启用一些选项,请查阅官方文档获取完整的要求。

注意: 也可以使用其他工具来完成相同的工作,例如perf-tools[6]中的funcgraph和kprobe。

Linux 5.15

现在我们开始使用这些工具来查看在内核5.15中尝试加载我们的BPF程序时会发生什么。

从这一点开始到本文末尾,我们将使用probe二进制文件代替pulsar,因为它更简单。为了简要概括其工作原理,以下是命令行帮助:

exein@pine64-1:~$./probe
TestrunnerforeBPFprograms

Usage:probe[OPTIONS]

Commands:
file-system-monitorWatchfilecreations
process-monitorWatchprocessevents(fork/exec/exit)
network-monitorWatchnetworkevents
helpPrintthismessageorthehelpofthegivensubcommand(s)

Options:
-v,--verbose
-h,--helpPrinthelp
-V,--versionPrintversion

在这些示例中,我们将尝试加载file-system-monitor探针。

通过运行以下命令,我们可以看到__sys_bpf函数的函数图调用,这是BPF系统调用的入口点:

trace-cmdrecord-pfunction_graph-g__sys_bpf./probefile-system-monitor
trace-cmdreport

输出是一个非常庞大的函数图,太大了,无法在这里粘贴。由于我们遇到了错误,我们对程序停止前的最后几个函数感兴趣。以下是trace-cmd report输出的最后几行:

...
tokio-runtime-w-1666[003]1318.058019:funcgraph_entry:|bpf_trampoline_link_prog(){
tokio-runtime-w-1666[003]1318.058020:funcgraph_entry:2.292us|bpf_attach_type_to_tramp();
tokio-runtime-w-1666[003]1318.058024:funcgraph_entry:1.250us|mutex_lock();
tokio-runtime-w-1666[003]1318.058028:funcgraph_entry:|bpf_trampoline_update(){
tokio-runtime-w-1666[003]1318.058030:funcgraph_entry:|kmem_cache_alloc_trace(){
tokio-runtime-w-1666[003]1318.058031:funcgraph_entry:1.167us|should_failslab();
tokio-runtime-w-1666[003]1318.058036:funcgraph_exit:6.792us|}
tokio-runtime-w-1666[003]1318.058039:funcgraph_entry:|kmem_cache_alloc_trace(){
tokio-runtime-w-1666[003]1318.058042:funcgraph_entry:2.750us|should_failslab();
tokio-runtime-w-1666[003]1318.058046:funcgraph_exit:6.417us|}
tokio-runtime-w-1666[003]1318.058048:funcgraph_entry:2.708us|bpf_jit_charge_modmem();
tokio-runtime-w-1666[003]1318.058053:funcgraph_entry:|bpf_jit_alloc_exec_page(){
tokio-runtime-w-1666[003]1318.058055:funcgraph_entry:|bpf_jit_alloc_exec(){
tokio-runtime-w-1666[003]1318.058057:funcgraph_entry:|vmalloc(){
tokio-runtime-w-1666[003]1318.058059:funcgraph_entry:|__vmalloc_node(){
tokio-runtime-w-1666[003]1318.058061:funcgraph_entry:|__vmalloc_node_range(){
tokio-runtime-w-1666[003]1318.058064:funcgraph_entry:|__get_vm_area_node.constprop.64(){
tokio-runtime-w-1666[003]1318.058067:funcgraph_entry:|kmem_cache_alloc_node_trace(){
tokio-runtime-w-1666[003]1318.058069:funcgraph_entry:1.459us|should_failslab();
tokio-runtime-w-1666[003]1318.058073:funcgraph_exit:6.292us|}
tokio-runtime-w-1666[003]1318.058075:funcgraph_entry:|alloc_vmap_area(){
tokio-runtime-w-1666[003]1318.058077:funcgraph_entry:|kmem_cache_alloc_node(){
tokio-runtime-w-1666[003]1318.058079:funcgraph_entry:1.167us|should_failslab();
tokio-runtime-w-1666[003]1318.058085:funcgraph_exit:7.625us|}
tokio-runtime-w-1666[003]1318.058088:funcgraph_entry:|kmem_cache_alloc_node(){
tokio-runtime-w-1666[003]1318.058089:funcgraph_entry:1.208us|should_failslab();
tokio-runtime-w-1666[003]1318.058092:funcgraph_exit:4.584us|}
tokio-runtime-w-1666[003]1318.058104:funcgraph_entry:|kmem_cache_free(){
tokio-runtime-w-1666[003]1318.058107:funcgraph_entry:2.084us|__slab_free();
tokio-runtime-w-1666[003]1318.058110:funcgraph_exit:5.667us|}
tokio-runtime-w-1666[003]1318.058112:funcgraph_entry:6.375us|insert_vmap_area.constprop.74();
tokio-runtime-w-1666[003]1318.058119:funcgraph_exit:+44.667us|}
tokio-runtime-w-1666[003]1318.058122:funcgraph_exit:+58.250us|}
tokio-runtime-w-1666[003]1318.058124:funcgraph_entry:|__kmalloc_node(){
tokio-runtime-w-1666[003]1318.058125:funcgraph_entry:1.625us|kmalloc_slab();
tokio-runtime-w-1666[003]1318.058128:funcgraph_entry:1.167us|should_failslab();
tokio-runtime-w-1666[003]1318.058131:funcgraph_exit:7.208us|}
tokio-runtime-w-1666[003]1318.058133:funcgraph_entry:|alloc_pages(){
tokio-runtime-w-1666[003]1318.058135:funcgraph_entry:1.583us|get_task_policy.part.48();
tokio-runtime-w-1666[003]1318.058138:funcgraph_entry:1.500us|policy_node();
tokio-runtime-w-1666[003]1318.058141:funcgraph_entry:1.209us|policy_nodemask();
tokio-runtime-w-1666[003]1318.058143:funcgraph_entry:|__alloc_pages(){
tokio-runtime-w-1666[003]1318.058145:funcgraph_entry:1.458us|should_fail_alloc_page();
tokio-runtime-w-1666[003]1318.058147:funcgraph_entry:|get_page_from_freelist(){
tokio-runtime-w-1666[003]1318.058150:funcgraph_entry:1.583us|prep_new_page();
tokio-runtime-w-1666[003]1318.058153:funcgraph_exit:5.459us|}
tokio-runtime-w-1666[003]1318.058154:funcgraph_exit:+10.542us|}
tokio-runtime-w-1666[003]1318.058155:funcgraph_exit:+22.083us|}
tokio-runtime-w-1666[003]1318.058157:funcgraph_entry:|__cond_resched(){
tokio-runtime-w-1666[003]1318.058158:funcgraph_entry:1.833us|rcu_all_qs();
tokio-runtime-w-1666[003]1318.058161:funcgraph_exit:4.167us|}
tokio-runtime-w-1666[003]1318.058166:funcgraph_entry:5.542us|vmap_pages_range_noflush();
tokio-runtime-w-1666[003]1318.058173:funcgraph_exit:!112.375us|}
tokio-runtime-w-1666[003]1318.058175:funcgraph_exit:!116.000us|}
tokio-runtime-w-1666[003]1318.058176:funcgraph_exit:!119.292us|}
tokio-runtime-w-1666[003]1318.058177:funcgraph_exit:!122.542us|}
tokio-runtime-w-1666[003]1318.058179:funcgraph_entry:|find_vm_area(){
tokio-runtime-w-1666[003]1318.058180:funcgraph_entry:1.375us|find_vmap_area();
tokio-runtime-w-1666[003]1318.058183:funcgraph_exit:4.333us|}
tokio-runtime-w-1666[003]1318.058185:funcgraph_entry:|set_memory_x(){
tokio-runtime-w-1666[003]1318.058186:funcgraph_entry:|change_memory_common(){
tokio-runtime-w-1666[003]1318.058188:funcgraph_entry:|find_vm_area(){
tokio-runtime-w-1666[003]1318.058189:funcgraph_entry:1.333us|find_vmap_area();
tokio-runtime-w-1666[003]1318.058192:funcgraph_exit:3.875us|}
tokio-runtime-w-1666[003]1318.058193:funcgraph_entry:|vm_unmap_aliases(){
tokio-runtime-w-1666[003]1318.058194:funcgraph_entry:|_vm_unmap_aliases.part.58(){
tokio-runtime-w-1666[003]1318.058196:funcgraph_entry:1.542us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058199:funcgraph_entry:1.208us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058202:funcgraph_entry:1.166us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058205:funcgraph_entry:1.208us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058207:funcgraph_entry:1.208us|mutex_lock();
tokio-runtime-w-1666[003]1318.058210:funcgraph_entry:|purge_fragmented_blocks_allcpus(){
tokio-runtime-w-1666[003]1318.058212:funcgraph_entry:1.500us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058214:funcgraph_entry:1.500us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058217:funcgraph_entry:1.500us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058220:funcgraph_entry:1.167us|rcu_read_unlock_strict();
tokio-runtime-w-1666[003]1318.058222:funcgraph_exit:+11.917us|}
tokio-runtime-w-1666[003]1318.058224:funcgraph_entry:|__purge_vmap_area_lazy(){
tokio-runtime-w-1666[003]1318.058232:funcgraph_entry:|kmem_cache_free(){
tokio-runtime-w-1666[003]1318.058234:funcgraph_entry:1.250us|__slab_free();
tokio-runtime-w-1666[003]1318.058237:funcgraph_exit:4.791us|}
tokio-runtime-w-1666[003]1318.058241:funcgraph_entry:1.209us|__cond_resched_lock();
tokio-runtime-w-1666[003]1318.058244:funcgraph_exit:+19.625us|}
tokio-runtime-w-1666[003]1318.058245:funcgraph_entry:1.167us|mutex_unlock();
tokio-runtime-w-1666[003]1318.058247:funcgraph_exit:+53.042us|}
tokio-runtime-w-1666[003]1318.058248:funcgraph_exit:+55.625us|}
tokio-runtime-w-1666[003]1318.058250:funcgraph_entry:|__change_memory_common(){
tokio-runtime-w-1666[003]1318.058251:funcgraph_entry:|apply_to_page_range(){
tokio-runtime-w-1666[003]1318.058253:funcgraph_entry:|__apply_to_page_range(){
tokio-runtime-w-1666[003]1318.058255:funcgraph_entry:1.250us|pud_huge();
tokio-runtime-w-1666[003]1318.058258:funcgraph_entry:1.166us|pmd_huge();
tokio-runtime-w-1666[003]1318.058260:funcgraph_entry:1.208us|change_page_range();
tokio-runtime-w-1666[003]1318.058263:funcgraph_exit:9.834us|}
tokio-runtime-w-1666[003]1318.058264:funcgraph_exit:+12.709us|}
tokio-runtime-w-1666[003]1318.058266:funcgraph_exit:+15.459us|}
tokio-runtime-w-1666[003]1318.058268:funcgraph_exit:+80.791us|}
tokio-runtime-w-1666[003]1318.058270:funcgraph_exit:+84.834us|}
tokio-runtime-w-1666[003]1318.058272:funcgraph_exit:!218.500us|}
tokio-runtime-w-1666[003]1318.058274:funcgraph_entry:|__alloc_percpu_gfp(){
tokio-runtime-w-1666[003]1318.058276:funcgraph_entry:|pcpu_alloc(){
tokio-runtime-w-1666[003]1318.058281:funcgraph_entry:2.250us|mutex_lock_killable();
tokio-runtime-w-1666[003]1318.058290:funcgraph_entry:|pcpu_find_block_fit(){
tokio-runtime-w-1666[003]1318.058293:funcgraph_entry:2.833us|pcpu_next_fit_region.constprop.38();
tokio-runtime-w-1666[003]1318.058299:funcgraph_exit:9.084us|}
tokio-runtime-w-1666[003]1318.058301:funcgraph_entry:|pcpu_alloc_area(){
tokio-runtime-w-1666[003]1318.058315:funcgraph_entry:4.000us|pcpu_block_update_hint_alloc();
tokio-runtime-w-1666[003]1318.058320:funcgraph_entry:2.208us|pcpu_chunk_relocate();
tokio-runtime-w-1666[003]1318.058324:funcgraph_exit:+22.625us|}
tokio-runtime-w-1666[003]1318.058327:funcgraph_entry:1.208us|mutex_unlock();
tokio-runtime-w-1666[003]1318.058332:funcgraph_entry:1.584us|pcpu_memcg_post_alloc_hook();
tokio-runtime-w-1666[003]1318.058335:funcgraph_exit:+58.833us|}
tokio-runtime-w-1666[003]1318.058336:funcgraph_exit:+61.834us|}
tokio-runtime-w-1666[003]1318.058338:funcgraph_entry:|kmem_cache_alloc_trace(){
tokio-runtime-w-1666[003]1318.058339:funcgraph_entry:1.167us|should_failslab();
tokio-runtime-w-1666[003]1318.058342:funcgraph_exit:4.458us|}
tokio-runtime-w-1666[003]1318.058359:funcgraph_entry:|bpf_image_ksym_add(){
tokio-runtime-w-1666[003]1318.058360:funcgraph_entry:|bpf_ksym_add(){
tokio-runtime-w-1666[003]1318.058363:funcgraph_entry:1.583us|__local_bh_enable_ip();
tokio-runtime-w-1666[003]1318.058366:funcgraph_exit:5.750us|}
tokio-runtime-w-1666[003]1318.058369:funcgraph_exit:9.834us|}
tokio-runtime-w-1666[003]1318.058371:funcgraph_entry:1.250us|arch_prepare_bpf_trampoline();
tokio-runtime-w-1666[003]1318.058373:funcgraph_entry:2.292us|kfree();
tokio-runtime-w-1666[003]1318.058377:funcgraph_exit:!348.625us|}
tokio-runtime-w-1666[003]1318.058379:funcgraph_entry:1.250us|mutex_unlock();
tokio-runtime-w-1666[003]1318.058382:funcgraph_exit:!363.167us|}
tokio-runtime-w-1666[003]1318.058384:funcgraph_entry:|bpf_link_cleanup(){
tokio-runtime-w-1666[003]1318.058386:funcgraph_entry:|bpf_link_free_id.part.30(){
tokio-runtime-w-1666[003]1318.058392:funcgraph_entry:|call_rcu(){
tokio-runtime-w-1666[003]1318.058396:funcgraph_entry:1.834us|rcu_segcblist_enqueue();
tokio-runtime-w-1666[003]1318.058401:funcgraph_exit:9.333us|}
tokio-runtime-w-1666[003]1318.058403:funcgraph_entry:1.542us|__local_bh_enable_ip();
tokio-runtime-w-1666[003]1318.058406:funcgraph_exit:+19.542us|}
tokio-runtime-w-1666[003]1318.058408:funcgraph_entry:|fput(){
tokio-runtime-w-1666[003]1318.058409:funcgraph_entry:|fput_many(){
tokio-runtime-w-1666[003]1318.058411:funcgraph_entry:|task_work_add(){
tokio-runtime-w-1666[003]1318.058414:funcgraph_entry:1.625us|kick_process();
tokio-runtime-w-1666[003]1318.058418:funcgraph_exit:6.750us|}
tokio-runtime-w-1666[003]1318.058419:funcgraph_exit:+10.333us|}
tokio-runtime-w-1666[003]1318.058420:funcgraph_exit:+12.708us|}
tokio-runtime-w-1666[003]1318.058422:funcgraph_entry:2.250us|put_unused_fd();
tokio-runtime-w-1666[003]1318.058426:funcgraph_exit:+41.416us|}
tokio-runtime-w-1666[003]1318.058428:funcgraph_entry:1.292us|mutex_unlock();
tokio-runtime-w-1666[003]1318.058430:funcgraph_entry:1.250us|kfree();
tokio-runtime-w-1666[003]1318.058433:funcgraph_exit:!567.458us|}
tokio-runtime-w-1666[003]1318.058435:funcgraph_entry:2.125us|__bpf_prog_put.isra.47();
tokio-runtime-w-1666[003]1318.058438:funcgraph_exit:!602.291us|}
tokio-runtime-w-1666[003]1318.058439:funcgraph_exit:!631.791us|}
```shell
这是`kernel/bpf/trampoline.c`中与最后执行的函数`bpf_trampoline_update`对应的源代码:
```c
staticintbpf_trampoline_update(structbpf_trampoline*tr)
{
structbpf_tramp_image*im;
structbpf_tramp_progs*tprogs;
u32flags=BPF_TRAMP_F_RESTORE_REGS;
boolip_arg=false;
interr,total;

tprogs=bpf_trampoline_get_progs(tr,&total,&ip_arg);
if(IS_ERR(tprogs))
returnPTR_ERR(tprogs);

if(total==0){
err=unregister_fentry(tr,tr->cur_image->image);
bpf_tramp_image_put(tr->cur_image);
tr->cur_image=NULL;
tr->selector=0;
gotoout;
}

im=bpf_tramp_image_alloc(tr->key,tr->selector);
if(IS_ERR(im)){
err=PTR_ERR(im);
gotoout;
}

if(tprogs[BPF_TRAMP_FEXIT].nr_progs||
tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs)
flags=BPF_TRAMP_F_CALL_ORIG|BPF_TRAMP_F_SKIP_FRAME;

if(ip_arg)
flags|=BPF_TRAMP_F_IP_ARG;

err=arch_prepare_bpf_trampoline(im,im->image,im->image+PAGE_SIZE,
&tr->func.model,flags,tprogs,
tr->func.addr);
if(err< 0)
      goto out;
    
     WARN_ON(tr->cur_image&&tr->selector==0);
WARN_ON(!tr->cur_image&&tr->selector);
if(tr->cur_image)
/*progsalreadyrunningatthisaddress*/
err=modify_fentry(tr,tr->cur_image->image,im->image);
else
/*firsttimeregistering*/
err=register_fentry(tr,im->image);
if(err)
gotoout;
if(tr->cur_image)
bpf_tramp_image_put(tr->cur_image);
tr->cur_image=im;
tr->selector++;
out:
kfree(tprogs);
returnerr;
}

根据先前的输出,我们可以看到:

tokio-runtime-w-1666[003]1318.058371:funcgraph_entry:1.250us|arch_prepare_bpf_trampoline();
tokio-runtime-w-1666[003]1318.058373:funcgraph_entry:2.292us|kfree();

在arch_prepare_bpf_trampoline和kfree函数之间没有其他函数调用,所以很可能第一个函数在err变量中返回了错误代码。让我们来验证一下!

通过以下方式在shell中启动bpftace,我们可以捕获arch_prepare_bpf_trampoline函数的返回值并将其打印到控制台上:

bpftrace-e'kretprobe:arch_prepare_bpf_trampoline{printf("retvallink:%d
",retval);}'

并且在另一个终端中启动probe后,我们从bpftace得到了以下输出:

root@pine64-1:/home/exein#bpftrace-e'kretprobe:arch_prepare_bpf_trampoline{printf("retvallink:%d
",retval);}'
Attaching1probe...
retvallink:-524

这是因为内核5.15缺乏对aarch64架构的arch_prepare_bpf_trampoline实现,并使用了默认的占位符实现。

int__weak
arch_prepare_bpf_trampoline(structbpf_tramp_image*tr,void*image,void*image_end,
conststructbtf_func_model*m,u32flags,
structbpf_tramp_links*tlinks,
void*orig_call)
{
return-ENOTSUPP;
}

因此,这个功能在这个内核版本上是不受支持的。好消息是,多亏了这个补丁[7],它在6.x内核中得到了实现。

让我们移步到6.x内核。

Linux 6.1

如果我们尝试在内核 6.1 上运行 probe,我们会得到以下输出:

root@pine64:/home/exein#./probefile-system-monitor
thread'main'panickedat'initializationfailed:ProgramAttachError{program:"lsmpath_mknod",program_error:SyscallError{call:"bpf_raw_tracepoint_open",io_error:Os{code:524,kind:Uncategorized,message:"Noerrorinformation"}}}',src/bin/probe.rs43
note:runwith`RUST_BACKTRACE=1`environmentvariabletodisplayabacktrace

对于内核版本6.1,我们仍然遇到了和5.15内核一样的错误!!!让我们找出其中的原因。

这次在arch_prepare_bpf_trampoline上运行bpftrace,我们得到了以下输出:

root@pine64:/home/exein#bpftrace-e'kretprobe:arch_prepare_bpf_trampoline{printf("retvaltplink:%d
",retval);}'
Attaching1probe...
retvaltplink:284

所以问题不在这里,这个函数不再返回错误了。让我们回到函数调用图。

这次我们启动trace-cmd,跳过一些函数以获得更清晰的输出:

trace-cmdrecord
-pfunction_graph
-gbpf_trampoline_link_prog
-nbpf_jit_alloc_exec
-nkmalloc_trace
-narch_prepare_bpf_trampoline
-ngeneric_handle_domain_irq
-ndo_interrupt_handler
-nirq_exit_rcu
./probefile-system-monitor

我们从trace-cmd report中获得以下输出:

root@pine64:/home/exein#trace-cmdreport
CPU0isempty
CPU1isempty
CPU3isempty
cpus=4
tokio-runtime-w-11886[002]193385.056283:funcgraph_entry:|bpf_trampoline_link_prog(){
tokio-runtime-w-11886[002]193385.056321:funcgraph_entry:+15.042us|mutex_lock();
tokio-runtime-w-11886[002]193385.056373:funcgraph_entry:|__bpf_trampoline_link_prog(){
tokio-runtime-w-11886[002]193385.056395:funcgraph_entry:+14.833us|bpf_attach_type_to_tramp();
tokio-runtime-w-11886[002]193385.056428:funcgraph_entry:|bpf_trampoline_update.isra.23(){
tokio-runtime-w-11886[002]193385.056459:funcgraph_entry:2.917us|bpf_jit_charge_modmem();
tokio-runtime-w-11886[002]193385.056531:funcgraph_entry:|find_vm_area(){
tokio-runtime-w-11886[002]193385.056540:funcgraph_entry:3.000us|find_vmap_area();
tokio-runtime-w-11886[002]193385.056547:funcgraph_exit:+16.208us|}
tokio-runtime-w-11886[002]193385.056554:funcgraph_entry:|__alloc_percpu_gfp(){
tokio-runtime-w-11886[002]193385.056563:funcgraph_entry:|pcpu_alloc(){
tokio-runtime-w-11886[002]193385.056568:funcgraph_entry:4.875us|mutex_lock_killable();
tokio-runtime-w-11886[002]193385.056591:funcgraph_entry:|pcpu_find_block_fit(){
tokio-runtime-w-11886[002]193385.056599:funcgraph_entry:8.625us|pcpu_next_fit_region.constprop.38();
tokio-runtime-w-11886[002]193385.056608:funcgraph_exit:+17.166us|}
tokio-runtime-w-11886[002]193385.056610:funcgraph_entry:|pcpu_alloc_area(){
tokio-runtime-w-11886[002]193385.056639:funcgraph_entry:9.167us|pcpu_block_update();
tokio-runtime-w-11886[002]193385.056656:funcgraph_entry:7.667us|pcpu_block_update_hint_alloc();
tokio-runtime-w-11886[002]193385.056671:funcgraph_entry:7.750us|pcpu_chunk_relocate();
tokio-runtime-w-11886[002]193385.056679:funcgraph_exit:+69.667us|}
tokio-runtime-w-11886[002]193385.056682:funcgraph_entry:7.042us|mutex_unlock();
tokio-runtime-w-11886[002]193385.056703:funcgraph_entry:2.792us|pcpu_memcg_post_alloc_hook();
tokio-runtime-w-11886[002]193385.056712:funcgraph_exit:!148.709us|}
tokio-runtime-w-11886[002]193385.056719:funcgraph_exit:!165.250us|}
tokio-runtime-w-11886[002]193385.056866:funcgraph_entry:|bpf_image_ksym_add(){
tokio-runtime-w-11886[002]193385.056873:funcgraph_entry:|bpf_ksym_add(){
tokio-runtime-w-11886[002]193385.056882:funcgraph_entry:2.750us|__local_bh_disable_ip();
tokio-runtime-w-11886[002]193385.056897:funcgraph_entry:4.625us|__local_bh_enable_ip();
tokio-runtime-w-11886[002]193385.056905:funcgraph_exit:+32.459us|}
tokio-runtime-w-11886[002]193385.056922:funcgraph_entry:7.584us|perf_event_ksymbol();
tokio-runtime-w-11886[002]193385.056944:funcgraph_exit:+78.417us|}
tokio-runtime-w-11886[002]193385.057492:funcgraph_entry:|set_memory_ro(){
tokio-runtime-w-11886[002]193385.057501:funcgraph_entry:|change_memory_common(){
tokio-runtime-w-11886[002]193385.057504:funcgraph_entry:|find_vm_area(){
tokio-runtime-w-11886[002]193385.057506:funcgraph_entry:8.875us|find_vmap_area();
tokio-runtime-w-11886[002]193385.057518:funcgraph_exit:+14.250us|}
tokio-runtime-w-11886[002]193385.057522:funcgraph_entry:|__change_memory_common(){
tokio-runtime-w-11886[002]193385.057531:funcgraph_entry:|apply_to_page_range(){
tokio-runtime-w-11886[002]193385.057538:funcgraph_entry:|__apply_to_page_range(){
tokio-runtime-w-11886[002]193385.057544:funcgraph_entry:+12.791us|pud_huge();
tokio-runtime-w-11886[002]193385.057559:funcgraph_entry:2.708us|pmd_huge();
tokio-runtime-w-11886[002]193385.057574:funcgraph_entry:+15.125us|change_page_range();
tokio-runtime-w-11886[002]193385.057591:funcgraph_exit:+53.792us|}
tokio-runtime-w-11886[002]193385.057597:funcgraph_exit:+66.083us|}
tokio-runtime-w-11886[002]193385.057610:funcgraph_exit:+88.125us|}
tokio-runtime-w-11886[002]193385.057619:funcgraph_entry:|vm_unmap_aliases(){
tokio-runtime-w-11886[002]193385.057622:funcgraph_entry:|_vm_unmap_aliases.part.77(){
tokio-runtime-w-11886[002]193385.057625:funcgraph_entry:9.125us|mutex_lock();
tokio-runtime-w-11886[002]193385.057637:funcgraph_entry:3.084us|purge_fragmented_blocks_allcpus();
tokio-runtime-w-11886[002]193385.057643:funcgraph_entry:|__purge_vmap_area_lazy(){
tokio-runtime-w-11886[002]193385.057687:funcgraph_entry:|kmem_cache_free(){
tokio-runtime-w-11886[002]193385.057693:funcgraph_entry:+13.250us|__slab_free();
tokio-runtime-w-11886[002]193385.057705:funcgraph_exit:+18.750us|}
tokio-runtime-w-11886[002]193385.057718:funcgraph_entry:7.416us|__cond_resched_lock();
tokio-runtime-w-11886[002]193385.057733:funcgraph_exit:+90.042us|}
tokio-runtime-w-11886[002]193385.057741:funcgraph_entry:2.792us|mutex_unlock();
tokio-runtime-w-11886[002]193385.057747:funcgraph_exit:!124.666us|}
tokio-runtime-w-11886[002]193385.057749:funcgraph_exit:!130.291us|}
tokio-runtime-w-11886[002]193385.057756:funcgraph_entry:|__change_memory_common(){
tokio-runtime-w-11886[002]193385.057759:funcgraph_entry:|apply_to_page_range(){
tokio-runtime-w-11886[002]193385.057765:funcgraph_entry:|__apply_to_page_range(){
tokio-runtime-w-11886[002]193385.057768:funcgraph_entry:4.125us|pud_huge();
tokio-runtime-w-11886[002]193385.057778:funcgraph_entry:8.750us|pmd_huge();
tokio-runtime-w-11886[002]193385.057790:funcgraph_entry:4.625us|change_page_range();
tokio-runtime-w-11886[002]193385.057797:funcgraph_exit:+31.958us|}
tokio-runtime-w-11886[002]193385.057803:funcgraph_exit:+44.375us|}
tokio-runtime-w-11886[002]193385.057817:funcgraph_exit:+61.208us|}
tokio-runtime-w-11886[002]193385.057820:funcgraph_exit:!319.292us|}
tokio-runtime-w-11886[002]193385.057826:funcgraph_exit:!333.667us|}
tokio-runtime-w-11886[002]193385.057840:funcgraph_entry:|set_memory_x(){
tokio-runtime-w-11886[002]193385.057847:funcgraph_entry:|change_memory_common(){
tokio-runtime-w-11886[002]193385.057855:funcgraph_entry:|find_vm_area(){
tokio-runtime-w-11886[002]193385.057858:funcgraph_entry:2.917us|find_vmap_area();
tokio-runtime-w-11886[002]193385.057870:funcgraph_exit:+14.375us|}
tokio-runtime-w-11886[002]193385.057876:funcgraph_entry:|vm_unmap_aliases(){
tokio-runtime-w-11886[002]193385.057879:funcgraph_entry:|_vm_unmap_aliases.part.77(){
tokio-runtime-w-11886[002]193385.057882:funcgraph_entry:3.959us|mutex_lock();
tokio-runtime-w-11886[002]193385.057893:funcgraph_entry:3.000us|purge_fragmented_blocks_allcpus();
tokio-runtime-w-11886[002]193385.057900:funcgraph_entry:2.791us|__purge_vmap_area_lazy();
tokio-runtime-w-11886[002]193385.057907:funcgraph_entry:2.709us|mutex_unlock();
tokio-runtime-w-11886[002]193385.057913:funcgraph_exit:+33.708us|}
tokio-runtime-w-11886[002]193385.057915:funcgraph_exit:+43.000us|}
tokio-runtime-w-11886[002]193385.057922:funcgraph_entry:|__change_memory_common(){
tokio-runtime-w-11886[002]193385.057925:funcgraph_entry:|apply_to_page_range(){
tokio-runtime-w-11886[002]193385.057930:funcgraph_entry:|__apply_to_page_range(){
tokio-runtime-w-11886[002]193385.057933:funcgraph_entry:4.292us|pud_huge();
tokio-runtime-w-11886[002]193385.057945:funcgraph_entry:8.750us|pmd_huge();
tokio-runtime-w-11886[002]193385.057956:funcgraph_entry:3.958us|change_page_range();
tokio-runtime-w-11886[002]193385.058037:funcgraph_exit:+32.083us|}
tokio-runtime-w-11886[002]193385.058089:funcgraph_entry:7.667us|irq_enter_rcu();
tokio-runtime-w-11886[002]193385.058233:funcgraph_exit:!308.041us|}
tokio-runtime-w-11886[002]193385.058239:funcgraph_exit:!316.709us|}
tokio-runtime-w-11886[002]193385.058247:funcgraph_exit:!400.417us|}
tokio-runtime-w-11886[002]193385.058255:funcgraph_exit:!415.000us|}
tokio-runtime-w-11886[002]193385.058555:funcgraph_entry:8.250us|irq_enter_rcu();
tokio-runtime-w-11886[002]193385.058958:funcgraph_entry:|kallsyms_lookup_size_offset(){
tokio-runtime-w-11886[002]193385.058974:funcgraph_entry:+36.333us|get_symbol_pos();
tokio-runtime-w-11886[002]193385.059017:funcgraph_exit:+59.750us|}
tokio-runtime-w-11886[002]193385.059043:funcgraph_entry:|kfree(){
tokio-runtime-w-11886[002]193385.059057:funcgraph_entry:3.000us|__kmem_cache_free();
tokio-runtime-w-11886[002]193385.059065:funcgraph_exit:+22.833us|}
tokio-runtime-w-11886[002]193385.059073:funcgraph_exit:#2644.708us|}
tokio-runtime-w-11886[002]193385.059079:funcgraph_exit:#2706.292us|}
tokio-runtime-w-11886[002]193385.059095:funcgraph_entry:2.792us|mutex_unlock();
tokio-runtime-w-11886[002]193385.059101:funcgraph_exit:#2870.416us|}

这次程序已经通过了arch_prepare_bpf_trampoline、set_memory_ro和set_memory_x,我们看到的最后一个函数是kallsyms_lookup_size_offset。

正如我们在kernel/bpf/trampoline.c中的bpf_trampoline_update函数中所看到的,这里并没有明确调用kallsyms_lookup_size_offset:

staticintbpf_trampoline_update(structbpf_trampoline*tr,boollock_direct_mutex)
{

//...OTHERCODE...

#ifdefCONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
again:
if((tr->flags&BPF_TRAMP_F_SHARE_IPMODIFY)&&
(tr->flags&BPF_TRAMP_F_CALL_ORIG))
tr->flags|=BPF_TRAMP_F_ORIG_STACK;
#endif

err=arch_prepare_bpf_trampoline(im,im->image,im->image+PAGE_SIZE,
&tr->func.model,tr->flags,tlinks,
tr->func.addr);
if(err< 0)
      goto out;
    
     set_memory_ro((long)im->image,1);
set_memory_x((long)im->image,1);

WARN_ON(tr->cur_image&&tr->selector==0);
WARN_ON(!tr->cur_image&&tr->selector);
if(tr->cur_image)
/*progsalreadyrunningatthisaddress*/
err=modify_fentry(tr,tr->cur_image->image,im->image,lock_direct_mutex);
else
/*firsttimeregistering*/
err=register_fentry(tr,im->image);

#ifdefCONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
if(err==-EAGAIN){
/*-EAGAINfrombpf_tramp_ftrace_ops_func.Now
*BPF_TRAMP_F_SHARE_IPMODIFYisset,wecangeneratethe
*trampolineagain,andretryregister.
*/
/*resetfops->funcandfops->trampolineforre-register*/
tr->fops->func=NULL;
tr->fops->trampoline=0;

/*resetim->imagememoryattrforarch_prepare_bpf_trampoline*/
set_memory_nx((long)im->image,1);
set_memory_rw((long)im->image,1);
gotoagain;
}
#endif
if(err)
gotoout;

if(tr->cur_image)
bpf_tramp_image_put(tr->cur_image);
tr->cur_image=im;
tr->selector++;
out:
/*Ifanyerrorhappens,restorepreviousflags*/
if(err)
tr->flags=orig_flags;
kfree(tlinks);
returnerr;
}
```shell

>**注意:**`bpf_trampoline_update`的实现与之前的内核5.15稍有不同。

`kallsyms_lookup_size_offset`的调用被隐藏在另一个函数内部。我们在函数图中看不到它,因为编译器将其内联了。

看起来`kallsyms_lookup_size_offset`是由`ftrace_location`调用的:
```c
unsignedlongftrace_location(unsignedlongip)
{
structdyn_ftrace*rec;
unsignedlongoffset;
unsignedlongsize;

rec=lookup_rec(ip,ip);
if(!rec){
if(!kallsyms_lookup_size_offset(ip,&size,&offset))
gotoout;

/*mapsym+0to__fentry__*/
if(!offset)
rec=lookup_rec(ip,ip+size-1);
}

if(rec)
returnrec->ip;

out:
return0;
}

ftrace_location被register_fentry调用,而register_fentry在调用ftrace_location之后,在struct bpf_trampoline *tr的fops字段上包含了一次检查。

/*firsttimeregistering*/
staticintregister_fentry(structbpf_trampoline*tr,void*new_addr)
{
void*ip=tr->func.addr;
unsignedlongfaddr;
intret;

faddr=ftrace_location((unsignedlong)ip);
if(faddr){
if(!tr->fops)
return-ENOTSUPP;
tr->func.ftrace_managed=true;
}

if(bpf_trampoline_module_get(tr))
return-ENOENT;

if(tr->func.ftrace_managed){
ftrace_set_filter_ip(tr->fops,(unsignedlong)ip,0,1);
ret=register_ftrace_direct_multi(tr->fops,(long)new_addr);
}else{
ret=bpf_arch_text_poke(ip,BPF_MOD_CALL,NULL,new_addr);
}

if(ret)
bpf_trampoline_module_put(tr);
returnret;
}

确实,如果tr->fops为false,该函数将返回错误-ENOTSUPP。

让我们找出tr->fops是在哪里初始化的。

如果我们是正确的,那么创建trampoline的地方应该在bpf_trampoline_lookup函数内部。

staticstructbpf_trampoline*bpf_trampoline_lookup(u64key)
{
structbpf_trampoline*tr;
structhlist_head*head;
inti;

mutex_lock(&trampoline_mutex);
head=&trampoline_table[hash_64(key,TRAMPOLINE_HASH_BITS)];
hlist_for_each_entry(tr,head,hlist){
if(tr->key==key){
refcount_inc(&tr->refcnt);
gotoout;
}
}
tr=kzalloc(sizeof(*tr),GFP_KERNEL);
if(!tr)
gotoout;
#ifdefCONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
tr->fops=kzalloc(sizeof(structftrace_ops),GFP_KERNEL);
if(!tr->fops){
kfree(tr);
tr=NULL;
gotoout;
}
tr->fops->private=tr;
tr->fops->ops_func=bpf_tramp_ftrace_ops_func;
#endif

tr->key=key;
INIT_HLIST_NODE(&tr->hlist);
hlist_add_head(&tr->hlist,head);
refcount_set(&tr->refcnt,1);
mutex_init(&tr->mutex);
for(i=0;i< BPF_TRAMP_MAX; i++)
      INIT_HLIST_HEAD(&tr->progs_hlist[i]);
out:
mutex_unlock(&trampoline_mutex);
returntr;
}

在分配之后,只有在出现CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS标志时,才会填充trampoline的fops字段。这个标志依赖于HAVE_CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS标志,而这个标志在aarch64上不存在。

结论

当前情况下,由于缺少_ftrace直接调用_功能,无法在aarch64上使用BPF LSM。幸运的是,当前的mainline分支已经合并了一个补丁[8],该补丁将在aarch64上启用LSMs(以及其他功能)。

预计这些变化将会在下一个6.4版的Linux内核中发布。

审核编辑:汤梓红

声明:本文内容及配图由入驻作者撰写或者入驻合作网站授权转载。文章观点仅代表作者本人,不代表电子发烧友网立场。文章及其配图仅供工程师学习之用,如有内容侵权或者其他违规问题,请联系本站处理。 举报投诉
  • 内核
    +关注

    关注

    3

    文章

    1309

    浏览量

    39846
  • cpu
    cpu
    +关注

    关注

    68

    文章

    10442

    浏览量

    206560
  • Linux
    +关注

    关注

    87

    文章

    10990

    浏览量

    206735
  • 程序
    +关注

    关注

    114

    文章

    3631

    浏览量

    79541

原文标题:探索aarch64架构上使用ftrace的BPF LSM

文章出处:【微信号:LinuxDev,微信公众号:Linux阅码场】欢迎添加关注!文章转载请注明出处。

收藏 人收藏

    评论

    相关推荐

    ARM-v8架构分析

    ARM-v8是在32位ARM架构上进行开发的,将被首先用于对扩展虚拟地址和64位数据处理技术有更高要求的产品领域,如企业应用、高档消费电子产品。ARMv8架构包含两个执行状态:AArch64
    发表于 12-07 10:08

    ARMv8架构资料分享

    ,大大提升了处理器的性能。从目前的的了解来看,基本 ARMv8 与上代架构的差别是非常大的。除了 A64 指令集之外,还有许多地方都有较大改动,下面列出几个目前比较关注的点:  · 执行状态与异常级别
    发表于 03-21 14:50

    在ARMv8中aarch64aarch32是怎样进行切换的

    32条件下进行编程。在EL3,设置EL2的架构aarch32,设置好返回地址,通过ERET指令,切换到EL2。对于A64代码,使用aarch64编译工具链进行编译。对于A32代码,使
    发表于 04-01 15:09

    谈一谈在AArch64架构下内核与用户地址的隔离机制

    1、在 AArch64 架构下内核与用户地址的隔离机制一般来说在操作系统之上会有多个应用程序或者任务同时运行。每一个任务都有自己独立的页表,在进程上下文切换的过程中,也会进行页表的切换。然而,大部分
    发表于 04-13 17:27

    ARMv8架构概述

    Hypervisor,EL3用于Secure/Non-Secure的切换。Memory Management在AArch64,TBBR0和TBBR1分别用于指定user和kernel的页表。最大支持48bit
    发表于 05-13 10:31

    在armv8架构中Arch32切换到Arch64是如何运作的

    各位大神,armv8架构中,如果Arch32要去切换到Arch64,是如何运作的?状态会清空吗?
    发表于 06-06 16:13

    如何在x86环境下基于Qemu和Docker快速搭建AARCH64开发环境

    ,官方 release 的 gcc 还不支持 SVE intrinsics ,但 github 的 gcc-mirror 仓包含了一个 aarch64/sve-acle-branch 的分支,通过
    发表于 07-11 15:18

    在ARM64架构下为啥没有OpenJDK8的镜像

    为什么需要ARM64架构的OpenJDK8的Docker镜像对现有的Java应用,之前一直运行在x86处理器环境下,编译和运行都是JDK8,如今在树莓派的Docker环境运行(也可能是其他ARM环境
    发表于 07-12 15:57

    为何Arm 64位指令集架构AArch64)是移动设备中不可或缺的

    等,都只能在AArch64架构实现在不断涌现和演化的移动应用场景(如混合现实,人工智能,机器学习,和网络应用)中具备更好的性能表现单一运行时意味着更少的测试和维护工作量仅支持AArch64
    发表于 09-13 15:03

    AArch64异常模型指南

    AArch64异常模型指南介绍了Armv8-A中的异常和特权模型Armv9-A。它涵盖了Arm体系结构中不同类型的异常,以及处理器与异常的关系。 这些内容面向底层代码的开发人员,例如引导代码或内核
    发表于 08-02 06:03

    AArch64自托管调试指南

    集成在Arm核心中的调试逻辑提供了观察和控制CPU和系统环境,同时在深度嵌入式处理器执行软件。手臂调试体系结构规范允许将调试逻辑合并到Arm体系结构中。 本指南介绍了调试,并介绍了AArch64
    发表于 08-02 10:05

    AArch64平台上性能下降的例子

    编者按:目前许多公司同时使用 x86 和 AArch64 2 种主流的服务器。这两种环境的算力相当,内存相同的情况下:相同版本的 JVM 和 Java 应用,相同的 JVM 参数,应用性
    的头像 发表于 09-09 11:11 2035次阅读

    AArch64寄存器介绍

    作为 RISC 架构AArch64 提供了大量的通用寄存器。除通用寄存器之外,本节还会介绍特殊寄存器、系统控制寄存器、处理器状态、函数调用标准。
    的头像 发表于 08-24 09:57 4915次阅读

    如何使用预装程序创建并分发AArch64容器

    本文我们将探讨如何使用预装程序创建并分发 AArch64 容器。
    的头像 发表于 09-30 10:57 810次阅读

    最新的Linux aarch64 LSA驱动程序

    电子发烧友网站提供《最新的Linux aarch64 LSA驱动程序.zip》资料免费下载
    发表于 08-23 15:46 2次下载
    最新的Linux <b class='flag-5'>aarch64</b> LSA驱动程序