0
  • 聊天消息
  • 系统消息
  • 评论与回复
登录后你可以
  • 下载海量资料
  • 学习在线课程
  • 观看技术视频
  • 写文章/发帖/加入社区
会员中心
创作中心

完善资料让更多小伙伴认识你,还能领取20积分哦,立即完善>

3天内不再提示

简述开放域长格式问答系统的进步与挑战

Tensorflowers 来源:Google Research 研究员 Aurko 作者:Google Research 研究员 2021-05-31 10:02 次阅读
加入交流群
微信小助手二维码

扫码添加小助手

加入工程师交流群

发布人:Google Research 研究员 Aurko Roy

开放域长格式问答 (LFQA) 是自然语言处理 (NLP) 的一项基础挑战,涉及检索与给定问题相关的文档,并使用这些文档来生成一段详尽答案。在事实型开放域问答 (QA) 中,简单的短语或实体便足以回答问题。虽然我们近期在这一方面取得了显著进展,但在长格式问答领域中却做得远远不够。尽管如此,LFQA 仍是一项非常重要的任务,特别是它能提供一个测试平台来衡量生成文本模型的真实性。但是,当前的基准和评估指标真的能在 LFQA 方面取得进展吗?

在“在长格式问答领域取得进展的障碍”(Hurdles to Progress in Long-form Question Answering)(将在 NAACL 2021 会议上发表)中,我们介绍了一种新的开放域长格式问答系统,它利用了 NLP 的两项最新进展:

1.最先进的稀疏注意力模型(例如 Routing Transformer(RT)),能够将基于注意力的模型扩展至长序列;

2.基于检索的模型(例如 REALM),有助于检索与给定查询相关的维基百科文章。

Routing Transformer

https://www.mitpressjournals.org/doi/full/10.1162/tacl_a_00353

为获得更多的事实依据,对于检索到的与给定问题相关的一些维基百科文章,我们的系统会在答案生成之前将从中获得的信息结合起来 ELI5 是唯一一个可用于长格式问答的大规模公开数据集,我们的系统在该数据集上取得了突破性进展。

ELI5

https://ai.facebook.com/blog/longform-qa/

不过,虽然这个系统在公开排行榜上名列前茅,但我们发现 ELI5 数据集及其相关评估指标的一些趋势令人担忧。特别要强调的是,我们发现 1) 几乎没有证据表明模型实际使用了它们所要求的检索;2) 平凡基线(例如输入复制)击败了现代系统,如 RAG/BART+DPR;以及 3) 数据集中存在大量训练/验证重叠。我们的论文针对每一个问题提出了缓解策略。

输入复制

https://eval.ai/web/challenges/challenge-page/689/leaderboard/1908#leaderboardrank-6

文本生成

NLP 模型的核心要件是 Transformer 架构,其序列中的每个 Token 都会关注序列中的其他所有 Toekn,从而形成一个随序列长度呈二次增长的模型。RT 模型引入了一种基于内容的动态稀疏注意力机制,将 Transformer 模型中的注意力复杂度从 n2 降到了 n1.5( 其中 n 是序列长度),使其能够扩展到长序列。这使得每个单词都可以关注整个文本中 任何地方的其他相关单词, 而不像 Transformer XL 等类似方法,一个单词只能关注其附近的单词。

RT 发挥作用的关键在于每个 Token 对其他 Token 的关注通常是冗余的,并且可以通过结合局部和全局注意力进行估算。局部注意力允许每个 Token 在模型的几个层上建立一个局部表征,其中每个 Token 关注一个局部邻域,从而达到局部的一致性和流畅性。作为对局部注意力的补充,RT 模型还使用了小批量 k-均值集群, 使每个 Token 只关注一组最相关的 Token 。

我们以语言建模为目标,使用 ProjectGutenberg(PG-19) 数据集预先训练了一个 RT 模型,即在给定前面所有单词的情况下,让该模型学会预测下一个单词,从而能够生成流利的段落长文本。

ProjectGutenberg(PG-19)

https://deepmind.com/blog/article/A_new_model_and_dataset_for_long-range_memory

信息检索

为了证明 RT 模型在 LFQA 任务中的有效性,我们将其与 REALM 中检索到的内容结合使用。REALM 模型(Guu 等人于 2020 年发布)是基于检索的模型,使用最大内积搜索来检索与特定查询或问题相关的维基百科文章。我们对该模型进行了微调,以便根据自然问题数据集作出事实型问答。REALM 利用 BERT 模型学习问题的良好表征,并使用 SCANN 检索与问题表征具有高度主题相似性的维基百科文章。接着进行端到端训练,以最大程度地提高 QA 任务的对数似然值。

通过使用对比损失,我们进一步提高了 REALM 检索的质量。其背后的想法是让问题表征更靠近其基本事实答案,并与其他小批量答案有所不同。这样可以确保,当系统使用此问题表征检索相关项目时,会返回与基本事实答案“类似”的文章。我们称这种检索器为对比型-REALM 或 c-REALM。

对比损失

https://towardsdatascience.com/contrastive-loss-explaned-159f2d4a87ec

评估

我们使用 ELI5 数据集对该模型进行了长格式问答测试。ELI5 数据集是 KILT 基准的一部分, 也是唯一 公开可用的大规模 LFQA 数据集。KILT 基准使用准确率 (R-Prec) 衡量文本检索质量,并使用 ROUGE-L 衡量文本生成质量。将这两项得分相加即可得出一个 KILT R-L 得分,该得分决定了模型在排行榜上的排名。在 KILT 的 ELI5 数据集上,我们对预先训练好的 RT 模型和 c-REALM 检索进行了微调。

我们提交的内容在 ELI5 长格式问答 KILT 排行榜上名列前茅,综合 KILT R-L 得分为 2.36。对比先前 BART + DPR 排行榜的排名(KILT R-L 得分为 1.9)有了提升,同时具有与排行榜上其他模型相似的参数数量。在文本生成质量方面,我们发现 Rouge-L 与 T5、BART + DPR 和 RAG 相比,得分分别提高了 4.11、5.78 和 9.14。

排行榜

https://eval.ai/web/challenges/challenge-page/689/leaderboard/1908

RT 模型生成的示例

Q1

Why are almost all boats white?

Boats are generally painted white for obvious reasons, as they are inexpensive and easy to clean. Ships are also often painted to provide a visible visual contrast to the sea. Boats also often have high visibility ports to allow for easy international trade.

In addition, many boats have an air-breathing function that makes them easier to spot and stay afloat, which is a good thing if you're a pirate or trying to get into some country's waters. Lastly, many ships are built in the last few decades, so they are not as exotic looking as they were thirty years ago.

And lastly, many ships are made out of materials that are easy to work with, and so many of them are painted white as a practical matter. There are many, many more reasons for why ships are white, but those are the ones I know of. I'm sure there are many other reasons, but those are the ones I know of.

Edit: I forgot to mention that the other answers about visibility and easy to see ports are also true. Ships are painted to appear to be larger, and more expensive. That makes them appear more impressive, and so they are more likely to be bought by people who want to show off their wealth and status.

Q2

What causes the trail behind jets at high altitude?

The same thing that causes the clouds you see at the start of a storm to form in the first place. The air pressure changes as the jet moves through the air, causing it to cool down and condense into water droplets. These droplets are in fact a lot bigger than the water droplets in the clouds, so when the plane passes through them, they can cause the air pressure to change, causing the water droplets to condense into clouds.

This is what you see when you look at a cloud formation from the ground. In addition to this, when the plane passes through the clouds, it also moves through the air, causing it to change its temperature, causing the water droplets to condense and form clouds.

These clouds are often quite large, and can block out the sun. This is what you see when you look at a contrail. Finally, the water vapor that forms at high altitudes is cooled by the air above, and as it cools, it condenses into clouds. This is what you see when you look at a cumulonimbus cloud. These clouds are even larger than the water droplets in the clouds, and they block out the sun.

Q3

How do shampoo and conditioner work?

The key ingredients in shampoo and conditioner are usually surfactants, they tend to attract oil from your skin. Shampoo will usually contain an anti-dandruff agent (dandruff is caused by fungi in your skin which cause dandruff). They also contain some moisturizing agents (moisturizers are things that stay in your skin longer and protect your scalp from cleaning),

And a lot of anti-dandruff (dandruff is a side effect of high doses of both surfactants and moisturizers) Conditioner will contain moisturizing agents, like a skin moisturizer. They tend to cause oils to be separated from your hair (e.g. wash your hair with conditioner instead of shampoo). They both have a foaming agent that makes them foam up (usually Hydrogen Peroxide). They also have a foaming agent that stops dirt from attaching to your hair. Some conditioner will also suppress the action of the shampoo (to help prevent dirt from attaching to your hair).

Q4

When glancing at a clock, why does the first second after glancing at it sometimes feel longer than the rest?

This is known as chronostasis. Your brain gets used to seeing the second hand move after the first, so it ignores the first one. If you were to look at the clock a lot, you'd see the second hand move a lot more than the first, and you'd be able to tell the difference between the second and the first.

The second hand will appear to move much slower than the first hand because your brain is used to seeing the second hand move a lot more than the first. If you were to go on a long trip, you'd get used to the second hand moving a lot more than the first, and you'd have no problem telling the difference between the second and the first hand.

It's kind of like how you don't notice your nose when you're driving, but if you look at it for a while, you start to notice it. Your brain is just getting used to seeing the nose move a lot more than the other, and it can't tell the difference between the two. It's kind of like that with your eyes.

Q5

Why do Britain and other English empire countries still bow to monarchs? What real purpose does the queen serve?

Depends what you define as "purpose". There is no specific purpose for the British monarch - rather, they are figureheads that serve a role. The role is to have a high degree of influence over a country, though the rules of the monarchy can somewhat vary from country to country, and the ceremony of "kissing the royal corpse" does not always involve kissing the monarch. Whether that's the only reason for the custom, or if it was the only reason,

I can't say, but that's the reasoning. When the U.K. was in full power, the monarch was a puppet of Parliament, and the powers of the monarch were transferred to the Prime Minister and thus to the Prime Minister's deputy, who then became the Prime Minister. Since then, Parliament has been able to vote on legislation that goes through the monarch,

although they may still act as the monarch's representative in negotiating treaties, which can have very very deep consequences. The Queen's role, as a representative of Britain, doesn't necessarily involve her formal approval of any of the laws or legislation that goes through Parliament, though.

Q6

What exactly is fire, in detail? How can light and heat come from something we can't really touch?

Fire is the chemical reaction of fuel (oxygen) with (what we call) impurities (ash, soot, oil etc). These impurities are created due to a number of things. If you were to study the chemical composition of a log (or a candle) and analyse it you would discover the chemical formula (A, C, and G) which is a simplified version of what actually happens. There is always fuel, and there is always the ash/soot. Wood is made up of mostly carbon and hydrogen and these are the fuels that create the fire. Hydrogen and oxygen are reactive, and they form water (H2O) and CO2 (CO2). If we split a log into smaller pieces, we start to see the process of combustion as it occurs. The carbon is burned, and with the explosion of CO, H2O and H2CO2, we get water and carbon dioxide (mostly). If we smash a log together, we start to see what happens when we also hit another log with another log...imagine smashing two sand castles together. We see how the sand castles and brick castles form and collapse at different rates...then it starts to burn. The smoke, ash and flames are the actual combustion of fuel.

Q7

What goes on in those tall tower buildings owned by major banks?

The actual buildings are not there for profit but for show. There are many real world historical buildings that have little to no effect other than being a symbol of an economic entity or symbol of an era or location, and nothing more. For example look at Sears, Sears didn't care what went on inside, it was all about the \_appearance\_ of its location, the prestige of the location, the facilities and so on. It didn't care about how long it took it to operate, it was about how much people would pay to go see it. Sears was a landmark as a cultural movement and other big companies followed suit, so if you want to see a building you've never seen before, you have to go see Sears, just like you have to see a Toyota Camry for Toyota Camry. They used to be all about building new factories, some of them if I recall, but now that they're bigger, that means that more factory jobs are coming to them. You've probably seen them in stores as stores where people buy and sell stuff, so there aren't that many places for them to come from. Instead, it's just for show, a symbol of rich people.

在 LFQA 领域取得进展的障碍

不过,虽然此处介绍的 RT 系统在公开排行榜中名列前茅,但对模型和 ELI5 数据库的详细分析仍揭示了一些令人担忧的趋势。

Train/Valid Overlap

Many held-out questions are paraphrased in the training set. Best answer to similar train questions gets 27.4 ROUGE-L.

Lack of Grounding

Conditioning answer generation on random documents instead of relevant ones does not measurably impact its factual correctness. Longer outputs get higher ROUGE-L.

我们发现,几乎没有任何证据表明模型会将其文本生成实际定位到检索文档中。与 Wikipedia 中的随机检索搭配使用的微调 RT 模型(例如,随机检索 + RT),几乎与 c-REALM + RT 模型(24.2 与 24.4 ROUGE-L)表现得一样好。在训练、验证和测试 ELI5 数据集时,我们还发现了很多的重叠(几个问题相互解释),因此可能不再需要检索。KILT 基准会单独衡量检索和生成的质量,但不确定文本生成是否会在实际情况中使用检索。

与 RAG 和 BART + DPR 相比,平凡基线会获得更高的 Rouge-L 分数

此外,在使用 Rouge-L 指标和平凡无意义基线(如随机训练集答案和输入复制)来评估文本生成质量的过程中,我们发现了一些问题,并导致 Rouge-L 分数相对较高(甚至超过了 BART + DPR 和 RAG)。

结论

我们为基于 Routing Transformers 和 REALM 的长格式问答推出了一个系统,该系统在关于 ELI5 的 KILT 排行榜中名列前茅。但是,详细的分析揭示了存在的一些问题,即无法使用基准来显示有意义的建模进展。我们希望社区共同合作,一起解决这些问题,以便研究人员向正确的高峰攀登,在这个充满挑战但十分重要的任务中取得有意义的进展。

致谢

Routing Transformer 是 Aurko Roy、Mohammad Saffar、Ashish Vaswani 和 David Grangier 等人进行团队协作的结果。有关开放域长格式问答的后续工作是由 Kalpesh Krishna、Aurko Roy 和 Mohit Iyyer 协作完成的。我们要感谢 Vidhisha Balachandran、Niki Parmar 和 Ashish Vaswani 提供的多条实用意见,感谢 REALM 团队 (Kenton Lee、Kelvin Guu、Ming-Wei Chang 和 Zora Tung) 在代码库方面提供的帮助以及多条实用意见,这些意见帮助我们进一步完善了实验。

我们非常感谢 Tu Vu 针对 QQP 分类器提供的帮助,这些分类器用于在 ELI5 训练集和测试集中检测解释。感谢 Jules Gagnon-Marchand 和 Sewon Min 对检查 ROUGE-L 边界提供的有用实验建议。最后,感谢 Shufan Wang、Andrew Drozdov、Nader Akoury 以及 UMass NLP 小组的其他成员针对项目的不同阶段提出的实用意见和建议。

编辑:jq

声明:本文内容及配图由入驻作者撰写或者入驻合作网站授权转载。文章观点仅代表作者本人,不代表电子发烧友网立场。文章及其配图仅供工程师学习之用,如有内容侵权或者其他违规问题,请联系本站处理。 举报投诉
  • 数据集
    +关注

    关注

    4

    文章

    1230

    浏览量

    26047
  • nlp
    nlp
    +关注

    关注

    1

    文章

    491

    浏览量

    23192

原文标题:开放域长格式问答系统的进步与挑战

文章出处:【微信号:tensorflowers,微信公众号:Tensorflowers】欢迎添加关注!文章转载请注明出处。

收藏 人收藏
加入交流群
微信小助手二维码

扫码添加小助手

加入工程师交流群

    评论

    相关推荐
    热点推荐

    时储能元器件面临三大挑战 行业技术方案逐步落地

    电子发烧友网报道(文/黄山明)当前,在“双碳”目标引领下,新能源发电占比持续提升,但风能、光伏的间歇性与波动性对电网稳定运行构成严峻挑战时储能(通常指放电时长4小时以上的储能系统)作为解决新能源
    的头像 发表于 12-07 08:35 8143次阅读

    解读

    、RAM、ROM、I/O、中断系统、定时器/计数器等功能(有的还包括显示驱动电路、脉宽调制电路、模拟多路转换器、A/D转换器等电路)集成到一块硅片上构成的一个小而完善的微型计算机系统,在工控领域
    发表于 12-05 06:45

    HarmonyOSAI编程智能问答

    CodeGenie基于生成式搜索能力,通过查询生成、内容优选服务高效理解用户意图,问答交互式地获取编码相关知识。 对话示例 在对话区域输入需要查询的问题,开始问答。示例如下: ArkTS如何实现
    发表于 09-03 16:17

    2025电赛题目问答(已更新)

    2025电赛题目问答(已更新)
    的头像 发表于 07-30 12:59 4601次阅读
    2025电赛题目<b class='flag-5'>问答</b>(已更新)

    BMS HIL测试技术演进:高压架构、多融合与储能系统应用解析

    BMS通信适配、国标充电协议迭代、多融合协同测试等关键技术挑战,并拓展至储能BMS主动均衡与多系统仿真需求,助力行业应对能源变革。
    的头像 发表于 05-19 14:56 1542次阅读
    BMS HIL测试技术演进:高压架构、多<b class='flag-5'>域</b>融合与储能<b class='flag-5'>系统</b>应用解析

    电科技发布2024年度ESG报告:创新驱动绿色发展,共建开放协同生态

    2025年4月20日,电科技(600584.SH)正式发布《2024年度环境、社会及治理(ESG)报告》,系统展示公司在ESG战略引领下的全面布局与突破实践。 作为全球领先的集成电路成品制造企业
    的头像 发表于 04-21 14:11 911次阅读

    工业控制系统中的信号传输格式解析

    在工业控制系统中,各类传感器、执行器和控制器之间的信号传输是实现自动化生产的核心环节。这些信号的格式与传输方式直接影响着系统的稳定性、精度和抗干扰能力。本文将从工业信号的分类、常见格式
    的头像 发表于 03-19 17:29 928次阅读

    中科曙光助力中航结算公司构建私文档智能问答系统

    近日,中航结算公司(中国航空结算有限责任公司)依托曙光AI解决方案提供的强劲算力,协同DeepAI深算智能引擎快速适配、调优、上线DeepSeek大模型业务平台。基于DeepSeek底座,中航结算公司构建了以RAG增强检索为核心的私文档智能问答
    的头像 发表于 03-19 15:40 786次阅读

    基于华为云 Flexus 云服务器 X 搭建部署——AI 知识库问答系统(使用 1panel 面板安装)

    ���对于企业来讲为什么需要华为云 Flexus X 来搭建自己的知识库问答系统??? 【重塑知识边界,华为云 Flexus 云服务器 X 引领开源问答新纪元!】 ���解锁知识新动力,华为云
    的头像 发表于 01-17 09:45 3652次阅读
    基于华为云 Flexus 云服务器 X 搭建部署——AI 知识库<b class='flag-5'>问答</b><b class='flag-5'>系统</b>(使用 1panel 面板安装)

    定制本地的ChatFile的AI问答系统

    会遇到这样的困扰?今天,我们将利用下面两个技术为自己定制一个本地的 ChatFile 的 AI 问答系统: 1. Google 最新开源的生成式 AI 模型: Gemma 2 2. 检索增强生成技术
    的头像 发表于 01-03 09:26 966次阅读
    定制本地的ChatFile的AI<b class='flag-5'>问答</b><b class='flag-5'>系统</b>

    中海达公路边坡安全监测服务热门问答

    中海达公路边坡安全监测服务热门问答
    的头像 发表于 12-30 16:20 1069次阅读

    仪器知识问答小课堂

    关于仪器设备实验中的各种知识问题的问答
    的头像 发表于 12-27 16:21 754次阅读
    仪器知识<b class='flag-5'>问答</b>小课堂

    混合示波器的原理和应用

    ,从而进行深入的测量和分析。 二、应用 捕获和分析复杂信号:混合示波器能够同时捕获时间相关的模拟、数字和射频信号,从而获得完整的系统级观测。这使得工程师能够快速解决复杂的设计问题,如定位和分析电路中
    发表于 12-27 15:54

    一文了解底盘控之制动功能

    1底盘控基础1.1底盘控的概念1.2线控底盘技术1)线控制动系统2)线控转向系统2制动系统功能2.1制动
    的头像 发表于 12-13 16:46 2502次阅读
    一文了解底盘<b class='flag-5'>域</b>控之制动功能

    中兴通讯与上和美签署战略合作协议

    深入合作,形成全面战略合作伙伴关系。 中兴通讯副总裁、产业数字化方案部总经理陆平,上和美集团联合创始人、副董事何平,上和美集团董事、总裁李元媛等双方领导出席此次签约仪式。
    的头像 发表于 12-13 15:12 952次阅读