电子发烧友网>电子资料下载>电子资料>TinyML：使用ChatGPT和合成数据检测婴儿哭声

TinyML：使用ChatGPT和合成数据检测婴儿哭声

2388746 2023-07-13 | zip | 0.00 MB | 次下载 | 2积分

资料介绍

描述

TinyML 是机器学习的一个领域，专注于将人工智能的力量带给低功耗设备。该技术对于需要实时处理的应用程序特别有用。在机器学习领域，目前在定位和收集数据集方面存在挑战。然而，使用合成数据可以以一种既具有成本效益又具有适应性的方式训练 ML 模型，从而消除了对大量真实世界数据的需求。

在此项目中，我将向您展示如何通过使用Edge Impulse平台训练模型来创建婴儿哭声检测系统，并将其部署到您的边缘设备（例如Arduino Nicla Voice）。通过使用合成数据训练机器学习模型，我们可以区分婴儿哭声的发生或背景噪音的存在。

这是即将发生的事情的先睹为快：

将数据集收集到Edge Impulse，以使用AduioLDM：Text-to-Audio和ChatGPT技术训练模型。
使用Edge Impulse训练模型。
导出模型供netron.app分析
将您的模型部署到Arduino Nicla Voice。
使用 Arduino IDE 进行实时数据评估和测试。

Baby Cry 系统部署管道

该图包含部署机器学习模型以检测两种情况所涉及的几个组件和步骤：婴儿哭声和背景噪音，使用 ChatGPT 生成文本提示。

以下是管道图中组件及其交互的逐步分解：

ChatGPT ：ChatGPT 是管道的起点。它为两种情况生成文本提示：婴儿哭声和背景噪音。
文本到音频转换：生成文本提示后，我们将它们发送到将文本转换为音频的模块。该模块创建与两种情况的提示相对应的音频文件。
模型训练：生成的音频文件上传到Edge Impulse SaaS平台。这是一个基于云的平台，提供用于为微控制器等边缘设备开发、训练和部署机器学习模型的工具。
模型部署：训练完成后，将机器学习模型部署到Arduino Nicla Voice开发板上。这些开发板专为构建可处理音频和执行机器学习任务的智能语音设备而设计。
推论：部署后，机器学习模型可以处理来自麦克风的实时音频输入。该模型可以检测输入音频是否代表婴儿哭声或背景噪音。

潜在地，机器学习模型的输出可用于触发动作，例如打开灯或向智能手机发送通知。

Arduino Nicla语音开发板概述

Arduino Nicla Voice是与Syntiant合作创建的开发板。通过使用 Syntiant 的超低功耗深度学习处理器，该板能够在边缘提供永远在线的语音、手势和动作识别。

1 / 2 • Arduino Nicla 语音开发板

凭借其紧凑的尺寸，Nicla Voice 可以集成到可穿戴设备中，允许 AI 集成，同时需要最少的能量消耗。通过使用 Nicla Voice，您可以开发定制的语音识别模型并将它们与开发板一起使用，从而使 Nicla Voice 能够通过分析您的声音来识别特定的单词或短语。

让我们开始吧！

使用 ChatGPT 生成文本提示

使用ChatGPT生成不同的提示可以简化为我的机器学习模型编写提示的过程，该模型由两类组成：婴儿哭声和背景噪音。通过使用ChatGPT生成不同的提示，我可以节省时间和精力，否则这些时间和精力将花费在集思广益和编写提示上。这种方法还可以产生范围更广的多样化提示，从而可以提高机器学习模型的准确性和有效性。

这是使用 ChatGPT 生成的 Baby crying 场景的我的文本提示。

prompts = [
"Baby Crying",
"Baby crying in bedroom",
"Baby crying loudly",
"Infant crying",
"Newborn crying",
"Crying baby",
"Upset baby",
"Distressed baby",
"Fussy baby",
"Weeping infant",
"Sobbing baby",
"Whimpering baby",
"Wailing baby",
"Bawling baby",
"Crying newborn",
"Tearful baby",
"Bawling infant",
"Mourning baby",
"Bellowing baby",
"Screaming baby",
"Howling baby",
"Squalling baby",
"Yowling baby",
"Crying baby in nursery",
"Wailing infant in bedroom",
"Whimpering baby in crib",
"Sobbing baby in bassinet",
"Crying baby in the dark",
"Upset baby in bed",
"Distressed baby in room",
"Fussy baby in cradle",
"Weeping infant in playpen",
"Sobbing baby in the corner",
"Whimpering baby in the closet",
"Wailing baby in the crib",
"Bawling baby in the nursery",
"Crying newborn in the bedroom",
"Tearful baby in the playroom",
"Bawling infant in the den",
"Mourning baby in the living room",
"Bellowing baby in the kitchen",
"Screaming baby in the bathroom",
"Howling baby in the hallway",
"Squalling baby in the dining room",
"Yowling baby in the family room",
"Crying baby in the middle of the night",
"Wailing infant in the early morning",
"Whimpering baby during naptime",
"Sobbing baby during mealtime",
"Crying baby during bathtime",
"Upset baby during diaper change",
"Distressed baby during playtime",
"Fussy baby during bedtime",
"Weeping infant during storytime",
"Sobbing baby during teething",
"Whimpering baby during vaccination",
"Wailing baby during check-up",
"Bawling baby during colic",
"Crying newborn during feeding",
"Tearful baby during immunization",
"Bawling infant during growth spurt",
"Mourning baby during illness",
"Bellowing baby during teething",
"Screaming baby during reflux",
"Howling baby during ear infection",
"Squalling baby during constipation",
"Yowling baby during sleep regression",
"Crying baby during travel",
"Wailing infant during car ride",
"Whimpering baby during flight",
"Sobbing baby during road trip",
"Crying baby during vacation",
"Upset baby during change of environment",
"Distressed baby during new experiences",
"Fussy baby during unfamiliar situations",
"Weeping infant during loud noises",
"Sobbing baby during separation anxiety",
"Whimpering baby during stranger danger",
"Wailing baby during socialization",
"Bawling baby during weaning",
"Crying newborn during swaddling",
"Tearful baby during bath",
"Bawling infant during burping",
"Mourning baby during pacifier weaning",
"Bellowing baby during crawling",
"Screaming baby during walking",
]

此外，使用像 ChatGPT 这样的语言模型可以帮助我提出我可能想不到的有创意和创新的提示。

这些是背景噪音提示。

prompts = [
"A hammer is hitting a wooden surface",
"A noise of nature",
"The sound of waves crashing on the shore",
"A thunderstorm in the distance",
"Traffic noise on a busy street",
"The hum of an air conditioning unit",
"Birds chirping in the morning",
"The sound of a train passing by",
"A group of people talking in a crowded room",
"The sound of raindrops hitting a tin roof",
"The buzz of a fluorescent light",
"The sound of footsteps on a wooden floor",
"The crackling of a campfire",
"The whirring of a ceiling fan",
"The sound of a basketball bouncing on concrete",
"A dog barking in the distance",
"The rustling of leaves in the wind",
"The buzzing of a bee or other insect",
"The sound of a church bell ringing",
"The roar of a waterfall",
"The tapping of a keyboard",
"The hiss of a steam engine",
"The clanging of pots and pans in a kitchen",
"The sound of a roaring fire in a fireplace",
"The hum of an electric generator",
"The sound of a lawnmower in the distance",
"The whistling of wind through a window crack",
"The clatter of dishes in a busy restaurant",
"The sound of a helicopter flying overhead",
"The tapping of rain on a metal roof",
"The gentle rustling of a book's pages turning",
"The creaking of a wooden chair",
"The sound of a pencil scratching on paper",
"The chirping of crickets at night",
"The crackling of a vinyl record playing",
"The hissing of an old radio",
"The sound of a pencil sharpener grinding",
"The gurgling of a coffee maker",
"The sound of a ticking clock",
"The roar of an airplane engine",
"The bubbling of a fish tank filter",
"The clanking of dishes being washed in a sink",
"The sound of a typewriter clacking",
"The roar of a lion in the wild",
"The whirring of a drone flying overhead",
"The beeping of a car horn in traffic",
"The sound of a door creaking open",
"The buzzing of a mosquito in the room",
"The sound of a blender mixing ingredients",
"The rumbling of a thunderstorm overhead",
"The tapping of a woodpecker on a tree trunk",
"The rustling of paper being shuffled",
"The sound of a busy office with people talking on the phone and typing on their keyboards",
"The sound of a construction site with heavy machinery and drilling",
"The sound of a dishwasher running in the kitchen",
"The chirping of birds in a forest",
"The sound of a police siren in the distance",
"The whistling of wind through tall grass",
"The sound of a cash register in a busy store",
"The buzzing of a fly or bee flying around",
"The sound of a bicycle bell ringing",
"The crackling of a fire in a fireplace"
]

这就是数据集生成的全部内容！

安装 AudioLDM:Text-to-Audio 用于数据集生成

要从文本生成音频文件，下一步涉及使用名为AudioLDM的文本到音频生成工具，该工具由萨里大学和英国伦敦帝国理工学院的研究人员开发。该工具利用潜在扩散模型从文本生成高质量音频。要使用 AudioLDM，您需要一台配备强大 CPU 的独立计算机。虽然建议使用专用 GPU，但这不是强制性的。要测试 AudioLDM 的功能，您可以通过Hugging Face在线试用。

我们将配置我们的 Python 环境。为了管理虚拟环境，我们将使用virtualenv ，它可以像下面这样安装：

sudo pip3 install virtualenv virtualenvwrapper

为了让 virtualenv 工作，我们需要将以下行添加到~/.bashrc文件中：

nano ~/.bashrc

并添加以下行

# virtualenv and virtualenvwrapper
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh

要激活更改，必须执行以下命令：

source ~/.bashrc

现在我们可以使用 mkvirtualenv 命令创建一个虚拟环境。

mkvirtualenv audioldm -p python

使用 pip 安装 PyTorch。

pip3 install torch==2.0.0

然后安装audioldm包。

pip3 install audioldm

然后运行以下命令以使用文本提示生成音频文件，该文件是使用 ChatGPT 生成的，可以在下面的 github 代码部分中找到。

python3 generate.py

您应该得到以下输出：

genereated: A hammer is hitting a wooden surface
genereated: A noise of nature
genereated: The sound of waves crashing on the shore
genereated: A thunderstorm in the distance
genereated: Traffic noise on a busy street
genereated: The hum of an air conditioning unit
genereated: Birds chirping in the morning
genereated: The sound of a train passing by

一旦收集到 wav 音频样本，就可以将它们输入神经网络以启动自动检测婴儿是否在哭泣或是否存在背景噪音的训练过程。

使用 Edge Impulse 平台进行模型训练

Edge Impulse 是一种基于 Web 的工具，可帮助我们快速轻松地创建可用于各种项目的 AI 模型。我们可以通过几个简单的步骤创建机器学习模型，用户只需一个网络浏览器就可以构建自定义图像分类器。

转到Arduino 云平台，在登录处输入您的凭据（或创建一个帐户），然后开始一个新项目。

下载Google Speech Commands Dataset以从中获取“背景噪声类”数据。可以按如下方式下载数据集。

wget http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz

从Google Speech Commands Dataset上传合成 wav 音频文件和“背景噪音类” 。就我而言，我上传了大约 500 个 wav 文件。如果需要，您还可以通过标记文件并在数据采集中上传并重新训练模型来添加更多文件。

一旦你设置了所有的类并且对你的数据集感到满意，就可以训练模型了。在左侧导航菜单中导航至 Create Impulse。

选择Add a processing block并添加Audio (Syntiant) ，因为它非常适合基于 Syntiant NDP120 的开发板。它会尝试将音频转换成某种基于时间和频率特征的特征，这将有助于我们进行分类。然后选择添加学习块并添加具有两个输出类的分类。

然后导航到 Syntiant。在 Syntiant 下，我们将保留默认参数。单击保存参数。

最后，单击生成功能按钮。您应该会得到如下所示的响应。

按“开始训练”按钮训练模型。此过程可能需要大约 5-10 分钟，具体取决于您的数据集大小。如果一切正常，您应该会在 Edge Impulse 中看到以下内容

我们得到了 90.7% 的验证准确率。你不应该从你的训练数据集中获得 100% 的准确率，因为它可以被认为是过度拟合的模型。任何大于 70% 的值都是出色的模型性能。增加训练时期的数量可能会增加这个准确度分数。

.tflite文件是我们的模型。最终的量化模型文件 (int8) 大小约为5KB ，准确率接近 90%。

查看模型架构及其输入和输出格式和形状总是很有趣。您可以使用像Netron这样的程序来查看神经网络。

单击 serving_default_x:0：我们观察到输入的类型为 int8，大小为 [1, 1600]。现在让我们看看输出：我们有 2 个类，所以我们看到输出形状是 [1, 2]。量化过程会降低模型的性能，因为从 32 位浮点到 8 位整数表示意味着精度损失。

完成模型构建后，请转到“部署”部分并将其部署到其中一个受支持的边缘设备上。ML 模型部署是将经过训练和测试的 ML 模型放入边缘设备等生产环境中的过程，在这里它可以用于其预期目的。

转到 Edge Impulse 的“部署”选项卡。单击您的边缘设备固件类型。在这里，它是 Arduino Nicla 语音。

您可能会看到以下日志消息：

Total Parameter Memory: 1.375 KB out of 640.0 KB on the NDP120_B0 device.                            | | Estimated Model Energy/Inference at 0.9V: 5.55404 (uJ)