Authors
Lea Schönherr, Maximilian Golla, Thorsten Eisenhofer, Jan Wiele, Dorothea Kolossa, Thorsten Holz
Publication date
2022/5/1
Journal
Computer Speech & Language
Volume
73
Publisher
Academic Press
Description
Voice assistants like Amazon’s Alexa, Google’s Assistant, Tencent’s Xiaowei, or Apple’s Siri, have become the primary (voice) interface in smart speakers that can be found in millions of households. For privacy reasons, these speakers analyze every sound in their environment for their respective wake word like “Alexa,” “Jiǔsì’èr líng,” or “Hey Siri,” before uploading the audio stream to the cloud for further processing. Previous work reported on examples of an inaccurate wake word detection, which can be tricked using similar words or sounds like “cocaine noodles” instead of “OK Google.”
In this paper, we perform a comprehensive analysis of such accidental triggers, i. e., sounds that should not have triggered the voice assistant, but did. More specifically, we automate the process of finding accidental triggers and measure their prevalence across 11 smart speakers from 8 different manufacturers using everyday …
Total citations
201820192020202120222023202411312192212
Scholar articles
L Schönherr, M Golla, T Eisenhofer, J Wiele, D Kolossa… - arXiv preprint arXiv:2008.00508, 2020
L Schönherr, M Golla, T Eisenhofer, J Wiele, D Kolossa… - Computer Speech & Language, 2022
L Schönherr, M Golla, T Eisenhofer, J Wiele, D Kolossa… - Exploring Accidental Triggers of Smart Speakers. ArXiv …, 2008