Robust Audio Adversarial Examples

Supervisor(s):	Karla Markert, Jan-Philipp Schulze
Status:	finished
Topic:	Others
Author:	Armin Ettenhofer
Submission:	2022-10-17
Type of Thesis:	Bachelorthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching
Description The popularity of devices containing software for automatic speech recognition (ASR) is ever increasing. These AI systems interpret spoken words to execute voice commands or provide automatic transcriptions. As they are a part of AI personal assistants like Amazon Alexa or Apple Siri, which can often be used to control Smart Home functions, security is an important issue. However, the technology many state-of-the-art ASR systems use, end-to-end neural networks, has an inherent vulnerability to adversarial examples. These are specifically generated small and ideally imperceptible manipulations to the input of a network, which enable an attacker to arbitrarily control the network's output. In this work, we investigate the current possibilities of audio adversarial examples, building on previous work in both the image and audio domain. We use room impulse responses dynamically created by a neural network to simulate a physical environment during the generation process and harden our examples against transformations experienced in over-the-air attacks. Furthermore, we use a psychoacoustic model for auditory masking, whereby spikes in certain frequencies will make changes in adjacent frequencies completely imperceptible. As a result, the attack is hidden in frequencies imperceptible to a human listener, yet an ASR understands a phrase clearly different than the original. Thanks to our improved attack mechanisms, the target phrase is correctly transcribed even in physical environments under volatile real-world conditions, while being less perceptible to human observers. This is further evaluated in a human study.

Robust Audio Adversarial Examples

Description