E on request from the corresponding author. Conflicts of Interest: The authors declare no conflicts of interest.
applied sciencesArticleAdversarial Attack and Defense on Deep Neural NetworkBased Voice Processing Systems: An OverviewXiaojiao Chen 1 , Sheng Li 2 and Hao Huang 1,three, 2School of Data Science and Engineering, Xinjiang University, Urumqi 830046, China; [email protected] National Institute of Information and Communications Technology, Kyoto GS-626510 site 6190288, Japan; [email protected] Xinjiang Provincial Essential Laboratory of MultiLingual Facts Technologies, Urumqi 830046, China Correspondence: [email protected]: Voice Processing Systems (VPSes), now broadly deployed, have turn out to be deeply involved in people’s everyday lives, assisting drive the auto, unlock the smartphone, make on line purchases, and so on. Sadly, current investigation has shown that these systems determined by deep neural networks are vulnerable to adversarial examples, which attract significant attention to VPS safety. This assessment presents a detailed introduction for the background expertise of adversarial attacks, which includes the generation of adversarial examples, psychoacoustic models, and evaluation indicators. Then we present a concise introduction to defense techniques against adversarial attacks. Lastly, we propose a systematic classification of adversarial attacks and defense approaches, with which we hope to provide a far better understanding in the classification and structure for novices within this field. Search phrases: adversarial attack; adversarial instance; adversarial defense; speaker recognition; speech recognitionCitation: Chen, X.; Li, S.; Huang, H. Adversarial Attack and Defense on Deep Neural NetworkBased Voice Processing Systems: An Overview. Appl. Sci. 2021, 11, 8450. https:// doi.org/10.3390/app11188450 Academic Editor: Yoshinobu Kajikawa Received: 15 August 2021 Accepted: 8 September 2021 Published: 12 September1. Introduction Using the effective application of deep neural networks inside the field of speech processing, automatic speech recognition systems (ASR) and automatic speaker recognition systems (SRS) have turn into ubiquitous in our lives, which includes personal voice assistants (VAs) (e.g., Apple Siri (https://www.apple.com/in/siri (Aripiprazole (D8) Technical Information accessed on 9 September 2021)), Amazon Alexa (https://developer.amazon.com/enUS/alexa (accessed on 9 September 2021)), Google Assistant (https://assistant.google.com/ (accessed on 9 September 2021)), iFLYTEK (http://www.iflytek.com/en/index.html (accessed on 9 September 2021))), voiceprint recognition systems on mobile phones, bank selfservice voice systems, and forensic testing [1]. The application of those systems has brought good comfort to people’s private and public lives, and, to a particular extent, enables persons to access assist much more effectively and conveniently. Current research, on the other hand, has shown that the neural network systems are vulnerable to adversarial attacks [2]. This will threaten private identity info and home security and leaves an chance for criminals. In the viewpoint of security, the privacy with the public is in danger. Therefore, for the goal of public and personal safety, mastering the methods of attack and defense will enable us to stop complications before their probable occurrence. In response to the issues pointed out above, the idea of adversarial examples [2] was born. The original adversarial examples were applied to image recognition systems [3,4,six,7] then researc.