Improving user verification in human-robot interaction from audio or image inputs through sample quality assessment
بهبود تأیید کاربر در تعامل انسان و روبات از طریق ورودی های صوتی یا تصویری از طریق ارزیابی کیفیت نمونه-2021
In this paper, we tackle the task of improving biometric veriﬁcation in the context of Human-Robot Interaction (HRI). A robot that wants to identify a speciﬁc person to provide a service can do so by either image veriﬁcation or, if light conditions are not favourable, through voice veriﬁcation. In our approach, we will take advantage of the possibility a robot has of recovering further data until it is sure of the identity of the person. The key contribution is that we select from both image and audio signals the parts that are of higher conﬁdence. For images we use a system that looks at the face of each person and selects frames in which the conﬁdence is high while keeping those frames separate in time to avoid using very similar facial appearance. For audio our approach tries to ﬁnd the parts of the signal that contain a person talking, avoiding those in which noise is present by segmenting the signal. Once the parts of interest are found, each input is described with an independent deep learning architecture that obtains a descriptor for each kind of input (face/voice). We also present in this paper fusion methods that improve performance by combining the features from both face and voice, results to validate this are shown for each independent input and for the fusion methods.© 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Keywords: Biometric verification | Audiovisual verification | Human robot interaction
A multimodal-Siamese Neural Network (mSNN) for person verification using signatures and EEG
شبکه عصبی چند حالته سیامی (mSNN) برای تأیید شخص با استفاده از امضا و EEG-2021
Signatures have long been considered to be one of the most accepted and practical means of user verification, despite being vulnerable to skilled forgers. In contrast, EEG signals have more recently been shown to be more difficult to replicate, and to provide better biometric information in response to known a stimulus. In this paper, we propose combining these two biometric traits using a multimodal Siamese Neural Network (mSNN) for improved user verification. The proposed mSNN network learns discriminative temporal and spatial features from the EEG signals using an EEG encoder and from the offline signatures using an image encoder. Features of the two encoders are fused into a common feature space for further processing. A Siamese network then employs a distance metric based on the similarity and dissimilarity of the input features to produce the verification results. The proposed model is evaluated on a dataset of 70 users, comprised of 1400 unique samples. The novel mSNN model achieves a 98.57% classification accuracy with a 99.29% True Positive Rate (TPR) and False Acceptance Rate (FAR) of 2.14%, outperforming the current state-of-the-art by 12.86% (in absolute terms). This proposed network architecture may also be applicable to the fusion of other neurological data sources to build robust biometric verification or diagnostic systems with limited data size.
Keywords: User verification | Multimodal | EEG | Siamese Neural Network | LSTM | CNN