Modified Convolutional Neural Network for Speaker Age and Gender Classification

Laxmi Kantham Durgam; Ravi Kumar Jatoth; Daniel Hládek; Stanislav Ondáš; Matúš Pleva; Jozef Juhár

Authors

Laxmi Kantham Durgam Digital Signal Processing Lab, Department of Electronics and Communication Engineering, National Institute of Technology Warangal, 506004 Telangana, India & Speech Communications Lab, Department of Electronics and Multimedia Communications, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Letná 9, 042 00 Košice, Slovakia
Ravi Kumar Jatoth Digital Signal Processing Lab, Department of Electronics and Communication Engineering, National Institute of Technology Warangal, 506004 Telangana, India
Daniel Hládek Department of Electronics and Multimedia Communications, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Letná 9, 04200 Košice, Slovakia
Stanislav Ondáš Department of Electronics and Multimedia Communications, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Letná 9, 04200 Košice, Slovakia
Matúš Pleva Department of Electronics and Multimedia Communications, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Letná 9, 04200 Košice, Slovakia
Jozef Juhár Department of Electronics and Multimedia Communications, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Letná 9, 04200 Košice, Slovakia

Keywords:

Age and gender estimation, speaker recognition, MFCC, modified 1D-CNN, dimensionality reduction, random seed, cross-validation

Abstract

Identifying a person's age and gender from speech signal characteristics poses a significant challenge in personal identity recognition systems, particularly when security considerations are involved. In signal processing applications such as speaker recognition, biometric identification, human-machine interface (HMI), and telecommunication, the estimation of age and gender from voice is a crucial and demanding problem. In several signal processing domains, deep learning models have demonstrated remarkable effectiveness. In this paper, we proposed a modified convolutional neural network to identify the age and gender of the speaker using the characteristics of the MFCC speech. We also included techniques to reduce the dimensionality of the speech feature set. We tested modified one-dimensional convolutional neural networks (1D-CNN) and machine learning models such as support vector classification (SVC), decision trees (DT), and random forests (RF). The modified 1D-CNN based on deep learning, along with dimensionality reduction, random seeding, and cross-validation, is proposed for the recognition of age and gender in speech. We applied different dimensionality reduction techniques such as principal component analysis (PCA) and independent component analysis (ICA) along with random seed and various sets of cross-validations. In this study, we used the Children speech recording dataset, Biometric Visions and Computing (BVC) and the Mozilla Common Voice speech datasets for estimating age and gender from speech. The proposed 1D-CNN model exhibits a promising performance compared to the state-of-the-art (SOTA) approaches. The models were evaluated and compared with evaluation metrics, such as accuracy. The dimensionality reduction techniques, selection of speech features, and seeding show a significant impact on the performance of the suggested model.

Downloads

Download data is not yet available.

Modified Convolutional Neural Network for Speaker Age and Gender Classification

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Information

Make a Submission

Keywords