Light Weight DeepLearning Frame Work For Speech Emotion Recognition Singal Processing

Authors: Subanila V

Abstract: Speech Emotion Recognition (SER) plays a crucial role in enhancing human-computer interaction by enabling machines to understand and respond to human emotions. In this study, we propose a lightweight and efficient SER model that integrates Random Forest and Multi-layer Perceptron (MLP) classifiers within a VGGNet framework. Unlike traditional deep learning models that require extensive computational resources and hyper-parameter tuning, our approach optimizes performance while significantly reducing complexity. We extracted Mel Frequency Cepstral Coefficient (MFCC) features from three widely-used speech emotion datasets—TESS, EMODB, and RAVDESS—covering 6 to 8 distinct emotions including Sad, Angry, Happy, Surprise, Neutral, Disgust, Fear, and Calm. The proposed model achieved remarkable accuracy rates of 100%, 96%, and 86.25% on the TESS, EMODB, and RAVDESS datasets, respectively. These results indicate superior or comparable performance to state-of-the-art deep learning architectures such as InceptionV3, ResNet, MobileNetV2, and DenseNet, while maintaining lower computational demands. Our findings demonstrate that the hybrid lightweight model effectively balances resource efficiency and emotion recognition accuracy, making it well-suited for deployment on resource-constrained devices without compromising performance.

Related posts

Follow Us on