Speech Emotion Recognition Using CNN

Speech Emotion Recognition Using CNN
Authors:-Pratiksha Sathe, Dr. Jasbir Kaur, Assistant Professor Suraj Kanal

Abstract-Speech Emotion Recognition (SER) is an evolving and critical field in human-computer interaction, aimed at identifying and interpreting human emotions through speech signals. The ability to recognize emotions accurately from speech has applications in various domains, including mental health diagnostics, customer service, and adaptive learning systems. This paper focuses on leveraging Convolutional Neural Networks (CNN) for SER, emphasizing their capability to perform robust feature extraction and accurate classification. CNNs excel in capturing both spatial and temporal characteristics of audio signals, making them particularly well-suited for processing speech data. By converting speech signals into Log-Mel spectrograms, which effectively represent the spectral and temporal properties of audio, the proposed model achieves high accuracy in recognizing a diverse range of emotions. The study demonstrates the practical application of CNNs for SER, highlights their advantages over traditional machine learning models, and evaluates their performance on benchmark datasets such as RAVDESS and IEMOCAP. The results underscore the potential of CNN-based approaches to advance the field of speech emotion recognition, paving the way for more sophisticated and empathetic human-computer interaction systems.

DOI: 10.61137/ijsret.vol.11.issue2.373/a>

Related posts

Follow Us on