Integrated Approach to Emotion Recognition Across Multiple Modalities

Uncategorized

Integrated Approach to Emotion Recognition Across Multiple Modalities
Authors:-Dr. Kavitha C, Jananisri K, Monisha B T, Prathibha G, Shanmitha P, Niranjani T

Abstract-Multimodal emotion recognition is essential for advancing human-computer interactions and enabling applications like mental health monitoring and social robotics. This study focuses on utilizing text, audio, and motion data from the IEMOCAP dataset to develop independent models that capture unique emotional cues from each modality. The audio model employs a hybrid architecture combining Convolutional Neural Networks (CNN), Multi-Head Attention, and Gated Recurrent Units (GRU), achieving an accuracy of 81%. The text model leverages a CNN-based approach inspired by Temporal Convolutional Networks (TCN), achieving 94% accuracy. For motion data, a Spatio-Temporal Graph Convolutional Network (ST-GCN) was implemented, achieving 63% accuracy. A score-level fusion strategy integrates these models, improving the overall recognition performance. Evaluations using metrics like accuracy, precision, and recall demonstrate how multimodal approaches can provide a more accurate and reliable emotion recognition system by combining complementary information from diverse data types.

DOI: 10.61137/ijsret.vol.11.issue1.112

× How can I help you?