Real-Time Emotion Detection From Speech Using LSTM And MFCC

Authors: Subasree S, Dhanusree R S

Abstract: Speech Emotion Recognition (SER) stands as a transformative component within the field of affective computing, offering immense potential in domains such as mental health assessment, intelligent virtual assistants, human-robot interaction, and personalized customer service. This study introduces an advanced SER framework powered by a Long Short-Term Memory (LSTM) deep learning model, designed to accurately identify emotional states embedded in vocal expressions. By capturing and analyzing temporal dynamics in speech, the system effectively distinguishes emotions such as happiness, sadness, anger, and neutrality. The architecture processes both uploaded and live-recorded audio inputs, utilizing Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction—one of the most reliable techniques for capturing relevant speech characteristics. A user-friendly interface, built using Streamlit, provides real-time interaction and feedback, making the system accessible for non-technical users. Furthermore, this solution uniquely incorporates the ability to detect emotional cues from animal sounds, expanding its scope beyond human applications. The project emphasizes performance, responsiveness, and practical integration into real-world applications, enhancing user engagement through visual output and immediate emotion recognition results.

Related posts

Follow Us on