Authors: Vaishnavi Chevale, Dr.Santosh Gaikwad, Dr. A. A. Khan, Dr. R. S. Deshpande§
Abstract: Contextual Emotion Recognition (CER) is crucial for human-computer interaction, requiring an understanding of emotions from linguistic and visual cues. This paper explores the integration of Large Vision- Language Models (LVLMs) to improve CER accuracy. The proposed framework employs multimodal learning to capture contextual dependencies, reduce biases, and enhance generalization. Experimental results demon- strate superior performance in real-world scenarios, de- creasing ambiguity and increasing robustness compared to traditional methods.
DOI: http://doi.org/