A Comparative Study on Additive Cross-Modal Attention Network (ACMA) for Depression Detection Based on Audio and Textual Features

Uncategorized

A Comparative Study on Additive Cross-Modal Attention Network (ACMA) for Depression Detection Based on Audio and Textual Features
Authors:-Asif S Majeed, Evelyn Treasa Jaison, Fathima S, Arunlal M L, Dr. Jyothi R L, Swathi S

Abstract-:This study introduces an approach for depression detection through an Additive Cross-Modal Attention Network (ACMA) that integrates audio and textual data to improve diagnostic accuracy without relying on self-report questionnaires. Traditional depression assessments often depend on patient- disclosed information, which may not always be accurate due to stigma or personal reluctance, leading to potential underdiagno- sis. The ACMA model addresses these limitations by leveraging cross-modal attention mechanisms within a Bidirectional Long Short-Term Memory (BiLSTM) and Transformer model to cap- ture and assign optimal weights to relevant features across audio and text modalities. This enables the model to effectively detect depressive symptoms by analyzing both linguistic and acoustic cues. The model is designed for both binary classification (depressed vs. non-depressed) and regression tasks to estimate depression severity, utilizing the DAIC-WOZ dataset for evaluation. ACMA demonstrates significant improvements over baseline models, achieving high accuracy, recall, and F1 scores. Additionally, the model’s adaptability across different datasets underscores its potential as a robust, non-intrusive tool for clinical applications in mental health diagnostics. This work advances the field of au- tomated depression detection, providing a foundation for further research in cross-modal mental health assessment systems.

DOI: 10.61137/ijsret.vol.11.issue2.463

× How can I help you?