Authors: Hanathika T, Gayatri K
Abstract: Diabetic retinopathy (DR) is a major cause of avoidable vision loss worldwide, and deep learning approaches have shown promising results on large-scale retinal image datasets such as EyePACS. However, many existing works mainly emphasize overall accuracy or referable DR detection, while giving less importance to factors like model reliability, interpretability, and performance on noisy real-world data. To address these limitations, this study presents a Multi-Attention Residual Network (MARN) built upon EfficientNet-B0 for simultaneous DR grading and referable DR classification using a resized version of the EyePACS Kaggle dataset. The proposed architecture integrates a residual fully connected head with dropout regularization and is trained using class-balanced sampling along with cross-entropy loss. The model is evaluated on both a five-class DR grading task and a clinically significant binary classification task (referable DR ≥ moderate versus non-referable DR). Experimental results on a subset of 6,081 images show that MARN improves five-class validation accuracy from 0.4618 to 0.4881 and increases the macro-F1 score from 0.4905 to 0.5195 when compared to a strong EfficientNet-B0 baseline. For referable DR detection, the model achieves an accuracy of 0.780, with sensitivity of 0.776 and specificity of 0.783, demonstrating a slight improvement in specificity while preserving high sensitivity. Further analysis indicates notable performance gains in Severe and Proliferative DR categories, with ROC-AUC scores of 0.865 and 0.915, respectively. In addition, Grad-CAM visualizations highlight that the model focuses on clinically relevant lesion regions, while t-SNE representations show improved clustering of advanced DR features. Overall, the proposed MARN framework delivers consistent improvements in classification performance, effective identification of vision-threatening DR, and enhanced interpretability, making it a reliable and explainable tool for clinical decision support rather than a purely black-box model.