Development Of An Explainable AI Model For PCOS Diagnosis Using Machine Learning Techniques

Authors: Mamta Bhardwaj

Abstract: Polycystic Ovary Syndrome (PCOS) is a multifactorial endocrine disorder affecting a significant proportion of women of reproductive age, often leading to metabolic, hormonal, and reproductive complications such as infertility, insulin resistance, and cardiovascular risks. Early and accurate diagnosis of PCOS remains a major clinical challenge due to its heterogeneous symptoms, variability across patients, and reliance on subjective diagnostic criteria such as the Rotterdam guidelines. In recent years, machine learning (ML) techniques have shown promising potential in improving diagnostic accuracy; however, their lack of interpretability has limited their adoption in real-world healthcare settings. This study proposes a comprehensive Explainable Artificial Intelligence (XAI)-based risk prediction framework for PCOS diagnosis that combines robust machine learning algorithms with interpretable techniques to enhance clinical trust and usability. The proposed model utilizes a publicly available PCOS dataset comprising clinical, hormonal, and ultrasound features. A systematic preprocessing pipeline is implemented, including missing value imputation, feature scaling, and class imbalance handling using Synthetic Minority Oversampling Technique (SMOTE). Feature selection methods such as correlation analysis and Recursive Feature Elimination (RFE) are applied to identify the most significant predictors contributing to PCOS. Multiple machine learning models, including Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost), are evaluated. A stacking ensemble model is then developed to leverage the strengths of individual classifiers and improve overall predictive performance. To address the critical challenge of model interpretability, ex-plainability techniques such as SHapley Additive explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) are integrated into the framework. These methods provide both global and local explanations, enabling the identification of key features such as menstrual cycle irregularity, Body Mass Index (BMI), follicle count, and hormonal imbalance, which are consistent with established clinical knowledge.

DOI: http://doi.org/10.5281/zenodo.20643863

Related posts

Follow Us on