Evolutionary Dimensionality Reduction for Structured Heart-Disease Classification: Balancing Predictive Performance, Clinical Input Burden and Global Transparency

Authors: Research Scholar Rakesh Kumar Khillan, Associate Professor Dr. Abhinav Shukla

Abstract: Background: In clinical machine learning, the task of feature selection is frequently stated as a step toward increased accuracy, but a smaller model can be just as useful as it can help to ease the burden of input and give a better global picture of the model. This study compared the performance-compactness trade-off between a full feature Random Forest and a Genetic Algorithm (GA) selected Random Forest in terms of their performance in binary classification of the recorded heart disease status. Data: A public structured dataset with 918 instances, 11 features and a binary target HeartDisease was used. The full-featured Random Forest employed all of the predictors. The binary chromosomes, population size of 20, number of generations of 10, tournament selection, two-point crossover, bit-flip mutation and fitness function of 20-fold Random Forest accuracy are used in a wrapper GA. A subset of 7 predictors was selected and compared to the full-feature model via 10 replications of 20-fold stratified cross-validation. Accuracy, precision, sensitivity, F1-score, ROC-AUC, predictor count and cross-validated permutation importance were measured. Results: The best repeated internal accuracy (87.11% ± 5.06%) and ROC-AUC (0.9285 ± 0.0396) was obtained by the full-feature Random Forest method. The GA-selected model reduced the predictor set from 11 to 7 (36.36%) and achieved accuracy of 83.67% ± 5.45% and ROC-AUC of 0.9075 ± 0.0439. The mean difference of accuracy between the two models in the paired accuracy was −3.44 percentage points in favor of the full-feature model. The largest mean decreases in validation ROC-AUC following permutation was from ST_Slope, followed by ChestPainType and Oldpeak. Conclusions: The evidence was not sufficient to support the assumption which led to the improvement of predictive accuracy through evolutionary features selection. On the contrary, GA has come up with a small, clinically identifiable prototype that had less intraclass discrimination. Thus, the full-featured versus the compact configuration are used in different ways: to maximize predictive performance versus to minimize both user input and global predictor transparency. Prior to clinical-use claims, the features should be fully nested and be externally validated.

DOI: https://doi.org/10.5281/zenodo.21187688

Related posts

Follow Us on