Authors: Anshika Singh, Sneha Chhabra, Rajat Takkar, Harshwardhan Singh Thakur
Abstract: This study investigates the rising global disease burden, emphasizing the need for early detection to minimize mortality and healthcare costs. This article proposes a machine learning model for predicting disease risk from a dataset of 4240 patient records. Each record is characterized by 15 clinical and demographic attributes. This research paper employed five classifiers—Logistic Regression, Random Forest, Support Vector Machine (SVM), K-Nearest Neighbours (KNN), and Naive Bayes—to identify disease presence. Using hold-out validation, the models were evaluated, and Logistic Regression achieved the highest accuracy of approximately 84%, followed by Random Forest (~83.7%), SVM (~83.3%), and KNN (~82–83%). These results show the potential for early disease detection, enabling timely interventions. By integrating such models into practice, clinicians can maximize patient outcomes and reduce the disease burden globally. Future development includes expanding the dataset and adding an accessible interface for real-time analysis of disease risk.