Authors: Sneha Sankeshwari, Santosh Gaikwad, Arshiya Khan, R.S. Deshpande
Abstract: Lung cancer is one of the leading causes of cancer-related mortality worldwide, primarily due to delayed diagnosis and limited access to timely screening. Early detection is essential for improving survival outcomes, yet conventional diagnostic techniques such as CT scans, X-rays, and biopsies are often expensive, time-consuming, and not readily available in all healthcare settings. This study explores the potential of machine learning (ML) techniques in facilitating early and accurate lung cancer prediction by leveraging structured patient data, including age, smoking history, environmental exposures, and family medical background. Various ML models—including Logistic Regression, Decision Trees, Random Forests, and Support Vector Machines—are evaluated for their effectiveness in identifying high-risk individuals. Publicly available datasets, such as the UCI Lung Cancer Dataset, SEER database, and PLCO trial data, are utilized for training and validation. The study also addresses key challenges in ML-based diagnosis, including data imbalance, feature selection, and model interpretability. Additionally, future research directions are highlighted, particularly the integration of multi-modal data and the deployment of interpretable AI solutions in clinical practice. The findings underscore the promise of ML in making lung cancer detection more accessible, efficient, and cost-effective, ultimately contributing to reduced mortality rates.