Authors: Muneeswaran B, Shanmuga Eswari M
Abstract: This research presents an innovative hybrid machine learning framework that amalgamates density-based clustering with ensemble regression and logistic classification to improve the precision of student performance prediction. We use DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering on the StudentPerformanceFactors dataset to find hidden student behavioural phenotypes. These phenotypes are then used as engineered features for supervised learning models. An automated hyperparameter tuning system uses silhouette score maximisation to systematically test different DBSCAN settings and find the best density parameters (eps=1.0, min_samples=5) without any human input. The final cluster assignments are used in both a RandomForestRegressor to predict test scores and a Logistic Regression model to classify performance into categories. This creates a hybrid framework that captures both clear academic metrics and more subtle behavioural patterns. Experimental validation shows performance gains that are statistically significant. The hybrid RandomForest gets an MSE of 4.45 on test data that wasn't used to train it, and the hybrid Logistic Regression gets an accuracy of 82.3%. Feature importance analysis shows that Attendance (33.4%), Hours_Studied (23.9%), and Previous_Scores (9.8%) are the most important predictors. DBSCAN_Cluster also adds useful discriminative power. Five-fold cross-validation verifies model robustness (CV-MSE=4.88±0.12). This study enhances educational data mining by implementing unsupervised learning for supervised improvement, providing interpretable student groupings that uncover density-based behavioural phenotypes affecting academic performance. The proposed framework shows that it can be used in real life for early intervention systems by giving teachers useful student types based on regular academic data.