
Depression among undergraduates is a growing global concern, with prevalence rates rising from 3.82% to 4.36% globally and 3.76% to 4.51% in India (2017 -2021) for the 15-24 age group, exacerbated by the COVID-19 pandemic. Early detection is crucial, but self-reported questionnaires often lack objectivity. This study explores the efficacy of ensemble machine learning models for predicting early depression among undergraduates, leveraging a dataset of 12,617 Indian students compiled from a Kaggle dataset and a 2025 survey, augmented to balance depressed (44%) and not depressed (56%) classes. Three heterogeneous ensemble models—Model Averaged Neural Network (MANN), LogitBoost, and Weighted Subspace Random Forest (WSRF)—were applied to a feature set, after assessing multicollinearity using Tolerance and Variance Inflation Factor, and addressing feature redundancy with Pearson Correlation Coefficient (PCC). AUC-ROC, Accuracy, Precision, Recall, F1 Score, and Cohen's Kappa were employed to evaluate the model's performance, with 10-fold cross-validation repeated thrice for robustness. WSRF excelled, achieving 0.9829 accuracy, 0.9910 precision, 0.9782 recall, and AUC values of 99.90% (training), 92.10% (testing), and 91.50% (validation), compared to MANN (AUC 93.20% testing) and LogitBoost (AUC 87.40% validation). WSRF’s superior performance supports its use in identifying at-risk students, enabling timely mental health interventions. Future research should focus on cross-demographic generalizability, advanced methods like deep learning, and real-time monitoring to enhance global mental health strategies.
Authors: Samprita De, Smritika Ghosh, Deepanjan Sen, Swarup Das, Arindam Sarkar
DOI: https://doi.org/10.1109/ciacon65473.2025.11189611
Publish Year: 2025