A Machine Learning-Based Early Warning System for Student Performance Prediction: System Development and Empirical Evaluation in Higher Education

Authors

DOI:

https://doi.org/10.51903/92j5wj58

Keywords:

early warning system, machine learning, Gradient Boosting, student at-risk, higher education

Abstract

Academic failure and student attrition remain structural challenges in higher education, yet the behavioral and academic data generated through digital learning environments are largely underexploited for early risk identification. This study presents the design, development, and empirical evaluation of a machine learning-based early warning system (EWS) for predicting student academic performance, deployed as a fully operational web-based application within the information technology infrastructure of a higher education institution in Indonesia. A dataset of 1,240 student records spanning academic years 2021–2024 was constructed by integrating static academic attributes extracted from the institutional Academic Information System (SIAKAD) with dynamic behavioral features derived from a Moodle-based Learning Management System (LMS), including weekly login frequency, assignment submission lead time, quiz attempt rate, and forum participation count. Four supervised classification algorithms Logistic Regression, Random Forest, Gradient Boosting, and Long Short-Term Memory (LSTM) were trained and benchmarked under stratified 10-fold cross-validation with SMOTE-based class balancing. The Gradient Boosting classifier achieved superior performance across all evaluation metrics, attaining an accuracy of 89.1%, F1-Score of 0.850, and AUC-ROC of 0.931. SHAP-based feature attribution confirmed that LMS-derived behavioral variables, particularly weekly login frequency (SHAP = 0.241) and assignment submission lead time (SHAP = 0.187), contributed substantively to prediction quality beyond static academic records alone. The deployed system was evaluated by 32 academic advisors using the System Usability Scale (SUS) following a four-week observation period, yielding a mean score of 78.4, indicative of above-average practitioner usability.

References

[1] J. K. Rost, “Analyzing Student Success Outcome Variables in Higher Education Utilizing the Chi-Square Test of Independence,” International Journal of Higher Education, vol. 13, no. 2, p. 100, Apr. 2024, doi: 10.5430/ijhe.v13n2p100.

[2] H. Brdesee, W. Alsaggaf, N. Aljohani, and S. U. Hassan, “Predictive Model Using a Machine Learning Approach for Enhancing the Retention Rate of Students At-Risk,” https://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/IJSWIS.299859, vol. 18, no. 1, pp. 1–21, Jan. 1AD, doi: 10.4018/IJSWIS.299859.

[3] K. Alalawi, R. Athauda, and R. Chiong, “An Extended Learning Analytics Framework Integrating Machine Learning and Pedagogical Approaches for Student Performance Prediction and Intervention,” International Journal of Artificial Intelligence in Education 2024 35:3, vol. 35, no. 3, pp. 1239–1287, Sep. 2024, doi: 10.1007/s40593-024-00429-7.

[4] A. A. Eli, A. Rahman, and N. Kshetri, “D3S3real: Enhancing Student Success and Security Through Real-Time Data-Driven Decision Systems for Educational Intelligence,” Digital 2025, Vol. 5, vol. 5, no. 3, Sep. 2025, doi: 10.3390/digital5030042.

[5] C. J. Arizmendi et al., “Predicting student outcomes using digital logs of learning behaviors: Review, current standards, and suggestions for future work,” Behavior Research Methods 2022 55:6, vol. 55, no. 6, pp. 3026–3054, Aug. 2022, doi: 10.3758/s13428-022-01939-9.

[6] A. Qazdar, O. Hasidi, S. Qassimi, and E. H. Abdelwahed, “Newly Proposed Student Performance Indicators Based on Learning Analytics for Continuous Monitoring in Learning Management Systems,” International Journal of Online and Biomedical Engineering (iJOE), vol. 19, no. 11, pp. 19-30–19–30, Aug. 2023, doi: 10.3991/ijoe.v19i11.39471.

[7] A. Al-Ameri, W. Al-Shammari, A. Castiglione, M. Nappi, C. Pero, and M. Umer, “Student Academic Success Prediction Using Learning Management Multimedia Data With Convoluted Features and Ensemble Model,” Journal of Data and Information Quality, vol. 17, no. 3, Sep. 2025, doi: 10.1145/3687268.

[8] K. Alalawi, R. Athauda, and R. Chiong, “An Extended Learning Analytics Framework Integrating Machine Learning and Pedagogical Approaches for Student Performance Prediction and Intervention,” International Journal of Artificial Intelligence in Education 2024 35:3, vol. 35, no. 3, pp. 1239–1287, Sep. 2024, doi: 10.1007/s40593-024-00429-7.

[9] H. Brdesee, W. Alsaggaf, N. Aljohani, and S. U. Hassan, “Predictive Model Using a Machine Learning Approach for Enhancing the Retention Rate of Students At-Risk,” https://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/IJSWIS.299859, vol. 18, no. 1, pp. 1–21, Jan. 1AD, doi: 10.4018/IJSWIS.299859.

[10] F. Liao, S. Adelaine, M. Afshar, and B. W. Patterson, “Governance of Clinical AI applications to facilitate safe and equitable deployment in a large health system: Key elements and early successes,” Front. Digit. Health, vol. 4, p. 931439, Aug. 2022, doi: 10.3389/fdgth.2022.931439.

[11] M. Hyzy et al., “System Usability Scale Benchmarking for Digital Health Apps: Meta-analysis,” JMIR Mhealth Uhealth, vol. 10, no. 8, p. e37290, Aug. 2022, doi: 10.2196/37290.

[12] A. M. Deshmukh and R. Chalmeta, “Validation of system usability scale as a usability metric to evaluate voice user interfaces,” PeerJ Comput. Sci., vol. 10, p. e1918, Feb. 2024, doi: 10.7717/peerj-cs.1918.

[13] Z. Salekshahrezaee, J. L. Leevy, and T. M. Khoshgoftaar, “The effect of feature extraction and data sampling on credit card fraud detection,” Journal of Big Data 2023 10:1, vol. 10, no. 1, pp. 6-, Jan. 2023, doi: 10.1186/s40537-023-00684-w.

[14] D. Elreedy et al., “A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning,” Machine Learning 2023 113:7, vol. 113, no. 7, pp. 4903–4923, Jan. 2023, doi: 10.1007/s10994-022-06296-4.

[15] M. Imani, A. Beikmohammadi, and H. R. Arabnia, “Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under Varying Imbalance Levels,” Technologies 2025, Vol. 13, vol. 13, no. 3, Feb. 2025, doi: 10.3390/technologies13030088.

[16] P. Vlachogianni and N. Tselios, “Perceived usability evaluation of educational technology using the System Usability Scale (SUS): A systematic review,” Journal of Research on Technology in Education, vol. 54, no. 3, pp. 392–409, 2022, doi: 10.1080/15391523.2020.1867938.

[17] Y. Liu, S. Fan, S. Xu, A. Sajjanhar, S. Yeom, and Y. Wei, “Predicting Student Performance Using Clickstream Data and Machine Learning,” Education Sciences 2023, Vol. 13, vol. 13, no. 1, Dec. 2022, doi: 10.3390/EDUCSCI13010017.

[18] M. Fazil, A. Rísquez, and C. Halpin, “A Novel Deep Learning Model for Student Performance Prediction Using Engagement Data,” Journal of Learning Analytics, vol. 11, no. 2, pp. 23–41, May 2024, doi: 10.18608/jla.2024.7985.

[19] E. Kalita, H. El Aouifi, A. Kukkar, S. Hussain, T. Ali, and S. Gaftandzhieva, “LSTM-SHAP based academic performance prediction for disabled learners in virtual learning environments: a statistical analysis approach,” Social Network Analysis and Mining 2025 15:1, vol. 15, no. 1, pp. 65-, Jun. 2025, doi: 10.1007/S13278-025-01484-1.

[20] S. A. Alwarthan, N. Aslam, and I. U. Khan, “Predicting Student Academic Performance at Higher Education Using Data Mining: A Systematic Review,” Applied Computational Intelligence and Soft Computing, vol. 2022, no. 1, p. 8924028, Jan. 2022, doi: 10.1155/2022/8924028.

[21] S. C. Matz, C. S. Bukow, H. Peters, C. Deacons, and C. Stachl, “Using machine learning to predict student retention from socio-demographic characteristics and app-based engagement metrics,” Scientific Reports 2023 13:1, vol. 13, no. 1, pp. 5705-, Apr. 2023, doi: 10.1038/s41598-023-32484-w.

[22] F. A. Al-azazi and M. Ghurab, “ANN-LSTM: A deep learning model for early student performance prediction in MOOC,” Heliyon, vol. 9, no. 4, p. e15382, Apr. 2023, doi: 10.1016/j.heliyon.2023.e15382.

[23] B. Le, G. A. Lawrie, and J. T. H. Wang, “Student Self-perception on Digital Literacy in STEM Blended Learning Environments,” Journal of Science Education and Technology 2022 31:3, vol. 31, no. 3, pp. 303–321, Feb. 2022, doi: 10.1007/S10956-022-09956-1.

[24] N. A. Mohindra et al., “Development of an electronic health record-integrated patient-reported outcome-based shared decision-making dashboard in oncology,” JAMIA Open, vol. 7, no. 3, Jul. 2024, doi: 10.1093/JAMIAOPEN/OOAE056.

[25] S. Almasi, K. Bahaadinbeigy, H. Ahmadi, S. Sohrabei, and R. Rabiei, “Usability Evaluation of Dashboards: A Systematic Literature Review of Tools,” Biomed Res. Int., vol. 2023, no. 1, p. 9990933, Jan. 2023, doi: 10.1155/2023/9990933.

Downloads

Published

2026-04-10