A Comparative Evaluation of Machine Learning Models in Enterprise Information Systems with SHAP-Based Explainability Analysis
DOI:
https://doi.org/10.51903/7d0szq86Keywords:
classification, data mining, decision tree, enterprise information systems, explainability, random forest, SHAP, SMOTE, support vector machineAbstract
Enterprise Information Systems (EIS) generate large volumes of transactional data across functional domains including Human Resource Management (HRM), Customer Relationship Management (CRM), Supply Chain Management (SCM), and Financial Management. Although Decision Tree (DT), Random Forest (RF), and Support Vector Machine (SVM) are among the most widely applied classification algorithms, no study has systematically benchmarked these three classifiers across multiple enterprise IS domains under consistent methodological conditions. This study presents a comparative evaluation of DT, RF, and SVM applied to four representative enterprise IS classification tasks: employee attrition, customer churn, supplier delay risk, and financial fraud detection. The experimental framework incorporates the Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance endemic to enterprise data. It employs grid search cross-validation for hyperparameter optimization to ensure fair comparison. SHAP-based explainability analysis is further applied to the Random Forest (RF) classifier across all four domains bridging algorithmic performance and enterprise decision-makers’ interpretability. Results show that RF consistently achieves the highest predictive performance across all four domains, while SVM demonstrates strong stability, and DT retains advantages in interpretability and computational efficiency. The study culminates in a practitioner-oriented algorithm selection framework that guides enterprise IS stakeholders in choosing appropriate classifiers based on domain characteristics.
References
[1] Y. Niu, L. Ying, J. Yang, M. Bao, and C. B. Sivaparthipan, "Organizational business intelligence and decision making using big data analytics," Inf. Process. Manag., vol. 58, p. 102725, 2021.
[2] S. Ren, "Optimization of enterprise financial management and decision-making systems based on big data," J. Math., 2022.
[3] A. Gandomi and M. Haider, “Beyond the hype: Big data concepts, methods, and analytics,” Int. J. Inf. Manage., vol. 35, no. 2, pp. 137–144, 2015.
[4] A. Dogan and D. Birant, "Machine learning and data mining in manufacturing," Expert Syst. Appl., vol. 166, p. 114060, 2021.
[5] I. D. Mienye and N. Jere, "A survey of decision trees: Concepts, algorithms, and applications," IEEE Access, vol. 12, pp. 86716-86727, 2024.
[6] Z. Sun et al., "An improved random forest based on the classification accuracy and correlation measurement of decision trees," Expert Syst. Appl., 2023.
[7] J. Cervantes, F. Garcia, L. Rodriguez-Mazahua, and A. Lopez-Chau, "A comprehensive survey on support vector machine classification: Applications, challenges and trends," Neurocomputing, vol. 408, pp. 189-215, 2020.
[8] Q. Li and J. Zhou, "A comparative analysis of extreme gradient boosting, decision tree, SVM, and random forest in data analysis of college students' psychological health," Informatica (Slovenia), vol. 49, 2025.
[9] C. Zhang, Y. Liu, and N. Tie, "Forest land resource information acquisition with Sentinel-2 image utilizing SVM, KNN, RF, DT, and MLP," Forests, 2023.
[10] M. Khushi et al., "A comparative performance analysis of data resampling methods on imbalance medical data," IEEE Access, vol. 9, pp. 109960-109975, 2021.
[11] M. Mujahid et al., "Data oversampling and imbalanced datasets: An investigation of performance for machine learning and feature engineering," J. Big Data, vol. 11, pp. 1-32, 2024.
[12] R. Dwivedi et al., "Explainable AI (XAI): Core ideas, techniques, and solutions," ACM Comput. Surv., vol. 55, pp. 1-33, 2022.
[13] J. Gerlach, P. Hoppe, S. Jagels, L. Licker, and M. Breitner, "Decision support for efficient XAI services," Electron. Mark., vol. 32, pp. 2139-2158, 2022.
[14] V. Plotnikova, M. Dumas, and F. P. Milani, "Adaptations of data mining methodologies: A systematic literature review," PeerJ Comput. Sci., vol. 6, 2020.
[15] F. Martinez-Plumed et al., "CRISP-DM twenty years later: From data mining processes to data science trajectories," IEEE Trans. Knowl. Data Eng., vol. 33, pp. 3048-3061, 2021.
[16] L. Jian, "Design of enterprise human resources decision support system based on data mining," Soft Comput., vol. 26, pp. 10571-10580, 2021.
[17] Z. N. Jawad and V. Balazs, "Machine learning-driven optimization of enterprise resource planning (ERP) systems: A comprehensive review," Beni-Suef Univ. J. Basic Appl. Sci., vol. 13, 2024.
[18] S. West, D. Powell, F. Ille, and S. Behringer, "Decision making in service shops supported by mining enterprise resource planning data," Sci, 2024.
[19] R. Panigrahi et al., "Features level sentiment mining in enterprise systems from informal text corpus using machine learning techniques," Enterprise Inf. Syst., vol. 18, 2024.
[20] M. Ghiasi, S. Zendehboudi, and A. Mohsenipour, "Decision tree-based diagnosis of coronary artery disease: CART model," Comput. Methods Programs Biomed., vol. 192, p. 105400, 2020.
[21] D. Chung, J. Yun, J. Lee, and Y. Jeon, "Predictive model of employee attrition based on stacking ensemble learning," Expert Syst. Appl., vol. 215, p. 119364, 2023.
[22] B. Gaye, D. Zhang, and A. Wulamu, "Improvement of support vector machine algorithm in big data background," Math. Probl. Eng., vol. 2021, pp. 1-9, 2021.
[23] K.-L. Du, B. Jiang, J. Lu, J. Hua, and M. Swamy, "Exploring kernel machines and support vector machines: Principles, techniques, and future directions," Mathematics, 2024.
[24] C.-W. Lee, M.-W. Fu, C.-C. Wang, and M. I. Azis, "Evaluating machine learning algorithms for financial fraud detection: Insights from Indonesia," Mathematics, 2025.
[25] N. Rezki and M. Mansouri, "Machine learning for proactive supply chain risk management: Predicting delays and enhancing operational efficiency," Manage. Syst. Prod. Eng., vol. 32, pp. 345-356, 2024.
[26] P. Lalwani, M. Mishra, J. S. Chadha, and P. Sethi, "Customer churn prediction system: A machine learning approach," Computing, vol. 104, pp. 271-294, 2021.
[27] M. Imani, A. Beikmohammadi, and H. Arabnia, "Comprehensive analysis of random forest and XGBoost performance with SMOTE, ADASYN, and GNUS under varying imbalance levels," Technologies, 2025.
[28] A. Manzoor, M. A. Qureshi, E. Kidney, and L. Luca, "A review on machine learning methods for customer churn prediction and recommendations for business practitioners," IEEE Access, vol. 12, pp. 70434-70463, 2024.
[29] Y. Rimal, N. Sharma, and A. Alsadoon, "The accuracy of machine learning models relies on hyperparameter tuning: Student result classification using random forest, randomized search, grid search, bayesian, genetic, and optuna algorithms," Multimed. Tools Appl., vol. 83, pp. 74349-74364, 2024.
[30] M. S. Tahosin et al., "Optimizing brain tumor classification through feature selection and hyperparameter tuning in machine learning models," Inform. Med. Unlocked, 2023.
[31] J. Jin and Y. Zhang, "The analysis of fraud detection in financial market under machine learning," Sci. Rep., vol. 15, 2025.
[32] Y. Nohara, K. Matsumoto, H. Soejima, and N. Nakashima, "Explanation of machine learning models using Shapley additive explanation and application for real data in hospital," Comput. Methods Programs Biomed., vol. 214, p. 106584, 2021.
[33] M. Liao, W. Jiao, and J. Zhang, "Research on trade credit risk assessment for foreign trade enterprises based on explainable machine learning," Information, 2025.
[34] K. Konar, S. Das, S. Das, and S. Misra, "Employee attrition prediction using Bayesian optimized stacked ensemble learning and explainable AI," SN Comput. Sci., vol. 6, 2025.
[35] A. Raza et al., "Predicting employee attrition using machine learning approaches," Appl. Sci., 2022.
[36] K. G. Al-Hashedi and P. Magalingam, "Financial fraud detection applying data mining techniques: A comprehensive review from 2009 to 2019," Comput. Sci. Rev., vol. 40, p. 100402, 2021.
[37] X. Liu et al., "Customer churn prediction model based on hybrid neural networks," Sci. Rep., vol. 14, 2024.
[38] Y. Lei, H. Qiaoming, and Z. Tong, "Research on supply chain financial risk prevention based on machine learning," Comput. Intell. Neurosci., 2023.
[39] S. Najafi-Zangeneh, N. Shams-Gharneh, A. Arjomandi-Nezhad, and S. H. Zolfani, "An improved machine learning-based employees attrition prediction framework with emphasis on feature selection," Mathematics, vol. 9, p. 1226, 2021.
[40] S. Wu, W.-C. Yau, T. Ong, and S.-C. Chong, "Integrated churn prediction and customer segmentation framework for telco business," IEEE Access, vol. 9, pp. 62118–62136, 2021.
[41] E. Ileberi, Y. Sun, and Z. Wang, "Performance evaluation of machine learning methods for credit card fraud detection using SMOTE and AdaBoost," IEEE Access, 2021.
[42] F. Alarfaj, I. Malik, H. U. Khan, N. Almusallam, M. Ramzan, and M. Ahmed, "Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms," IEEE Access, 2023.
[43] K. R. Ahmed, M. E. Ansari, M. N. Ahsan, A. Rohan, M. B. Uddin, and M. A. H. Rivin, "Deep learning framework for interpretable supply chain forecasting using SOM, ANN and SHAP," Sci. Rep., vol. 15, 2025.
[44] M. Bassiouni, R. Chakrabortty, K. M. Sallam, and O. K. Hussain, "Deep learning approaches to identify order status in a complex supply chain," Expert Syst. Appl., vol. 250, p. 123947, 2024.
[45] D. Elreedy, A. F. Atiya, and F. Kamalov, "A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning," Mach. Learn., pp. 1–21, 2023.
[46] F. Pedregosa et al., "Scikit-learn: Machine learning in Python," J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.
[47] D. Chicco and G. Jurman, "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation," BMC Genomics, vol. 21, 2020.
[48] H. Wang, Q. Liang, J. T. Hancock, and T. M. Khoshgoftaar, "Feature selection strategies: A comparative analysis of SHAP-value and importance-based methods," J. Big Data, vol. 11, pp. 1–16, 2024.
[49] J. G. Brandão et al., "Optimization of machine learning models for sentiment analysis in social media," Inf. Sci., 2025.
[50] N. Chandrasekhar and S. Peddakrishna, "Enhancing heart disease prediction accuracy through machine learning techniques and optimization," Processes, 2023.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Azka Nahya Amanta, Lukman Santoso (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.



