Transformer Based Intelligent Virtual Assistant for Automated IT Helpdesk Resolution: A System Implementation and Comparative Evaluation Study

Liza Putri  Pagan; Maya Utami Dewi

doi:10.51903/3467pq60

Authors

Liza Putri Pagan Department of Information Technology, Faculty of Academic Studies, Universitas Sains dan Teknologi Komputer, Semarang, Indonesia 50192 Author https://orcid.org/0009-0002-1593-778X
Maya Utami Dewi Department of Information Technology, Faculty of Academic Studies, Universitas Sains dan Teknologi Komputer, Semarang, Indonesia 50192 Author https://orcid.org/0009-0009-4060-5067

DOI:

https://doi.org/10.51903/3467pq60

Keywords:

BERT, DistilBERT, intelligent virtual assistant, IT service management, natural language processing, RoBERTa, ticket classification, transformer

Abstract

Enterprise IT service management environments face mounting operational pressure as the volume and complexity of support requests increasingly exceed the capacity of conventional human operated helpdesk systems. This study addresses that challenge by designing, implementing, and evaluating a transformer based intelligent virtual assistant for automated IT helpdesk resolution within an enterprise ITSM workflow. Three pre-trained transformer architectures BERT, RoBERTa, and DistilBERT were fine tuned on a publicly available IT helpdesk ticket dataset; following preprocessing and removal of duplicate and incomplete records, 4,800 usable annotated instances were retained, spanning five intent categories: hardware failure, software malfunction, network connectivity, access management, and general inquiry. The system incorporates dual task inference combining intent classification with retrieval based response generation, governed by a confidence gated escalation mechanism that routes low confidence predictions to human agents. RoBERTa achieved the highest classification accuracy at 93.6% with a weighted F1-score of 0.934, while DistilBERT reduced inference latency by 45.8% relative to RoBERTa, offering a computationally efficient alternative for latency-constrained deployments. At the system level, the RoBERTa configuration attained a ticket deflection rate of 92.8% under offline evaluation conditions, confirming the operational viability of the proposed architecture for autonomous first-line incident resolution within the scope of the experimental setup reported here. These findings provide practitioners with empirically grounded, multi criteria guidance for transformer model selection in enterprise helpdesk deployment, and contribute a replicable integration architecture that bridges the gap between isolated model evaluation and production-representative ITSM implementation documented in prior literature.

References

[1] A. Zangari, M. Marcuzzo, M. Schiavinato, A. Gasparetto, and A. Albarelli, “Ticket automation: An insight into current research with applications to multi-level classification scenarios,” Expert Syst. Appl., vol. 225, p. 119984, Sep. 2023, doi: 10.1016/J.ESWA.2023.119984.

[2] H. Zhang and M. O. Shafiq, “Survey of transformers and towards ensemble learning using transformers for natural language processing,” Journal of Big Data 2024 11:1, vol. 11, no. 1, pp. 25-, Feb. 2024, doi: 10.1186/S40537-023-00842-0.

[3] J. Devlin, M.-W. Chang, K. Lee, K. T. Google, and A. I. Language, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Proceedings of the 2019 Conference of the North, pp. 4171–4186, 2019, doi: 10.18653/V1/N19-1423.

[4] L. Xu, H. Xie, Z. Li, F. L. Wang, W. Wang, and Q. Li, “Contrastive Learning Models for Sentence Representations,” ACM Trans. Intell. Syst. Technol., vol. 14, no. 4, p. 67, Jun. 2023, doi: 10.1145/3593590;JOURNAL:JOURNAL:TIST;WGROUP:STRING:ACM.

[5] G. Tucudean, M. Bucos, B. Dragulescu, and C. D. Caleanu, “Natural language processing with transformers: a review,” PeerJ Comput. Sci., vol. 10, p. e2222, Aug. 2024, doi: 10.7717/PEERJ-CS.2222/TABLE-3.

[6] J. Von Der Mosel, A. Trautsch, and S. Herbold, “On the Validity of Pre-Trained Transformers for Natural Language Processing in the Software Engineering Domain,” IEEE Transactions on Software Engineering, vol. 49, no. 4, pp. 1487–1507, Apr. 2023, doi: 10.1109/TSE.2022.3178469.

[7] S. Rustamov, A. Bayramova, and E. Alasgarov, “Development of Dialogue Management System for Banking Services,” Applied Sciences 2021, Vol. 11, Page 10995, vol. 11, no. 22, p. 10995, Nov. 2021, doi: 10.3390/APP112210995.

[8] Y. Li, J. Li, Y. Suhara, A. Doan, and W. C. Tan, “Deep entity matching with pre-trained language models,” Proceedings of the VLDB Endowment, vol. 14, no. 1, pp. 50–60, Sep. 2020, doi: 10.14778/3421424.3421431;SUBPAGE:STRING:BASIC.

[9] Z. Liu, C. Benge, and S. Jiang, “Ticket-BERT: Labeling Incident Management Tickets with Language Models,” Jun. 2023, Accessed: Apr. 03, 2026. [Online]. Available: https://arxiv.org/pdf/2307.00108

[10] A. Rogers, O. Kovaleva, and A. Rumshisky, “A Primer in BERTology: What we know about how BERT works,” Trans. Assoc. Comput. Linguist., vol. 8, pp. 842–866, Feb. 2020, doi: 10.1162/tacl_a_00349.

[11] E. Karlsen, X. Luo, N. Zincir-Heywood, and M. Heywood, “Benchmarking Large Language Models for Log Analysis, Security, and Interpretation,” Journal of Network and Systems Management 2024 32:3, vol. 32, no. 3, pp. 59-, Jun. 2024, doi: 10.1007/S10922-024-09831-X.

[12] A. Areshey and H. Mathkour, “Exploring transformer models for sentiment classification: A comparison of BERT, RoBERTa, ALBERT, DistilBERT, and XLNet,” Expert Syst., vol. 41, no. 11, p. e13701, Nov. 2024, doi: 10.1111/EXSY.13701.

[13] R. Anggrainingsih, G. M. Hassan, and A. Datta, “Evaluating BERT-based language models for detecting misinformation,” Neural Computing and Applications 2025 37:16, vol. 37, no. 16, pp. 9937–9968, Mar. 2025, doi: 10.1007/S00521-025-11101-Z.

[14] S. Zhang et al., “Robust Failure Diagnosis of Microservice System Through Multimodal Data,” IEEE Trans. Serv. Comput., vol. 16, no. 6, pp. 3851–3864, Nov. 2023, doi: 10.1109/TSC.2023.3290018.

[15] M. Spring, J. Faulconbridge, and A. Sarwar, “How information technology automates and augments processes: Insights from Artificial-Intelligence-based systems in professional service operations,” Journal of Operations Management, vol. 68, no. 6–7, pp. 592–618, Sep. 2022, doi: 10.1002/JOOM.1215;JOURNAL:JOURNAL:18731317;CSUBTYPE:STRING:SPECIAL;PAGE:STRING:ARTICLE/CHAPTER.

[16] N. Venkata Sai Jitin Jami et al., “Stratify or Die: Rethinking Data Splits in Image Segmentation,” Sep. 2025, Accessed: Apr. 09, 2026. [Online]. Available: https://arxiv.org/pdf/2509.21056v1

[17] S. Szeghalmy and A. Fazekas, “A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning,” Sensors 2023, Vol. 23, Page 2333, vol. 23, no. 4, p. 2333, Feb. 2023, doi: 10.3390/S23042333.

[18] M. Waseem Sabir, M. Farhan, N. S. Almalki, M. M. Alnfiai, and G. A. Sampedro, “FibroVit Vision transformer-based framework for detection and classification of pulmonary fibrosis from chest CT images,” Front. Med. (Lausanne)., vol. 10, p. 1282200, Nov. 2023, doi: 10.3389/FMED.2023.1282200/TEXT.

[19] Y. Sharma, D. Bhamare, N. Sastry, B. Javadi, and R. Buyya, “SLA Management in Intent-Driven Service Management Systems: A Taxonomy and Future Directions,” ACM Comput. Surv., vol. 55, no. 13 s, Dec. 2023, doi: 10.1145/3589339;PAGE:STRING:ARTICLE/CHAPTER.

[20] J. ; Ricketts et al., “A Scoping Literature Review of Natural Language Processing Application to Safety Occurrence Reports,” Safety 2023, Vol. 9, Page 22, vol. 9, no. 2, p. 22, Apr. 2023, doi: 10.3390/SAFETY9020022.

[21] H. Gweon and M. Schonlau, “Automated Classification for Open-Ended Questions with BERT,” J. Surv. Stat. Methodol., vol. 12, no. 2, pp. 493–504, Apr. 2024, doi: 10.1093/JSSAM/SMAD015.

[22] L. Zhang, “Features extraction based on Naive Bayes algorithm and TF-IDF for news classification,” PLoS One, vol. 20, no. 7, p. e0327347, Jul. 2025, doi: 10.1371/JOURNAL.PONE.0327347.

[23] D. Bamurange and P. Dr. KN Jonathan, “Designing a Hybrid AI Chatbot Framework for Student Support: Integrating NLP and Human Oversight in African Universities,” Journal of Information and Technology, vol. 5, no. 4, pp. 41–52, Jun. 2025, doi: 10.70619/VOL5ISS4PP41-52.

[24] L. Xiao, Q. Li, Q. Ma, J. Shen, Y. Yang, and D. Li, “Text classification algorithm of tourist attractions subcategories with modified TF-IDF and Word2Vec,” PLoS One, vol. 19, no. 10, p. e0305095, Oct. 2024, doi: 10.1371/JOURNAL.PONE.0305095.

[25] B. Peng et al., “Graph Retrieval-Augmented Generation: A Survey,” ACM Trans. Inf. Syst., vol. 44, no. 2, pp. 1–52, Feb. 2026, doi: 10.1145/3777378;JOURNAL:JOURNAL:TOIS;PAGE:STRING:ARTICLE/CHAPTER.

[26] X. Li et al., “From Matching to Generation: A Survey on Generative Information Retrieval,” ACM Trans. Inf. Syst., vol. 43, no. 3, May 2025, doi: 10.1145/3722552.

Transformer Based Intelligent Virtual Assistant for Automated IT Helpdesk Resolution: A System Implementation and Comparative Evaluation Study

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

Sidebar

Editorial Team

Reviewer Team

Peer Review Process

Open Access Policy

Publication Ethics

DOI Policy

Journal License

Archive Policy

Repository Policy

Policy of Screening for Plagiarsm

Open Access Statement

Copyright Terms

Article Processing Charge (APC)