Context-Aware Neural Code Refactoring for Legacy IT Infrastructure: A Semantic-Preserving Framework

Authors

DOI:

https://doi.org/10.51903/d4yrfv98

Keywords:

AI, Automated Refactoring, Legacy Code, Context Aware

Abstract

Legacy IT infrastructure heavily relies on aging monolithic systems, yet modernizing these codebases is often hindered by immense technical debt and the risk of breaking critical business logic. While recent large language models offer automated refactoring capabilities, their token-based processing frequently hallucinates syntax and alters execution semantics in complex inter-procedural environments. To resolve this, this study proposes a Context-Aware Neural Code Refactoring framework that explicitly integrates structural embeddings—Abstract Syntax Trees and Data Flow Graphs—into the neural attention mechanism. Using a quantitative comparative design, the proposed multimodal model was evaluated against standard token-only and hybrid LLM+Static Analysis baselines on LegacyRefact-50, a newly curated dataset of complex Java and C++ enterprise repositories. The empirical results demonstrate that the context-aware framework substantially outperforms both baselines, achieving Syntactic Correctness of 94.2%, Semantic Preservation Ratio of 89.7%, and Execution Equivalence of 81.4%. Conversely, the baseline model only passed 42.1% of its original unit tests. These findings demonstrate that enforcing topological constraints significantly mitigates semantic drift and structural hallucinations during code generation. Ultimately, this multimodal integration establishes a rigorous foundation for safely deploying neural refactoring agents in automated enterprise pipelines, providing a scalable mechanism to significantly mitigate software aging in Java and C++ object-oriented paradigms without jeopardizing core operational services.

References

[1] R. Ramač et al., “Prevalence, common causes and effects of technical debt: Results from a family of surveys with the IT industry,” J. Syst. Softw., vol. 184, p. 111114, Feb. 2022, doi: 10.1016/j.jss.2021.111114.

[2] Z. Irani, R. M. Abril, V. Weerakkody, A. Omar, and U. Sivarajah, “The impact of legacy systems on digital transformation in European public administration: Lesson learned from a multi case analysis,” Gov. Inf. Q., vol. 40, no. 1, p. 101784, Jan. 2023, doi: 10.1016/j.giq.2022.101784.

[3] C. Wen et al., “Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We?,” ACM Trans Knowl Discov Data, vol. 18, no. 7, p. 168:1-168:34, Jun. 2024, doi: 10.1145/3653718.

[4] V. Lenarduzzi, T. Besker, D. Taibi, A. Martini, and F. Arcelli Fontana, “A systematic literature review on Technical Debt prioritization: Strategies, processes, factors, and tools,” J. Syst. Softw., vol. 171, p. 110827, Jan. 2021, doi: 10.1016/j.jss.2020.110827.

[5] “A Survey on Large Language Models for Code Generation | ACM Transactions on Software Engineering and Methodology.” Accessed: Mar. 07, 2026. [Online]. Available: https://dl.acm.org/doi/10.1145/3747588

[6] Z. Zheng et al., “Towards an understanding of large language models in software engineering tasks,” Empir. Softw. Eng., vol. 30, no. 2, p. 50, Dec. 2024, doi: 10.1007/s10664-024-10602-0.

[7] M. Sridharan, M. Mäntylä, and L. Rantala, “Detection, classification and prevalence of self-admitted aging debt,” Empir. Softw. Eng., vol. 30, no. 5, p. 143, Jul. 2025, doi: 10.1007/s10664-025-10696-0.

[8] D. Pomian et al., “Together We Go Further: LLMs and IDE Static Analysis for Extract Method Refactoring,” Apr. 24, 2024, arXiv: arXiv:2401.15298. doi: 10.48550/arXiv.2401.15298.

[9] “ACE: Automated Technical Debt Remediation with Validated Large Language Model Refactorings | Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering.” Accessed: Mar. 07, 2026. [Online]. Available: https://dl.acm.org/doi/10.1145/3696630.3730565

[10] S. M. Abtahi and A. Azim, “Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements,” in 2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge), Apr. 2025, pp. 82–92. doi: 10.1109/Forge66646.2025.00017.

[11] M.-F. Wong, S. Guo, C.-N. Hang, S.-W. Ho, and C.-W. Tan, “Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review,” Entropy, vol. 25, no. 6, p. 888, Jun. 2023, doi: 10.3390/e25060888.

[12] G. Shobha, A. Rana, V. Kansal, and S. Tanwar, “Code Clone Detection—A Systematic Review,” in Emerging Technologies in Data Mining and Information Security, A. E. Hassanien, S. Bhattacharyya, S. Chakrabati, A. Bhattacharya, and S. Dutta, Eds., Singapore: Springer Nature, 2021, pp. 645–655. doi: 10.1007/978-981-33-4367-2_61.

[13] M. Fang, H. Hu, F. Hu, and J. Liu, “A Survey of the Full Process of Code Search Based on Deep Learning,” Concurr. Comput. Pract. Exp., vol. 37, no. 23–24, p. e70277, 2025, doi: 10.1002/cpe.70277.

[14] T. H. M. Le, H. Chen, and M. A. Babar, “Deep Learning for Source Code Modeling and Generation: Models, Applications, and Challenges,” ACM Comput Surv, vol. 53, no. 3, p. 62:1-62:38, Jun. 2020, doi: 10.1145/3383458.

[15] A. Alansari and H. Luqman, “Large Language Models Hallucination: A Comprehensive Survey,” Oct. 09, 2025, arXiv: arXiv:2510.06265. doi: 10.48550/arXiv.2510.06265.

[16] W. Zhang and J. Zhang, “Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review,” Mathematics, vol. 13, no. 5, p. 856, Jan. 2025, doi: 10.3390/math13050856.

[17] M. A. H. M. A. Hodovychenko and D. D. K. D. D. Kurinko, “Analysis of existing approaches to automated refactoring of object-oriented software systems,” Her. Adv. Inf. Technol., vol. 8, no. 2, pp. 179–196, Jun. 2025, doi: 10.15276/hait.08.2025.11.

[18] I. Palit and T. Sharma, “Generating refactored code accurately using reinforcement learning,” Dec. 23, 2024, arXiv: arXiv:2412.18035. doi: 10.48550/arXiv.2412.18035.

[19] Y. C. K. Piao, J. C. Paul, L. D. Silva, A. M. Dakhel, M. Hamdaqa, and F. Khomh, “Refactoring with LLMs: Bridging Human Expertise and Machine Understanding,” Oct. 04, 2025, arXiv: arXiv:2510.03914. doi: 10.48550/arXiv.2510.03914.

[20] S. Cao, X. Sun, L. Bo, Y. Wei, and B. Li, “BGNN4VD: Constructing Bidirectional Graph Neural-Network for Vulnerability Detection,” Inf. Softw. Technol., vol. 136, p. 106576, Aug. 2021, doi: 10.1016/j.infsof.2021.106576.

[21] “Combining Graph-Based Learning With Automated Data Collection for Code Vulnerability Detection | IEEE Journals & Magazine | IEEE Xplore.” Accessed: Mar. 12, 2026. [Online]. Available: https://ieeexplore.ieee.org/document/9293321/

[22] D. Cui et al., “RMove: Recommending Move Method Refactoring Opportunities using Structural and Semantic Representations of Code,” in 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME), Oct. 2022, pp. 281–292. doi: 10.1109/ICSME55016.2022.00033.

[23] F. Zhang, H. Chen, Q. Chen, and J. Liu, “Cloud software code generation via knowledge graphs and multi-modal learning,” J. Cloud Comput., vol. 14, no. 1, p. 37, Jul. 2025, doi: 10.1186/s13677-025-00758-5.

[24] M. R. I. Rabin, N. D. Q. Bui, K. Wang, Y. Yu, L. Jiang, and M. A. Alipour, “On the generalizability of Neural Program Models with respect to semantic-preserving program transformations,” Inf. Softw. Technol., vol. 135, p. 106552, Jul. 2021, doi: 10.1016/j.infsof.2021.106552.

[25] D. Horpácsi, J. Kőszegi, and D. J. Németh, “Towards a Generic Framework for Trustworthy Program Refactoring,” Acta Cybern., vol. 25, no. 4, pp. 753–779, 2022, doi: 10.14232/actacyb.284349.

[26] M. Dilhara, A. Bellur, T. Bryksin, and D. Dig, “Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example,” Proc ACM Softw Eng, vol. 1, no. FSE, p. 29:631-29:653, Jul. 2024, doi: 10.1145/3643755.

[27] E. A. AlOmar, M. W. Mkaouer, C. Newman, and A. Ouni, “On preserving the behavior in software refactoring: A systematic mapping study,” Inf. Softw. Technol., vol. 140, p. 106675, Dec. 2021, doi: 10.1016/j.infsof.2021.106675.

[28] A. Ouni, M. Kessentini, H. Sahraoui, K. Inoue, and M. S. Hamdi, “Improving multi-objective code-smells correction using development history,” J. Syst. Softw., vol. 105, pp. 18–39, Jul. 2015, doi: 10.1016/j.jss.2015.03.040.

Downloads

Published

2026-04-03