Abstract
In the software development process, ensuring the quality of the software is essential. Software defect prediction (SDP) is of significant importance in identifying software modules with a high likelihood of defects. Several machine learning-based defect prediction models have been developed and implemented in recent years. Researchers have also utilized network embedding for SDP, showcasing the adaptability of Natural Language Processing techniques within the domain of defect prediction. This study aims to review, investigate, and discuss network embedding's use in SDP. We examined the previous 15 years' defect prediction articles using network embedding, the majority of which were published in notable conferences and software engineering journals. Each network embedding technique, its findings, and its particular roles in SDP have been described in detail. The papers that have been reviewed are listed in the order of publication along with their comparative assessment. We have developed three research questions that emphasize the significance of analyzing network representations, particularly network embedding, for identifying potential software defects. According to our knowledge, this review is the first to include a thorough analysis of both the transductive and inductive variants of network embedding, along with their potential in machine learning (ML) for predicting software defects. This article extensively explores the challenges and puts forth potential research directions as solutions, intending to effectively guide future research efforts for academics and practitioners in the field of SDP.
Keywords
- Teaching and learning resources
- Kenya
- mixed methods
- explanatory sequential type
References
- 1. Alharthi, Z. S., Alsaeedi, A., & Yafooz, W. M. S. (2021). Software defect prediction approaches: A review. In Proceedings of the 4th International Conference on Bio-Engineering for Smart Technologies (pp. 1-6). https://doi.org/10.1109/BioSMART54244.2021.9677869
- 2. Ali, Z., Qi, G., Muhammad, K., Ali, B., & Abro, W. A. (2020). Paper recommendation based on heterogeneous network embedding. Knowledge-Based Systems, 210, 106438. https://doi.org/10.1016/j.knosys.2020.106438
- 3. Bahaweres, R. B., Jumral, D., Hermadi, I., Suroso, A. I., & Arkeman, Y. (2021). Hybrid software defect prediction based on LSTM (Long Short Term Memory) and word embedding. In Proceedings of the 2nd International Conference On Smart Cities, Automation & Intelligent Computing Systems (pp. 70-75). https://doi.org/10.1109/ICON-SONICS53103.2021.9617182
- 4. Hossain, M., & Chen, H. (2022). Application of Machine Learning on Software Quality Assurance and Testing: A Chronological Survey. International Journal of Computers and their Applications, 29(3), 150-157.
- 5. Cai, H., Zheng, V., & Chang, K. (2018). A comprehensive survey of graph embedding: Problems, Techniques, and Applications. IEEE Transactions on Knowledge & Data Engineering, 30(9), 1616-1637. https://doi.org/10.1109/TKDE.2018.2807452
- 6. Cao, S., Lu, W., & Xu, Q. (2015). Grarep: Learning graph representations with global structural information. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (pp. 891-900). ACM. https://doi.org/10.1145/2806416.2806512
- 7. Cao, S., Lu, W., & Xu, Q. (2016). Deep neural networks for learning graph representations. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (pp. 1145-1152). AAAI Press.
- 8. Chen, H., Su, X., Tian, Y., Perozzi, B., Chen, M., & Skiena, S. (2018). Enhanced network embeddings via exploiting edge labels. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (pp. 4 pages). https://doi.org/10.1145/3269206.3269270
- 9. Chen, L., Ma, W., Zhou, Y., Xu, L., Wang, Z., Chen, Z., & Xu, B. (2016). Empirical analysis of network measures for predicting high severity software faults. Science China Information Sciences, 59, Article 122901. https://doi.org/10.1007/s11432-015-5426-3
- 10. Coscia, J. L. O., Crasso, M., Mateos, C., & Zunino, A. (2012). Estimating Web service interface complexity and quality through conventional object-oriented metrics. In 15th Ibero-American Conference on Software Engineering. https://doi.org/10.19153/cleiej.16.1.4
- 11. Coscia, J. L. O., Crasso, M., Mateos, C., Zunino, A., & Misra, S. (2012). Predicting web service maintainability via object-oriented metrics: A statistics-based approach. Computational Science and Its Applications, Lecture Notes in Computer Science, 7336. https://doi.org/10.1007/978-3-642-31128-4_3
- 12. Dai, Q., Shen, X., Zhang, L., Li, Q., & Wang, D. (2019). Adversarial Training Methods for Network Embedding. In Proceedings of the World Wide Web Conference (pp. 329-339). https://doi.org/10.1145/3308558.3313445
- 13. Dong, T., Shi, H., Zhu, Y., Li, K., Chai, F., & Wang, Y. (2019). Embedded software reliability prediction based on software life cycle. In Proceedings of the IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (pp. 725-729). https://doi.org/10.1109/ISKE47853.2019.9170437
- 14. Dong, Y., Chawla, N. V., & Swami, A. (2017). metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 135-144). https://doi.org/10.1145/3097983.3098036
- 15. Dong, Y., Tang, Y., Cheng, X., Yang, Y., & Wang, S. (2023). SedSVD: Statement-level software vulnerability detection based on Relational Graph Convolutional Network with subgraph embedding. Information and Software Technology, 158. https://doi.org/10.1016/j.infsof.2023.107168
- 16. Du, X., Wang, T., Wang, L., Pan, W., Chai, C., Xu, X., Jiang, B., & Wang, J. (2022). CoreBug: Improving effort-aware bug prediction in software systems using generalized k-core decomposition in class dependency networks. Axioms, 11(5), 205. https://doi.org/10.3390/axioms11050205
- 17. Du, X., Yan, J., Zhang, R., & Zha, H. (2022). Cross-Network Skip-Gram Embedding for Joint Network Alignment and Link Prediction. IEEE Transactions on Knowledge and Data Engineering, 34(3), 1080-1095. https://doi.org/10.1109/TKDE.2020.2997861
- 18. Fan, G., Diao, X., Yu, H., Yang, K., & Chen, L. (2019). Deep semantic feature learning with embedded static metrics for software defect prediction. In Proceedings of the 26th Asia-Pacific Software Engineering Conference (pp. 244-251). https://doi.org/10.1109/APSEC48747.2019.00041
- 19. Gao, H., Lu, M., Pan, C., & Xu, B. (2019). Empirical Study: Are complex network features suitable for cross-version software defect prediction? In Proceedings of the IEEE 10th International Conference on Software Engineering and Service Science (pp. 1-5). https://doi.org/10.1109/ICSESS47205.2019.9040793
- 20. Gong, L., Rajbahadur, G. K. K., Hassan, A. E., & Jiang, S. (2021). Revisiting the impact of dependency network metrics on software defect prediction. IEEE Transactions on Software Engineering. https://doi.org/10.1109/TSE.2021.3131950
- 21. Goyal, P., & Ferrara, E. (2018). Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems, 151, 78-94. https://doi.org/10.1016/j.knosys.2018.03.022
- 22. Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd International Conference on Knowledge Discovery & Data Mining (pp. 855-864). https://doi.org/10.1145/2939672.2939754
- 23. Gurung, S. (2022). Performing software defect prediction using deep learning. Computer and Information Science, 1697. Springer. https://doi.org/10.1007/978-3-031-22405-8_25
- 24. Halstead, M. H. (1977). Elements of software science (Operating and programming systems series).
- 25. Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Representation learning on graphs: Methods and Applications. IEEE Data Engineering, 40(3), 52-74. arXiv:1709.05584
- 26. Hamilton, W. L., Ying, Z., & Leskovec, J. (2017). Inductive representation learning on large graphs. In Proceedings of the 28th International Conference on Neural Information Processing Systems (pp. 1025-1035). https://doi.org/10.48550/arXiv.1706.02216
- 27. Harrison, R., Counsell, S. J., & Nithi, R. V. (1998). An evaluation of the mood set of object-oriented software metrics. IEEE Transactions on Software Engineering, 24(6), 491-496. https://doi.org/10.1109/32.689404
- 28. Hou, M., Ren, J., Zhang, D., Kong, X., Zhang, D., & Xia, F. (2020). Network embedding: Taxonomies, frameworks and applications. Computer Science Review, 38, 100296. https://doi.org/10.1016/j.cosrev.2020.100296
- 29. Huo, X., Yang, Y., Li, M., & Zhan, D. (2018). Learning semantic features for software defect prediction by code comments embedding. In Proceedings of the IEEE International Conference on Data Mining (pp. 1049-1054). https://doi.org/10.1109/ICDM.2018.00133
- 30. Jureczko, M., & Spinellis, D. (2010). Using object-oriented design metrics to predict software defects. Models and Methods of System Dependability (pp. 69-81). Oficyna Wydawnicza Politechniki Wrocławskiej.
- 31. Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations (pp. 1-14). arXiv:1609.02907
- 32. Li, N., Liu, J., He, Z., Zhang, C., & Xie, J. (2022). Network Embedding with dual generation tasks. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2022.3187851
- 33. Li, T., Zhang, J., Yu, P. S., Zhang, Y., & Yan, Y. (2018). Deep dynamic network embedding for link prediction. IEEE Access, 6, 29219-29230. https://doi.org/10.1109/ACCESS.2018.2839770
- 34. Ma, W., Chen, L., Yang, Y., Zhou, Y., & Xu, B. (2016). Empirical analysis of network measures for effort-aware fault-proneness prediction. Information and Software Technology, 69, 50-70. https://doi.org/10.1016/j.infsof.2015.09.001
- 35. McCabe, T. J. (1976). A complexity measure. IEEE Transactions on Software Engineering, 2(4), 308-320. https://doi.org/10.1109/TSE.1976.233837
- 36. Narayana, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., & Jaiswal, S. (2017). graph2vec: Learning distributed representations of graphs. arXiv:1707.05005
- 37. Nguyen, T. H. D., Adams, B., & Hassan, A. E. (2010). Studying the impact of dependency network measures on software quality. In Proceedings of the IEEE International Conference on Software Maintenance (pp. 1-10). https://doi.org/10.1109/ICSM.2010.5609560
- 38. Ou, M., Cui, P., Pei, J., Zhang, Z., & Zhu, W. (2016). Asymmetric transitivity preserving graph embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1105-1114). https://doi.org/10.1145/2939672.2939751
- 39. Pan, W., Ming, H., Yang, Z., & Wang, T. (2022). Comments on using k-core decomposition on class dependency networks to improve bug prediction model's practical performance. IEEE Transactions on Software Engineering. https://doi.org/10.1109/TSE.2022.3140599
- 40. Pereira, J., Groen, A. K., Stroes, E. S. G., & Levin, E. (2019). Graph space embedding. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (pp. 3253-3259). https://doi.org/10.24963/ijcai.2019/451
- 41. Perozzi, B., Kulkarni, V., & Skiena, S. (2016). Walklets: Multiscale graph embeddings for interpretable network classification. ArXiv:abs/1605.02115.
- 42. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge discovery and data mining (pp. 701-710). https://doi.org/10.1145/2623330.2623732
- 43. Pinzger, M., Nagappan, N., & Murphy, B. (2008). Can developer-module networks predict failures? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (pp. 2-12). https://doi.org/10.1145/1453101.1453105
- 44. Premraj, R., & Herzig, K. (2011). Network versus code metrics to predict defects: A replication study. In International Symposium on Empirical Software Engineering and Measurement (pp. 215-224). https://doi.org/10.1109/ESEM.2011.30
- 45. Qiu, J., Yuxiao, D., Ma, H., Li, J., Wang, K., & Tang, J. (2018). Network embedding as matrix factorization: Unifying DeepWalk, LINE, PTE, and node2vec. In Proceedings of the 11th ACM Int. Conf. on Web Search and Data Mining (pp. 459-467). https://doi.org/10.1145/3159652.3159706
- 46. Qu, Y., Liu, T., Chi, J., Jin, Y., Cui, D., He, A., Zheng, Q. (2018). Node2defect: using network embedding to improve software defect prediction. In Proceedings of the 33rd ACM/IEEE Int. Conf. on Automated Software Engineering (pp. 844-849). https://doi.org/10.1145/3238147.3240469
- 47. Qu, Y., & Yin, H. (2021). Evaluating network embedding techniques' performances in software bug prediction. Empirical Software Engineering, 26, 60. https://doi.org/10.1007/s10664-021-09965-5
- 48. Qu, Y., Zheng, Q., Chi, J., Jin, Y., He, A., Cui, D., Zhang, H., & Liu. (2021). Using K-core Decomposition on Class Dependency Networks to improve bug prediction model's practical performance. IEEE Transactions on Software Engineering, 47, 348-366. https://doi.org/10.1109/TSE.2019.2892959
- 49. Ribeiro, L. F. R., Saverese, P. H., & Figueiredo, D. R. (2017). Struc2vec: Learning node representations from structural identity. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 385-394). https://doi.org/10.1145/3097983.3098061
- 50. Shen, X., Pan, S., Liu, W., Ong, Y., & Sun, Q. (2018). Discrete network embedding. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (pp. 3549-3555).
- 51. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). LINE: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web (pp. 1067-1077). https://doi.org/10.1145/2736277.2741093
- 52. Tang, S., Meng, Z., & Liang, S. (2022). Dynamic Co-Embedding Model for temporal attributed networks. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2022.3193564
- 53. Tang, W., Tang, M., Ban, M., Zhao, Z., & Feng, M. (2023). CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection. Journal of Systems and Software, 199. https://doi.org/10.1016/j.jss.2023.111623
- 54. Tong, H., Liu, B., & Wang, S. (2019). Kernel spectral embedding transfer ensemble for heterogeneous defect prediction. IEEE Transactions on Software Engineering, 47(9), 1886-1906. https://doi.org/10.1109/TSE.2019.2939303
- 55. Tosun, A., Turhan, B., & Bener, A. (2009). Validation of network measures as indicators of defective modules in software systems. In Proceedings of the 5th International Conference on Predictor Models in Software Engineering (pp. 1-9). https://doi.org/10.1145/1540438.1540446
- 56. Wang, D., Cui, P., & Zhu, W. (2016). Structural deep network embedding. In Proceedings of the 22nd Int. Conf. on Knowledge Discovery and Data Mining (pp. 1225-1234). https://doi.org/10.1145/2939672.2939753
- 57. Wang, X., Lu, L., Wang, B., Shang, Y., & Yang, H. (2022). Software defect prediction via GIN with hybrid graphical features. In IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion, 411-416. https://doi.org/10.1109/QRS-C57518.2022.00066
- 58. Wang, Z., Ye, X., Wang, C., Cui, J., & Yu, P. S. (2021). Network embedding with completely-imbalanced labels. IEEE Transactions on Knowledge and Data Engineering, 33(11), 3634-3647. https://doi.org/10.1109/TKDE.2020.2971490
- 59. Xie, Y., Yu, B., Lv, S., Zhang, C., Wang, G., & Gong, G. (2021). A survey on heterogeneous network representation learning. Pattern Recognition, 116, 107936. https://doi.org/10.1016/j.patcog.2021.107936
- 60. Xu, J., Ai, J., & Shi, T. (2021). Software Defect Prediction for Specific Defect Types based on Augmented Code Graph Representation. In Proceedings of the Conference on Dependable Systems and Their Applications (pp. 669-678). https://doi.org/10.1109/DSA52907.2021.00097
- 61. Yang, C., Shi, C., Liu, Z., Tu, C., & Sun, M. (2021). Network Embedding: Theories, methods, and applications. Springer Cham.
- 62. Yang, F., Huang, Y., Xu, H., Xiao, P., & Zheng, W. (2022). Fine-Grained software defect prediction based on the method-call sequence. Computational Intelligence and Neuroscience, 4311548. https://doi.org/10.1155/2022/4311548
- 63. Yang, F., Xu, H., Xiao, P., Zhong, F., & Zeng, G. (2023). A Method-Level defect prediction approach based on structural features of method-calling network. IEEE Access, 11, 7933-7946. https://doi.org/10.1109/ACCESS.2023.3239266
- 64. Yang, Y., Ai, J., & Wang, F. (2018). Defect prediction based on the characteristics of multilayer structure of software network. In Proceedings of the IEEE International Conference on Software Quality, Reliability and Security Companion (pp. 27-34). https://doi.org/10.1109/QRS-C.2018.00019
- 65. Yang, Y., Harman, M., Krinke, J., Islam, S., Binkley, D., Zhou, Y., & Xu, B. (2016). An empirical study on dependence clusters for effort-aware fault-proneness prediction. In Proceedings of the 31st IEEE/ACM Int. Conf. on Automated Software Engineering (pp. 296-307).
- 66. Yang, Z., Cohen, W. W., & Salakhutdinov, R. (2016). Revisiting semi-supervised learning with graph embeddings. In Proceedings of the 33rd Int. Conf. on Int. Conf. on Machine Learning (pp. 40-48). https://doi.org/10.48550/arXiv.1603.08861
- 67. Zeng, C., Zhou, C. Y., Lv, S. K., He, P., & Huang, J. (2021). GCN2defect: Graph Convolutional Networks for SMOTETomek-based software defect prediction. In IEEE 32nd International Symposium on Software Reliability Engineering (pp. 69-79). https://doi.org/10.1109/ISSRE52982.2021.00020
- 68. Zhang, D., Yin, J., Zhu, X., & Zhang, C. (2021). Search efficient binary network embedding. ACM Transactions on Knowledge Discovery and Data, 15(4), Article 61, 1-27. https://doi.org/10.1145/3436892
- 69. Zhang, J., Dong, Y., Wang, Y., Tang, J., & Ding, M. (2019). ProNE: Fast and scalable network representation learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (pp. 4278-4284). https://doi.org/10.24963/ijcai.2019/594
- 70. Zhu, W., Wang, X., & Cui, P. (2020). Deep Learning for learning graph representations. W. Pedrycz & S. M. Chen (Eds.), Deep Learning: Concepts and Architectures. Studies in Computational Intelligence, 866, 99-115. https://doi.org/10.1007/978-3-030-31756-0_6
- 71. Zimmermann, T., & Nagappan, N. (2008). Predicting defects using network analysis on dependency graphs. In Proceedings of the ACM/IEEE 30th Int. Conf. on Software Engineering (pp. 531-540). https://doi.org/10.1145/1368088.1368161