Intelligent Fault Detection and Self-Healing Architectures in Distributed Software Systems for Mission-Critical Applications
Downloads
Self-healing and intelligent fault detection systems are very vital frameworks if we are to raise the dependability and resilience of distributed software systems in mission-critical applications. By use of contemporary technologies including predictive analytics, machine learning, and adaptive algorithms, these systems independently repair errors, actively evaluate system health, and find anomalies: Among the techniques these systems apply to keep low operational costs, continuous service delivery, and little downtime are redundancy, failover systems, and real-time diagnostics. Systems with self-healing capability offer scalability and fault tolerance in both dynamic and demanding environments as well as in optimal performance with various workloads. Using reference to its main features, advantages, and techniques, this book discusses intelligent defect management. The focus is on how these satisfy the dependability standards in domains such aviation, finance, and healthcare. This highlights the possibility to reorganise these systems to enhance operational resilience and efficiency, hence strengthening the dependability and autonomy of dispersed systems.
Downloads
[1] N. D. Huynh et al., “Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey,” 2022, [Online]. Available: http://arxiv.org/abs/2202.10594
[2] A. A. Kane, A. G. Marino, F. Fons, S. Nueesch, P. Serwa, and M. Schoetz, “Elastic Gateway Functional Safety Architecture and Deployment: A Case Study,” IEEE Access, vol. 10, no. September, pp. 91771–91801, 2022, doi: 10.1109/ACCESS.2022.3199356.
[3] U. Sikandar et al., “A context-aware and intelligent framework for the secure mission critical systems,” Trans. Emerg. Telecommun. Technol., vol. 33, no. 6, pp. 1–17, 2022, doi: 10.1002/ett.3954.
[4] A. U. Rehman, R. L. Aguiar, and J. P. Barraca, “Fault-Tolerance in the Scope of Cloud Computing,” IEEE Access, vol. 10, pp. 63422–63441, 2022, doi: 10.1109/ACCESS.2022.3182211.
[5] G. Pedrini, “Alma Mater Studiorum Università di Bologna Archivio istituzionale della ricerca Rights / License : The terms and conditions for the reuse of this version of the manuscript are specified in the,” vol. 3, no. April 2024, pp. 109–114, 2022.
[6] S. Bharany et al., “Energy efficient fault tolerance techniques in green cloud computing: A systematic survey and taxonomy,” Sustain. Energy Technol. Assessments, vol. 53, 2022, doi: 10.1016/j.seta.2022.102613.
[7] C. Nam, S. Math, P. Tam, and S. Kim, “Intelligent Resource Allocations for Software-Defined Mission-Critical IoT Services,” Comput. Mater. Contin., vol. 73, no. 2, pp. 4087–4102, 2022, doi: 10.32604/cmc.2022.030575.
[8] J. Porter, D. A. Menascé, and H. Gomaa, “A decentralized approach for discovering runtime software architectural models of distributed software systems,” Inf. Softw. Technol., vol. 131, pp. 1–50, 2021, doi: 10.1016/j.infsof.2020.106476.
[9] M. J. Farooq and Q. Zhu, “QoE Based Revenue Maximizing Dynamic Resource Allocation and Pricing for Fog-Enabled Mission-Critical IoT Applications,” IEEE Trans. Mob. Comput., vol. 20, no. 12, pp. 3395–3408, 2021, doi: 10.1109/TMC.2020.2999895.
[10] C. Arendt, M. Patchou, S. Bocker, J. Tiemann, and C. Wietfeld, “Pushing the Limits: Resilience Testing for Mission-Critical Machine-Type Communication,” IEEE Veh. Technol. Conf., vol. 2021-September, 2021, doi: 10.1109/VTC2021-Fall52928.2021.9625209.
[11] H. Farag, M. Gidlund, and C. Stefanovic, “A Deep Reinforcement Learning Approach for Improving Age of Information in Mission-Critical IoT,” 2021 IEEE Glob. Conf. Artif. Intell. Internet Things, GCAIoT 2021, pp. 14–18, 2021, doi: 10.1109/GCAIoT53516.2021.9692982.
[12] P. Bhide, D. Shetty, and S. Mikkili, “Review on 6G communication and its architecture, technologies included, challenges, security challenges and requirements, applications, with respect to AI domain,” IET Quantum Commun., no. August, pp. 1–23, 2025, doi: 10.1049/qtc2.12114.
[13] Q. Zhang et al., “Distributed satellite information networks: Architecture, enabling technologies, and trends,” pp. 1–69, 2024, [Online]. Available: http://arxiv.org/abs/2412.12587
[14] C. Trivedi et al., “Explainable AI for Industry 5.0: Vision, Architecture, and Potential Directions,” IEEE Open J. Ind. Appl., vol. 5, no. July 2023, pp. 177–208, 2024, doi: 10.1109/OJIA.2024.3399057.
[15] T. Davis-stewart, “Stress Detection : Stress Detection Framework for Mission-Critical Application : Addressing Cybersecurity Analysts Using Facial Expression Recognition,” vol. 2, no. 3, pp. 1–12, 2024.
[16] I. Moghaddasi, S. Gorgin, and J. A. Lee, “Dependable DNN Accelerator for Safety-Critical Systems: A Review on the Aging Perspective,” IEEE Access, vol. 11, no. July, pp. 89803–89834, 2023, doi: 10.1109/ACCESS.2023.3300376.
[17] M. Pistoia et al., “Paving the way toward 800 Gbps quantum-secured optical channel deployment in mission-critical environments,” Quantum Sci. Technol., vol. 8, no. 3, 2023, doi: 10.1088/2058-9565/acd1a8.
[18] D. Issa Mattos, A. Dakkak, J. Bosch, and H. H. Olsson, “The HURRIER process for experimentation in business-to-business mission-critical systems,” J. Softw. Evol. Process, vol. 35, no. 5, pp. 1–24, 2023, doi: 10.1002/smr.2390.
[19] Z. Paladin, E. Kočan, Ž. Lukšić, N. Kapidani, M. A. Kourtis, and M. C. Batistatos, “5G for Mission Critical Communications: RESPOND-A Project Experiences,” 2023 22nd Int. Symp. INFOTEH-JAHORINA, INFOTEH 2023, no. March, 2023, doi: 10.1109/INFOTEH57020.2023.10094163.
[20] M. Duarte, J. P. Dias, H. S. Ferreira, and A. Restivo, “Evaluation of IoT Self-healing Mechanisms using Fault-Injection in Message Brokers,” Proc. - 4th Int. Work. Softw. Eng. Res. Pract. IoT, SERP4IoT 2022, pp. 9–16, 2022, doi: 10.1145/3528227.3528567.
[21] S. M. Gutiérrez and G. Steinbauer-Wagner, “The Need for a Meta-Architecture for Robot Autonomy,” Electron. Proc. Theor. Comput. Sci. EPTCS, vol. 362, pp. 81–97, 2022, doi: 10.4204/EPTCS.362.9.
[22] M. Barrère and C. Hankin, “Analysing Mission-critical Cyber-physical Systems with AND/OR Graphs and MaxSAT,” ACM Trans. Cyber-Physical Syst., vol. 5, no. 3, 2021, doi: 10.1145/3451169.
[23] X. Guo et al., “Towards scalable, secure, and smart mission-critical IoT systems: review and vision,” Proc. - 2021 Int. Conf. Embed. Software, EMSOFT 2021, pp. 1–10, 2021, doi: 10.1145/3477244.3477624.
[24] M. Silva, J. P. Dias, A. Restivo, and H. S. Ferreira, “A Review on Visual Programming for Distributed Computation in IoT,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12745 LNCS, pp. 443–457, 2021, doi: 10.1007/978-3-030-77970-2_34.
[25] L. Rosa, W. Song, L. Foschini, A. Corradi, and K. Birman, “DerechoDDS: Strongly Consistent Data Distribution for Mission-Critical Applications,” Proc. - IEEE Mil. Commun. Conf. MILCOM, vol. 2021-November, pp. 684–689, 2021, doi: 10.1109/MILCOM52596.2021.9653032.
[26] S. Lins et al., “Artificial intelligence for enhanced mobility and 5g connectivity in UAV-Based critical missions,” IEEE Access, vol. 9, pp. 111792–111801, 2021, doi: 10.1109/ACCESS.2021.3103041.
[27] D. Yu, W. Li, H. Xu, and L. Zhang, “Low Reliable and Low Latency Communications for Mission Critical Distributed Industrial Internet of Things,” IEEE Commun. Lett., vol. 25, no. 1, pp. 313–317, 2021, doi: 10.1109/LCOMM.2020.3021367.
[28] D. Sobhy, R. Bahsoon, L. Minku, and R. Kazman, “Evaluation of Software Architectures under Uncertainty: A Systematic Literature Review,” ACM Trans. Softw. Eng. Methodol., vol. 30, no. 4, 2021, doi: 10.1145/3464305.
[29] M. Ndiaye, G. P. Hancke, A. M. Abu-Mahfouz, and H. Zhang, “Software-defined power grids: A survey on opportunities and taxonomy for microgrids,” IEEE Access, vol. 9, pp. 98973–98991, 2021, doi: 10.1109/ACCESS.2021.3095317.
[30] F. Aminifar, F. Rahmatian, and M. Shahidehpour, “State-of-the-Art in Synchrophasor Measurement Technology Applications in Distribution Networks and Microgrids,” IEEE Access, vol. 9, pp. 153875–153892, 2021, doi: 10.1109/ACCESS.2021.3127915.
[31] “Decentralised Control for Distributed Self-adaptive Systems with Strict Quality-of-Service Requirements,” 2021.
[32] S. S. Khan and H. Wen, “A Comprehensive Review of Fault Diagnosis and Tolerant Control in DC-DC Converters for DC Microgrids,” IEEE Access, vol. 9, pp. 80100–80127, 2021, doi: 10.1109/ACCESS.2021.3083721.
[33] S. Das, S. Wedaj, K. Paul, U. Bellur, and V. J. Ribeiro, “Airmed: Efficient Self-Healing Network of Low-End Devices,” 2020, [Online]. Available: http://arxiv.org/abs/2004.12442
[34] D. R. Perez, M. E. Domingo, I. P. Llopis, and F. J. Carvajal Rodrigo, “System and architecture of an adapted situation awareness tool for first responders,” Proc. Int. ISCRAM Conf., vol. 2020-May, no. May, pp. 928–936, 2020.
[35] N. Burow, R. Burrow, R. Khazan, H. Shrobe, and B. C. Ward, “Moving Target Defense Considerations in Real-Time Safety-and Mission-Critical Systems,” MTD 2020 - Proc. 7th ACM Work. Mov. Target Def., pp. 81–89, 2020, doi: 10.1145/3411496.3421224.
[36] J. P. Dias, T. B. Sousa, A. Restivo, and H. S. Ferreira, “A Pattern-Language for Self-Healing Internet-of-Things Systems,” ACM Int. Conf. Proceeding Ser., 2020, doi: 10.1145/3424771.3424804.
[37] S. Paul, F. Kopsaftopoulos, S. Patterson, and C. A. Varela, “Dynamic Data-Driven Formal Progress Envelopes for Distributed Algorithms,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12312 LNCS, pp. 245–252, 2020, doi: 10.1007/978-3-030-61725-7_29.
Copyright (c) 2024 Gireesh Kambala

This work is licensed under a Creative Commons Attribution 4.0 International License.