A Comparative Analysis of Machine Learning Algorithms for Big Data Applications in Predictive Analytics

Nidadavolu Venkat Durga Sai Siva Vara Prasad Raju; Penmetsa Naveena Devi

doi:10.18535/ijsrm/v12i10.ec09

Abstract

As the volume and complexity of data continue to grow, predictive analytics has emerged as a vital tool for extracting actionable insights from big data, driving decision-making across various domains such as healthcare, finance, and e-commerce. However, selecting an appropriate machine learning algorithm for predictive analytics applications is challenging due to differences in algorithmic performance, computational requirements, and scalability, especially in the context of big data. This paper provides a comprehensive comparative analysis of popular machine learning algorithms utilized in predictive analytics, specifically focusing on their effectiveness and feasibility in big data environments.

The study categorizes algorithms based on learning types—supervised, unsupervised, and reinforcement learning—and evaluates their performance across multiple dimensions: prediction accuracy, computational efficiency, scalability, and suitability for real-time analytics. Through a detailed analysis of algorithms, including linear regression, decision trees, support vector machines, neural networks, and clustering techniques, we assess each method’s strengths and limitations in handling large datasets. Additionally, the study introduces a series of metrics, such as accuracy, F1-score, and training time, as benchmarks for assessing the algorithms’ predictive capabilities and computational viability.

A hypothetical case study demonstrates the application of these algorithms on a sample big data set, providing insights into their real-world performance across different predictive analytics scenarios. Visual data representations, including comparative tables and performance graphs, offer a clearer perspective on the trade-offs among algorithm choices. The findings highlight that while certain algorithms like random forests and neural networks achieve higher accuracy in prediction tasks, they may also require substantial computational resources, posing limitations for real-time processing in big data applications.

This paper concludes with recommendations for selecting machine learning algorithms based on specific predictive analytics objectives, data characteristics, and processing requirements. Furthermore, it discusses the challenges associated with implementing these algorithms in big data contexts and explores potential advancements, such as the integration of deep learning and the use of distributed computing, as promising directions for enhancing predictive analytics performance in future applications.

References

Jan, B., Farman, H., Khan, M., Imran, M., Islam, I. U., Ahmad, A., ... & Jeon, G. (2019). Deep learning in big data analytics: a comparative study. Computers & Electrical Engineering, 75, 275-287.Google Scholar ↗
Akundi, S., Soujanya, R., & Madhuri, P. M. (2020). Big Data analytics in healthcare using Machine Learning algorithms: a comparative study.Google Scholar ↗
Biswas, N., Uddin, K. M. M., Rikta, S. T., & Dey, S. K. (2022). A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach. Healthcare Analytics, 2, 100116.Google Scholar ↗
Egwim, C. N., Alaka, H., Egunjobi, O. O., Gomes, A., & Mporas, I. (2024). Comparison of machine learning algorithms for evaluating building energy efficiency using big data analytics. Journal of Engineering, Design and Technology, 22(4), 1325-1350.Google Scholar ↗
Kumar, P. S., & Pranavi, S. (2017, December). Performance analysis of machine learning algorithms on diabetes dataset using big data analytics. In 2017 international conference on infocom technologies and unmanned systems (trends and future directions)(ICTUS) (pp. 508-513). IEEE.Google Scholar ↗
Hussin, S. K., Omar, Y. M., Abdelmageid, S. M., & Marie, M. I. (2020). Traditional machine learning and big data analytics in virtual screening: a comparative study. International Journal of Advanced Computer Research, 10(47), 72-88.Google Scholar ↗
Theng, D., & Theng, M. (2020, July). Machine Learning Algorithms for Predictive Analytics: A Review and New Perspectives. In Conf. High Technol. Lett (Vol. 26, No. 6, pp. 536-545).Google Scholar ↗
Ahmed, N., Barczak, A. L., Rashid, M. A., & Susnjak, T. (2022). Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models. Journal of Big Data, 9(1), 67.Google Scholar ↗
Naganathan, V. (2018). Comparative analysis of Big data, Big data analytics: Challenges and trends. International Research Journal of Engineering and Technology (IRJET), 5(05), 1948-1964.Google Scholar ↗
Singla, A., & Jangir, H. (2020, February). A comparative approach to predictive analytics with machine learning for fraud detection of realtime financial data. In 2020 International Conference on Emerging Trends in Communication, Control and Computing (ICONC3) (pp. 1-4). IEEE.Google Scholar ↗
Nti, I. K., Quarcoo, J. A., Aning, J., & Fosu, G. K. (2022). A mini-review of machine learning in big data analytics: Applications, challenges, and prospects. Big Data Mining and Analytics, 5(2), 81-97.Google Scholar ↗
Khoshaba, F., Kareem, S., Awla, H., & Mohammed, C. (2022, June). Machine learning algorithms in Bigdata analysis and its applications: A Review. In 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA) (pp. 1-8). IEEE.Google Scholar ↗
Wang, J., & Zheng, G. (2020). Research on E-commerce Talents Training in Higher Vocational Education under New Business Background. INTI JOURNAL, 2020(5).Google Scholar ↗
Yusuf, G. T. P., Şimşek, A. S., Setiawati, F. A., Tiwari, G. K., & Kianimoghadam, A. S. (2024). Validation of the Interpersonal Forgiveness Indonesian Scale: An examination of its psychometric properties using confirmatory factor analysis. Psikohumaniora: Jurnal Penelitian Psikologi, 9(1).Google Scholar ↗
YUSUF, G. T. P. (2021). HUBUNGAN ANTARA RELIGIOSITAS DENGAN KEBERSYUKURAN PADA JEMAAH PENGAJIAN MAJELIS TAKLIM USTAZ KEMBAR (Doctoral dissertation, Universitas Mercu Buana Jakarta-Menteng).Google Scholar ↗
Wang, J., & Zhang, Y. (2021). Using cloud computing platform of 6G IoT in e-commerce personalized recommendation. International Journal of System Assurance Engineering and Management, 12(4), 654-666.Google Scholar ↗
Wang, J. (2021). Impact of mobile payment on e-commerce operations in different business scenarios under cloud computing environment. International Journal of System Assurance Engineering and Management, 12(4), 776-789.Google Scholar ↗
Mammadzada, A. Evolving Environmental Immigration Policies Through Technological Solutions: A Focused Analysis of Japan and Canada in the Context of COVID-19.Google Scholar ↗
JOSHI, D., SAYED, F., BERI, J., & PAL, R. (2021). An efficient supervised machine learning model approach for forecasting of renewable energy to tackle climate change. Int J Comp Sci Eng Inform Technol Res, 11, 25-32.Google Scholar ↗
Joshi, D., Sayed, F., Saraf, A., Sutaria, A., & Karamchandani, S. (2021). Elements of Nature Optimized into Smart Energy Grids using Machine Learning. Design Engineering, 1886-1892.Google Scholar ↗
Joshi, D., Parikh, A., Mangla, R., Sayed, F., & Karamchandani, S. H. (2021). AI Based Nose for Trace of Churn in Assessment of Captive Customers. Turkish Online Journal of Qualitative Inquiry, 12(6).Google Scholar ↗
Khambaty, A., Joshi, D., Sayed, F., Pinto, K., & Karamchandani, S. (2022, January). Delve into the Realms with 3D Forms: Visualization System Aid Design in an IOT-Driven World. In Proceedings of International Conference on Wireless Communication: ICWiCom 2021 (pp. 335-343). Singapore: Springer Nature Singapore.Google Scholar ↗
Khambati, A. (2021). Innovative Smart Water Management System Using Artificial Intelligence. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(3), 4726-4734.Google Scholar ↗

[refR-1] Jan, B., Farman, H., Khan, M., Imran, M., Islam, I. U., Ahmad, A., ... & Jeon, G. (2019). Deep learning in big data analytics: a comparative study. Computers & Electrical Engineering, 75, 275-287.Google Scholar ↗

[refR-2] Akundi, S., Soujanya, R., & Madhuri, P. M. (2020). Big Data analytics in healthcare using Machine Learning algorithms: a comparative study.Google Scholar ↗

[refR-3] Biswas, N., Uddin, K. M. M., Rikta, S. T., & Dey, S. K. (2022). A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach. Healthcare Analytics, 2, 100116.Google Scholar ↗

[refR-4] Egwim, C. N., Alaka, H., Egunjobi, O. O., Gomes, A., & Mporas, I. (2024). Comparison of machine learning algorithms for evaluating building energy efficiency using big data analytics. Journal of Engineering, Design and Technology, 22(4), 1325-1350.Google Scholar ↗

[refR-5] Kumar, P. S., & Pranavi, S. (2017, December). Performance analysis of machine learning algorithms on diabetes dataset using big data analytics. In 2017 international conference on infocom technologies and unmanned systems (trends and future directions)(ICTUS) (pp. 508-513). IEEE.Google Scholar ↗

[refR-6] Hussin, S. K., Omar, Y. M., Abdelmageid, S. M., & Marie, M. I. (2020). Traditional machine learning and big data analytics in virtual screening: a comparative study. International Journal of Advanced Computer Research, 10(47), 72-88.Google Scholar ↗

[refR-7] Theng, D., & Theng, M. (2020, July). Machine Learning Algorithms for Predictive Analytics: A Review and New Perspectives. In Conf. High Technol. Lett (Vol. 26, No. 6, pp. 536-545).Google Scholar ↗

[refR-8] Ahmed, N., Barczak, A. L., Rashid, M. A., & Susnjak, T. (2022). Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models. Journal of Big Data, 9(1), 67.Google Scholar ↗

[refR-9] Naganathan, V. (2018). Comparative analysis of Big data, Big data analytics: Challenges and trends. International Research Journal of Engineering and Technology (IRJET), 5(05), 1948-1964.Google Scholar ↗

[refR-10] Singla, A., & Jangir, H. (2020, February). A comparative approach to predictive analytics with machine learning for fraud detection of realtime financial data. In 2020 International Conference on Emerging Trends in Communication, Control and Computing (ICONC3) (pp. 1-4). IEEE.Google Scholar ↗

[refR-11] Nti, I. K., Quarcoo, J. A., Aning, J., & Fosu, G. K. (2022). A mini-review of machine learning in big data analytics: Applications, challenges, and prospects. Big Data Mining and Analytics, 5(2), 81-97.Google Scholar ↗

[refR-12] Khoshaba, F., Kareem, S., Awla, H., & Mohammed, C. (2022, June). Machine learning algorithms in Bigdata analysis and its applications: A Review. In 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA) (pp. 1-8). IEEE.Google Scholar ↗

[refR-13] Wang, J., & Zheng, G. (2020). Research on E-commerce Talents Training in Higher Vocational Education under New Business Background. INTI JOURNAL, 2020(5).Google Scholar ↗

[refR-14] Yusuf, G. T. P., Şimşek, A. S., Setiawati, F. A., Tiwari, G. K., & Kianimoghadam, A. S. (2024). Validation of the Interpersonal Forgiveness Indonesian Scale: An examination of its psychometric properties using confirmatory factor analysis. Psikohumaniora: Jurnal Penelitian Psikologi, 9(1).Google Scholar ↗

[refR-15] YUSUF, G. T. P. (2021). HUBUNGAN ANTARA RELIGIOSITAS DENGAN KEBERSYUKURAN PADA JEMAAH PENGAJIAN MAJELIS TAKLIM USTAZ KEMBAR (Doctoral dissertation, Universitas Mercu Buana Jakarta-Menteng).Google Scholar ↗

[refR-16] Wang, J., & Zhang, Y. (2021). Using cloud computing platform of 6G IoT in e-commerce personalized recommendation. International Journal of System Assurance Engineering and Management, 12(4), 654-666.Google Scholar ↗

[refR-17] Wang, J. (2021). Impact of mobile payment on e-commerce operations in different business scenarios under cloud computing environment. International Journal of System Assurance Engineering and Management, 12(4), 776-789.Google Scholar ↗

[refR-18] Mammadzada, A. Evolving Environmental Immigration Policies Through Technological Solutions: A Focused Analysis of Japan and Canada in the Context of COVID-19.Google Scholar ↗

[refR-19] JOSHI, D., SAYED, F., BERI, J., & PAL, R. (2021). An efficient supervised machine learning model approach for forecasting of renewable energy to tackle climate change. Int J Comp Sci Eng Inform Technol Res, 11, 25-32.Google Scholar ↗

[refR-20] Joshi, D., Sayed, F., Saraf, A., Sutaria, A., & Karamchandani, S. (2021). Elements of Nature Optimized into Smart Energy Grids using Machine Learning. Design Engineering, 1886-1892.Google Scholar ↗

[refR-21] Joshi, D., Parikh, A., Mangla, R., Sayed, F., & Karamchandani, S. H. (2021). AI Based Nose for Trace of Churn in Assessment of Captive Customers. Turkish Online Journal of Qualitative Inquiry, 12(6).Google Scholar ↗

[refR-22] Khambaty, A., Joshi, D., Sayed, F., Pinto, K., & Karamchandani, S. (2022, January). Delve into the Realms with 3D Forms: Visualization System Aid Design in an IOT-Driven World. In Proceedings of International Conference on Wireless Communication: ICWiCom 2021 (pp. 335-343). Singapore: Springer Nature Singapore.Google Scholar ↗

[refR-23] Khambati, A. (2021). Innovative Smart Water Management System Using Artificial Intelligence. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(3), 4726-4734.Google Scholar ↗