Efficient Customer Data Privacy Management in Hadoop Ecosystems: A Scalable Query Engine Approach
Downloads
Assurance of customer data privacy in the Hadoop ecosystem creates a lot of interesting challenges for large-scale data request processing. Traditional methods involve very resource-consuming table scans that neither cost-effective nor scalable can afford. This paper proposes a new architecture in Hadoop for customers' data retrieval that achieves considerable computation overhead and cost reductions down to one-tenth compared to conventional methods. This would, in turn, use Bloom filters, bucketing, and predicate pushdown to directly optimize the data elimination and fetching processes at a file level, rather than following the inefficiencies prevalent in traditional OLAP systems. Benchmarking results depict scalability and effectiveness ranging over several magnitudes from terabytes to petabytes. This ensures that proposed methodology complies better with data privacy regulations without comprising performance and cost efficiency and hence would work perfectly for the enterprise-grade big data platform.
Downloads
1. Dahdouh, K., Dakkak, A., Oughdir, L., & Ibriz, A. (2019). Large-scale e-learning recommender system based on Spark and Hadoop. Journal of Big Data, 6(1), 1–23. https://doi.org/10.1186/s40537-019-0173-1
2. Jain, V. K. (2017). Big Data and Hadoop. Khanna Publishing.
3. Jayaraman, P. P., Perera, C., Georgakopoulos, D., Dustdar, S., Thakker, D., & Ranjan, R. (2017). Analytics-as-a-service in a multi-cloud environment through semantically-enabled hierarchical data processing. Software: Practice and Experience, 47(8), 1139–1156. https://doi.org/10.1002/spe.2490
4. Kumar, V. N., & Shindgikar, P. (2018). Modern Big Data processing with Hadoop: Expert techniques for architecting end-to-end Big Data solutions to get valuable insights. Packt Publishing Ltd.
5. Landset, S., Khoshgoftaar, T. M., Richter, A. N., & Hasanin, T. (2015). A survey of open source tools for machine learning with big data in the Hadoop ecosystem. Journal of Big Data, 2(1), 1–36. https://doi.org/10.1186/s40537-015-0019-6
6. Mazumder, S., & Dhar, S. (2018). Hadoop ecosystem as enterprise big data platform: Perspectives and practices. International Journal of Information Technology and Management, 17(4), 334–348. https://doi.org/10.1504/IJITM.2018.094161
7. Mazumder, S., Seybold, D., Kritikos, K., & Verginadis, Y. (2019). A survey on data storage and placement methodologies for cloud-big data ecosystem. Journal of Big Data, 6(1), 1–37. https://doi.org/10.1186/s40537-019-0178-9
8. Mazumdar, S., & Dhar, S. (2015, March). Hadoop as Big Data Operating System: The emerging approach for managing challenges of enterprise big data platform. In 2015 IEEE First International Conference on Big Data Computing Service and Applications (pp. 499–505). IEEE. https://doi.org/10.1109/BigDataService.2015.23
9. Patil, A. (2018). Securing MapReduce programming paradigm in Hadoop, cloud, and big data ecosystem. Journal of Theoretical & Applied Information Technology, 96(3), 664–674.
10. Rathore, M. M., Son, H., Ahmad, A., Paul, A., & Jeon, G. (2018). Real-time big data stream processing using GPU with Spark over Hadoop ecosystem. International Journal of Parallel Programming, 46, 630–646. https://doi.org/10.1007/s10766-017-0523-2
11. Romero, O., Herrero, V., Abelló, A., & Ferrarons, J. (2015). Tuning small analytics on big data: Data partitioning and secondary indexes in the Hadoop ecosystem. Information Systems, 54, 336–356. https://doi.org/10.1016/j.is.2015.06.004
12. Sitto, K., & Presser, M. (2015). Field guide to Hadoop: An introduction to Hadoop, its ecosystem, and aligned technologies. O’Reilly Media, Inc.
13. Spivey, B., & Echeverria, J. (2015). Hadoop Security: Protecting your big data platform. O’Reilly Media, Inc.
14. Wu, W., Lin, W., Hsu, C. H., & He, L. (2018). Energy-efficient Hadoop for big data analytics and computing: A systematic review and research insights. Future Generation Computer Systems, 86, 1351–1367. https://doi.org/10.1016/j.future.2018.04.038
15. Storey, V. C., & Song, I. Y. (2017). Big data technologies and management: What conceptual modeling can do. Data & Knowledge Engineering, 108, 50-67.
16. Gupta, A. (2015, March). Big data analysis using computational intelligence and Hadoop: a study. In 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom) (pp. 1397-1401). IEEE.
17. Saha, B., Shah, H., Seth, S., Vijayaraghavan, G., Murthy, A., & Curino, C. (2015, May). Apache tez: A unifying framework for modeling and building data processing applications. In Proceedings of the 2015 ACM SIGMOD international conference on Management of Data (pp. 1357-1369).
18. Benjelloun, F. Z., Lahcen, A. A., & Belfkih, S. (2015). An overview of big data opportunities, applications and tools. 2015 Intelligent Systems and Computer Vision (ISCV), 1-6.
19. Noh, K. S., & Lee, D. S. (2015). Bigdata platform design and implementation model. Indian Journal of science and technology, 8(18), 1.
20. Gupta, D., & Rani, R. (2019). A study of big data evolution and research challenges. Journal of information science, 45(3), 322-340.
21. Moyne, J., Samantaray, J., & Armacost, M. (2016). Big data capabilities applied to semiconductor manufacturing advanced process control. IEEE transactions on semiconductor manufacturing, 29(4), 283-291.
22. Kapil, G., Agrawal, A., & Khan, R. A. (2018). Big data security challenges: Hadoop perspective. International Journal of pure and applied mathematics, 120(6), 11767-11784.
23. Ullah, S., Awan, M. D., & Sikander Hayat Khiyal, M. (2018). Big data in cloud computing: A resource management perspective. Scientific programming, 2018(1), 5418679.
24. Jayanthi, M. D., Sumathi, G., & Sriperumbudur, S. (2016). A framework for real-time streaming analytics using machine learning approach. In Proceedings of national conference on communication and informatics-2016.
25. Ismail, M., Gebremeskel, E., Kakantousis, T., Berthou, G., & Dowling, J. (2017, June). Hopsworks: Improving user experience and development on hadoop with scalable, strongly consistent metadata. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) (pp. 2525-2528). IEEE.
Copyright (c) 2024 Sai Kiran Reddy Malikireddy

This work is licensed under a Creative Commons Attribution 4.0 International License.