HDFS with cache system – a paradigm for performance improvement
Downloads
Due to online activities and use of resources related to computing, data is being generated at an enormous rate. To access and
handle such huge amount of data spread, distributed systems is an efficient mechanism. One such mechanism is a Hadoop distributed file
system (HDFS). However HDFS faces performance drawback. Hence need is felt improve upon the performance. In this paper we are
presenting a new paradigm for improving small file processing in HDFS. The paradigm shift is to use cache information. It is known that
accessing data from cache is much faster as compared to disk access. The cache memory is used to store frequently accessed data & hence
process it much more quickly. This paper describes the system architecture that aims to provide a cache system to HDFS, we can avoid
unnecessary trips HDD to fetch data and thus avoid delay