Hadoop Interview Questions Part-2 5) Describe the significant distinction between HDFS prevent and InputSplit . In easy conditions, prevent is the physical reflection of information while divided is the sensible reflection of information found in the prevent. Split functions a s a middleman between prevent and mapper. Suppose we have two blocks: Block 1: ii nntteell Block 2: Ii ppaatt Now, considering the map, it will study first prevent from ii until ll, but does not know how to process the second prevent simultaneously. Here comes Separated into play, which will type a sensible team of Block1 and Block 2 as a individual prevent. It then types key-value couple using inputformat and information audience and delivers map for further handling With inputsplit, if you have restricted sources, you can boost the divided size to restrict the number of charts. For example, if there are 10 prevents of 640MB (64MB each) and there are restricted sources, you can allocate ‘split size’ as 128MB. This will type a sensible team of 128MB, with only 5 charts performing simultaneously. However, if the ‘split size’ property is set to incorrect, whole information file will type one inputsplit and is prepared by individual map, taking a longer period when the information file is larger. 6) What is shipped storage cache and what are its benefits? Distributed Cache, in Hadoop, is a service by MapReduce structure to storage cache information files when required. Learn more in this MapReduce Guide now. Once a information file is cached for a particular job, hadoop will make it available on each information node both in system and in storage, where map and decrease jobs are performing.Later, you can easily accessibility and study the storage cache information file and fill any selection (like range, hashmap) in your rule.