Hadoop architecture, data processing

In the process of understanding the basis of Hadoop I found a training course in AWS that is helping me to understand the concepts.

  • Map: it’s a process to map all distributed blocks of data and send the requested job to each one of them. The job sent to the block is a task with a reference to that block.
  • Reduce: it’s an iterative process where all data output from a task is sent back to summarize or join all the information into an unique place.
  • #Replication factor = 3. This means that every data block is replicated 3 times. With it, if a data block fails, another is available.
  • Jobs are divided in tasks that are sent to different blocks.
  • The block size by default is 64Mb.

Hadoop-Architecture-data-processing