{"id":1500,"date":"2014-04-18T17:37:26","date_gmt":"2014-04-18T17:37:26","guid":{"rendered":"http:\/\/joapen.com\/blog\/?p=1500"},"modified":"2014-04-18T17:37:26","modified_gmt":"2014-04-18T17:37:26","slug":"before-hadoop-distributed-processing","status":"publish","type":"post","link":"http:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/","title":{"rendered":"Before Hadoop, distributed processing"},"content":{"rendered":"<p>I&#8217;m reading about Content Delivery Network (CDN) and I found the Apache Hadoop project. I have been shocked about the nature of the project, where this comes from and all the toolkit generated around it. It&#8217;s massive amount of information, but fascinating for me.<\/p>\n<p>Hadoop project comes from the need of requiring more resources for a give goal. The solution has been to distribute the data and the processing of data. You need to process a huge amount of data with a simple computer that offers limited processing cycles, then you use combined group of computers to run these processes in less time.<\/p>\n<p>The major resources considered while distributed processing system are: Processor time, memory, hard drive space, network bandwidth. For instance virtual servers is a\u00a0 sophisticated software that detects idle CPU capacity on a rack of physical server and parcels out the virtual environments to utilize it.<\/p>\n<p>There are so many challenges on distributed processing when it&#8217;s applied at large scale, and Hadoop faces them. It&#8217;s important to mention these challenges to understand (or admire) what the Apache Hadoop project does.<\/p>\n<ul>\n<li>One individual compute node may overheat, crash, experience hard drive failures, or run out of memory or disk space.<\/li>\n<li>The networks can experience partial or total failure if switches and routers break down. The network congestion which causes data transfer.<\/li>\n<li>Multiple implementations or versions of client software may speak slightly different protocols from one another.<\/li>\n<li>If the input data set is several terabytes, then this would require a thousand or more machines to hold it in RAM.<\/li>\n<li>Intermediate data sets generated while performing a large-scale computation can take several times more space than what the original input data.<\/li>\n<li>Synchronization between multiple machines.<\/li>\n<\/ul>\n<p><strong>In each of the mentioned cases, the distributed system should be able to recover from the component failure or transient error condition and continue to make progress<\/strong>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;m reading about Content Delivery Network (CDN) and I found the Apache Hadoop project. I have been shocked about the nature of the project, where this comes from and all the toolkit generated around it. It&#8217;s massive amount of information, but fascinating for me. Hadoop project comes from the need of requiring more resources for &#8230; <a title=\"Before Hadoop, distributed processing\" class=\"read-more\" href=\"http:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/\" aria-label=\"Read more about Before Hadoop, distributed processing\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[65],"tags":[],"class_list":["post-1500","post","type-post","status-publish","format-standard","hentry","category-hadoop"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Before Hadoop, distributed processing -<\/title>\n<meta name=\"description\" content=\"I&#039;m reading about Content Delivery Network (CDN) and I found the Apache Hadoop project. I have been shocked about the nature of the project, where this - joapen projects\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Before Hadoop, distributed processing -\" \/>\n<meta property=\"og:description\" content=\"I&#039;m reading about Content Delivery Network (CDN) and I found the Apache Hadoop project. I have been shocked about the nature of the project, where this - joapen projects\" \/>\n<meta property=\"og:url\" content=\"https:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/\" \/>\n<meta property=\"og:site_name\" content=\"joapen projects\" \/>\n<meta property=\"article:published_time\" content=\"2014-04-18T17:37:26+00:00\" \/>\n<meta name=\"author\" content=\"joapen\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"joapen\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/18\\\/before-hadoop-distributed-processing\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/18\\\/before-hadoop-distributed-processing\\\/\"},\"author\":{\"name\":\"joapen\",\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\"},\"headline\":\"Before Hadoop, distributed processing\",\"datePublished\":\"2014-04-18T17:37:26+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/18\\\/before-hadoop-distributed-processing\\\/\"},\"wordCount\":319,\"commentCount\":0,\"publisher\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\"},\"articleSection\":[\"Hadoop\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/18\\\/before-hadoop-distributed-processing\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/18\\\/before-hadoop-distributed-processing\\\/\",\"url\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/18\\\/before-hadoop-distributed-processing\\\/\",\"name\":\"Before Hadoop, distributed processing -\",\"isPartOf\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#website\"},\"datePublished\":\"2014-04-18T17:37:26+00:00\",\"description\":\"I'm reading about Content Delivery Network (CDN) and I found the Apache Hadoop project. I have been shocked about the nature of the project, where this - joapen projects\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/18\\\/before-hadoop-distributed-processing\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/18\\\/before-hadoop-distributed-processing\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/18\\\/before-hadoop-distributed-processing\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"http:\\\/\\\/joapen.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Before Hadoop, distributed processing\"}]},{\"@type\":\"WebSite\",\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#website\",\"url\":\"http:\\\/\\\/joapen.com\\\/blog\\\/\",\"name\":\"joapen projects\",\"description\":\"Just a place to write\",\"publisher\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"http:\\\/\\\/joapen.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\",\"name\":\"joapen\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\",\"url\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\",\"contentUrl\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\",\"width\":400,\"height\":400,\"caption\":\"joapen\"},\"logo\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\"},\"sameAs\":[\"http:\\\/\\\/www.joapen.com\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Before Hadoop, distributed processing -","description":"I'm reading about Content Delivery Network (CDN) and I found the Apache Hadoop project. I have been shocked about the nature of the project, where this - joapen projects","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/","og_locale":"en_US","og_type":"article","og_title":"Before Hadoop, distributed processing -","og_description":"I'm reading about Content Delivery Network (CDN) and I found the Apache Hadoop project. I have been shocked about the nature of the project, where this - joapen projects","og_url":"https:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/","og_site_name":"joapen projects","article_published_time":"2014-04-18T17:37:26+00:00","author":"joapen","twitter_misc":{"Written by":"joapen","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/#article","isPartOf":{"@id":"https:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/"},"author":{"name":"joapen","@id":"http:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217"},"headline":"Before Hadoop, distributed processing","datePublished":"2014-04-18T17:37:26+00:00","mainEntityOfPage":{"@id":"https:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/"},"wordCount":319,"commentCount":0,"publisher":{"@id":"http:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217"},"articleSection":["Hadoop"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/","url":"https:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/","name":"Before Hadoop, distributed processing -","isPartOf":{"@id":"http:\/\/joapen.com\/blog\/#website"},"datePublished":"2014-04-18T17:37:26+00:00","description":"I'm reading about Content Delivery Network (CDN) and I found the Apache Hadoop project. I have been shocked about the nature of the project, where this - joapen projects","breadcrumb":{"@id":"https:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"http:\/\/joapen.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Before Hadoop, distributed processing"}]},{"@type":"WebSite","@id":"http:\/\/joapen.com\/blog\/#website","url":"http:\/\/joapen.com\/blog\/","name":"joapen projects","description":"Just a place to write","publisher":{"@id":"http:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"http:\/\/joapen.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"http:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217","name":"joapen","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg","url":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg","contentUrl":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg","width":400,"height":400,"caption":"joapen"},"logo":{"@id":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg"},"sameAs":["http:\/\/www.joapen.com"]}]}},"_links":{"self":[{"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts\/1500","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/comments?post=1500"}],"version-history":[{"count":2,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts\/1500\/revisions"}],"predecessor-version":[{"id":1502,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts\/1500\/revisions\/1502"}],"wp:attachment":[{"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/media?parent=1500"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/categories?post=1500"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/tags?post=1500"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}