{"id":1488,"date":"2014-04-20T18:27:27","date_gmt":"2014-04-20T18:27:27","guid":{"rendered":"http:\/\/joapen.com\/blog\/?p=1488"},"modified":"2015-08-12T21:20:59","modified_gmt":"2015-08-12T21:20:59","slug":"apache-hadoop","status":"publish","type":"post","link":"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/","title":{"rendered":"Apache Hadoop"},"content":{"rendered":"<p><a href=\"http:\/\/joapen.com\/blog\/wp-content\/uploads\/2014\/04\/hadoop-logo.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-full wp-image-1506\" src=\"http:\/\/joapen.com\/blog\/wp-content\/uploads\/2014\/04\/hadoop-logo.jpg\" alt=\"hadoop-logo\" width=\"300\" height=\"71\" \/><\/a>Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.<\/p>\n<p>All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. They cover all these <a href=\"http:\/\/joapen.com\/blog\/2014\/04\/18\/before-hadoop-distributed-processing\/\">challenges<\/a> and other ones.<\/p>\n<ul>\n<li>Hadoop is <strong><span style=\"text-decoration: underline;\">scalable<\/span><\/strong>, it linearly adds more nodes to\u00a0the cluster to handle larger data.<\/li>\n<li>Hadoop is <strong><span style=\"text-decoration: underline;\">accessible<\/span><\/strong>, it runs on large clusters of commodity machines or on cloud\u00a0computing services such as Amazon&#8217;s Elastic Compute Cloud (EC2). If you are rich and can purchase an IBM Big Blue, then good for you, but the rest of the mortals, we have to utilize inexpensive servers that both store and process the data.<\/li>\n<li>Hadoop is <strong><span style=\"text-decoration: underline;\">robust<\/span><\/strong>, : It is architected\u00a0with the assumption of frequent hardware failure, so it has been implemented to handle these type of failures.<\/li>\n<li>Hadoop is <strong><span style=\"text-decoration: underline;\">simple<\/span><\/strong>, it allows users to quickly write efficient parallel code.\u00a0Hadoop&#8217;s accessibility and simplicity give it an edge over writing and running large\u00a0distributed programs. Please do not mix with &#8220;easy&#8221;, it requires intelligence and knowledge.<\/li>\n<li>Hadopp is <strong><span style=\"text-decoration: underline;\">versatile<\/span><\/strong>, it understands the &#8220;big data&#8221; challenge where today we do not know the data we will be required to analyze by tomorrow. Hadoop\u2019s breakthrough, businesses, organizations, data and is able to analyze data that was recently considered useless.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. They &#8230; <a title=\"Apache Hadoop\" class=\"read-more\" href=\"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/\" aria-label=\"Read more about Apache Hadoop\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[65,99],"tags":[],"class_list":["post-1488","post","type-post","status-publish","format-standard","hentry","category-hadoop","category-open-source"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Apache Hadoop -<\/title>\n<meta name=\"description\" content=\"Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. All the modules - joapen projects\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache Hadoop -\" \/>\n<meta property=\"og:description\" content=\"Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. All the modules - joapen projects\" \/>\n<meta property=\"og:url\" content=\"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/\" \/>\n<meta property=\"og:site_name\" content=\"joapen projects\" \/>\n<meta property=\"article:published_time\" content=\"2014-04-20T18:27:27+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2015-08-12T21:20:59+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/joapen.com\/blog\/wp-content\/uploads\/2014\/04\/hadoop-logo.jpg\" \/>\n<meta name=\"author\" content=\"joapen\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"joapen\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/20\\\/apache-hadoop\\\/#article\",\"isPartOf\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/20\\\/apache-hadoop\\\/\"},\"author\":{\"name\":\"joapen\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\"},\"headline\":\"Apache Hadoop\",\"datePublished\":\"2014-04-20T18:27:27+00:00\",\"dateModified\":\"2015-08-12T21:20:59+00:00\",\"mainEntityOfPage\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/20\\\/apache-hadoop\\\/\"},\"wordCount\":242,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\"},\"image\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/20\\\/apache-hadoop\\\/#primaryimage\"},\"thumbnailUrl\":\"http:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2014\\\/04\\\/hadoop-logo.jpg\",\"articleSection\":[\"Hadoop\",\"Open Source\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"http:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/20\\\/apache-hadoop\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/20\\\/apache-hadoop\\\/\",\"url\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/20\\\/apache-hadoop\\\/\",\"name\":\"Apache Hadoop -\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/20\\\/apache-hadoop\\\/#primaryimage\"},\"image\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/20\\\/apache-hadoop\\\/#primaryimage\"},\"thumbnailUrl\":\"http:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2014\\\/04\\\/hadoop-logo.jpg\",\"datePublished\":\"2014-04-20T18:27:27+00:00\",\"dateModified\":\"2015-08-12T21:20:59+00:00\",\"description\":\"Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. All the modules - joapen projects\",\"breadcrumb\":{\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/20\\\/apache-hadoop\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"http:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/20\\\/apache-hadoop\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/20\\\/apache-hadoop\\\/#primaryimage\",\"url\":\"http:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2014\\\/04\\\/hadoop-logo.jpg\",\"contentUrl\":\"http:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2014\\\/04\\\/hadoop-logo.jpg\",\"width\":300,\"height\":71},{\"@type\":\"BreadcrumbList\",\"@id\":\"http:\\\/\\\/joapen.com\\\/blog\\\/2014\\\/04\\\/20\\\/apache-hadoop\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/joapen.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Apache Hadoop\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/joapen.com\\\/blog\\\/\",\"name\":\"joapen projects\",\"description\":\"Just a place to write\",\"publisher\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/joapen.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/#\\\/schema\\\/person\\\/23919df2312175fe9c4609203595b217\",\"name\":\"joapen\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\",\"url\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\",\"contentUrl\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\",\"width\":400,\"height\":400,\"caption\":\"joapen\"},\"logo\":{\"@id\":\"https:\\\/\\\/joapen.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/04\\\/joapen-mini.jpeg\"},\"sameAs\":[\"http:\\\/\\\/www.joapen.com\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache Hadoop -","description":"Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. All the modules - joapen projects","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/","og_locale":"en_US","og_type":"article","og_title":"Apache Hadoop -","og_description":"Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. All the modules - joapen projects","og_url":"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/","og_site_name":"joapen projects","article_published_time":"2014-04-20T18:27:27+00:00","article_modified_time":"2015-08-12T21:20:59+00:00","og_image":[{"url":"http:\/\/joapen.com\/blog\/wp-content\/uploads\/2014\/04\/hadoop-logo.jpg","type":"","width":"","height":""}],"author":"joapen","twitter_misc":{"Written by":"joapen","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/#article","isPartOf":{"@id":"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/"},"author":{"name":"joapen","@id":"https:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217"},"headline":"Apache Hadoop","datePublished":"2014-04-20T18:27:27+00:00","dateModified":"2015-08-12T21:20:59+00:00","mainEntityOfPage":{"@id":"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/"},"wordCount":242,"commentCount":0,"publisher":{"@id":"https:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217"},"image":{"@id":"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/#primaryimage"},"thumbnailUrl":"http:\/\/joapen.com\/blog\/wp-content\/uploads\/2014\/04\/hadoop-logo.jpg","articleSection":["Hadoop","Open Source"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/#respond"]}]},{"@type":"WebPage","@id":"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/","url":"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/","name":"Apache Hadoop -","isPartOf":{"@id":"https:\/\/joapen.com\/blog\/#website"},"primaryImageOfPage":{"@id":"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/#primaryimage"},"image":{"@id":"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/#primaryimage"},"thumbnailUrl":"http:\/\/joapen.com\/blog\/wp-content\/uploads\/2014\/04\/hadoop-logo.jpg","datePublished":"2014-04-20T18:27:27+00:00","dateModified":"2015-08-12T21:20:59+00:00","description":"Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. All the modules - joapen projects","breadcrumb":{"@id":"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/#primaryimage","url":"http:\/\/joapen.com\/blog\/wp-content\/uploads\/2014\/04\/hadoop-logo.jpg","contentUrl":"http:\/\/joapen.com\/blog\/wp-content\/uploads\/2014\/04\/hadoop-logo.jpg","width":300,"height":71},{"@type":"BreadcrumbList","@id":"http:\/\/joapen.com\/blog\/2014\/04\/20\/apache-hadoop\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/joapen.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Apache Hadoop"}]},{"@type":"WebSite","@id":"https:\/\/joapen.com\/blog\/#website","url":"https:\/\/joapen.com\/blog\/","name":"joapen projects","description":"Just a place to write","publisher":{"@id":"https:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/joapen.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/joapen.com\/blog\/#\/schema\/person\/23919df2312175fe9c4609203595b217","name":"joapen","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg","url":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg","contentUrl":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg","width":400,"height":400,"caption":"joapen"},"logo":{"@id":"https:\/\/joapen.com\/blog\/wp-content\/uploads\/2021\/04\/joapen-mini.jpeg"},"sameAs":["http:\/\/www.joapen.com"]}]}},"_links":{"self":[{"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts\/1488","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/comments?post=1488"}],"version-history":[{"count":3,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts\/1488\/revisions"}],"predecessor-version":[{"id":1507,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/posts\/1488\/revisions\/1507"}],"wp:attachment":[{"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/media?parent=1488"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/categories?post=1488"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/joapen.com\/blog\/wp-json\/wp\/v2\/tags?post=1488"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}