Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.
All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. They cover all these challenges and other ones.
- Hadoop is scalable, it linearly adds more nodes to the cluster to handle larger data.
- Hadoop is accessible, it runs on large clusters of commodity machines or on cloud computing services such as Amazon’s Elastic Compute Cloud (EC2). If you are rich and can purchase an IBM Big Blue, then good for you, but the rest of the mortals, we have to utilize inexpensive servers that both store and process the data.
- Hadoop is robust, : It is architected with the assumption of frequent hardware failure, so it has been implemented to handle these type of failures.
- Hadoop is simple, it allows users to quickly write efficient parallel code. Hadoop’s accessibility and simplicity give it an edge over writing and running large distributed programs. Please do not mix with “easy”, it requires intelligence and knowledge.
- Hadopp is versatile, it understands the “big data” challenge where today we do not know the data we will be required to analyze by tomorrow. Hadoop’s breakthrough, businesses, organizations, data and is able to analyze data that was recently considered useless.