Apache Hadoop is a java based open source programming framework that supports huge amounts of data in a distributed environment. It is sponsored by the Apache Software Foundation. As Hadoop deals with vast amounts of data, it is associated with the phrase “big data”. Hadoop architecture consists of two main components namely MapReduce and HDFS (Hadoop Distributed File System). MapReduce is the java based processing part that is responsible for programming the tasks. It is also known as the Task Tracker. HDFS is responsible for the data storage. It is also known as the Data Node.
Some of the supplemented projects used with Hadoop are HBASE, PIG, HIVE, SQOOP, and ZOO KEEPER. These are used for the fast processing of big data. The Hadoop framework is widely used by top enterprises like Google, Yahoo and IBM. The operating systems supported by Hadoop are Windows and Linux.