Everything about Web and Network Monitoring

Home > Big Data > Big Data Tools that You Need to Know About – Hadoop & NoSQL – Part 1

Big Data Tools that You Need to Know About – Hadoop & NoSQL – Part 1

 

In the last series we introduced the concept of ‘Big Data’ as a big deal that has captured global interest in recent years and provides a gold mine of opportunities for businesses that can leverage it effectively. Now we want to focus our attention on two of the major toolsets that can help companies capitalize on the immense amounts of structured and unstructured data available in today’s digital universe. These tools are Hadoop and NoSQL.

 

The Origins and Growth of Hadoop

 

hadoop

 

 

 

 

 

 

If you’ve heard anything about Big Data chances are good that you’ve also heard some buzz around a platform called Hadoop. Hadoop was developed in 2005 by Doug Cutting and Mike Cafarella. Cutting, who worked for Yahoo at the time, actually named this tool after his son’s toy elephant. Hadoop really came to light as an outgrowth of efforts by Google, Yahoo, and other companies to provide faster methods for indexing web pages and handling the growing data bottleneck. By 2008 Yahoo announced that a 10,000 core Hadoop cluster was being used to run its production index search and since then there’s been no looking back. We’re at the point now where, as one publication put it, Hadoop is “the focal point of an immense big data movement.” A fast-growing ecosystem of commercial vendors has emerged in recent years with companies like Cloudera, HortonWorks, and MapR developing customizable and accessible out-of-the-box solutions for scaling up Hadoop. In 2012 the research firm IDC estimated the Hadoop market at $77M with projected annual growth of 60% to $813M by 2016.

 

How does Hadoop Work?

In a nutshell, Hadoop is an open source data processing framework for storing and analyzing massive amounts of unstructured data across many servers. Unlike a centrally located data warehouse, Hadoop works on an entirely different principle by leveraging the power of distributed computing and parallel processing to divide up data across many nodes (servers). The Hadoop infrastructure is complex and is comprised of its own distributed file system (HDSF) and a programming model called MapReduce. As the Apache site defines it, “Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.”

 

Please join us tomorrow as we continue our discussion on Hadoop.

 

 

 

 

Post Tagged with , ,
Ralph Eck

About Ralph Eck

Ralph is an international businessman with a wealth of experience in developing; telecommunications, data transmission, CATV and internet companies. His experience and expertise positions him uniquely in being able to; analyze, evaluate and critique technology and how it fits into a business’ operational needs while supporting its’ success.

Web & Cloud
Monitoring