In the span of just a couple of years, Hadoop, a free software program named after a toy elephant, has taken over some of the world's biggest Web sites. It controls the top search engines and determines the ads displayed next to the results. It decides what people see on Yahoo's homepage and finds long-lost friends on Facebook.
It has made dealing with and analyzing the unprecedented volumes of data churned out by the Internet not only possible but interesting.
"It's a breakthrough," said Mark Seager, head of advanced computing at the Lawrence Livermore National Laboratory in California. "I think this type of technology will solve a whole new class of problems and open new services."
Three top engineers from Google, Yahoo and Facebook, along with a former executive from Oracle, are betting it will. They've founded a start-up called Cloudera, based in Burlingame, California, that will try and bring Hadoop's smarts to far-flung industries like genomics, retailing and finance.
The core concepts behind the software were nurtured at Google.
By 2003, Google found it more difficult than ever to ingest and index the entire Internet on a regular basis. Adding to these woes, Google lacked a relatively easy-to-use means of analyzing its vast stores of information to figure out the quality of search results and how people behaved across its numerous online services.
With such issues in mind, a pair of Google engineers invented something called MapReduce that -- when paired with the intricate file management technology Google used to index and catalogue the Web -- turned into a savior for the company.
The MapReduce technology makes it possible to break large sets of data into little chunks, spread that information across thousands of computers, ask the computers questions, and receive cohesive answers. Google rewrote its entire search index system to take advantage of MapReduce's...