Introduction to Hadoop

Eng. Niccolò Becchi, Wikido events portal founder, will held a technical seminar about Apache Hadoop on Monday 2013 June 17 at Media Integration and Communication Center.

Data mining representation

Data mining representation

Abstract:

  • what is it and how it came about;
  • who uses it and what for;
  • the map-reduce: an application in many small pieces;
  • and especially when you may agree to use it even if you do not work at Facebook?

Hadoop is a tool that allows you to run scalable applications on clusters consisting of tens, hundreds or even thousands of servers. It is currently used by Facebook, Yahoo, LastFm and many realities that have the need to work on gigabytes or even petabytes of data.

At the core of the framework is the paradigm of the Map-Reduce. Developed internally at Google on its distributed filesystem it was created to respond to his need for parallel processing of large amounts of data. Hadoop is Google’s open source version of their software which anyone can use for processing data on his servers or, possibly, on the Amazon cloud (consuming some credit card!).

During the meeting, you will see the first steps of map-reduce paradigm. In this kind of programming many (but not all) algorithms are rewritable. We will look at some tools that increase productivity in the Map-Reduce application development.

Material: it is recommended to participate bringing a PC with the Java Development Environment installed (JDK >= 1.6) and the Hadoop package (downloadable from: http://hadoop.apache.org/releases.html (#Download) > then choose 1.1.X > current stable version, 1.1 release)

Leave a Reply