Guus Bosman

software engineering director


You are here

internet

Data-Intensive Text Processing with MapReduce

1

It's beautiful to see a real change in paradigm happening. I remember in college how much I enjoyed programming in functional languages, and how cool it is to be able to look at problems from a different viewpoint. What Google and others have achieved with MapReduce a similar change in the way of looking at problems.

MapReduce is the name of Google's base algorithm for their processing of huge data sets. Since then, other companies have followed suit. I didn't know much about this field and this book is a great introduction. It provides a good description of the foundation, and I love it that it describes practical uses. Examples they gave are machine translations, Google's PageRank, shortest path in a graph etc.

Actually in use

What I like about MapReduce is that it provides an abstraction for distributed computing that is actually being used and is succesful. The book showed the scaling characteristics of an example algorithm (strips for computing word co-occurrence) on Hadoop: a R^2 of 0.997! That means that there is almost a linear scalability increase when you add extra machines.

Want to read more

This is one of those books that makes you want to read more. For example, since reading this book I've looked into terms such as Zipfian, Brewer's CAP Theorem and Heap's Law. I still need to learn more about Expectation Maximization and "Hidden Markov Models", harping back on some fundamental mathematics I had in college.

I want to read more about machine translations now, Koehn's book perhaps. And definitely want to read the Google article, about "unreasonable effectiveness of data".

This is an excellent book, which provides a very readable introduction to the algorithms and real-world implementations.

ISBN: 
9781608453429
language: 
English for work
Author: 
Jimmy Lin, Chris Dyer
/images/books/mapreduce.jpg

Recent comments

Recently read

Books I've recently read: