2011年4月26日

Ch4.mapreduce algorithm design

Chapter 4 of Data-Intensive Text Processing with Map Reduce introduce the efficiently algorithms, pairs and stripes. It display how to use these algorithms to construct the co-occurrence matrix and how to use this matrix to compute the conditional probability. They compare the time complexity between pairs and stripes algorithms. The stripes algorithms can achieve the better efficiency than pairs, however, the pairs algorithm are easy to implement.