This post has been imported from my previous blog. I did my best to parse XML properly, but it might have some errors.
If you find one, send a Pull Request.
Next great article from Google: Percolator. The whitepaper describes a new indexing engine, which no longer uses the massive MapReduce algorithm to calculate web pages indexes. As always, Google used a few already existin g tools, like BigTable and build an index updater, which opposite to MapReduce updates small chunks of the index repository drastically reducing :-) the time between a page being crawled and being indexed. Worth to read, even for the transaction scheme implemented on the non-transactional BigTable.