Hpricot is a very flexible HTML parser, based on Tanaka Akira's HTree
and John Resig's JQuery, but with the scanner recoded in C (using
Ragel for scanning).

Homepage:
https://github.com/hpricot/hpricot/wiki
