YAPC::EU::2005

Using Perl to Gather Information from the Web

Using Perl to Gather Information from the Web

By Tom Hukins
Date: Friday, 2 September 2005 15:40
Duration: 20 minutes
Language:


Whilst some Web site owners have opened up their information, either through REST or SOAP interfaces, many have not. Screen scraping remains the only viable approach to gather information from such Web sites.

My talk will explore how Perl, WWW::Mechanize and XPath can make gathering information from such sites easier and more robust, even when working with badly formed HTML. I will compare the XPath approach to the more commonly used tokenising technique used by HTML::Parser.

I will also discuss other tools that help developers gather information from sites lacking public interfaces and how to use these tools to write simple, flexible Perl code.


Valid XHTML 1.0!   Valid CSS!