Using Perl to Gather Information from the Web

Por Tom Hukins
Data: sexta-feira, 2 de setembro de 2005 15:40
Duração: 20 minutos
Língua:

Whilst some Web site owners have opened up their information, either through REST or SOAP interfaces, many have not. Screen scraping remains the only viable approach to gather information from such Web sites.

My talk will explore how Perl, WWW::Mechanize and XPath can make gathering information from such sites easier and more robust, even when working with badly formed HTML. I will compare the XPath approach to the more commonly used tokenising technique used by HTML::Parser.

I will also discuss other tools that help developers gather information from sites lacking public interfaces and how to use these tools to write simple, flexible Perl code.