Copyright (C) 2000-2012 |
Manpages HTML::LinkExtorSection: User Contributed Perl Documentation (3)Updated: perl v5.6.0 Index Return to Main Contents NAMEHTML::LinkExtor - Extract links from an HTML documentSYNOPSISrequire HTML::LinkExtor; $p = HTML::LinkExtor->new(\&cb, "http://www.perl.org/"); sub cb { my($tag, %links) = @_; print "$tag @{[%links]}\n"; } $p->parse_file("index.html"); DESCRIPTIONHTML::LinkExtor is an HTML parser that extracts links from an HTML document. The HTML::LinkExtor is a subclass of HTML::Parser. This means that the document should be given to the parser by calling the $p->parse() or $p->parse_file() methods.
EXAMPLEThis is an example showing how you can extract links from a document received using LWP:
use LWP::UserAgent; use HTML::LinkExtor; use URI::URL; $url = "http://www.perl.org/"; # for instance $ua = LWP::UserAgent->new; # Set up a callback that collect image links my @imgs = (); sub callback { my($tag, %attr) = @_; return if $tag ne 'img'; # we only look closer at <img ...> push(@imgs, values %attr); } # Make the parser. Unfortunately, we don't know the base yet # (it might be diffent from $url) $p = HTML::LinkExtor->new(\&callback); # Request document and parse it as it arrives $res = $ua->request(HTTP::Request->new(GET => $url), sub {$p->parse($_[0])}); # Expand all image URLs to absolute ones my $base = $res->base; @imgs = map { $_ = url($_, $base)->abs; } @imgs; # Print them out print join("\n", @imgs), "\n"; SEE ALSOthe HTML::Parser manpage, the HTML::Tagset manpage, the LWP manpage, the URI::URL manpageCOPYRIGHTCopyright 1996-2001 Gisle Aas.This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
IndexThis document was created by man2html, using the manual pages. Time: 18:05:21 GMT, April 26, 2024 |