Channel: devnotcorp » Perl

Extract links using LibXML

August 22, 2011, 1:18 pm

≪ Previous: Count pages in PDF

Here’s a little example of the usage of perl’s XML::LibXML to extract links from a HTML page.

#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use XML::LibXML;

my $url = "http://fantasyfilmfest.com/pages/filme.html";
my $fw = "http://www.freshwap.com/index.php?do=search&subaction=search&full_search=1&catlist[]=5&titleonly=3&story=";

my $p = XML::LibXML->new();
my %opts = (
    suppress_errors => 1,
    recover => 1,
);
my $dom = $p->parse_html_file($url, \%opts);
my $root = $dom->getDocumentElement;
my $title;
my $info;

foreach my $node ($root->findnodes("//div[\@class='FilmREITER']")) {
    $title = $node->findvalue('a');
    if ($title =~ m/, the/i) {
        $title =~ s/(.*), the/the $1/i;
    }
    print $title . "\n";
    $dom = $p->parse_html_file($fw.$title, \%opts);
    next if !defined($dom);
    $root = $dom->getDocumentElement;
    foreach my $link ($root->findnodes("//div[\@class='title']/a")) {
        print $link->getAttribute('href')."\n";
    }
    print "--------------\n";
}

↧

↧

Latest Images

Nonprofit donates custom home in this East Bay city for Marine injured in...

Nonprofit donates custom home in this East Bay city for Marine injured in...

April 23, 2024, 7:00 am

New private rooms on Tokaido Shinkansen change the way we travel from Tokyo...

New private rooms on Tokaido Shinkansen change the way we travel from Tokyo...

April 22, 2024, 6:00 am

Ukraine bans military from online gambling amid addiction concerns

Ukraine bans military from online gambling amid addiction concerns

April 22, 2024, 5:17 am

ಮಂಡ್ಯದಿಂದ ಸುಮಲತಾ ದೂರ; ಹೆಚ್‌ಡಿಕೆ ಪರ ಪ್ರಚಾರಕ್ಕಿಳಿಯದ ಸಂಸದೆ –ಬರ್ತಾರೆ ನೋಡೋಣ ಎಂದ...

ಮಂಡ್ಯದಿಂದ ಸುಮಲತಾ ದೂರ; ಹೆಚ್‌ಡಿಕೆ ಪರ ಪ್ರಚಾರಕ್ಕಿಳಿಯದ ಸಂಸದೆ –ಬರ್ತಾರೆ ನೋಡೋಣ ಎಂದ...

April 20, 2024, 8:08 pm

OCBC Bank Singapore Offers Up to 2.8% p.a. Fixed Deposit Promotion from 21...

April 20, 2024, 12:38 pm

National Poetry Month 2024: Maxine Starr

National Poetry Month 2024: Maxine Starr

April 19, 2024, 9:56 am

Vegan Chicken Pot Pie

Vegan Chicken Pot Pie

April 19, 2024, 9:18 am

Firefox UX: On Purpose: Collectively Defining Our Team’s Mission Statement

Firefox UX: On Purpose: Collectively Defining Our Team’s Mission Statement

April 19, 2024, 7:03 am

New $4.5 million East Bay trail path will connect bicyclists, pedestrians to...

New $4.5 million East Bay trail path will connect bicyclists, pedestrians to...

April 18, 2024, 11:05 am

Photographer Gifts for Clients / Print Packaging / 4x6 Photo Box by...

Photographer Gifts for Clients / Print Packaging / 4x6 Photo Box by...

April 17, 2024, 6:48 pm

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

September 22, 2019, 11:40 pm

100+ Short Whatsapp Status in English | Short Status Quotes Words

March 22, 2017, 12:27 am

Happy Birthday Wishes for Bhabhi in Hindi & English |हैप्पी बर्थडे भाभी

March 13, 2020, 3:01 am

Windows Update / Microsoft Update の接続先 URL について

February 27, 2017, 12:32 am

Girls WhatsApp Numbers Collection For Friendship

January 20, 2021, 5:55 pm

[GPGT] Yes 933 Radio DJ 陈宁 Owner of New Cafe

March 27, 2021, 8:53 pm

Isilon CLI Command Reference

May 9, 2017, 9:25 am

Watch! Darrell Djay Perkins Musician At Greater Harvest Where Eric Thomas Is...

July 9, 2019, 11:53 pm

Varzish Sport Tv HD Biss Key Frequency Update

January 15, 2017, 9:03 pm

Happy Birthday SMS Wishes For Best Friend - Funny Bday Wishes Hindi

March 14, 2020, 9:35 pm

Ready Made Periodical Test Questions for all Grades with TOS (1st - 4th Quarter)

October 28, 2018, 2:19 am

COGI: Deletion of records

February 11, 2014, 2:30 pm

Guntur District Police Officers Mobile Numbers

April 17, 2017, 2:10 am

አዋጅ ቁጥር 188-1992 የፌዴራል ሸሪዓ ፍርድ ቤቶችን አቋም ለማጠናከር የወጣ አዋጅ

February 16, 2020, 6:50 pm

Best Way to Reheat Crepes

May 4, 2015, 11:49 am

A Trail Through Guangzhou Sex Toys Market

August 20, 2017, 11:58 pm

Azure Backup Problem - Catalog Failure 0x80131500

December 27, 2014, 1:13 pm

Driver Lower Dash Panel Trim Cover VW Beetle Dashboard Trim Grey - 1C1 858 367

October 19, 2017, 12:43 pm

Jina la mtoto wa kike linaloanza na herufi "G"

August 14, 2017, 1:31 pm

Natsamrat(2016) Full Marathi Movie HDRip Print Download

July 19, 2016, 7:53 pm

More Pages to Explore .....

Latest Images

Nonprofit donates custom home in this East Bay city for Marine injured in...

Nonprofit donates custom home in this East Bay city for Marine injured in...

April 23, 2024, 7:00 am

New private rooms on Tokaido Shinkansen change the way we travel from Tokyo...

New private rooms on Tokaido Shinkansen change the way we travel from Tokyo...

April 22, 2024, 6:00 am

Ukraine bans military from online gambling amid addiction concerns

Ukraine bans military from online gambling amid addiction concerns

April 22, 2024, 5:17 am

ಮಂಡ್ಯದಿಂದ ಸುಮಲತಾ ದೂರ; ಹೆಚ್‌ಡಿಕೆ ಪರ ಪ್ರಚಾರಕ್ಕಿಳಿಯದ ಸಂಸದೆ –ಬರ್ತಾರೆ ನೋಡೋಣ ಎಂದ...

ಮಂಡ್ಯದಿಂದ ಸುಮಲತಾ ದೂರ; ಹೆಚ್‌ಡಿಕೆ ಪರ ಪ್ರಚಾರಕ್ಕಿಳಿಯದ ಸಂಸದೆ –ಬರ್ತಾರೆ ನೋಡೋಣ ಎಂದ...

April 20, 2024, 8:08 pm

OCBC Bank Singapore Offers Up to 2.8% p.a. Fixed Deposit Promotion from 21...

April 20, 2024, 12:38 pm

National Poetry Month 2024: Maxine Starr

National Poetry Month 2024: Maxine Starr

April 19, 2024, 9:56 am

Vegan Chicken Pot Pie

Vegan Chicken Pot Pie

April 19, 2024, 9:18 am

Firefox UX: On Purpose: Collectively Defining Our Team’s Mission Statement

Firefox UX: On Purpose: Collectively Defining Our Team’s Mission Statement

April 19, 2024, 7:03 am

New $4.5 million East Bay trail path will connect bicyclists, pedestrians to...

New $4.5 million East Bay trail path will connect bicyclists, pedestrians to...

April 18, 2024, 11:05 am

Photographer Gifts for Clients / Print Packaging / 4x6 Photo Box by...

Photographer Gifts for Clients / Print Packaging / 4x6 Photo Box by...

April 17, 2024, 6:48 pm

© 2024 //www.rssing.com