Skip to main content.
home | support | download

Back to List Archive

Re: URL-fixing with callback routines for spider.pl

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu May 19 2005 - 18:17:12 GMT
On Thu, May 19, 2005 at 06:00:08PM +0200, koszalekopalek wrote:
> 	if ($url =~ s{/\(A\(.*?\)\)}{(__bogus__)}) {
> 		if ($bogus_visited{$url}) {
> 			dbg ("BOGUS (duplicate): $url\n");
> 			return 0;
> 		} else {
> 			dbg ("BOGUS (new): $url\n");
> 			$bogus_visited{$url} = 1;
> 		};
> 	};
> 	return 1;
> };

all that looks fine.  A common way to write that is:

   if ( $bogus_visited{ $url }++ ) {

   } else {
       warn "BOGUS (new): $url\n";
       return 1;
   }

but you way is fine.  You can also do:

    $server->{counts}{"BOGUS Duplicates"}++;

and the spider will print that count out in the summary at the end of
the run.



-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Thu May 19 11:17:15 2005