Skip to main content.
home | support | download

Back to List Archive

Re: ReplaceRules not working as advertised

From: Colin Kuskie <ckuskie(at)not-real.sterlink.net>
Date: Mon Apr 22 2002 - 23:27:11 GMT
---------- Original Message ----------------------------------
From: Bill Moseley <moseley@hank.org>
Reply-To: moseley@hank.org
Date: Mon, 22 Apr 2002 12:01:57 -0700 (PDT)

>At 11:42 AM 04/22/02 -0700, Colin Kuskie wrote:
>>I found that I was getting "duplicate" results when indexing:
>>
>>1000 http://www.sunsetpres.org/Men/ "Sunset Presbyterian Men's Ministry
>Page" 29670
>>1000 http://www.sunsetpres.org/Men/index.html "Sunset Presbyterian Men's
>Ministry Page" 29670
>
>Two different URLs.

to the same information, since index.html is a pretty common default.

>Yes perhaps not the best wording.
>
>You can change the name of of the path stored in the index with
>ReplaceRules, but it doesn't effect what is sent to swish for indexing.
>That's before indexing, not before spidering a URL.
>
>In other words think of it as a pipe
>
>   spider | swish
>
>spider is just passing files to swish, and swish can tell spider
>anything.  

So exactly at what point do the ReplaceRules take place?  If they
were implemented before swish-e invoked the swishspider, then the
system should work as described.

>-S prog with spider.pl is a lot more flexible.  And probably faster,
>too, since it avoids compiling a perl program for every URL.

I'll look at spider.pl, and I'll try to use Randal's pslinky program
to do the downloading for me, just to kick it up another notch.

Colin
Received on Mon Apr 22 23:27:16 2002