Skip to main content.
home | support | download

Back to List Archive

Re: SWISH-E index limits

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Apr 22 2002 - 18:08:29 GMT
At 10:49 AM 04/22/02 -0700, Gerald Klaas wrote:

>And then possibilities of case insensitivity if the host is MS-based
>http://www.sacto.com/Index.htm
>http://www.sacto.com/INDEX.htm
>http://www.sacto.com/INDEX.HTM

And for that I believe in test_url you could even do

     $uri->path( lc $uri->path );

to "normalize" all the paths.

And if you have the same page at different URLs (e.g. using symlinks to
point to the same file from different location) you can use the MD5 option
of the spider.pl which takes a fingerprint of each page.  That should help
with the common situation of two links:

   http://example.com/dir/  and http://example.com/dir/index.html




>> Another option, which would be fast, would be to run another web
>> server/virtual host on a different port, and change the document root.
>> 
>Interesting.  Then you'd use the ReplaceRules directive to
>rewrite the URL as it goes into the index? 

Yep. 


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Mon Apr 22 18:09:59 2002