Re: RE: LWP,HTTP and HTML modules

From: Mark Gaulin <gaulin(at)>
Date: Tue Jan 19 1999 - 21:30:56 GMT
The argument goes like this:
In the case where you are indexing someone else's web site you are 
not able to change urls as seen on pages or mime types as configured
on the other server.  The only thing you do have control over are the 
extensions (and other options) listed in the swish config file.  Also, 
presumably if you are indexing someone else's web site then you
cannot use the file system method.

I would think that the only reason to index using the HTTP method is
if the server is remote (so filesystem access is not available) or if the
site has a lot of dynamic content.  I suppose it might just be simpler
to configure a indexing job with HTTP, letting the web crawler find everything
on it's own, but it will always be slower.

I think that makes sense...


At 12:45 PM 1/19/99 -0800, Ron Klatchko wrote:
>At 12:14 PM 1/19/99 -0800, Mark Gaulin wrote:
>>The reason you and I can use file extensions to index files
>>is because *we* control those extensions... we know what
>>they mean and they are by definition not dynamically
>I find this a slightly curious statement.  If you control the extensions,
>then I assume you control the web site.  If what you are looking for is to
>make the indexing more efficient, wouldn't it make more sense to use the
>file system methods which will always blow away the HTTP method is indexing
>          Ron Klatchko - Manager, Advanced Technology Group           
>           UCSF Library and Center for Knowledge Management           
