Skip to main content.
home | support | download

Back to List Archive

Re: swishspider only indexing file names

From: Ron Samuel Klatchko <rsk(at)not-real.brightmail.com>
Date: Thu Aug 10 2000 - 21:22:49 GMT
Ben Caldwell wrote:
> This may be related to the thread "HTTP spidering - zero results" that
> bounced around on the list in June, but wasn't sure if a resolution was
> ever reached.
> 
> When attempting to index via HTTP, I seem to only be getting as many unique
> words as there are files that I attempt to index. Have pasted a sample of
> the results I'm getting below. In this case, I'm only trying to index the
> first page of the site, but if I set the MaxDepth variable higher than 1, I
> only end up with as many unique words as swishspider attempts to index.

I just tried that and everything worked find (I created an index with
227 words).

One interesting thing to try would be to use another program to get the
source (lynx -source is an easy way) and change only the parts of the
config file that control the HTTP code and see what happens when you
index that as a file.  If the results are identical, then you have some
misconfiguration in the indexing engine and not the retrieval engine.

moo
------------------------------------------------------------
        Ron Samuel Klatchko - Senior Software Jester
            Brightmail Inc - rsk@brightmail.com
Received on Thu Aug 10 17:26:33 2000