Skip to main content.
home | support | download

Back to List Archive

swishspider only indexing file names

From: Ben Caldwell <caldwell(at)>
Date: Thu Aug 10 2000 - 20:46:49 GMT
More questions about swishspider...

This may be related to the thread "HTTP spidering - zero results" that 
bounced around on the list in June, but wasn't sure if a resolution was 
ever reached.

When attempting to index via HTTP, I seem to only be getting as many unique 
words as there are files that I attempt to index. Have pasted a sample of 
the results I'm getting below. In this case, I'm only trying to index the 
first page of the site, but if I set the MaxDepth variable higher than 1, I 
only end up with as many unique words as swishspider attempts to index.

Indexing Data Source: "HTTP-Crawler"
retrieving (0)...
  (1 words)
Skipping  Too deep.
Skipping  Too deep.

Removing very common words...
no words removed.
Writing main index...
Computing hash table ...
Writing header ...
Writing index entries ...
Writing stopwords ...
1 unique word indexed.
Writing file index...
Writing file list ...
Writing file offsets ...
Writing MetaNames ...
Writing offsets (2)...
1 file indexed.
Running time: 6 seconds.
Indexing done!

Any ideas?


Ben Caldwell - Web/Information Specialist
Trace Research & Development Center
email: caldwell(at) |
Tel: 608.265.2064 | Fax: 608.262.8848 | TTY: 608.263.5408
Received on Thu Aug 10 16:50:34 2000