Skip to main content.
home | support | download

Back to List Archive

Re: indexing really long urls - possible?

From: <moseley(at)not-real.hank.org>
Date: Tue May 13 2003 - 16:51:56 GMT
On Tue, May 13, 2003 at 08:12:00AM -0700, Jody Cleveland wrote:
> Hello,
> 
> Well, I've got someone who wants me to index:
> \\vision\www\keetra\wip\digitization\picbooks\current\pdfs\
> 
> Which is on our test windows 2000 server. I run swish-e on a redhat 8 server
> and spider that location. When I do that, I get this message:
> 
> ./spider.pl: Reading parameters from
> '/var/www/cgi-bin/search/vision/spider/visionSpiderConfig.pl'
> 
>  -- Starting to spider:
> http://199.242.176.180/www/keetra/wip/digitization/picbooks/current/pdfs/ --
> 
> Summary for:
> http://199.242.176.180/www/keetra/wip/digitization/picbooks/current/pdfs/
> Skipped: 1  (1.0/sec)
> Indexing Data Source: "External-Program"
> Indexing "stdin"
> 
> Removing very common words...
> no words removed.
> Writing main index...
> err: No unique words indexed!
> .
> 
> So, since that didn't work, I had her copy all her files to
> http://199.242.176.180/picbooks and that works fine. Is swish-e only happy
> with one subdirectory, or is there a configuration somewhere I need to
> change?

Sorry, I don't really follow your question.  

If you want to know why something is not sent to swish-e by the spider run

   SPIDER_DEBUG=skipped  swish-e -S prog ....

before running it and it will tell you why it was skipped.


-- 
Bill Moseley
moseley@hank.org
Received on Tue May 13 16:56:24 2003