Thomas Nyman scribbled on 6/5/05 8:24 AM:
> Sorry about the encrypted mail .. my mistake... which file contains
> the necessary parameters used when spidering. I found on the site the
> following
>
> # my %ccenter = (
>
> # email => 'Lance.Perry(at)not-real.ourdomain.com',
> # base_url => 'http://our.domain.com/ccenter/',
> # delay_sec => '0',
> # max_depth => '1',
> # credentials => 'username:password'
>
> # );
>
> # @servers = ( \%ccenter );
>
> the question is where should this go?
>
it goes in a config file, by default SwishSpiderConfig.pl. You can name it
anything you want (e.g., myconfig), if you call it by name from the command
line. Be sure to take out the leading # signs -- those "comment out" the lines
in a Perl script.
called like (for example):
$ spider.pl myconfig | swish-e -S prog -i stdin
to test it, just do:
$ spider.pl myconfig
which will print to stdout.
>
>
>
> 5 jun 2005 kl. 13.54 skrev Thomas Nyman:
>
>
>>Hi
>>
>>I have created a conf file that contains
>>
>>IndexDir http://192.168.1.2/archive/
>>
>>I wish to index all files found in the "archive" on the remote
>>machine. The remote machine uses htpasswd to access it, so one need a
>>password to surf to the machine.
>>
>>When running swish i recieve the following messages
>>
>>Indexing Data Source: "HTTP-Crawler"
>>Indexing "http://192.168.1.2/archive/"
>>Removing very common words...
>>no words removed.
>>Writing main index...
>>err: No unique words indexed!
>>
>>It seems that its not indexing any documents.
>>
>>I have not made any particular changes to any other file than my conf
>>file.
>>
>>I can successfully index on the same machine that swish is
>>installed on.
>>
>>I'm guessing I'm missing something here but I'm not sure what. I
>>would appreciate any pointers. If someone wants me to send additional
>>info I will.
>>
>>Thanks
>>
>>Thomas
>>
>>
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Sun Jun 5 06:52:37 2005