Skip to main content.
home | support | download

Back to List Archive

Re: Error on win2k when spidering

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Jan 16 2002 - 19:40:34 GMT
Ok, I only have Win98, and I'm using the newest version of the Win32 2.1-dev.

Obviously, permission errors would be an issue on Win2k where they are not
on Win98.  But, unless you change the default, temporary files should be
written to the current directory (or where specified by environment
variable settings).

I have a config file called "s":

  IndexDir e:\\perl\\bin\\perl.exe
  SwishProgParameters prog-bin/spider.pl default http://swish-e.org

And the command to run is:

  swish-e -c s -S prog -v 9

-c s says to use the "s" config file.
-S prog says to use the "prog" input source

In the config file the IndexDire is the program to run, which in this case
is perl.  I'd be interested if Wink2 could run the perl program directly
using the #! (shebang) line at the top of the spider.pl program.

  IndexDir e:\\perl\\bin\\perl.exe

Swish uses the backslash as an escape character, so to use a backslash you
must specify two.  (And, at least in Win98) you have to use the backslash
instead of a forward slash because swish use popen() to run the program,
and that passes through the shell.

  SwishProgParameters prog-bin/spider.pl default http://swish-e.org

That is the command line passed to the IndexDir program (perl in this
case).  Together this is basically:

   e:\perl\bin\perl.exe prog-bin/spider.pl default http://swish-e.org

which tells perl to run spider.pl, and spider.pl gets two parameters,
"default" telling it to use default settings for spidering, and the URL of
what to spider.

Here's what I get:

E:\Program Files\SWISH-E>swish-e -c s -S prog -v 9
Indexing Data Source: "External-Program"
Indexing "e:\perl\bin\perl.exe"
prog-bin/spider.pl: Reading parameters from 'default'
No such signal: SIGHUP at prog-bin/spider.pl line 64.
No such signal: SIGHUP at prog-bin/spider.pl line 64.
http://swish-e.org - Using DEFAULT (HTML) parser -  (323 words)
http://swish-e.org/download.html - Using DEFAULT (HTML) parser -  (76 words)
http://swish-e.org/2.2/docs/CHANGES.html - Using DEFAULT (HTML) parser -
(2532
words)
http://swish-e.org/2.2/docs/index.html - Using DEFAULT (HTML) parser -
(242 words)

The warning:

   No such signal: SIGHUP at prog-bin/spider.pl line 64.

is because win89 doesn't have signals.  On unix you can send a SIGHUP to
the spider process which says tells the spider to abort spidering and let
swish index just the documents spidered so far.

Now, how do you do that with -S http?  (depreciated method, as far as I'm
concerned ;) 

First, I don't think swishspider.pl made it into the Win32 binary.  It's
not on my machine, at least.   David, can you check?

Here's the way I ran it:

E:\Program Files\SWISH-E>cat h
Delay 0


E:\Program Files\SWISH-E>swish-e -S http -c h -i http://swish-e.org -v 3
Indexing Data Source: "HTTP-Crawler"
Indexing "http://swish-e.org"
retrieving http://swish-e.org (0)...
 - Using DEFAULT (HTML) parser -  (323 words)
Skipping http://sourceforge.net/cvs/?group_id=15097:  Wrong method or server.
Skipping http://swish-e.org/download.html:  Already indexed.
Skipping http://www.fsf.org/copyleft/gpl.html:  Wrong method or server.
Skipping http://swish-e.org/Discussion/:  Already indexed.
retrieving http://swish-e.org/download.html (1)...
 - Using DEFAULT (HTML) parser -  (76 words)
Skipping ftp://sunsite.berkeley.edu/pub/swish-e/:  Wrong method or server.
Skipping http://www.fsf.org/copyleft/gpl.html:  Wrong method or server.
Skipping http://swish-e.org/Discussion/:  Already indexed.
retrieving http://swish-e.org/Discussion (1)...
Skipping http://swish-e.org/Discussion/:  Already indexed.
retrieving http://swish-e.org/2.2/docs/CHANGES.html (1)...
 - Using DEFAULT (HTML) parser -  (2532 words)
Skipping http://swish-e.org:  Already indexed.
Skipping http://swish-e.org/2.2/docs/index.html:  Already indexed.
Skipping http://swish-e.org/2.2/docs/CHANGES.html:  Already indexed.
Skipping http://swish-e.org/2.2/docs/CHANGES.html:  Already indexed.
Skipping http://swish-e.org/2.2/docs/CHANGES.html:  Already indexed.
Skipping http://swish-e.org/2.2/docs/CHANGES.html:  Already indexed.
Skipping http://swish-e.org/2.2/docs/CHANGES.html:  Already indexed.
Skipping http://swish-e.org/2.2/docs/SWISH-CONFIG.html:  Already indexed.
Skipping http://swish-e.org/2.2/docs/SWISH-CONFIG.html:  Already indexed.
Skipping http://swish-e.org/2.2/docs/SWISH-CONFIG.html:  Already indexed.
Skipping http://swish-e.org/2.2/docs/SWISH-CONFIG.html:  Already indexed.
Skipping http://swish-e.org/2.2/docs/CHANGES.html:  Already indexed.
Skipping http://swish-e.org/2.2/docs/INSTALL.html:  Already indexed.
Skipping http://swish-e.org/2.2/docs/index.html:  Already indexed.
Skipping http://swish-e.org/2.2/docs/SWISH-CONFIG.html:  Already indexed.
Skipping http://www.fsf.org/copyleft/gpl.html:  Wrong method or server.
Skipping http://swish-e.org/Discussion/:  Already indexed.
retrieving http://swish-e.org/2.2/docs/index.html (1)...

I'm not sure how much I trust all that.  It kind of looks like STDERR and
STDOUT messages are mixed, so they are out of order.



-- 
Bill Moseley
mailto:moseley@hank.org
Received on Wed Jan 16 19:41:44 2002