On Wed, Feb 02, 2005 at 04:04:36PM -0500, Juan Carlos Avila / MTBASE wrote:
> Hi Bill,
>
> Yes, I'm monitoring my web server's log and while swish-e shows status
> 500, my web server shows 200.
>
> I do not understand what you mean by "running the spider with:
> SPIDER_DEBUG=...." -- I'm quite new to swish-e... sorry.
Sorry, I'm used to setting environment vars at the command line.
I spent last weekend working on windows XP and aged about five years.
Create config.txt:
@servers = ( {
base_url => 'http://your_server/casos/VerCasoIdx?caso_numero=6896',
debug => 'headers, url, skipped',
max_files => 1,
email => 'you@yourmail.whatever',
} );
Then run the spider directly -- I don't know where it's installed on
your machine, but this is what I would do:
perl /usr/local/lib/swish-e/spider.pl config.txt > output
which generates this:
/usr/local/lib/swish-e/spider.pl: Reading parameters from 'config.txt'
-- Starting to spider: http://localhost/index.html --
Request for 'http://localhost/index.html' aborted because: 'dead at /usr/local/lib/swish-e/spider.pl line 688.'
Summary for: http://localhost/index.html
Connection: Close: 1 (1.0/sec)
Skipped: 1 (1.0/sec)
Unique URLs: 1 (1.0/sec)
moseley@bumby:~$ perl /usr/local/lib/swish-e/spider.pl config.txt > output
/usr/local/lib/swish-e/spider.pl: Reading parameters from 'config.txt'
-- Starting to spider: http://localhost/index.html --
vvvvvvvvvvvvvvvv HEADERS for http://localhost/index.html vvvvvvvvvvvvvvvvvvvvv
---- Request ------
GET http://localhost/index.html
Accept-Encoding: gzip; deflate
From: me@inalid.com
User-Agent: swish-e spider 2.2 http://swish-e.org/
---- Response ---
Status: 200 OK
Connection: close
Date: Wed, 02 Feb 2005 21:16:49 GMT
Accept-Ranges: bytes
ETag: "1c0140c-100f-3ffc3496"
Server: Apache/1.3.33 (Debian GNU/Linux) PHP/4.3.9-2 mod_ssl/2.8.22 OpenSSL/0.9.7d mod_perl/1.29
Content-Length: 4111
Content-Type: text/html; charset=iso-8859-1
Content-Type: text/html; charset=iso-8859-1
Last-Modified: Wed, 07 Jan 2004 16:32:22 GMT
Client-Date: Wed, 02 Feb 2005 21:16:49 GMT
Client-Peer: 127.0.0.1:80
Client-Response-Num: 1
Title: Welcome to Your New Home Page!
X-Meta-Author: johnie@debian.org (Johnie Ingram)
X-Meta-Description: The initial installation of Debian/GNU Apache.
X-Meta-GENERATOR: Mozilla/4.05 [en] (X11; I; Linux 2.3.99-pre3 i686) [Netscape]
^^^^^^^^^^^^^^^ END HEADERS ^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +Fetched 0 Cnt: 1 GET http://localhost/index.html 200 OK text/html 4111 parent: depth:0
sleeping 5 seconds
/usr/local/lib/swish-e/spider.pl: Max files Reached
Summary for: http://localhost/index.html
Connection: Close: 2 (0.4/sec)
Off-site links: 10 (2.0/sec)
Total Bytes: 4,111 (822.2/sec)
Total Docs: 1 (0.2/sec)
Unique URLs: 2 (0.4/sec)
So that shows you exactly what the server is sending back. If that
says 500 and your logs say 200 then maybe:
1) you are looking at the wrong longs
2) you web server is telling you a lie
3) spider.pl or LWP::UserAgent/LWP::RobotUA is generating a 500
but I can't think of why it would do that....
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Wed Feb 2 13:18:53 2005