Skip to main content.
home | support | download

Back to List Archive

HTTP spidering - zero results

From: Angel Parn <angel(at)not-real.mv.parnu.ee>
Date: Mon Jun 12 2000 - 10:04:59 GMT
Hi Everyone!

I have problem getting HTTP method indexing to work, at the
same time FS method works great. Symptoms of the problem
look silimar to old case from "swish-e archive":
http://sunsite.berkeley.edu/SWISH-E/archive/0901.html

when running following script:
---------
cd /home/web/search
/home/bin/swish-e
    -S http
    -i http://www.mysite.com/index.php3?date=2000/06/10
    -f /home/web/day.swe
    -c /home/web/search/pp.cfg
---------

following response will be generated:
---------
Indexing Data Source: "HTTP-Crawler"
retrieving http://www.mysite.com/index.php3?date=2000/06/10 (0)...
Removing very common words... no words removed.
Writing main index... no unique words indexed.
Writing file index... no files indexed.
Running time: 21 seconds.
Indexing done!
---------

File /home/web/2000/06/10/day.swe will be created, but without
any keywords.

When I found thread http://sunsite.berkeley.edu/SWISH-E/archive/0901.html
from archives I thought that PERL needs reconfiguring. I have to say that I
have
not the owner of the server, and cannot configure server software. But, I
found
to my surprise that when running helper script:

/home/web/search/swishspider.pl ./ss http://www.mysite.com/index.php3
?date=2000/06/10

I get the files ss.response, ss.links and so on with status code 200
So it works, but I can't understand why I cannot index this through
swish-e (-S http). Maybe my config file is not correct
(I've double-triple-checked it but who knows):

I give the config options of http method which are turned on:
---------
# DIRECTIVES for HTTP METHOD ONLY
MaxDepth 2
Delay 20
TmpDir /home/tmp
SpiderDirectory /home/web/search
---------
Other parameters are given at command line - IndexDir, IndexFile.

TmpDir is perm 777, for debugging I set the /home/web/search
dir perms to 777 too.

Can this be still PERL fault? Helper script works.

Uh, long posting, but I hope anyone who has more experience than
me will help.


Desperately waiting for hints,
Angel Parn
angel@mv.parnu.ee
Received on Mon Jun 12 06:07:41 2000