Skip to main content.
home | support | download

Back to List Archive

Fwd: swish fails to close file handles /pipes with prog method

From: Khalid Shukri <khalid(at)not-real.einblick.de>
Date: Tue Jul 23 2002 - 11:14:11 GMT
I have a rather weird problem with swish-e:
I'm trying to index a lot of sites (about 45000), using the prog  method with
the spider.pl  (on a DSL line ) included in the windows binary distribution,
but I want maximally 3 pages from each site. I tried to do this on an old PII
with 64MB RAM running windows 2000. It started well (although slow), but became
slower and slower while progressing through the 45000 urls, and, more, after
some time, it  started reporting "Skipped" about every url. I thought this
might be a problem with insufficient memory, swapping etc. I then divided the
whole amount into chunks of 1000 which I indexed separately. This worked
reasonably  well although still slow. Then I got my brand new p4 with 2 Giga
RAM and 1 GHz CPU .-) on which I installed Debian. I then tried to search my
old indexes from the windows machine, but swish-e always crashed on certain
searchwords. (This is the second problem: Either the index files of the windows
version is different from the linux version, or there's a bug in the linux
version). I then indexed again, and on my new supercomputer the same thing as on
the old windows machine happened.  I put a "open (LOG,file);print LOG
something; close LOG;" in the test_url callback routine of the spider to find
out what 's happening, but at a certain point, the programm stopped to write
anything to the file , saying ("Can't write to closed file handle") . I then
tried again to do the indexing in chunks of 1000, but this time started the
whole 45 processes in parallel. After some time , I tried to open one of the
log files see whats happening, but got the error: "Too many open files".
So, my idea is the following: swish-e seems to open a file handle (or pipe? to
the spider?) each time its moving to the next url, but fails to close it
properly afterwards.
Any help/suggestions available?
Thanks in advance
Khalid
Received on Tue Jul 23 11:18:05 2002