Skip to main content.
home | support | download

Back to List Archive


From: Zaheed Haque <zaheed(at)>
Date: Mon Jul 27 1998 - 20:19:40 GMT

I am new and still learning to operate Swish and Wget..So here we go..

I use WGET to collect info from about 50 Web sites these sites are
Universities.. and then I use Swish to index them. 


1. Due to limited disk space WGET fills up my disk and I have no room
for indexing and index.

2. After the indexing process is done I delete my resource/collected
files.. so when I do update I have to do all the thing from start
again.. which is a pain!

Well the solution is more disk space offcourse but I don't have any
money :-)

What I wonder is ..

1. I want to run WGET and Swish in a sequence .. where..

a. WGET gets a file from the external site and then saves it to a temp

b. SWISH starts indexing from the temp directory

c. WGET/Swish deletes the temp file

d. Swish fixes up the relative linking

e. Do a stamp/MD5/mark on the index so when I update the index it will
not add a old documents which I have already index last week.


2. Swish uses some protocol and do crawling and indexing at the same

What do I do any help!! Thanks for your help

Zaheed Haque
Get your free address at
Received on Mon Jul 27 13:26:58 1998