Re: [SWISH-E:115] Re: Indexing off-site html

From: Jim Winstead <jimw(at)>
Date: Fri Jan 09 1998 - 21:50:30 GMT
It indeed isn't difficult at all. One place to steal a handy fopen()
wrapper that handles URLs from is PHP3 ( in the
functions/fsock.c file.

Combined with some code that reads the files to index from another file,
you don't even really need to building the spidering intelligence into
swish. You just need a spider-like tool that can spit out all of the
URLs that you should then index.

Some day (hopefully sooner rather than later) I'll polish up all my
swish modifications and make them available.


On Jan 09, Ron Klatchko wrote:
> Jerry Kuntz wrote:
> > Maybe I'm being obtuse today, but technically what limits SWISH-E
> > from indexing HTML documents on other servers? Or can it be done?
> Just the fact that no one has implemented it.  It shouldn't be too
> difficult.  In the file index.c, you'll find the functions indexadir and
> indexafile.  These functions and the functions they call assume that
> files are local.  You would need to add code so that remote files can be
> read.  You would also need to add some spider code so that it knows how
> to follow links and when to stop following them.
> I've thought about the same idea for a while.  I've just been waiting
> until I have some free time on my hands to implement it.
