Skip to main content.
home | support | download

Back to List Archive

Re: Selective indexing of sections (was: How to

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Jun 11 2002 - 14:05:36 GMT
At 08:35 AM 06/11/02 -0500, Jody Cleveland wrote:
>>    http://swish-e.org/archive/3392.html
>
>On this page, you mention to add filter_content..... to the config file
>Where does the sub split_page go? I tried putting that in the config file
>also, but when I tried to index I got alot of errors.

Yes, it's not very clear.  It goes in the spider's config file, not in
swish-e's config file.  A bit confusing.

Swish-e has the -S prog method of input where it an external program can
feed documents to swish.  The external program can be anything, something
that fetches documents from a MySQL database, or from remote web servers
(e.g. spider.pl that's in the prog-bin directory).  These external programs
are also useful for manipulating the data before it get's sent to swish --
such as converting pdf to html, or extracting just the content you might
want indexed.

The spider.pl program is complicated enough that it has its own
configuration file.  That's where that code goes, as it's code that effects
the way the spider works.  But the code could be used in another program if
you are not spidering (it could be added to prog-bin/DirTree.pl, for example).

I'll send you a current copy of the code by separate email.  The archive
kind of trashes the formatting.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Tue Jun 11 14:10:25 2002