RE: Q: Swish-E foreign language character support

From: Bill Moseley <moseley(at)>
Date: Sun Feb 04 2001 - 14:32:06 GMT
At 03:20 AM 02/04/01 -0800, Mark wrote:
>Great! A new version! :) My only question is, though, is it downwards
>compatible? And when I say downwards compatible, I do not mean for the
>index-files to be of the same format. But I only mean, are the scripts that
>I am using now to create the indices still able to generate the same indices
>with the new version?

Yes.  You should be able to use the same scripts -- if you are using a
configuration file then  swish-e -c config.conf will still work the same
(except your indexing may be quite a bit faster).

I'm not clear how many people are using 2.0 as there hasn't been that much
feed back about it (I assume that's a good sign) -- so any comments you
have would be great.

>I specificially ask, because of the Spider. I regret to say, but I do not
>think the 1.3 Spider script in Perl is all that good. I was kinda hoping
>improvements have been made in that area too.

What kind of specific problems are you having with the current spider?
What improvements would you like to see?  Is there anyway you can avoid
using the spider method for indexing?

Version 2.2 is just about beta ready.  I've been advocating for the release
after 2.2 to remove the spider code from within swish and instead have two
file source methods: 1) the current file system method, and 2) a general
purpose file source where swish calls an external program and that program
feeds documents to swish.  (Yes, swish-e would still support spidering.)

The advantage is that you could build any type of spider you like, and you
wouldn't be forking and recompiling the perl swishspider for every
document.  It also means you could index anything (from any source) you
like as long as you can feed swish one of it's native file formats
(currently HTML, XML, and text).

Bill Moseley
