Skip to main content.
home | support | download

Back to List Archive

Re: swish-e future

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Jul 27 2000 - 16:23:15 GMT
Ok, I'll throw in my wish list:

At 04:49 AM 07/18/00 -0700, jmruiz@boe.es wrote:
>What do you think about future versions of swish-e?

>1. Perl and php modules for searching index files

I'd really like this.  I'm running swish from mod_perl and the fork/exec to
swish really takes away the benefits of mod_perl.  I guess what would be
best is a threaded swish server as that might use less resources than
embedding swish in each web server process, and probably easier to setup
and use. But maybe making a perl xs interface to the search routines would
be a good first step.


>2. Add Files to the index file

I'm not sure I'm doing the right thing, but currently I have two indexes.
One is reindexed once a week, and the other as files are added during the
week.  Then I specify both indexes when searching.  This works ok, as there
aren't that many new files added each week.

What would help me is to be able to specify in the swish.conf file to only
index files newer than some date or newer than some (index) file.  That
way, I could have an incremental.conf file that says something like

   INDEXNEWERTHAN weekly.index

And then swish would only index files newer than the file weekly.index.

>3. Delete Files from the index file

Could this be as simple as maintaining a list of files to delete, and
search.c could throw away any results for any files in that list?



>10. Option to retrive documents with words highlighted
>in some way.

Should that be a function of swish?  Swish would have to read the source
files to be able to do any highlighting, of course, and this becomes tricky
when swish only knows the stems of the indexed words.


I'd like to see a synonym/thesaurus feature built into swish.  Currently I
use a text file that lists synonyms, but it's not very efficient since I
have to read the file and build a lookup hash (and perhaps stem all the
words) for each search.  For example, currently searching for doctor will
find documents with "doctor", but it will also display a note: "Doctors -
see Physicians".  

It would be faster to have the list of synonyms indexed by swish and
optionally have swish search the related search terms automatically.


Swish and ht://dig are often offered as similar search engines.  ht://dig
seems to be more popular, but I'm not sure why.  Maybe it's ht://dig's
standard search script that people like?  Are there other features of
ht://dig that make it attractive that might be good to add to swish?



Bill Moseley
mailto:moseley@hank.org
Received on Thu Jul 27 12:26:51 2000