Skip to main content.
home | support | download

Back to List Archive

RE: New alpha version swish-e-2.1.4

From: <Rainer.Scherg(at)>
Date: Mon Nov 13 2000 - 15:45:53 GMT

I was a "little" occupied due to some project work last months -
but now I try to take a breather and check the new swish-e version.

I found the following problems:

1. -------------------

When trying the IndexContents - Option I figgured out, that
the conf parsing routine seems to have trouble with \t characters.

The bug is related to  string.c//char *getword(line, skiplen)

>>        while (((int)*line)==' ' || ((int)*line)=='\t') line++;  
>>        for (i = 0; *line && ((inquotes) ? (*line != '\"') : (!(*line=='
'))); line++) {   

In the first part \t is checked in the second part not.
IMO this should look like:

>>        while (isspace(*line)) line++;  
>>        for (i = 0; *line && ((inquotes) ? (*line != '\"') :
(!isspace(*line))); line++) 	

2. -----------------------

Filtering doesn't work anymore (filter progs are working fine).
The index result log shows a broken pipe.

I don't see the problem yet, can anyone crosscheck this please?

3. ------------------------


 Right now I need a short description of the found pages on the result page
 (say: first 200 chars or 40 words of the indexed text).

 When using fileindexing, of course you can display parts of the files.
 But this doesn't work on http sidering and also not the best way, when
 pdf or doc links.

 What are the chances to get the following functionality:

       StoreDescription  <number>  # 0 = None  >1 Char or word count to
       DecriptionFile     <path>   # separate file/db to store this
						 (could use Joses
compression patches...)

   Implementation could look like follows:

      descr_open_db (...)
      descr_start (path_url_of_index_file)	# new description
      descr_add_str (...)				# build desc string 
      descr_end (...)					# saves the
description for this file
      descr_close_db (...)

 Question is how to return the results.

 Are there any requests for this functionality or am I the only one?


-----Original Message-----
From: []
Sent: Friday, October 13, 2000 11:19 AM
To: Multiple recipients of list
Subject: [SWISH-E] New alpha version swish-e-2.1.4

Hi all, I'm back

There is a new alpha (non stable) version of swish-e:

It fixes several bugs of previous 2.1.X plus:

- These are the new options in config file:
   DefaultContentType  [HTML|TXT|XML]    (*)
   Indexcontents [HTML|TXT|XML] .fileext1 .fileext2 (*)
   BumpPositionCounterCharacters string

(*) Only for FS at this moment

- First of all, fix your reported bugs

- Let the words "and", "or"  and  "not" be in a phrase (reported by Bill).
also need to be applied to 2.0.x).

- Add the ability of returning the header info for each index file to the C
(equivalent to option -x).

- Built in C html spider to make things much easier. This can avoid perl.

- Make stemmer.c thread safe

- Make index file smaller. Properties can make your index files really big.
deflate compression scheme will make it smaller but the well known zlib's 
library deflate compression format does not allow direct access... I  have
big index files because the properties are using a lot of space (55 % of the
size of the file). Consider that if you put all your info from your
documents in 
the index file as properties, you do not need to access the file to get the
This can be a good idea to distribute all your data in, for example, a
without including the files themselves.

- What about the new soundex.c posted early? Should I add it?

- Otion -k (will return all words of an index file starting with...). Eg:
"swish-e -k 
ac -f index.file" will returns all indexed words starting with ac.

If a miss something let me know

BTW, I will be at APACHECON{HYPERLINK ""} Europe, in
London, during the next week. 


This Mail has been checked for Viruses
Attention: Encrypted Mails can NOT be checked !

* * *

Diese Mail wurde auf Viren ueberprueft
Hinweis: Verschluesselte Mails koennen NICHT geprueft werden !
Received on Mon Nov 13 15:47:35 2000