Skip to main content.
home | support | download

Back to List Archive

Re: new swish-e 2.X version

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Oct 02 2000 - 00:19:45 GMT
At 09:36 AM 09/28/00 -0700, jmruiz@boe.es wrote:
>- Last unstable version: swish-e-2.1.0.tar.gz.

Using the 2.1.0 swish-e binary it hangs when trying to index.  It doesn't
get very far before eating all CPU time:

S   PID  PPID %CPU %MEM  RSS   VSZ COMMAND
R 30925 30924 96.9  1.0 1376  2232 swish-LII -c swish.cfg

A gdb backtrace does help much.  (Any tricks for getting better debugging
info?).  I've included the gdb backtrace and the config file below.

>3. BumpPositionCounterCharacters option. See user.config in conf 

Is there a way to say, for example, a character will bump if at the end of
a word, but not in the middle of a word?

   The location was www.swish.com. 
                       ^         ^
                   no bump      bump

>4. New option -x to write more info ("extended header") of the index 
>files while searching.

Can WordCharacters, Ignore and all the index-related settings be displayed
per index?  And then how hard would it be to print an extra field after the
file name to indicate which file the result was found?

I'd like to take each result and pass it to my highlighting routines along
with a pointer to the headers from the index file that contained that
result.  So I'd like to be able to use -x to get full headers per each
index file.

And the total results for all index files is GREAT!

I wish my C skills were better, Jose, as I'd like to help in the development.

Here's the gdb backtrace:

Indexing Data Source: "File-System"
Indexing ../docs..

Program received signal SIGINT, Interrupt.
0x4009d201 in strchr () from /lib/libc.so.6
(gdb) bt
#0  0x4009d201 in strchr () from /lib/libc.so.6
#1  0x806d780 in ?? ()
#2  0x805d8d2 in printfile ()
#3  0x805d9b5 in printfiles ()
#4  0x805d34c in indexadir ()
#5  0x805dbf6 in fs_indexpath ()
#6  0x804c846 in indexpath ()
#7  0x804a07e in main ()
#8  0x40055313 in __libc_start_main (main=0x80492d0 <main>, argc=3,
argv=0xbffff8b4, init=0x8048db8 <_init>, 
    fini=0x8060028 <_fini>, rtld_fini=0x4000ac70 <_dl_fini>,
stack_end=0xbffff8ac) at ../sysdeps/generic/libc-start.c:90



> cat ../readonly/swish.cfg
## This file is for indexing the main database

## Location of files
IndexDir ../docs
IndexFile ../docs/index.swish-e


# Live records are .htm
IndexOnly .htm

MetaNames SUBJECT TITLE DESCRIPTION URLS IDENTIFIER KEYWORDS CREATOR
CATEGORY AUTHOR PUBLISHER


IndexReport 1


# 1.3 features
PropertyNames CATEGORY SUBJECT
IgnoreTotalWordCountWhenRanking yes
UseStemming yes

# Added and, http, of, to, www as these were automaticly removed in old swish
IgnoreWords a an the and http of to www

# Allow comma so can index numbers
WordCharacters
abcdefghijklmnopqrstuvwxyz,0123456789

BeginCharacters
abcdefghijklmnopqrstuvwxyz,0123456789

EndCharacters
abcdefghijklmnopqrstuvwxyz,0123456789


# These have to be something or else the defaults are used
IgnoreLastChar  ,
IgnoreFirstChar ,



Bill Moseley
mailto:moseley@hank.org
Received on Mon Oct 2 00:20:32 2000