Skip to main content.
home | support | download

Back to List Archive

Re: Segmentation fault while indexing with"StoreDescription"

From: SANCHEZ Juanjo <jjsanchez(at)>
Date: Wed Apr 21 2004 - 19:26:34 GMT
Hi, everyone

I hope this can help some of you to discover the reason of fault.
This is my experience about indexing a large number of documents with
Swish-e windows version 2.4.2.

I use swish-e 2.4.2 Windows version and I get a kind of error that seems to
be related with the faults you mention. (And it happens from version 2.4.0).

In Windows, I get a small error window stating:
    "Invalid operation. The program will be terminated" + (Accept only
Swish-e terminates without giving any more info about the fault.

I build indexes from thousands of documents.
I feed swish-e with plain text documents from a self-written external
program by redirecting the "standard output" to swish-e.

It does not seem to be a problem linked to a certain document. When the
fault appears near a document, I try to index only that document and
everything works fine. I remove the mentioned document from original list
and I try to index again. Then the fault occurs at other time in other place
of the list. No reason about why the error appears maybe earlier, maybe

It is more a "random" error related to memory handling.

Either I use "save memory" parameter of swish-e  (-e) either I do not use
it, after a while (sometimes after 5 minutes, other after 45 minutes of
indexing process).
Using -e parameter it seems the error appeareance is delayed but finally it
I have seen a maximum of 376 temporary files in temp directory (TMPDIR, TMP
or TEMP) used by swish-e.
Maybe giving a new posibility to use more temp files will help ?

I got success  in indexing normally a set of about 23500 documents with a
total of 254300 words that generates a 80 Mb index file.

When I try to index a set of "big" documents (ie. average size > 250000
characters) I get the undefined swish error.

I suspect the error occurs in swish-e related when reading LARGE AMOUNT of
data through the "standard input", when it receives data generated by an
external program that uses "redirection" to feed its "standard output" to

A sample of this swish-e use can be like this:

    MyProgram.exe | swish-e -e -S prog -i stdin -c index.conf  -f  Index.idx

Trying to imagine a problem in "redirection pipe" when feeding large data or
linked to "redirection speed", I tried another approach.
First, I wrote a big file from "MyProgram" like this:

    MyProgram.exe >  WholeData.txt

and then I feed swish-e with the content of the file:

    Type WholeData | swish-e -e -S prog -i stdin -c index.conf  -f

(Eiher with  -e  or without it, the fault comes, at different points)

I understand that using only memory for indexing can be faster but it always
will have a limit (installed memory RAM) but using temporary files the max
limit will be higher. I still believe the 376 temp files I've seen is the
current max temp files allowed, so this could produce the "ghost error" we

Remember, all I refer it is related to Windows version of swish-e but I am
afraid the error in other platforms has the same nature.

Thanks to everyone.
Juan-Jose Sanchez

----- Original Message ----- 
From: "Bill Moseley" <>
To: "Multiple recipients of list" <>
Sent: Tuesday, April 20, 2004 3:10 PM
Subject: [SWISH-E] Re: Segmentation fault while indexing

> Would it be possible to index under gdb and they try and get a
> backtrace?  If we are lucky that might show the problem.
> The other standard suggestion is try and see if there is a small set of
> documents that will demonstrate the problem.
> -- 
> Bill Moseley
Received on Wed Apr 21 12:26:34 2004