Skip to main content.
home | support | download

Back to List Archive

Segmentation fault when processing large file

From: Steve Harris <S.W.Harris(at)not-real.ecs.soton.ac.uk>
Date: Tue Mar 16 2004 - 18:04:49 GMT
This is with swish-e v2.4.1

The console output is:

-----------------------------------------------------
$ swish-e -v3 -c swish/swish.config
Parsing config file 'swish/swish.config'
Indexing Data Source: "File-System"
Indexing "/raid/swh/lit_index/"

Checking dir "/raid/swh/lit_index"...
  segv.lit - Using TXT parser -  (9602809 words)
Segmentation fault (core dumped)
-----------------------------------------------------

The config file is
-----------------------------------------------------
IndexFile /raid/swh/lit_index/swish.index
IndexDir /raid/swh/lit_index/
IndexOnly .lit
IndexContents TXT .lit
FollowSymLinks no
-----------------------------------------------------

All I get from a backtrace is:

#0  0x400e04a9 in compress3 (num=2139062143, 
    buffer=0x487ab00f
"\202�p\001��|\203�+\201\234�dBC��k\201\216�c��(\237�s�\005\235�x\002\201\221-\004\203B\002\212h\004�W\002�w\002�z\002\204:\004\205g\005\234+\003�N\002\216p\002\210\032\002�\006\003\201h\001�r\003�")
at compress.c:140
140             _s[_i++] = _r & 127;
#1  0x7f7f7f7f in ?? ()
Cannot access memory at address 0x7f7f7f7f

The file its processing is quite large:
$ wc /raid/swh/lit_index/segv.lit 
5065943 9424230 50550321 /raid/swh/lit_index/segv.lit

and contains some 8bit characters, but if I run it through sort | uniq it
doesn't cause problems. Its fairly simple file, with one phrase per line,
longest line is 255 characters.

There are a few thousand similar files in the directory tree, that parse
fine, but this is by far the largest. It doesnt appear to matter at what
position it appears in the parse order.

I've made the file available at http://triplestore.aktors.org/~swh/segv.lit
incase anyone wants to test it.

- Steve
Received on Tue Mar 16 10:04:51 2004