Skip to main content.
home | support | download

Back to List Archive

Re: Question on indexing time

From: Rick McGowan <rick(at)not-real.unicode.org>
Date: Thu Aug 02 2001 - 23:31:44 GMT
Hi Bill --

> I'd be interested to see if the file really does have an embedded null.

Yes, the ".txt" file in question has a BUNCH of embedded nulls in it, for  
starters... ;-)

> We should probably track down the segfault.  Do you know how to get a
> backtrace with gdb?

Yup.  The backtrace is appended below.  I put the config file back to having  
"IgnoreLimit 50 1000" set, and it's indexing ".txt" and ".html" files only.  
(It found two possible embedded nulls and warned.)  It appears to have  
finished indexing, then crashed...

Cheers,
	Rick

-----------

(gdb) run  -i /home/httpd/htdocs/mail-arch/unicode-ml -c  
/home/httpd/cgi-bin/swish-filters/swish.config -S fs -f  
/home/httpd/cgi-bin/uml-index.swish -v 1
Starting program: /usr/local/bin/swish-e -i  
/home/httpd/htdocs/mail-arch/unicode-ml -c  
/home/httpd/cgi-bin/swish-filters/swish.config -S fs -f  
/home/httpd/cgi-bin/uml-index.swish -v 1
[...etc...]
Removing very common words...
Warning: This proccess can take some time. For a faster one, use IgnoreWords  
instead of IgnoreLimit

Program received signal SIGSEGV, Segmentation fault.
__libc_free (mem=0x401816bc) at malloc.c:3005
3005    malloc.c: No such file or directory.
(gdb) bt
#0  __libc_free (mem=0x401816bc) at malloc.c:3005
#1  0x8050567 in efree (ptr=0x401816bc) at mem.c:79
#2  0x804c629 in removestops (sw=0x80833f0) at index.c:1242
#3  0x804a4b5 in cmd_index (sw=0x80833f0, params=0x80970f0) at swish.c:1087
#4  0x8049397 in main (argc=11, argv=0xbffffa14) at swish.c:173
Received on Thu Aug 2 23:32:35 2001