Am Donnerstag, den 30.11.2006, 06:32 -0800 schrieb Bill Moseley:
> On Thu, Nov 30, 2006 at 03:47:45AM -0800, Uwe Helm wrote:
> > As you see in the filter.config below, swish-e scan's a directory and
> > indexes all files in there. there are a small pdf and the src/index.c
> > file for testing in this directory.
> > if the directive is "IndexOnly c" the indexing works fine. But when
> > indexing the pdf, it crashes, no matter which program (_pdftohtml.pl or
> > pdftotext) is used. starting the programs from the command-line works
> > fine.
>
> I would normally suggest that you first convert the pdf to text *then*
> index to see if if the problem happens when converting the file vs.
> indexing it. But...
>
I'm not the user of this system. I was told like "This is swish-e. And it's broken."
I hardly understand what the guy who uses this machine wants to do
here.
> > i did a gdb backtrace, the problems seems related to memory-allocating
> > or something, i hope you can figure this out better.
> >
> > (gdb) bt
> > #0 0xfedc2c8c in _free_unlocked () from /usr/lib/libc.so.1
> > #1 0xfedc2c44 in free () from /usr/lib/libc.so.1
> > #2 0x00026438 in filterCallCmdOptParam2 ()
>
I expanded my gdb knowledge today and found the 'where' command, here is
the line number:
Program received signal SIGSEGV, Segmentation fault.
0xfedc2c8c in _free_unlocked () from /usr/lib/libc.so.1
(gdb) where
#0 0xfedc2c8c in _free_unlocked () from /usr/lib/libc.so.1
#1 0xfedc2c44 in free () from /usr/lib/libc.so.1
#2 0x00026438 in filterCallCmdOptParam2 (str=0x13ae0d "", param=112 'p', fprop=0x2afa38) at filter.c:442
I also shot into the blue and tried to
#define PointerAlignmentSize 4 or sizeof(long) in src/mem.c
because it is a 32-bit solaris and i read about a bus error on irix where this fixed it.
but did not work.
> That filterCallCmdOptParam2() has been removed in the current
> development version. So, try downloading a daily snapshot (or cvs
> checkout if you prefer) and see if that makes the problem go away.
I tried the 11-30 snapshot, it works. he prints out really weird things
but it works. i forward your suggestions to the actual user, thank you.
would you consider the cvs snapshots fairly stable? so i can at least
say to him "it shouldn't crash"?
> [It would be interesting to know exactly why that segfaulted there, but
> your backtrace doesn't show line numbers -- and trying the newer
> version of swish-e is probably faster than debugging with gdb.]
>
>
> > root@wrkLTGsun001:~$ cat /usr/local/bin/filter.config
> > IndexReport 4
> > IndexOnly pdf
> > # IndexOnly c
> > IndexDir /export/home/root/blah
> > #FileFilter .pdf /usr/local/bin/pdftotext
> > FileFilter .pdf /usr/local/bin/share/doc/swish-e/examples/filter-bin/_pdf2html.pl
> > # IndexContents XML .pdf
>
> Note that using the perl script _pdf2html.pl will be slow when used
> with FileFilter. If that's an issue (you have lots of PDF files) then
> consider using a different indexing mode. Namely, use -S prog with
> DirTree.pl or Spider.pl which will keep Perl in memory and filter the
> documents before they are passed to swish-e.
best regards, Uwe
Received on Thu Nov 30 07:20:16 2006