Skip to main content.
home | support | download

Back to List Archive

Re: Segmentation fault while indexing with"StoreDescription"

From: Dietmar Rabich <do.not.send(at)not-real.gmx.de>
Date: Tue Apr 20 2004 - 12:20:24 GMT
Hi Jose,

we are using libxml (and so HTML, XML and so on). In the next days we will
try libxml2 (HTML2, XML2).

Thank you for your hint!

BTW: Out platform is Linux 2.4.9-e.3smp on i686.

cu Dietmar.

> I see. Anyway, I think that the number of files is not a problem. I 
> reindex every night about 500.000 xml files using the XML2 parser (I am 
> using the current CVS version).
> Have you tried HTML2 instead of HTML?
> Wich is your platform (linux, solaris...)?

> Dietmar Rabich escribió:
> ...
> >it is a little bit difficult ... There are 3 paths with html files.
> >
> >path1: 3 html files
> >path2: 38,278 html files
> >path3: 64,168 html files
> >
> >The whole directory is 5,565,272 kByte - unzipped. And some of the
> documents
> >are confidential.
> >
> >Here is an extract of the old header file:
> > ...
> >
> >I think there are much too much files for Swish-E 2.4.2. We've tried
> 2.4.1
> >too, but the same result: segmentation fault.
> > ...
> >>Hi Dietmar,
> >>
> >>(I cannot contact you directly because of your email address)
> >>If possible, can you gzipped "path1" and  "path2" and make them 
> >>available to me to try them?
> >>
> >>cu
> >>Jose
> >>
> >>Dietmar Rabich escribió:
> >>
> >>>Some more information:
> >>>
> >>>In many other cases Swish-E crashes too. In each case there are many
> >>>documents to be indexed. Here an example:
> >>>
> >>>...
> >>>Writing index entries ...
> >>> Writing word text:  20%Segmentation fault
> >>>
> >>>cu Dietmar.
> >>>
> >>>>I've just a problem while indexing HTML-Files. I have update Swish-E
> >>>>        
> >>>>
> >>from
> >>    
> >>
> >>>>version 2.2.3 to 2.4.2. Indexing with the old version works fine. Now
> I
> >>>>get
> >>>>a message "segmentation fault".
> >>>>
> >>>>The config file is simple:
> >>>>
> >>>>IndexDir ../../path1 ../../path2
> >>>>IndexOnly .html
> >>>>IndexReport 3
> >>>>IndexFile ./test.swish-e
> >>>>IndexContents HTML .html
> >>>>DefaultContents HTML
> >>>>StoreDescription HTML <body> 2000
> >>>>...

-- 
NEU : GMX Internet.FreeDSL
Ab sofort DSL-Tarif ohne Grundgebühr: http://www.gmx.net/dsl
Received on Tue Apr 20 05:20:24 2004