A successful workaround that I'm using for very large sets of files is to index part of the files as a seperate index and just specify multiple index files when searching. Swish-e is fast enough that it doesn't seem to have had much of a negative impact.
From: Dietmar Rabich [mailto:email@example.com]
Sent: Tuesday, April 20, 2004 3:11 AM
To: Multiple recipients of list
Subject: [SWISH-E] Re: Segmentation fault while indexing
it is a little bit difficult ... There are 3 paths with html files.
path1: 3 html files
path2: 38,278 html files
path3: 64,168 html files
The whole directory is 5,565,272 kByte - unzipped. And some of the documents
Here is an extract of the old header file:
# Swish-e format: 2.2.3
# Name: ...
# Saved as: index-html.swish-e
# Counts: 501968 words, 105898 files
# Indexed on: 2004-04-16 05:09:12 CEST
# Description: ...
# Pointer: (no pointer)
# Maintained by: ...
# DocumentProperties: Enabled
# Stemming Applied: 0
# Soundex Applied: 0
# Fuzzy Indexing Mode: None
# IgnoreTotalWordCountWhenRanking: 1
# WordCharacters: ... (not changed)
# MinWordLimit: 2
# MaxWordLimit: 80
# BeginCharacters: ... (not changed)
# EndCharacters: ... (not changed)
I think there are much too much files for Swish-E 2.4.2. We've tried 2.4.1
too, but the same result: segmentation fault.
> Hi Dietmar,
> (I cannot contact you directly because of your email address)
> If possible, can you gzipped "path1" and "path2" and make them
> available to me to try them?
> Dietmar Rabich escribió:
> >Some more information:
> >In many other cases Swish-E crashes too. In each case there are many
> >documents to be indexed. Here an example:
> >Removing very common words...
> >no words removed.
> >Writing main index...
> >Sorting words ...
> >Sorting 170,500 words alphabetically
> >Writing header ...
> >Writing index entries ...
> > Writing word text: 20%Segmentation fault
> >cu Dietmar.
> >>I've just a problem while indexing HTML-Files. I have update Swish-E
> >>version 2.2.3 to 2.4.2. Indexing with the old version works fine. Now I
> >>a message "segmentation fault".
> >>The config file is simple:
> >>IndexDir ../../path1 ../../path2
> >>IndexOnly .html
> >>IndexReport 3
> >>IndexFile ./test.swish-e
> >>IndexContents HTML .html
> >>DefaultContents HTML
> >>StoreDescription HTML <body> 2000
"Sie haben neue Mails!" - Die GMX Toolbar informiert Sie beim Surfen!
Jetzt aktivieren unter http://www.gmx.net/info
Received on Tue Apr 20 04:31:41 2004