Re: Using too much memory, or with -e failing out with too many f

From: Bill Moseley <moseley(at)>
Date: Thu Dec 09 2004 - 18:31:35 GMT
On Thu, Dec 09, 2004 at 08:29:49AM -0800, Stewart, John wrote:
> Well, I've had to revert to not indexing .pdf files, and that seems to work
> all right. Does anyone have any suggestion on limiting memory usage indexing
> pdf files, or getting the -e flag to work?

If it's the indexing of PDF files specifically (not just the number or
size of your docs)  then it's not a problem with swish-e itself that's
eating memory.

Hum, looking back at your config:

    IndexOnly .html .txt .pdf .htm .doc
    NoContents .gif .jpg
    # Don't do these pdf's - crashing
    FileRules pathname contains marketing/competition
    # Index the main internal section
    #IndexDir /www/internal
    ReplaceRules remove /www/internal
    # Index the internal_web section on titan
    #IndexDir /home/groups/internal_web
    ReplaceRules remove /home/groups/internal_web
    # Index the manuals section
    IndexDir /home/manuals
    ReplaceRules remove /home

How are you converting pdf to a format that swish-e can parse?  Swish
can parse text, html, and xml.  All you are doing above is telling
swish to index files that end in .html .txt .pdf .htm and .doc, but
swish doesn't know how to index .pdf or .doc without using a filter of
some type.

(Also NoContents .gif .jpg has no effect -- they are not included in
your "IndexOnly" list of extensions to index.)

Bill Moseley

