cygwin: email archiv indexing problem

From: <lanz+usenet(at)>
Date: Fri Nov 23 2001 - 09:54:01 GMT
I have different problems and questions concerning swish-e:

- How do you index an email archive with swish-e? Each file in
  directory is an email message - I think - mbox format (nnml spool
  under gnus). Are there pre-configured filters or other tools to get
  subject and other mail headers as properties? How to I instruct
  swish-e to not index embedded mail attachments?

- With -S fs turned on, I do not get NoContents, FileRules or
  FileMatch accepted by swish-e (a cygwin problem?). swish-e seems to
  scan index.swish-e.temp and index.swish.e.prop.temp, or what does
  the "Warning: Substitute possible embedded null character(s) in file
  index.swish-e" (and index.swish-e.temp, index.swish-e.prop,
  index.swish-e.prop.temp) mean? I have set "NoContents .swish-e .temp
  .prop" in my config file.

- Even with option -e I ran often out of memory: "err: Ran out f
  memory(could not allocate NNNN more bytes)!", even wtih IgnoreWords
  instead of IgnoreLimit. Is swish-e not made for scanning of some
  2000 email messages in a directory (some 2'000'000 words)? I have a
  reasonable PC with 128MB RAM and free disk space.

- In WordCharacters I define an extended international character set
  (only accented letters). Would it help to solve memory problems, if
  I reduced this set of characters? I am not shure what exactly
  happens with this extended character set combined with
  TranslateCharacters set to :ascii7:? Is configuring both

- I tried different (development) versions of swish-e. On some
  versions I also get an COALESCE_BUFFER_MAX_SIZE error, but
  increasing the value in config.h (do not change this!) does not
  help. Any idea?

In short: How do I configue a "small" swish-e to index my (huge?) mail
message archive with plenty of accented characters in the mail bodies?

Any help would be greatly appreciated!
