Skip to main content.
home | support | download

Back to List Archive

RE: Re: DOC-Properties

From: Rainer Scherg <Rainer.Scherg(at)not-real.rexroth.de>
Date: Tue May 18 1999 - 10:46:37 GMT
Hello Mark!

-----Original Message-----
From:	Mark Gaulin
Sent:	Monday, May 17, 1999 6:44 PM
To:	Rainer Scherg
Cc:	'swish-e@sunsite.berkeley.edu'
Subject:	RE: [SWISH-E] Re: DOC-Properties

Hi Rainer
[...]

Have you tried the latest versions of swishe (1.3.2)? There was a memory
leak bug in freeing regular expression memory that would appear even if 
you
were not using regex's in your config file. It was fixed recently (maybe
even in 1.3.1?). This effected searching primarily but may also have
affected indexing.

-- I've used swish 1.3.2 (from sunsite.berkeley.edu) enhanced with
   the filter option to index pdf's and other docs...


I recompiled my copy of swish (1.3.0) with SUPPORT_DOC_PROPERTIES 
undefined
and indexed a small set of documents and compared memory usage. There was 
a
tiny difference in memory used (less than 0.2%).  (I am running on NT and
there are some C runtime library functions that made it easy to see how
much memory was being used. You may have similar functions available to 
you
on your platform.)

-- I'm using sun solaris 2.6, using the top-tool to measure roughly the 
amount
   of used memory and program "performance"/impact.

Note: I needed to add a "#ifdef SUPPORT_DOC_PROPERTIES" / "#endif" block
around some code in the function addMetaMergeList() in merge.c to get it 
to
compile without SUPPORT_DOC_PROPERTIES. You would have do the same to get 
a
clean compile with the latest version of swishe.

-- Yep, I've reported this as bug #12 in the swish-e bug-db, when trying 
to
   compile swish without the DOC_PROP support.

Q: Did you comment out/delete the "#define SUPPORT_DOC_PROPERTIES 1" or 
did
you change it to "#define SUPPORT_DOC_PROPERTIES 0" in your config.h 
file?
Since "#ifdef" is used as the test in the code you would need to
remove/comment out the #define line to really get rid of the doc property
code.  Perhaps the code should be changed to use #if instead of #ifdef.

-- As stated: the DOC_PROP feature is disabled (commented out).

I have two "bottom line" responses at this point:
1. So far I have not found a memory-leak type bug associated with 
Document
Properties. It doesn't mean it's not there but since you are not even 
using
that feature most of the code related to Doc Props is not even called, so
it makes it even more unlikely to be the direct cause of excessive memory
use during indexing.

-- What was curious is not the huge amount of memory usage, but the 
difference
   between the index file sizes (@200MB to @30MB).

2. As we all know, swishe uses RAM to store all indexing information, 
which
causes it to use more memory than it could if the speed/memory usage
tradeoff was weighed differently.  I could see doing a first-order 
approach
to using less ram without affecting indexing speed too much just by 
storing
file names, titles (and document properties) in a temp file during
indexing.  Storing the word lists and the index itself on disk during
indexing would be more complex and would slow things down noticeably but
for really huge sets of files this might be needed to keep RAM usage 
down.

Sorry I could not be of more help.  I will continue to work on this as 
more
information comes in.

Mark

-- Tnx a lot for your help.
   At this point I can live without the DOC_PROP support.
   But I've a have a little more time to spare, I will include some debug 
code,
   to see what's happening.


Rainer


[lots of stupid outlook quote stuff deleted]
Received on Tue May 18 03:45:05 1999