Skip to main content.
home | support | download

Back to List Archive

RE: Re: Document properties - code sample

From: Rainer Scherg <Rainer.Scherg(at)>
Date: Wed Sep 08 1999 - 11:27:22 GMT
Mhh, IMO a standard-ASCII index file is not possible - e.g. you have
to store special characters (e.g. german umlauts) from TITLE-tags...

This means you have to take care about converting these special
characters or use a standard characterset like ISO xxxx.

Also our 200 MB index file for our IntraNet server might grow to

Would would make sense is IMO a tool to import or export index files...

-- rainer

-----Original Message-----
From:	Einar Indridason
Sent:	Wednesday, September 08, 1999 1:01 PM
To:	Multiple recipients of list
Subject:	[SWISH-E] Re:  Document properties - code sample

> The index file is packed to be as small as possible, presumably to 
> larger number of documents with smaller files and to get some kind of
> performance advantage. I'm not sure that it has much of an effect of
> performance but I could imagine it since read-ahead caching comes into
> play. That has to balanced with the extra CPU work to decompress 
> Hopefully someone did a benchmark on it way back (when?) when the
> compression part was introduced.
> As far as an ASCII index goes, that would be slower because of all of 
> numerical values that would have to be converted back from text strings 
> numbers on each and every search.  Also, the index uses lots of 
> (file position info) to refer to objects within the file and a pure 
> index would be too tempting to just "tweak" by hand, corrupting the 
> offsets.

Would the speed difference be great enough to affect normal usage of

Binary index file:
	the "correct" architecture can handle the date faster
	more complex
	needs specific tools to deal with it
	a small corruption can corrupt the whole file
	the "wrong" architecture needs to convert the file anyway

Ascii index file:
	readable with other tools
	(and fixable if there is the required "know-how")
	portable between all architectures, as all architectures needs
	to convert the file to an internal form anyway.
	it might be more tempting to "fix" the index file by hand
	somewhat slower.

Today I would definitely go for an ASCII file.

But that is just my opinion.


This Mail has been checked for Viruses
Attention: Encrypted Mails can NOT be checked !

* * *

Diese Mail wurde auf Viren ueberprueft
Hinweis: Verschluesselte Mails koennen NICHT geprueft werden !
Received on Wed Sep 8 04:22:26 1999