From: Yann Stettler <stettler(at)>
Date: Sat Jan 23 1999 - 12:14:11 GMT
David Norris wrote:

> Caching isn't worth the extra time and resources, in my mind, since a given
> set of results may only be used once. 

Not if you are "paging" the result to display URLs 20 at a time or
If someone search a document and it isn't listed in the first 20
he will quickly ask the next 20 ones and so on... That's why caching
the result would be faster/better than doing a full search each time.

> file from disk again.  Caching to RAM isn't an option in many scripting
> languages.  And, you could easily run out of RAM limiting the use of the
> cache.

Assuming a search return 5000 results and each as a lenght of
80 characters, you would need 390KB to cache it. But if you display
20 results per page, it's probably not realy needed to cache
the result for 250 pages... So let say that you only cache
the 25 pages "around" the current one, that mean that you only
need 39KB per cached search... I guess that we can cache a
few ones before it cause a problem... or before you need to
swap to disk.

Sure you can't cache everything and sometime it won't be
useful to cache as the person won't ask for more result...
but you won't lost anything by doing it...

> Reading a file from my Ultra-Wide
> SCSI disks shouldn't be much, if any, faster than SWISH-E reading the index
> file from disk again.

Even if you swap part of the cache to disk, it would be faster
because you can "jump" to the right position in the file
to get only the data you want without re-reading the whole
index and wihout having to process it : I am ready to bet
that most of the time spent by Swish-e is in pattern matching...
and especialy for regexp...

Yann Stettler

