Skip to main content.
home | support | download

Back to List Archive

Re: Search on metanames - internals and speed

From: Brett Paden <paden(at)not-real.multiply.com>
Date: Thu Mar 31 2005 - 18:58:46 GMT
on Thursday, Mar 31, 2005, Bill Moseley, wrote:
> On Thu, Mar 31, 2005 at 10:20:20AM -0800, Brett Paden wrote:
> > Does anyone know how metaname searches are done on a swish index?
> 
> Somewhat.
> 
> > I have a largish swish index (around 1 gig) that performs quite well 
> > except when doing searches that contain multiple metanames.  For 
> > example:
> > 
> > -w '(america OR clinton) AND (owner_id=xxx OR owner_id=yyyy OR owner_id=zzz OR ...)
> 
> Every search is a metaname search, so it has noting to do with that.
> 
> > But with anywhere from 10 to 100 metaname=key strung together with ORs.
> 
> My guess is a search of 100 ORs queries would take somewhere near 100
> times longer than a single one.
> 
> Each individual word is a query to the index that builds up a list of
> results.  Then that result is either ORed or ANDed with the existing
> list to make a new list.  When all done the entire list is sorted and
> then results returned.  Sounds rather linear, doesn't it?  Now, that's
> not taking into consideration what the OS might be doing to buffer the
> disk reads.
>

Is the index arranged something like:

word | in_metaname1 | in_metaname2  |  ... |  <doc_list> 

So querries that use metanames look up the entires by words, but only 
retrun them if that entry is also marked as having been indexed under a 
certain metaname?

So, theoritically, if I stored my owner_id metaname as a property , 
then asked swish to return all the results that matched 'america or 
clinton' along with the owner_id property, pushed them into some sort 
of hash keyed by the owner_id property ... I might be able to improve 
performance (or at least reduce the 'slow' part of the swish query).
  
Are property lookups expensive? Say my query returns 10,000 results, as 
I start iterating through them one out of 10 contains an owner_id I am 
intersted in, so to get ten "real" results I'll have to pull 100 full 
property/document lookups off the disk.  Bad?

> > Also, I've noticed that repeating the query speeds results 
> > dramatically.
> 
> You are running the swish-e binary or the C/Perl interface?

Both.  The behavior I described above, however, has to do with
command line tests using the binary.
 
> > I assume that swish stores some portion of the index in 
> > memory as slightly modifying the query slows result time.
> 
> Thank your OS for that.

Thank you Redhat.  I think. :-)

> 
> > Is there a way to force swish to store the entire index in memory
> > before any querries are done?
> 
> Like a RAM disk?  I think I'd trust the OS to buffer the best.
> Using the C or Perl interface (SWISH::API) will help keep buffers in
> memory since the index remains open between requests.  Also saves the
> overhead of forking (minor) and opening and parsing the header each
> request.
> 
> 
> -- 
> Bill Moseley
> moseley@hank.org
> 
> Unsubscribe from or help with the swish-e list: 
>    http://swish-e.org/Discussion/
> 
> Help with Swish-e:
>    http://swish-e.org/current/docs
>    swish-e@sunsite.berkeley.edu
Received on Thu Mar 31 10:58:46 2005