on Thursday, Mar 31, 2005, Bill Moseley, wrote:
> On Thu, Mar 31, 2005 at 10:20:20AM -0800, Brett Paden wrote:
> > Does anyone know how metaname searches are done on a swish index?
>
> Somewhat.
>
> > I have a largish swish index (around 1 gig) that performs quite well
> > except when doing searches that contain multiple metanames. For
> > example:
> >
> > -w '(america OR clinton) AND (owner_id=xxx OR owner_id=yyyy OR owner_id=zzz OR ...)
>
> Every search is a metaname search, so it has noting to do with that.
>
> > But with anywhere from 10 to 100 metaname=key strung together with ORs.
>
> My guess is a search of 100 ORs queries would take somewhere near 100
> times longer than a single one.
>
> Each individual word is a query to the index that builds up a list of
> results. Then that result is either ORed or ANDed with the existing
> list to make a new list. When all done the entire list is sorted and
> then results returned. Sounds rather linear, doesn't it? Now, that's
> not taking into consideration what the OS might be doing to buffer the
> disk reads.
>
Is the index arranged something like:
word | in_metaname1 | in_metaname2 | ... | <doc_list>
So querries that use metanames look up the entires by words, but only
retrun them if that entry is also marked as having been indexed under a
certain metaname?
So, theoritically, if I stored my owner_id metaname as a property ,
then asked swish to return all the results that matched 'america or
clinton' along with the owner_id property, pushed them into some sort
of hash keyed by the owner_id property ... I might be able to improve
performance (or at least reduce the 'slow' part of the swish query).
Are property lookups expensive? Say my query returns 10,000 results, as
I start iterating through them one out of 10 contains an owner_id I am
intersted in, so to get ten "real" results I'll have to pull 100 full
property/document lookups off the disk. Bad?
> > Also, I've noticed that repeating the query speeds results
> > dramatically.
>
> You are running the swish-e binary or the C/Perl interface?
Both. The behavior I described above, however, has to do with
command line tests using the binary.
> > I assume that swish stores some portion of the index in
> > memory as slightly modifying the query slows result time.
>
> Thank your OS for that.
Thank you Redhat. I think. :-)
>
> > Is there a way to force swish to store the entire index in memory
> > before any querries are done?
>
> Like a RAM disk? I think I'd trust the OS to buffer the best.
> Using the C or Perl interface (SWISH::API) will help keep buffers in
> memory since the index remains open between requests. Also saves the
> overhead of forking (minor) and opening and parsing the header each
> request.
>
>
> --
> Bill Moseley
> moseley@hank.org
>
> Unsubscribe from or help with the swish-e list:
> http://swish-e.org/Discussion/
>
> Help with Swish-e:
> http://swish-e.org/current/docs
> swish-e@sunsite.berkeley.edu
Received on Thu Mar 31 10:58:46 2005