Skip to main content.
home | support | download

Back to List Archive

Re: searching XML documents

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Apr 01 2004 - 20:18:38 GMT
> >>Does the -L option help at all? It's listed as experimental, but the 
> >>docs suggest that a range of dates is the intended function.
> > 
> > 
> > Thank you very much - it worked even if it's not as fast as all the 
> > other searches we've tested. But I suppose that you have to do something
> > like a "full table scan".
> 
> glad that worked.


It works by using the pre-sorted tables that swish creates at indexing
time and uses for sorting results.  The tables are indexed by file
number and the value of each entry is an integer that gives the sort
order for the file.  -L works like this:

1) The table is inverted -- sorted by the sort order value.

2) Entries within the range are flagged as such.  This is done by
   a binary searches -- once for the low and once for the high bound.

3) The table is inverted again -- sorted/indexed back by file number

4) The table is consulted during searching to determine if a file should
   be excluded from the result set.

I classified it experimental because I don't feel like it's a very good
design -- it's not vary scalable and can be slow for very large indexes.
It's really slow if the property doesn't have a pre-sorted index
(i.e. disabled in the config file).

> > Example:
> > - records are book titles
> > - subrecords contain information about books of a specific library
> >   e.g. signature, location, field of research (chemistry, physics, 
> >   comp. science etc.)
> > Now I want to find books using title keywords or author names etc. 
> > for a specific field of research at a specific location.

As Peter says, this sounds a little like you want a database, not a
search engine.

But you can do something like that with swish-e:

   (title=($keywords) OR author=($names)) AND field=physics AND location=LOC

You would just need to make sure that "field" and "location" contain
text that makes it easy to select on.

Still, if you are asking too many questions about relationships then
think about using a RDBMS instead.

-- 
Bill Moseley
moseley@hank.org
Received on Thu Apr 1 12:18:38 2004