Re: [swish-e] Efficiency of PropertyNames

From: Peter Karman <peter(at)>
Date: Fri Dec 05 2008 - 03:25:47 GMT
Greg Saylor wrote on 12/4/08 6:56 PM:
> Hello,
> My apologies if this has been asked before, I looked but could not find
> the answer.
> What I am wondering is how efficient is Swish-E with respect to
> PropertyNames?  By efficient, I mean in terms of disk utilization, search
> performance, and indexing performance.
> As an example, if I have a wide range of categories of indexes (I'll use
> books and chainsaws for this example) with each category of items in its
> own separate index file - is it efficient to configure each index with the
> same list of PropertyNames?
> For example, maybe "books" would have an ISBN PropertyName, something that
> clearly chainsaws would not have.
> Another angle on it: a chainsaw could a manufacturer and model, whereas
> the book might have publisher and author.  I can imagine using one
> attribute that could mean the manufacturer or publisher, depending on the
> category of item.  And another to mean author or model.  Is the efficiency
> such that adding all 4 attributes is just as efficient as adding 2 and
> overloading them?
> Last, (although I presume the answer to this will be clear based on the
> answers to the above), if a given index has a large number of
> PropertyNames that are not set (say 95%), does this have an effect on the
> overall efficiency of the index or is it better to just leave such
> properties out of the index altogether?

It would largely depend on (a) the size of your .prop file and (b) how much
compression you are using. I would expect those to be the biggest variables.

If you had a non-big (<1million doc) index, I wouldn't expect to see a noticable
difference if I declared a bunch of PropertyNames and then didn't store values
for them. You'll have a little extra disk usage I would guess. I tend to make my
indexes as specific as possible to the collection they represent. So I would
have 'teeth_count' and 'rpms' in my chainsaw index and 'isbn' and 'author' in my
book index. But that's just for my own sanity, not any kind of performance decision.

Peter Karman  .  .  peter(at)
