Skip to main content.
home | support | download

Back to List Archive

Re: Applicability of Swish-E... Thoughts?

From: Net Virtual Mailing Lists <mailinglists(at)not-real.net-virtual.com>
Date: Mon Jul 11 2005 - 10:05:20 GMT
>hi,
>
>if I understand it, Greg would like to have something as browsable index 
>of categories (at least, something that summarizes categories)
>a   15
>a.b 20
>a.c 25
>
>I was trying to do something similar, look for example here (testing)
>http://www.knihovnabbb.cz/cgi-bin/regcat/regcat.cgi?
>metaname=au&si=0&si=1&browse_term=a&submit=Search%21
>
>it is an external script, that simply counts the number of occurences 
>for later browsing/searching
>
>
>I think this information can be collected from swish-e index too, 
>something like dumping metadata out of the index and then counting it
>
>however, we would need an ability to dump only certain parts of index, 
>sounds that normal?
>
>roman


I think you understand what I am after here. :)

Except in the example you gave it would be:

a   45
a.b 20
a.c 25

. every upper level category's count is a sum of its sibling counts.

For a bit of theoretical thought on this:

Imagine if I indexed 1 million files which fall into 200 categories.  Now
imagine if a search result across all 1 million documents returns 100,000
of them.  For the main page I want to display, based on that result,
simply a count of how many documents fall into each category.   This
would require having a script iterate through a loop 100,000 times, when
it seems as if this could be handle *very* efficiently inside a search
engine, especially with the way Swish-E seems to have been designed (e.g.
property values).  It strikes me that Swish-E is spending extra work to
give me all these results and then I'm spending extra work in an external
script to process the results.  Theoretically speaking am I completely
wrong here?  If not, how hard would be it be to do this and could it be
added to a TODO list somewhere?  If I am wrong, sorry for beating this
dead horse.

As for the results page I would add to the search query whichever
category has currently been selected, reducing the number of returned
results to a much smaller number.

I have written a script to do this and while the performance is adequate,
it is no better then querying against Postgres directly.  I pick up some
performance when executing a query inside a specific category, but I've
not seen any improvement in the "summary" query when compared against
Postgres.

I am sorry, I wen tot the URL you have listed above, but I just can't
tell what it is I am looking at (probably a language thing).. :)

- Greg
Received on Mon Jul 11 03:05:22 2005