>hi,
>
>if I understand it, Greg would like to have something as browsable index
>of categories (at least, something that summarizes categories)
>a 15
>a.b 20
>a.c 25
>
>I was trying to do something similar, look for example here (testing)
>http://www.knihovnabbb.cz/cgi-bin/regcat/regcat.cgi?
>metaname=au&si=0&si=1&browse_term=a&submit=Search%21
>
>it is an external script, that simply counts the number of occurences
>for later browsing/searching
>
>
>I think this information can be collected from swish-e index too,
>something like dumping metadata out of the index and then counting it
>
>however, we would need an ability to dump only certain parts of index,
>sounds that normal?
>
>roman
I think you understand what I am after here. :)
Except in the example you gave it would be:
a 45
a.b 20
a.c 25
. every upper level category's count is a sum of its sibling counts.
For a bit of theoretical thought on this:
Imagine if I indexed 1 million files which fall into 200 categories. Now
imagine if a search result across all 1 million documents returns 100,000
of them. For the main page I want to display, based on that result,
simply a count of how many documents fall into each category. This
would require having a script iterate through a loop 100,000 times, when
it seems as if this could be handle *very* efficiently inside a search
engine, especially with the way Swish-E seems to have been designed (e.g.
property values). It strikes me that Swish-E is spending extra work to
give me all these results and then I'm spending extra work in an external
script to process the results. Theoretically speaking am I completely
wrong here? If not, how hard would be it be to do this and could it be
added to a TODO list somewhere? If I am wrong, sorry for beating this
dead horse.
As for the results page I would add to the search query whichever
category has currently been selected, reducing the number of returned
results to a much smaller number.
I have written a script to do this and while the performance is adequate,
it is no better then querying against Postgres directly. I pick up some
performance when executing a query inside a specific category, but I've
not seen any improvement in the "summary" query when compared against
Postgres.
I am sorry, I wen tot the URL you have listed above, but I just can't
tell what it is I am looking at (probably a language thing).. :)
- Greg
Received on Mon Jul 11 03:05:22 2005