On Thu, Nov 18, 2004 at 12:35:12PM -0800, Masoud Pirnazar wrote:
> i think you have the correct picture, but here's another attempt:
>
> A:(a bunch of documents, say 500,000 docs) | B:(initial pre-filtering,
> qualifying say 40,000 docs) | C:(index the 40,000 qualified docs) |
> D:(allow users to search the 40,000 qualified docs)
>
> (using the pipe sign | here to indicate the flow of data/different stages of
> processing)
>
> the end user specifies the criteria in steps B and D. it would be easier
> for the end user to use the same query syntax in both steps. at step B, it
> filters out a lot of unwanted documents. at step D, they are searching
> using other criteria, so the query changes.
>
> a typical application: fromthe 500,000 docs, i want to extract only the
> 40,000 docs that mention some kind of sport activity, then put those in the
> "sports collection" and allow end users to search the sports collection
> using whatever (unrelated) queries they want to use.
Nice old databases did this (BRS was one) where you do a query and you
get a set of records. Then you can do queries on that set.
In swish you would index the entire thing and then do:
-w some query AND type=sports
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Thu Nov 18 12:44:56 2004