Skip to main content.
home | support | download

Back to List Archive

Re: Using Swish's Query Parser

From: Peter Karman <karpet(at)not-real.peknet.com>
Date: Thu Nov 18 2004 - 04:52:29 GMT
you don't say how you plan to "extract the text" of your potential 
document, or how you will "run a swish-format query" on the text.

It wouldn't be very efficient, I wouldn't think, but you might just 
index the doc with swish-e and then search that temp index. swish-e is 
just as fast as anything else at "extracting text" and running the 
query. then you could simply delete the index (or repeat for each new 
doc, effectively using the same tmp index name).

example perl off the top of my head:

my $query = 'foo bar';
my %include = ();
for my $doc (@listofdocs) {
	indexdoc( $doc );
	if ( searchdoc( $query ) ) {
		$include{$doc}++;
	}
}

where indexdoc() and searchdoc() are functions that create your tmp 
index and then search it. you might define a special index name to use 
in your code, then remove it at end.

Masoud Pirnazar wrote on 11/17/04 9:56 PM:

> I have used Swish to index and search document collections, and now want to
> "filter" documents before indexing using the same query syntax, i.e.
> 
> Given a document, I will extract its text and want to run a swish-format
> query on the text to see if it matches the query criteria; if it does, I
> will add it to my collection.
> 
> The simplest method is to add everything to a collection and do a swish
> search on the collection, but I'm looking for a more efficient method,
> especially if the hit percentage is small.
> 
> Can anyone suggest anything?
> I looked at the parse_swish_query and tokenize_query_string functions, but
> it gets too complicated quickly.
> 
> Thanks in advance for any ideas and comments.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Wed Nov 17 20:52:30 2004