Skip to main content.
home | support | download

Back to List Archive

Re: Using Swish's Query Parser

From: Peter Karman <karpet(at)>
Date: Thu Nov 18 2004 - 19:38:40 GMT
Masoud Pirnazar wrote on 11/18/2004 09:27 AM:
> the "run a swish-format query" part is the part that's undefined right
> now--i'd like to keep the same syntax as swish, since once the documents
> pass through the initial query filter, they will be added to a collection,
> and the user will use the swish syntax to search this collection later.
> the simplest plan is:  extract and index everything, run the filter query,
> extract just the qualifying documents again and add them to the main
> collection.  this is a little disk-space expensive (timewise, it will
> probably be ok).

What I'm not clear on is why you need to form a swish-e query that you 
don't intend to use with swish-e.

It sounds like you're doing two indexing passes: once for everything, 
then another for docs that match a certain query.


swish-e -i /path/to/docs
swish-e -w myquery -x '<swishdocpath>\n' > list

and then use that list to make another index.

It would be nice if the IndexDir config option would take a file as an 
argument, because then you could do:

IndexDir list


swish-e -c config

 > ideally, if there were routines in swish such as
 > CompileQuery(strQuery) return CompiledQuery
 > TextMatchesQuery(strText, CompiledQuery) returns true/false (or info 
 > the matches)
 > then i wouldn't have to re-create a parser for the query syntax.

The SWISH::API ParsedWords() function will return the query as swish 
parsed it. Is that what you need?

> (by the way, i'm not sure if i should respond to your email address or send
> my response to the listproc again.  can you let me know?)

the list. that way Q&A are searchable.

> thanks
> -----Original Message-----
> From: Peter Karman []
> Sent: Wednesday, November 17, 2004 11:47 PM
> To:
> Cc: Multiple recipients of list
> Subject: Re: [SWISH-E] Using Swish's Query Parser
> you don't say how you plan to "extract the text" of your potential
> document, or how you will "run a swish-format query" on the text.
> It wouldn't be very efficient, I wouldn't think, but you might just
> index the doc with swish-e and then search that temp index. swish-e is
> just as fast as anything else at "extracting text" and running the
> query. then you could simply delete the index (or repeat for each new
> doc, effectively using the same tmp index name).
> example perl off the top of my head:
> my $query = 'foo bar';
> my %include = ();
> for my $doc (@listofdocs) {
> 	indexdoc( $doc );
> 	if ( searchdoc( $query ) ) {
> 		$include{$doc}++;
> 	}
> }
> where indexdoc() and searchdoc() are functions that create your tmp
> index and then search it. you might define a special index name to use
> in your code, then remove it at end.
> Masoud Pirnazar wrote on 11/17/04 9:56 PM:
>>I have used Swish to index and search document collections, and now want
> to
>>"filter" documents before indexing using the same query syntax, i.e.
>>Given a document, I will extract its text and want to run a swish-format
>>query on the text to see if it matches the query criteria; if it does, I
>>will add it to my collection.
>>The simplest method is to add everything to a collection and do a swish
>>search on the collection, but I'm looking for a more efficient method,
>>especially if the hit percentage is small.
>>Can anyone suggest anything?
>>I looked at the parse_swish_query and tokenize_query_string functions, but
>>it gets too complicated quickly.
>>Thanks in advance for any ideas and comments.
> --
> Peter Karman  .  .  peter(at)

  Peter Karman 651.208.6116
Received on Thu Nov 18 11:38:43 2004