Masoud Pirnazar wrote on 11/18/2004 09:27 AM:
> the "run a swish-format query" part is the part that's undefined right
> now--i'd like to keep the same syntax as swish, since once the documents
> pass through the initial query filter, they will be added to a collection,
> and the user will use the swish syntax to search this collection later.
>
> the simplest plan is: extract and index everything, run the filter query,
> extract just the qualifying documents again and add them to the main
> collection. this is a little disk-space expensive (timewise, it will
> probably be ok).
>
What I'm not clear on is why you need to form a swish-e query that you
don't intend to use with swish-e.
It sounds like you're doing two indexing passes: once for everything,
then another for docs that match a certain query.
i.e.
swish-e -i /path/to/docs
swish-e -w myquery -x '<swishdocpath>\n' > list
and then use that list to make another index.
It would be nice if the IndexDir config option would take a file as an
argument, because then you could do:
config:
IndexDir list
and
swish-e -c config
> ideally, if there were routines in swish such as
> CompileQuery(strQuery) return CompiledQuery
> TextMatchesQuery(strText, CompiledQuery) returns true/false (or info
about
> the matches)
>
> then i wouldn't have to re-create a parser for the query syntax.
>
The SWISH::API ParsedWords() function will return the query as swish
parsed it. Is that what you need?
http://www.swish-e.org/current/docs/API.html
> (by the way, i'm not sure if i should respond to your email address or send
> my response to the listproc again. can you let me know?)
>
the list. that way Q&A are searchable.
> thanks
>
> -----Original Message-----
> From: Peter Karman [mailto:karpet@peknet.com]
> Sent: Wednesday, November 17, 2004 11:47 PM
> To: amp834@rqinc.com
> Cc: Multiple recipients of list
> Subject: Re: [SWISH-E] Using Swish's Query Parser
>
>
> you don't say how you plan to "extract the text" of your potential
> document, or how you will "run a swish-format query" on the text.
>
> It wouldn't be very efficient, I wouldn't think, but you might just
> index the doc with swish-e and then search that temp index. swish-e is
> just as fast as anything else at "extracting text" and running the
> query. then you could simply delete the index (or repeat for each new
> doc, effectively using the same tmp index name).
>
> example perl off the top of my head:
>
> my $query = 'foo bar';
> my %include = ();
> for my $doc (@listofdocs) {
> indexdoc( $doc );
> if ( searchdoc( $query ) ) {
> $include{$doc}++;
> }
> }
>
> where indexdoc() and searchdoc() are functions that create your tmp
> index and then search it. you might define a special index name to use
> in your code, then remove it at end.
>
> Masoud Pirnazar wrote on 11/17/04 9:56 PM:
>
>
>>I have used Swish to index and search document collections, and now want
>
> to
>
>>"filter" documents before indexing using the same query syntax, i.e.
>>
>>Given a document, I will extract its text and want to run a swish-format
>>query on the text to see if it matches the query criteria; if it does, I
>>will add it to my collection.
>>
>>The simplest method is to add everything to a collection and do a swish
>>search on the collection, but I'm looking for a more efficient method,
>>especially if the hit percentage is small.
>>
>>Can anyone suggest anything?
>>I looked at the parse_swish_query and tokenize_query_string functions, but
>>it gets too complicated quickly.
>>
>>Thanks in advance for any ideas and comments.
>
>
> --
> Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Peter Karman peter@peknet.com 651.208.6116
Received on Thu Nov 18 11:38:43 2004