Re: wildcard bug in 2.4.2?

From: Bill Moseley <moseley(at)>
Date: Thu Jul 22 2004 - 21:24:33 GMT
On Thu, Jul 22, 2004 at 01:49:40PM -0700, Kees Cook wrote:
> On Thu, Jul 22, 2004 at 01:36:11PM -0700, Bill Moseley wrote:
> > > swish-e -w 'from=*' -f /data1/index_swish-e
> > > # SWISH format: 2.4.2
> > > # Search words: from=*
> > > # Removed stopwords: 
> > > err: Wildcard not allowed within a word
> > 
> > Believe the error message.
> > 
> > Wild cards are at the end of words.
> Okay, my bad.  I just read through a bunch of posts to the mailing list 
> too, and I see that only a trailing wildcard is currently recognized.
> Is there anything I can do to the search parser code, to make this happen,
> no matter how brute-force?

No.  The way the wild card index works is just like the index in the
back of your text books -- arranged alphabetically.  Imagine someone
asking you to look up in that index all words that end in "ing" --
the words are not organized that way.

> I can't afford to regenerate indexes with
> reversed strings right now.  It took 6 days to generate the indexes: 64
> indexes for about 140G worth of text.  :)

You are more patient than I.  Is 6 days acceptable?  That's not your
average web site's worth of pages to search.

> And, additionally, is there a way to test for the _existence_ of a meta 
> field, no matter what the contents?  For example, "from=*" would only hit 
> when a "from" meta was there?  (This will let me distinguish between email 
> and non-email in my indexes.)

Not that I can think of.  Swish-e searches for words.  It does this
without thinking about meta names (or structure or word position).
Then once it finds that word it walks though a list of data saying
what meta ID it's associated with and picks only the ones that match
the meta you are asking to search.

If you really need that feature (and don't want to use another
database) then maybe create a separate index.  You could use one of
the -T options to figure out what files have what metanames, but you
would have to look at a lot of data in your case.

Bill Moseley

Received on Thu Jul 22 14:24:48 2004