Skip to main content.
home | support | download

Back to List Archive

Re: searching for file names

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Tue Sep 12 2006 - 01:59:07 GMT
Gertjan Hofman scribbled on 9/11/06 6:14 PM:
> Dear Swish user,
> 
> When I am searching for a file of a given name using
> swishdocpath=<phrase> only those files are returned
> that have <phrase> seperated from the rest of the file
> name by either a space or underscore (and possibly
> others characters). In other words: if phrase = 22045
> then swish-e will return
> 
> myfile_22045_a.dat    but not
> 
> hello2204501.dat
> 
> Assuming of course that both were parsed and stored in
> the database.
> 
> Is there a simple way to change this behaviour (and
> return both)? I read through the options once again
> but didnt spot anything. I have a sneaking suspicion
> that this question has a trivial answer.
> 

as Shakespeare put it, "what's in a word?"

Swish-e won't match partial words unless you use the * wildcard (and in 2.4.4, 
the ? wildcard too). In addition, Swish-e never matches leading wildcards, i.e., 
you can search for foo* but not *oo.

So your file names are parsed just like your other text. However you've defined 
your WordCharacters (and their *Characters cousins) defines how a string of text 
is broken up into "words".

So by default, _ is not a word character. Nor is . (dot). Neither is any kind of 
whitespace.

Thus:

  myfile_22045_a.dat

gets parsed into:


  myfile 22045 a dat

while:

  hello2204501.dat

gets parsed into:

  hello2204501 dat

so a search for "22045"

will return the first one but not the second, because 22045 is a 'word' 
according to how the first string was parsed.

You'll have to read the SWISH-CONFIG documentation and play with the 
WordCharacters settings to arrive at a suitable solution for what you're trying 
to do.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Mon Sep 11 18:59:09 2006