Gertjan Hofman scribbled on 9/11/06 6:14 PM:
> Dear Swish user,
>
> When I am searching for a file of a given name using
> swishdocpath=<phrase> only those files are returned
> that have <phrase> seperated from the rest of the file
> name by either a space or underscore (and possibly
> others characters). In other words: if phrase = 22045
> then swish-e will return
>
> myfile_22045_a.dat but not
>
> hello2204501.dat
>
> Assuming of course that both were parsed and stored in
> the database.
>
> Is there a simple way to change this behaviour (and
> return both)? I read through the options once again
> but didnt spot anything. I have a sneaking suspicion
> that this question has a trivial answer.
>
as Shakespeare put it, "what's in a word?"
Swish-e won't match partial words unless you use the * wildcard (and in 2.4.4,
the ? wildcard too). In addition, Swish-e never matches leading wildcards, i.e.,
you can search for foo* but not *oo.
So your file names are parsed just like your other text. However you've defined
your WordCharacters (and their *Characters cousins) defines how a string of text
is broken up into "words".
So by default, _ is not a word character. Nor is . (dot). Neither is any kind of
whitespace.
Thus:
myfile_22045_a.dat
gets parsed into:
myfile 22045 a dat
while:
hello2204501.dat
gets parsed into:
hello2204501 dat
so a search for "22045"
will return the first one but not the second, because 22045 is a 'word'
according to how the first string was parsed.
You'll have to read the SWISH-CONFIG documentation and play with the
WordCharacters settings to arrive at a suitable solution for what you're trying
to do.
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Mon Sep 11 18:59:09 2006