See Bill's comments.
And, here's what I do with a similar situation.
We have publication numbers like S-1234-10 and 004-1234-10 and
007-12340-10 and HW-1234-10. The main four digits '1234' are the ones
that really matter, since the numbering scheme has changed over the
years. I want a user to be able to see the book 'S-1234-10' whether she
searches for 'S-1234' or '1234' or 'S-1234-10' -- she should get the
same result. My experience has been that trying to educate users to use
the wildcard * is a futile exercise.
One initial approach I took was to use a -S prog filter to add the
different variations in a 'pubnum' metatag. That had certain drawbacks,
though I can't remember at the moment what they were. Now, I have a
little 'fuzzy' function in my search script that examines the user query
and alters it to conform to how the docs are really indexed. So if a
user enters 'S-1234' or 'S-1234-10' or '1234' -- all of those are
manipulated before being handed to the actual swish-e search. If the
user enters '1234' the manipulation looks something like:
S-1234* or 004-1234* or 007-1234* or 1234
This can, of course, be a little surprising for the user, if she wanted
only docs that exactly matched '1234'. But typically, more results are
better than too few results.
Stemming is likely NOT what you want, since from what I know of stemming
algorithms, none of them would know what to do with your part numbers.
Instead, you might want to add some kind of regexp to your search script
that does:
my $before = 'xyz-2708';
my ($after) = ($before =~ m/-?(\d\d\d\d)/);
and then pass:
$before or $after
which would be in this case:
xyz-2708 or 2708
to your swish-e query. Of course, if your partnumbers are more
complicated or varied than just 4 digits (which that regexp above
roughly matches), then you'd have to get more clever.
MITCHELL TEIXEIRA wrote on 7/19/04 1:53 PM:
> Hello to the list - I am new to SWISH-E and need a little hint/help with
> making the search functionality better on my web site. Customers on my site
> can order by part number which may contain a alpha/numeric prefix. If they
> search using the numeric portion of the part number only, then the correct
> results are displayed by SWISH-E, but if they add the alpha prefix, no hits
> are generated by SWISH-E.
>
> Example:
> part number 2708 can be referenced simply by "2708" or "XY-2708", "XY2708",
> "3P-2708", etc. Searching with 2708 as the search term works, but searching
> with the prefix blows it.
>
> We have too many alpha prefixes to try to index them all with the numeric
> portion. I'd like a little help/hint on how to configure my SWISH-E or how
> better to improve my indexing. Reviewing the docs, I suspect what I want
> has something to do with stemming, but I get the idea that may only work
> with words?
>
> Many thanks -
> MitchellT
--
Peter Karman - Software Publications Engineer - Cray Inc
phone: 651-605-9009 - mailto:karman@cray.com
Received on Mon Jul 19 20:56:37 2004