Hi All,
I am currently the developer for a site that uses swish for searching its
catalogue and have just been asked by the client the following question:
"For the FAQ we just need some general search score info rather than anything
specific."
Now apart from saying "The most relevant should have a higher score." i don't
exactly know what to say.
Now the data that is being searched is both large and has a lot of meta
fields defined so how will this affect the score? If required i can post
sample data as well as config files.
I know the use is probably not what swish was designed for but it does the
job well although the only feature I'm missing is to search for a range of
values. I know it can be done with the -L but that only applies to properties
and therefore 1 value per file which is not enough for my requirements as i
would like to order the results by a field that could occur more then once
i.e. release dates. Anyway before i get even more off topic :)
For reference here is a typical query along with swish output...
User searches for:
* artist: "Black Sabbath"
* include compilation recordings in artist search.
* track: iron man
* format: CD
* order: Search relevance (highest » lowest)
The following command get run:
/usr/local/bin/swish-e -H 9 -d\\t -w '( ( recording.artist.main=( black
sabbath ) OR recording.track.artist.main=( black sabbath ) OR
recording.artist.main.md5=(b1dd10efa6a2761536d12edc20edeca9) OR
recording.track.artist.main.md5=(b1dd10efa6a2761536d12edc20edeca9) ) AND
recording.track.title=(iron man) AND recording.media.available.group=( -cd-
) AND recording.available=( yes ) AND recording.chanel=(musicmaster) )'
-s swishrank desc recording.title asc recording.artist.main asc -b 0 -m 3000
-f /usr/home/wb/Web/Work/red-phase3/_server/data/swish/data.index
And i get the following output...
# SWISH format: 2.2.2
# Search words: ( ( recording.artist.main=( black sabbath ) OR
recording.track.artist.main=( black sabbath ) OR
recording.artist.main.md5=(b1dd10efa6a2761536d12edc20edeca9) OR
recording.track.artist.main.md5=(b1dd10efa6a2761536d12edc20edeca9) ) AND
recording.track.title=(iron man) AND recording.media.available.group=( -cd-
) AND recording.available=( yes ) AND recording.chanel=(musicmaster) )
#
# Index File: /usr/home/wb/Web/Work/red-phase3/_server/data/swish/data.index
# Swish-e format: 2.2.2
#
# Name: searchRED Data File
# Saved as: data.index
# Counts: 3260219 words, 560604 files
# Indexed on: 2003-08-19 17:32:10 BST
# Description: This is an index of the searchRED data.
# Pointer: http://www.searchred.co.uk/
# Maintained by: William Bailey
# DocumentProperties: Enabled
# Stemming Applied: 0
# Soundex Applied: 0
# Fuzzy Indexing Mode: None
# IgnoreTotalWordCountWhenRanking: 1
# WordCharacters:
#&'-/0123456789;abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
# MinWordLimit: 1
# MaxWordLimit: 40
# BeginCharacters:
&'(-0123456789;abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
# EndCharacters:
'),-.0123456789;abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
# IgnoreFirstChar:
# IgnoreLastChar:
# StopWords:
# BuzzWords:
# Search Words: ( ( recording.artist.main=( black sabbath ) OR
recording.track.artist.main=( black sabbath ) OR
recording.artist.main.md5=(b1dd10efa6a2761536d12edc20edeca9) OR
recording.track.artist.main.md5=(b1dd10efa6a2761536d12edc20edeca9) ) AND
recording.track.title=(iron man) AND recording.media.available.group=( -cd-
) AND recording.available=( yes ) AND recording.chanel=(musicmaster) )
# Parsed Words: ( ( recording.artist.main = ( black sabbath ) or
recording.track.artist.main = ( black sabbath ) or recording.artist.main.md5
= ( b1dd10efa6a2761536d12edc20edeca9 ) or recording.track.artist.main.md5 = (
b1dd10efa6a2761536d12edc20edeca9 ) ) and recording.track.title = ( iron man )
and recording.media.available.group = ( -cd- ) and recording.available = (
yes ) and recording.chanel = ( musicmaster ) )
#
# Number of hits: 14
# Search time: 0.322 seconds
# Run time: 0.336 seconds
1000 MM/000/423/195.xml 423195 44209
988 MM/000/267/295.xml 267295 31681
980 MM/000/012/875.xml 12875 26547
972 MM/000/374/094.xml 374094 21523
954 MM/000/326/899.xml 326899 12316
953 MM/000/012/853.xml 12853 14668
949 MM/000/012/890.xml 12890 20204
944 MM/000/012/867.xml 12867 15696
939 MM/000/385/532.xml 385532 8886
928 MM/000/012/876.xml 12876 21115
264 MM/000/221/749.xml 221749 14080
264 MM/000/302/828.xml 302828 11119
264 MM/000/219/742.xml 219742 11725
264 MM/000/374/832.xml 374832 8322
.
Thanks for any insight that anybody can provide.
--
Regards,
William Bailey.
Pro-Net Internet Services Ltd.
http://www.pro-net.co.uk/
Received on Thu Sep 4 09:16:56 2003