koszalekopalek scribbled on 6/9/05 4:08 AM:
> I guess this the define in swish.h:
>
> #define RANK_BIAS_RANGE 10 /* max/min range ( -10 -> 10, with zero
> being no bias ) */
>
correct.
> To 'spam' the tags I 'multiplied' the strings by x99. I assume this has
> an effect similar to setting the bias, right?
>
> tuningcfg = (
> 'foo ' x 99, 'http://localhost/a.htm',
> 'bar ' x 99, 'http://localhost/b.htm',
> );
yes, I think so.
> Anyway -- I think want I am doing is becoming a hack on top of a hack.
> Let's change it into a feature request:-)
>
> The whole point is that I think it is useful to be able to manually
> assign urls to selected keywords. (Remember that Google demo I mentioned
> in my first post?) The keyword/url pairs could be read from a plain
> text file. The location of that file could be specified in the
> configuration hash for spider.pl. This is easy. Now, once I index an URL
> and I know that some 'keywords' are assigned to it, how do I tweak the
> ranking? I thought that automatically inserted meta tags were a good
> idea but maybe there is a better way?
>
your method of assigning keywords to urls seems fine. Swish-e is an indexer, not
a search engine. So putting the feature you're describing directly into the
Swish-e code seems "out of range" for the Swish-e's intent.
Are you including your 'keyword' metaname in the search?
here's a test I just did. Notice how when I don't specify the biased metaname
explicitly in the query, swish-e only searches swishdefault metaname by default.
I explicitly use swishdefault= here for demonstration.
Both files have the words 'foo' and 'bar' each 3 times. But each has it swapped
as to where the words are located: either in mymeta or in the body
(swishdefault). I index and search two times: once with a metanamerank bias and
once without (once with a config file and once with no config). Notice how with
the bias on, the difference in rank scores is significant; with no bias, the
rank is identical (frequency is equal, metaname is equal).
karpet@cartermac 45% swish-e -w foo
# SWISH format: 2.5.4
# Search words: foo
# Removed stopwords:
# Number of hits: 1
# Search time: 0.005 seconds
# Run time: 0.033 seconds
1000 file2.html "page one" 126
.
karpet@cartermac 46% swish-e -w swishdefault=foo
or mymeta=foo
# SWISH format: 2.5.4
# Search words: swishdefault=foo or mymeta=foo
# Removed stopwords:
# Number of hits: 2
# Search time: 0.006 seconds
# Run time: 0.029 seconds
1000 file1.html "page one" 126
367 file2.html "page one" 126
.
karpet@cartermac 47% swish-e -w swishdefault=bar
or mymeta=bar
# SWISH format: 2.5.4
# Search words: swishdefault=bar or mymeta=bar
# Removed stopwords:
# Number of hits: 2
# Search time: 0.005 seconds
# Run time: 0.032 seconds
1000 file2.html "page one" 126
367 file1.html "page one" 126
.
karpet@cartermac 48% cat file1.html
<html>
<head>
<meta name="mymeta" content="foo foo foo" />
<title>page one</title>
</head>
<body>
bar bar bar
</body>
</html>
karpet@cartermac 49% cat file2.html
<html>
<head>
<meta name="mymeta" content="bar bar bar" />
<title>page one</title>
</head>
<body>
foo foo foo
</body>
</html>
karpet@cartermac 50% cat c
MetaNamesRank 10 mymeta
karpet@cartermac 51% swish-e -i file*.html
..
Indexing done!
karpet@cartermac 52% swish-e -w swishdefault=bar
or mymeta=bar
# SWISH format: 2.5.4
# Search words: swishdefault=bar or mymeta=bar
# Removed stopwords:
err: Unknown metaname: 'mymeta'
.
karpet@cartermac 53% swish-e -w swishdefault=bar
# SWISH format: 2.5.4
# Search words: swishdefault=bar
# Removed stopwords:
# Number of hits: 2
# Search time: 0.004 seconds
# Run time: 0.035 seconds
1000 file2.html "page one" 126
1000 file1.html "page one" 126
.
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Thu Jun 9 05:41:10 2005