Skip to main content.
home | support | download

Back to List Archive

Re: MetaNamesRank & exe build for Windows

From: intervolved none <intervolved(at)not-real.yahoo.com>
Date: Tue Sep 09 2003 - 23:30:46 GMT
Thanks for the information.  It made it very simple to test.   I took your test files and ran my tests.   I am running on windows 2000.  
 
I did not get the same results..... 
 
 

D:\joe\SWISH-E>

D:\joe\SWISH-E>type 1.html

<html>

<head><title>Title</title>

</head>

<body>

body

testword

</body>

</html>

D:\joe\SWISH-E>type 2.html

<html>

<head><title>Title</title>

<meta name="foo" content="testword">

</head>

<body>

body

</body>

</html>

D:\joe\SWISH-E>type c

Metanames foo

D:\joe\SWISH-E>swish-e -c c -i 1.html 2.html -v0

D:\joe\SWISH-E>swish-e -w testword -H0

1000 1.html "Title" 87 <---- not the same.....

D:\joe\SWISH-E>type 3.html

<html>

<head><title>Title</title>

<meta name="bar" content="testword">

</head>

<body>

body

</body>

</html>

 

D:\joe\SWISH-E>swish-e -c c -i 2.html 3.html -v0

D:\joe\SWISH-E>swish-e -w bar=testword -H0

err: Unknown metaname: 'bar' <-- ok did not have bar in the config file "c"...

.

D:\joe\SWISH-E>swish-e -w foo=testword -H0

1000 2.html "Title" 115

D:\joe\SWISH-E>type c

Metanames foo bar

MetaNamesRank 10 foo

D:\joe\SWISH-E>swish-e -w bar=testword -H0

err: Unknown metaname: 'bar' <--- added bar but still did not find it....

.

D:\joe\SWISH-E>swish-e -w foo=testword -H0

1000 2.html "Title" 115 <--- ok did not include 3.html....

D:\joe\SWISH-E>swish-e -h

usage:

swish [-e] [-i dir file ... ] [-S system] [-c file] [-f file] [-l] [-v (num)

]

swish -w word1 word2 ... [-f file1 file2 ...] \

[-P phrase_delimiter] [-p prop1 ...] [-s sortprop1 [asc|desc] ...] \

[-m num] [-t str] [-d delim] [-H (num)] [-x output_format]

swish -k (char|*) [-f file1 file2 ...]

swish -M index1 index2 ... outputfile

swish -N /path/to/compare/file

swish -V

options: defaults are in brackets

-S : specify which indexing system to use.

Valid options are:

"fs" - index local files in your File System

"http" - index web site files using a web crawler

"prog" - index files supplied by an external program

The default value is: "fs"

-i : create an index from the specified files

-w : search for words "word1 word2 ..."

-t : tags to search in - specify as a string

"HBthec" - in Head|Body|title|header|emphasized|comments

-f : index file to create or file(s) to search from [index.swish-e]

-c : configuration file(s) to use for indexing

-v : indexing verbosity level (0 to 3) [-v 1]

-T : Trace options ('-T help' for info

-l : follow symbolic links when indexing

-b : begin results at this number

-m : the maximum number of results to return [defaults to all results]

-M : merges index files

-N : index only files with a modification date newer than path supplied

-p : include these document properties in the output "prop1 prop2 ..."

-s : sort by these document properties in the output "prop1 prop2 ..."

-d : next param is delimiter.

-P : next param is Phrase delimiter.

-V : prints the current version

-e : "Economic Mode": The index proccess uses less RAM.

-x : "Extended Output Format": Specify the output format.

-H : "Result Header Output": verbosity (0 to 9) [1].

-k : Print words starting with a given char.

-E : Append errors to file specified, or stderr if file not specified.

version: 2.4.0-pr1 <-- version....

docs: http://swish-e.org

Scripts and Modules at: (libexecdir) = D:\joe\SWISH-E\lib\swish-e

D:\joe\SWISH-E>

 


Bill Moseley <moseley@hank.org> wrote:
On Mon, Sep 08, 2003 at 04:42:48PM -0700, intervolved none wrote:
> I am trying to get the MetaNamesRank working.I installed the windows
> 2.4... version, added the line "MetaNamesRank 10 keywords" to my
> config file, reindexed the site, and looked to see if there was any
> change in the indexing. There was not. It is like it ignored the
> configuration file setting. Am I supposed to make any more
> configuration changes for it to pick up the meta tag line from the
> html document?
> 
> I am trying to get the meta tags in my html page to give more weight
> than the actual text... example : > help work" name=keywords>

meta tags do have more weight:

moseley@bumby:~$ cat 1.html 2.html




body
testword








body



moseley@bumby:~$ cat c
Metanames foo

moseley@bumby:~$ swish-e -c c -i 1.html 2.html -v0

moseley@bumby:~$ swish-e -w testword or foo=testword -H0
1000 2.html "Title" 107
431 1.html "Title" 79
.

Now try with two metanames:

moseley@bumby:~$ cat 3.html





body



See they have the same value here:

moseley@bumby:~$ swish-e -c c -i 2.html 3.html -v0
moseley@bumby:~$ swish-e -w bar=testword or foo=testword -H0
1000 3.html "Title" 107
1000 2.html "Title" 107
.

Now try changing the rank based on MetaNamesRank:

moseley@bumby:~$ cat c
Metanames foo bar
MetaNamesRank 10 foo

moseley@bumby:~$ swish-e -c c -i 2.html 3.html -v0

moseley@bumby:~$ swish-e -w bar=testword or foo=testword -H0
1000 2.html "Title" 107
592 3.html "Title" 107
.

> The meta tag line in the html should be weighted 10 times more than
> the words on the page, correct?

No, not really. Sure is a lot easier when you can build from source. 


Here's the calcuation for each word:

for(i = 0; i < freq; i++)
rank += sw->structure_map[ GET_STRUCTURE(posdata[i]) ] + meta_bias;

Rank is the sum each word's rank, where each word's rank is the
meta_bias plus its "structure" value, which is based on its position. 
Then the log is taken of that number.

moseley@bumby:~$ swish-e -c c -i 2.html 3.html -T indexed_words -v0
Adding:[1:swishdefault(1)] 'title' Pos:2 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[1:foo(10)] 'testword' Pos:5 Stuct:0x85 ( META HEAD FILE )
Adding:[1:swishdefault(1)] 'body' Pos:8 Stuct:0x9 ( BODY FILE )
Adding:[2:swishdefault(1)] 'title' Pos:2 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[2:bar(11)] 'testword' Pos:5 Stuct:0x85 ( META HEAD FILE )
Adding:[2:swishdefault(1)] 'body' Pos:8 Stuct:0x9 ( BODY FILE )

So "testword" has a structure of 0x85 mening it's in a file (duh) and 
it's in the section and it's also in a tag. is not 
used.

Then in config.h:

#define RANK_TITLE 7 // 

---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software


*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Tue Sep 9 23:31:14 2003