Thankyou for the reply Bill.
>
> Well, you have provided great detail, but I'm not seeing the same
> thing. It's odd.
>
It certainly is odd, I changed my config and doc file to exactly what
you have, yet there are a few different things in my index output........
> $ cat c
> ParserWarnLevel 0
> MetaNames id type
> PropertyNames id type
>
> $ cat doc.html
> <h1>hello</h1>
> <id>1</id>
> <name>hi</name>
> <type>product</type>
>
> $ swish-e -c c -i doc.html -T indexed_words properties -v0
> Adding:[1:swishdefault(1)] 'hello' Pos:5 Stuct:0x29 ( HEADING BODY FILE )
> Adding:[1:id(10)] '1' Pos:8 Stuct:0x89 ( META BODY FILE )
> Adding:[1:swishdefault(1)] 'hi' Pos:12 Stuct:0x9 ( BODY FILE )
> Adding:[1:type(11)] 'product' Pos:14 Stuct:0x89 ( META BODY FILE )
> swishdocpath: 6 ( 8) S: "doc.html"
> swishdocsize: 8 ( 4) N: "64"
> swishlastmodified: 9 ( 4) D: "2007-02-08 22:18:43 PST"
> id:12 ( 1) S: "1"
> type:13 ( 7) S: "product"
[matt@test swish-test]$ cat c
ParserWarnLevel 0
MetaNames id type
PropertyNames id type
[matt@test swish-test]$ cat doc.html
<h1>hello</h1>
<id>1</id>
<name>hi</name>
<type>product</type>
[matt@test swish-test]$ /opt/swish-e-2.4.5/bin/swish-e -c c -i doc.html
-T indexed_words properties -v0
Adding:[1:swishdefault(1)] 'hello' Pos:1 Stuct:0x21 ( HEADING
FILE )
Adding:[1:swishdefault(1)] '1' Pos:2 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'hi' Pos:3 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'product' Pos:4 Stuct:0x1 ( FILE )
swishdocpath: 6 ( 8) S: "doc.html"
swishdocsize: 8 ( 4) N: "64"
swishlastmodified: 9 ( 4) D: "2007-02-09 23:59:35 EST"
[matt@test swish-test]$
One thing I'm noticing is the first thing to get indexed is HEADING
FILE, where as in your indexing its HEADING BODY FILE. By putting <body>
tags around the html I can get it to say that, but I still cant get the
<id> tag or the type tag to index as a META BODY FILE like yours.
I'm not doing anythnig on the configure line except for the --prefix
switch (./configure --prefix=/opt/swish-e-2.4.5). Is it likely that
running swish-e from an alternate directory like this is breaking it
somehow (library reference?). I really dont want to mess with the
current production setup thats working, so ideally I'de like to be able
to use this alternate directory. If you thing I need to try it without
the prefix, let me know and I'll see if I can get hold of another machine.
To help me track it down further, would you think this is an indexing
thing, or a HTML parsing thing? I read in an archived mail something
about the differences between HTML and HTML2. Any help is much
appreciated, if I could narrow down the field of code to search through
that would be fantastic.
Thats all I can really think of. As I said any help is much appreciated.
Thankyou for your time (oh, and for provideing such an awsome tool :)
Matt.
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Feb 9 09:14:12 2007