Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] PropertyNames not being indexed

From: Matt Paine <matt(at)not-real.mattsoftware.com>
Date: Fri Feb 09 2007 - 14:14:40 GMT
Thankyou for the reply Bill.



> 
> Well, you have provided great detail, but I'm not seeing the same
> thing.  It's odd.
> 

It certainly is odd, I changed my config and doc file to exactly what 
you have, yet there are a few different things in my index output........


> $ cat c 
> ParserWarnLevel 0
> MetaNames id type
> PropertyNames id type
> 
> $ cat doc.html
> <h1>hello</h1>
> <id>1</id>
> <name>hi</name>
> <type>product</type>
> 
> $ swish-e -c c -i doc.html -T indexed_words properties -v0
>     Adding:[1:swishdefault(1)]   'hello'   Pos:5  Stuct:0x29 ( HEADING BODY FILE )
>     Adding:[1:id(10)]   '1'   Pos:8  Stuct:0x89 ( META BODY FILE )
>     Adding:[1:swishdefault(1)]   'hi'   Pos:12  Stuct:0x9 ( BODY FILE )
>     Adding:[1:type(11)]   'product'   Pos:14  Stuct:0x89 ( META BODY FILE )
>           swishdocpath: 6 (  8) S: "doc.html"
>           swishdocsize: 8 (  4) N: "64"
>      swishlastmodified: 9 (  4) D: "2007-02-08 22:18:43 PST"
>                     id:12 (  1) S: "1"
>                   type:13 (  7) S: "product"


[matt@test swish-test]$ cat c
ParserWarnLevel 0
MetaNames id type
PropertyNames id type

[matt@test swish-test]$ cat doc.html
<h1>hello</h1>
<id>1</id>
<name>hi</name>
<type>product</type>

[matt@test swish-test]$ /opt/swish-e-2.4.5/bin/swish-e -c c -i doc.html 
-T indexed_words properties -v0
     Adding:[1:swishdefault(1)]   'hello'   Pos:1  Stuct:0x21 ( HEADING 
FILE )
     Adding:[1:swishdefault(1)]   '1'   Pos:2  Stuct:0x1 ( FILE )
     Adding:[1:swishdefault(1)]   'hi'   Pos:3  Stuct:0x1 ( FILE )
     Adding:[1:swishdefault(1)]   'product'   Pos:4  Stuct:0x1 ( FILE )
           swishdocpath: 6 (  8) S: "doc.html"
           swishdocsize: 8 (  4) N: "64"
      swishlastmodified: 9 (  4) D: "2007-02-09 23:59:35 EST"
[matt@test swish-test]$





One thing I'm noticing is the first thing to get indexed is HEADING 
FILE, where as in your indexing its HEADING BODY FILE. By putting <body> 
tags around the html I can get it to say that, but I still cant get the 
<id> tag or the type tag to index as a META BODY FILE like yours.

I'm not doing anythnig on the configure line except for the --prefix 
switch (./configure --prefix=/opt/swish-e-2.4.5). Is it likely that 
running swish-e from an alternate directory like this is breaking it 
somehow (library reference?). I really dont want to mess with the 
current production setup thats working, so ideally I'de like to be able 
to use this alternate directory. If you thing I need to try it without 
the prefix, let me know and I'll see if I can get hold of another machine.


To help me track it down further, would you think this is an indexing 
thing, or a HTML parsing thing? I read in an archived mail something 
about the differences between HTML and HTML2. Any help is much 
appreciated, if I could narrow down the field of code to search through 
that would be fantastic.


Thats all I can really think of. As I said any help is much appreciated. 
Thankyou for your time (oh, and for provideing such an awsome tool :)



Matt.


_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Feb 9 09:14:12 2007