On Thu, Oct 02, 2003 at 03:04:10PM -0700, Peter Karman wrote:
> I have an HTML document that contains this markup:
>
>
> <tt CLASS="literal">
> -h
> [
> <span CLASS="optional">
> no
> </span>
> ]
> aggress
> </tt>
>
>
> And I would like a search for the following phrase to find that doc:
>
> "-h [no]aggress"
I guess this is a bug.
moseley@bumby:~$ cat t.xml
<xml>
<tt CLASS="literal">
-h
[
<span CLASS="optional">
no
</span>
]
aggress
</tt>
</xml>
Here's with the XML parser:
moseley@bumby:~$ swish-e -c c -i t.xml -T indexed_words -v0
Adding:[1:swishdefault(1)] 'h' Pos:3 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'no' Pos:5 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'aggress' Pos:7 Stuct:0x1 ( FILE )
The thing to note is that the word position got bumped due to the tag.
If I use the XML2 parser I get:
moseley@bumby:~$ swish-e -c c -i t.xml -T indexed_words -v0
Adding:[1:swishdefault(1)] 'h' Pos:7 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'no' Pos:8 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'aggress' Pos:9 Stuct:0x1 ( FILE )
moseley@bumby:~$ swish-e -w '"-h [no] aggress"' -H0
1000 t.xml "t.xml" 93
Can you use libxml2?
--
Bill Moseley
moseley@hank.org
Received on Fri Oct 3 00:05:52 2003