Skip to main content.
home | support | download

Back to List Archive

parsing question

From: Peter Karman <karman(at)not-real.cray.com>
Date: Thu Oct 02 2003 - 22:05:10 GMT
I have an HTML document that contains this markup:


<tt CLASS="literal">
-h
  [
<span CLASS="optional">
no
</span>
]
aggress
</tt>


And I would like a search for the following phrase to find that doc:

"-h [no]aggress"

But it doesn't.

I am using SWISH-E 2.2.3 with the expat parser (not libxml2).

So this got me to wondering: does text that is broken up by tags get 
indexed (or maybe, is it find-able)? A search for:

[no]aggress

does not match either. I have other docs where the markup is like this:

<tt CLASS="LITERAL">-h [no]aggress</tt>

and those docs ARE indexed and show up in a search.

If this IS indeed the default behaviour (I have a very minimal config 
file), is there a way to fix it with a config setting? Or should I be 
stripping out "trivial" HTML markup as part of a -S filter or something?

Thanks for any suggestions.

pek

-- 
Peter Karman - Software Publications Programmer - Cray Inc
phone: 651-605-9009 - mailto:karman@cray.com
Received on Thu Oct 2 22:05:13 2003