Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Having trouble trying to ignore invalid tags in HTML docs

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Mar 02 2007 - 00:05:47 GMT
On Thu, Mar 01, 2007 at 02:58:15PM -0800, Kathleen Vignos wrote:
> I've tried the following in the config file (swish.conf), with IgnoreWords
> by itself, then IgnoreMetaTags by itself, then added Undefined MetaTags.  I
> get the exact same results/errors each time.  I also tried commenting out
> "DefaultContents HTML*" and also got the same results/errors (shown at the
> bottom of this message).
> 
> # Tell swish-e what to index
> IndexDir /usr/local/apache/htdocs/documents/
> 
> # Only index HTML files
> IndexOnly .htm .html
> 
> # Use the HTML parser
> DefaultContents HTML*
> 
> # Ignore words list
> IgnoreWords /usr/local/apache/swish-e-2.4.5/ignorewords.txt
> 
> # Ignore certain tags
> IgnoreMetaTags DOCUMENT FILENAME DESCRIPTION SEQUENCE
> UndefinedMetaTags ignore
> 
> I continue to get the following error messages:
> 
> 
> /usr/local/apache/htdocs/documents/doc.htm:1: error: Tag document invalid
> <DOCUMENT>
>          ^

Ya, that's an error from libxml2.  IgnoreMetaTags doesn't disable
libxml2's warnings -- for that you need to change ParserWarnLevel.

    ParserWarnLevel 1



-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Mar 1 19:02:44 2007