Skip to main content.
home | support | download

Back to List Archive

Re: swish-e-2.1-dev-25-2002-01-09

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Jan 11 2002 - 15:51:47 GMT
At 07:23 AM 01/11/02 -0800, W. Addy Majewski wrote:
>When indexing with "StoreDescription HTML <description> xxx", <script> 
>tag content within the first xxx characters of the document is included 
>and printed in the results page (documents with no meta description 
>only). Scripts within the xxx chars are also serchable. A bug?

Likely.  Are you using the HTML or HTML2 parser?

I see in html.c this comment:

//$$$$ Todo: remove tag and content of scripts, css, java, embeddedobjects,
comments, etc

I'm a bit surprised that you can't block it with <!-- --> comments.  And
also that it's not being blocked by the IgnoreMetaTags directive.

Let me look at the code.

In the mean time, if you are not using HTML2 (libxml2) you might download
that so you can build swish with it.  It will be a better parser than the
built-in parser.  Plus, I'll probably only be able to fix the HTML2 parser.



-- 
Bill Moseley
mailto:moseley@hank.org
Received on Fri Jan 11 15:52:27 2002