Skip to main content.
home | support | download

Back to List Archive

Re: Ignoring tags of a certain class.

From: Bill Moseley <moseley(at)>
Date: Thu Sep 04 2003 - 17:07:19 GMT
I did a quick look at parser.c and just tried this:

RCS file: /cvsroot/swishe/swish-e/src/parser.c,v
retrieving revision 1.48
diff -u -r1.48 parser.c
--- parser.c    4 Sep 2003 04:02:40 -0000       1.48
+++ parser.c    4 Sep 2003 16:58:46 -0000
@@ -659,7 +659,7 @@
     /* Index the content of attributes */
-    if ( !parse_data->parsing_html && attr )
+    if (  attr )
         int class_found = 0;

That starts out the bit of code that handles XMLClassAttributes.  So if 
you make that change you may be able to do what you want when indexing 
HTML files.

I'm not sure why I limited to non-html files.  I might have worried 
about HTML's extensive use of attributes on tags so I didn't want to 
check every attribute for a metaname.  Also, XMLClassAttributes was 
added because someone needed it in processing their XML files.

I'm going to leave it disabled for HTML now, but if you do make that 
change it might be helpful if you report back any problems or notice 
that it's slower in indexing.

Nice thing about open source software is you can make a change to fit 
your needs.

Bill Moseley
Received on Thu Sep 4 17:07:34 2003