At 08:54 1998-07-13 -0700, Robert Rothenberg wrote:
>Is there any way to tell SWISH-E to ignore the <!DOCTYPE > declaration of
>SGML/XML) files? It's particularly irritating to find out that the word
"PUBLIC" is too
>common to be indexed because it occurs in the declaration.
Not with the current version at least: it detects "comments" by looking for
<! only (and is not looking for -- --); currently there is also no way to
tell the program to NOT index comments.
>For that matter, it'd be a good idea to have an option to ignore SGML
functions such as
><!CDATA>, <!IGNORE>, etc. if there is not already an option to do so.
I'm working on the code (which I started doing because I wanted some added
functionality but I soon spotted some problems as well); currently I have a
reasonably stable intermediate version which solves this and a number of
other problems. If you or anyone else is interested, let me know - I have a
ZIP file with all the source (all my changes commented) and a readme
outlining changes and improvements; I can mail it or post it somewhere.
Please realize that this is NOT in anyway finished but if you want to use
or test the code, feel free. I don't give real support (or garantees!) for
this, but I certainly would appreciate comments.
BTW, ignoring <!CDATA> etc. is certainly covered in my version simply
because such tags are no longer treated as comments (you can also ignore
Meanwhile I'm working on the next stage...
Marjolein Katsma firstname.lastname@example.org
Java Woman - http://javawoman.com/
Received on Mon Jul 13 09:45:07 1998