Skip to main content.
home | support | download

Back to List Archive

Re: Indexing xml files that has another included xml

From: Bernhard Weisshuhn <bkw(at)not-real.weisshuhn.de>
Date: Fri Sep 10 2004 - 14:08:47 GMT
On Fri, Sep 10, 2004 at 06:54:30AM -0700, Peter Karman <karman@cray.com> wrote:

> Bill Moseley wrote on 9/9/04 2:04 AM:
> 
> > Which, of course, we use the SAX interface.  I also see on
> > 
> >   http://www.xmlsoft.org/html/index.html
> > 
> > that our SAX usage of libxml2 is deprecated.  Looks like a trip to the
> > xml list might be in my future.
> 
> If you do consider rewriting swish-e to use the DOM interface, consider 
> making it optional/configurable. I suspect that folks use swish-e with 
> XML that might be derived from a database (which SAX seems better for), 
> as well as 'real' XML documents (which DOM seems better for -- as in 
> this case with resolving entities).

I seriously doubt whether using the DOM interface would solve more problems
than it would create. Some xml files get *hughe*, and might be indexed
for exactly that reason. Indexing hughe files via DOM will drive
indexing speed down and resource requirements up. Maintaining both
interfaces within swish-e drives the load on our cherished developers
up, something we also don't want, do we?

I personally find filtering stuff through xmllint acceptable, swish-e
users are used to filter all kinds of documents prior to indexing.

just my 2 cents though,
  Bernie
Received on Fri Sep 10 07:09:15 2004