Skip to main content.
home | support | download

Back to List Archive

Some Questions about 2.2RC1 and XML

From: Bill Humphries <whump(at)not-real.apple.com>
Date: Fri Sep 06 2002 - 00:46:39 GMT
I've built 2.2RC1 with LIBXML 2.23.4 on Mac OS 10.1.5, and have been 
experimenting with it the past two days.

I have a few questions about swish-e's XML support:

1) I plan to return search results as XML, however, in the Configuration 
File Directives (http://swish-e.org/2.2/docs/SWISH-
CONFIG.html#Document_Contents_Directives) it appears that entities in XML 
documents are evaluated regardless of the value of ConvertHTMLEntities:

	"NOTE: Entities within XML files and files parsed with libxml2 are 
converted regardless of this setting."

My current workaround for this is to build an XML result string, then pass 
it through Tidy (http://tidy.sourceforge.net/) to re-escape entities.

I'd rather not do this if at all possible.

2) I'm indexing XML source documents in the file system. I can use the 
configuration to use the first 100 characters of the document's root 
element, 'page', as the description:

PropertyNamesMaxLength 100 swishdescription
PropertyNameAlias swishdescription page

However, when swish-e constructs the index, it's taking the attribute 
values, as well as the text nodes of 'page'.

It's not clear how I could turn that off in the configuration file.

I'd also like to specify a location in the document to use as the 
description, ie /page/section[1]/para[1].

The workaround here would be to use the prog method to load pages and use 
some xpath tool to extract that location and use as the page description.

Thanks,

----
Bill Humphries <whump@apple.com>
Webmaster, HR Systems
Apple Computer
Received on Fri Sep 6 00:50:12 2002