At 01:50 PM 01/29/02 -0800, Gordon Jessop wrote:
>what is the difference between DefaultContents HTML and DefaultContents
HTML2?
DefaultContents and IndexContents map a file extension to a parser. From
the docs:
DefaultContents [TXT|HTML|XML|WML|TXT2|HTML2|XML2]
This sets the default parser for documents that are not specified in
IndexContents. If not specified the default is HTML.
The XML2, HTML2, and TXT2 parsers are currently only available when swish
is configured to use libxml2.
Example:
DefaultContents HTML
The DefaultContents directive should be used when spidering, as HTML files
may be returned without a file extension (such as when requesting a
directory and the default index.html is returned).
As long as I'm here, I'll mention that although the default is the html
parser, that doesn't mean that the default document type is html. What
that means if you don't use DefaultContents or IndexContents files will be
parsed by the HTML parser, but things like StoreDescription that needs a
file type won't work.
In other words if you have a config like:
StoreDescription HTML <body> 1000
and you index files it will still say:
Using DEFAULT (HTML) parser
but the file type has not been assigned so the description will not be stored.
But,
StoreDescription HTML <body> 1000
DefaultContents HTML
indexing will report
Using HTML parser
and store the description as expected.
I'm not sure if that should be considered a bug or not....
--
Bill Moseley
mailto:moseley@hank.org
Received on Tue Jan 29 23:43:21 2002