Skip to main content.
home | support | download

Back to List Archive

Re: MetaName search not working, yet

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Jan 29 2002 - 23:42:18 GMT
At 01:50 PM 01/29/02 -0800, Gordon Jessop wrote:
>what is the difference between DefaultContents HTML and DefaultContents
HTML2?

DefaultContents and IndexContents map a file extension to a parser.  From
the docs:

DefaultContents [TXT|HTML|XML|WML|TXT2|HTML2|XML2]

This sets the default parser for documents that are not specified in
IndexContents. If not specified the default is HTML. 

The XML2, HTML2, and TXT2 parsers are currently only available when swish
is configured to use libxml2. 

Example: 

       DefaultContents HTML
 


The DefaultContents directive should be used when spidering, as HTML files
may be returned without a file extension (such as when requesting a
directory and the default index.html is returned). 


As long as I'm here, I'll mention that although the default is the html
parser, that doesn't mean that the default document type is html.  What
that means if you don't use DefaultContents or IndexContents files will be
parsed by the HTML parser, but things like StoreDescription that needs a
file type won't work.

In other words if you have a config like:

    StoreDescription HTML <body> 1000

and you index files it will still say:

    Using DEFAULT (HTML) parser

but the file type has not been assigned so the description will not be stored.
But,

    StoreDescription HTML <body> 1000
    DefaultContents HTML

indexing will report

    Using HTML parser

and store the description as expected.

I'm not sure if that should be considered a bug or not....



-- 
Bill Moseley
mailto:moseley@hank.org
Received on Tue Jan 29 23:43:21 2002