At 09.07.2002 08:57 -0700, you wrote:
>At 02:20 PM 07/09/02 +0200, Guido Adam wrote:
>[...]
> >And the metatags are not read, if you leave out IndexContents.
>
>Both metanames and propertynames work for me without IndexContents. You
>have to using a parser that knows how to extract out the metanames. The
>default parser is HTML if you do not specify a parser, and that will parse
><meta> tags only (not fake html <tag> meta tags). If you had a header
>Document-Type: TXT then it won't parse the metanames.
>
>[hum, I think the default parser should be HTML2 if available]
>
> >My database records contain html pages.
> >
> >Looks like the "Document-Type:" field is not read correctly by the indexer,
> >if you use the "-S prog" switch. The indexer should use that field and not
> >the filetype it extracts from the URL.
>
>Check again. If you have in the -S prog program's output:
>
> Path-Name: foo.html
> Document-Type: HTML2
>
>and in your swish config you say:
>
> IndexContents TXT .html
>
>it will still use the header specified in the prog's headers (HTML2), not
>the TXT parser.
Okay.
Here we are: I changed my Document-Types from HTML to HTML2 and all is as
you say.
I can leave out IndexContents and meta tags and properties are as they
should be.
The HTML parser seems to have problems here.
That knotty.
Guido
>--
>Bill Moseley
>mailto:moseley@hank.org
Received on Tue Jul 9 18:10:47 2002