At 10:56 AM 09/11/02 -0700, Jody Cleveland wrote:
>> Regardless, switch to <swishdocpath> and <swishdescription> would probably
>fix.
>
>I'm not understanding. Switch it where?
http://swish-e.org/2.2/docs/SWISH-RUN.html#Searching_Command_Line_Arguments
See the section on -x.
>> You need two config options. I think this is all described in the
>> swish.cgi docs, too.
>>
>> IndexContents HTML .html
>> StoreDescription HTML <body>
>
>When I index, I run swish-e -S prog -c spider.config
>In that config file, I have this:
>StoreDescription HTML <body> 200000
>DefaultContents HTML
>IndexContents HTML2 .htm .html
>IndexContents TXT .txt .conf
So all your .htm, .html are type HTML2, and .txt and .conf are type TXT,
but StoreDescription is only saving the <body> for docs of type HTML.
I'd try:
DefaultContents HTML2
IndexContents TXT2 .txt .conf
StoreDescription HTML2 <body> 200000
StoreDescription TXT2 200000
That's saying all docs are HTML2, with the exception of .txt and .conf
which are TXT2. And then two Store Description's are needed because docs
are not of type HTML2 or type TXT2.
>Which I believe I took right from the docs.
Quite possible. I'll fix if you can point it out.
Sorry for all confusion about the document types. That's all due to having
two sets of parsers possible -- not to mention that we talk about HTML docs
in the general sense, and also HTML and HTML2 "types" as far as swish-e
processing is concerned.
--
Bill Moseley
mailto:moseley@hank.org
Received on Wed Sep 11 18:25:27 2002