Skip to main content.
home | support | download

Back to List Archive

Re: DEFAULT_CONFIG_FILE in 2.2 question

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Sep 11 2002 - 16:54:28 GMT
At 09:37 AM 09/11/02 -0700, Jody Cleveland wrote:
>>     swish-e -w not dkdkd -m 2 -x "%%p %%d\n"
>
>Ran great, got this:
># SWISH format: 2.2rc1
># Search words: not dkdkd
># Number of hits: 337
># Search time: 0.040 seconds
># Run time: 3.645 seconds
>%p %d
>%p %d
>.

I'm testing on windows 98, so maybe you have yet another different windows
shell.  Regardless, switch to <swishdocpath> and <swishdescription> would
probably fix.  Someone that knows windows should be answering how to quote
% and what type of quote chars work.

>> You should install bash on your Windows machine.  Command.com 
>> is no place to be.
>
>Didn't realize you could install that on Windows. Is cygwin.com the place to
>get it?

Sounds right.  I just plugged in "bash windows" into google and that's what
showed up first.  (Of course, for writing Web application and running
Apache I'd install Linux.)


>> E:\Program Files\SWISH-E2.2rc1>swish-e -b 2 -m 1 -T index_files
>
>showed path, title, size, NO description

Well, there's your problem.

>I searched the docs, and the only place I saw where it mentioned the
>description is in the config file for indexing. Does it go somewhere else,
>or am I missing something else?

You need two config options.  I think this is all described in the
swish.cgi docs, too.

IndexContents HTML .html
StoreDescription HTML <body>

The first says .html are of doc type HTML.  That tells swish which parser
to use.

Then StoreDescription says to extract the <body> out of docs that are type
HTML.

Note, if you don't use IndexContents or DefaultContents, then documents do
not have a type assigned, and even though they are parsed by the HTML
parser by default, StoreDescription won't work because the docs are not
type HTML.

Index a test document with -T properties and you can make sure that a
description is being added.  Also, -v3 or some such setting will show you
what parser is being used for each document as it's indexed.

Also note, for future reference, in CVS since yesterday you can specify the
types as HTML* (and XML*, TXT*) and swish-e will use the libxml2 parser
(HTML2) if installed, otherwise it will use the HTML parser.  And the
default parser will be the libxml2 HTML2 parser if the document is not
assigned a type.

Most people won't care about all that since they know what they installed,
but it makes it handy for me to make the spider.pl program automatically
select the correct parser based on the Content-Type: header returned from
web servers.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Wed Sep 11 16:58:00 2002