Skip to main content.
home | support | download

Back to List Archive

Re: Re: Config files and spider.pl

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Jun 03 2004 - 16:45:24 GMT
On Thu, Jun 03, 2004 at 12:22:38PM -0400, adivey1@cox.net wrote:
> Documentation not too helpful :(

Nor the question.  ;)

Here's the secret when having a problem.  Create a **short** config
containing only the commands you are having problems with and a short source file and 
copy-n-paste that plus the command you are using and its output to the list.  That way
it can easily be duplicated by others.

> In my SWISH-E configuration file, I have
> 
> StoreDescription HTML* <body> 20000
> 
> Using spider.pl, with -v 3, I can see that it's using the HTML2
> parser. So shouldn't it then pass that onto my StoreDescription line?
> When I run a swish.cgi through my browser, I don't get any text to
> highlight (summary, whatever you wanna call it) for anything other
> than HTML files. I haven't tried TXT or PPT, but PDF and DOC don't
> have anything. They don't have null though.

Here's another tip when using swish-e.  You described four parts above:
1) the spider, 2) swish-e, 3) the swish.cgi script, 4) the web server
running the swish.cgi script.

What I do is

1) run the spider and look at its output -- make sure swish-e seeing
what I expect as input

2) run swish-e and use "-T indexed_words properties" to make sure what I
think is indexed is really in the index.

3) run swish.cgi from the command line and check output.  You can make
swish.cgi tell you what command it's passing to swish, so you can then
run that command directly.

> Anyway, what do I have to change, and where do I change it, to get the
> snapshot (summary, highlightable area, etc) to display on the search
> page?

Well, back to that "not too helpful" documentation:

The docs for StoreDescription:

  http://swish-e.org/current/docs/SWISH-CONFIG.html#item_StoreDescription

      Again, note that documents must be assigned a document type with
      IndexContents or DefaultContents to use this feature.

Since you didn't post a complete example I can't be sure, but I'll guess
you didn't follow that advice.

That catches enough people that I wonder why it can't default to
DefaultContents HTML*?  Maybe it's in there just to make people suffer
reading the docs.



-- 
Bill Moseley
moseley@hank.org
Received on Thu Jun 3 09:45:24 2004