Re: <swishdecription> returning blank?

From: Antonio Barrera <abarrera(at)>
Date: Thu Oct 28 2004 - 14:14:02 GMT
Would this apply similarly to using xpdf to parse PDF docs?

IndexContents HTML* .htm .html .shtml .php
IndexContents TXT*  .txt .log .text .pdf
IndexContents XML*  .xml

StoreDescription TXT* 10000
StoreDescription HTML* <body>


On Thu, Oct 28, 2004 at 12:47:04AM -0700, Tim Hartley wrote:
> Ok I just tried swapping StoreDescription HTML2 <tbody> 2000, with the 
> extra line PropertyNames tbody added also, and I'm still getting the 
> blanks where swishdescription should be. My config file now looks 
> like:

Look at the StoreDescription line.  It's saying store all HTML2 type of
files (HTML2 is one of the available parsers).  But you are not telling
swish that .asp is a HTML2 type of file.

This is a confusing issue because swish will use the HTML2 parser for
parsing by default but that doesn't mean the document is classified as a
HTML2 type of document.

You need to used either DefaultContents or IndexContents.

Go back and look at the docs of the examples of StoreDescription.  It Let me
know if there's a place you were looking that doesen't use DefaultContents
or IndexContents so that can be fixed.

> ---Config file---
> IndexFile c:\swish-e\ForumVirtualIndex.index
> IndexDir C:/Inetpub/VirtualRoot/planetpdfforumarchive
> IndexOnly .asp
> StoreDescription HTML2 <tbody> 2000
> FileRules filename is forum6\.asp
> FileRules filename is forum52\.asp
> FileRules filename is forum2\.asp
> FileRules filename is forum3\.asp
> FileRules filename is forum5\.asp
> FileRules filename is forum34\.asp
> FileRules filename is forum9\.asp
> FileRules filename is forum68\.asp
> FileRules filename is forum18\.asp
> FileRules filename is forum73\.asp
> FileRules filename is forum4\.asp
> FileRules filename is forum7\.asp
> FileRules filename is forum12\.asp
> FileRules filename is attachlist\.asp
> IndexReport 3
> PropertyNames tbody
> ReplaceRules Replace "C:/Inetpub/VirtualRoot/planetpdfforumarchive"
> ---end file---
> --excerpt of results----
> (of the format 
> <swishrank>|<swishtitle>|<swishdocpath>|<swishlastmodified>|<swishdesc
> ription>)
> # SWISH format: 2.4.2 # Search words: eat # Removed stopwords: # 
> Number of hits: 30 # Search time: 0.110 seconds # Run time: 0.125 
> seconds
> 1000|Planet PDF Forum Archive - Is that all there is to 
> 1000|it?||2004-09-03 
> 1000|13:23:52 AUS Eastern Standard Time|
> 633|Planet PDF Forum Archive - 
> 633|explain||2004-09-03 
> 633|13:06:36 AUS Eastern Standard Time|
> Note the blanks after the <swishlastmodified>| section. :(
> -t
> > Hi Bill, all.
> > 
> > I'm using the File Access method to index a folder of .asp files (as 
> > it can do it waaaaaaaay quicker than spidering them) Anyway, I'm having
problems in that it doesn't seem to be getting any values in the
<swishdescription>, so my results are coming back with the
<swishrank><swishtitle><swishdocpath><swishlastmodified>, but NOT
<swishdescription>. All my other indexes return it. Mind you they use either
the or to create the indexes. Anyway, details
> > StoreDescription HTML2 <swishdescription> 2000
> I believe that the StoreDescription <tag> syntax is for the tag in the 
> source you want to include from, not for the swish property name. Is 
> there a tag called 'swishdescription' in your source files? otherwise, 
> you likely want <body> instead.

