Skip to main content.
home | support | download

Back to List Archive

Re: <swishdecription> returning blank?

From: Antonio Barrera <abarrera(at)not-real.princeton.edu>
Date: Thu Oct 28 2004 - 14:14:02 GMT
Would this apply similarly to using xpdf to parse PDF docs?

IndexContents HTML* .htm .html .shtml .php
IndexContents TXT*  .txt .log .text .pdf
IndexContents XML*  .xml

StoreDescription TXT* 10000
StoreDescription HTML* <body>

Thanks,
Antonio

-----Original Message-----
From: swish-e@sunsite3.berkeley.edu [mailto:swish-e@sunsite3.berkeley.edu]
On Behalf Of Bill Moseley
Sent: Thursday, October 28, 2004 10:06 AM
To: Multiple recipients of list
Subject: [SWISH-E] Re: <swishdecription> returning blank?

On Thu, Oct 28, 2004 at 12:47:04AM -0700, Tim Hartley wrote:
> Ok I just tried swapping StoreDescription HTML2 <tbody> 2000, with the 
> extra line PropertyNames tbody added also, and I'm still getting the 
> blanks where swishdescription should be. My config file now looks 
> like:

Look at the StoreDescription line.  It's saying store all HTML2 type of
files (HTML2 is one of the available parsers).  But you are not telling
swish that .asp is a HTML2 type of file.

This is a confusing issue because swish will use the HTML2 parser for
parsing by default but that doesn't mean the document is classified as a
HTML2 type of document.

You need to used either DefaultContents or IndexContents.

Go back and look at the docs of the examples of StoreDescription.  It Let me
know if there's a place you were looking that doesen't use DefaultContents
or IndexContents so that can be fixed.


> ---Config file---
> IndexFile c:\swish-e\ForumVirtualIndex.index
> IndexDir C:/Inetpub/VirtualRoot/planetpdfforumarchive
> IndexOnly .asp
> StoreDescription HTML2 <tbody> 2000
> FileRules filename is forum6\.asp
> FileRules filename is forum52\.asp
> FileRules filename is forum2\.asp
> FileRules filename is forum3\.asp
> FileRules filename is forum5\.asp
> FileRules filename is forum34\.asp
> FileRules filename is forum9\.asp
> FileRules filename is forum68\.asp
> FileRules filename is forum18\.asp
> FileRules filename is forum73\.asp
> FileRules filename is forum4\.asp
> FileRules filename is forum7\.asp
> FileRules filename is forum12\.asp
> FileRules filename is attachlist\.asp
> IndexReport 3
> PropertyNames tbody
> ReplaceRules Replace "C:/Inetpub/VirtualRoot/planetpdfforumarchive"
"http://www.planetpdf.com/forumarchive"
> ---end file---
> 
> --excerpt of results----
> (of the format 
> <swishrank>|<swishtitle>|<swishdocpath>|<swishlastmodified>|<swishdesc
> ription>)
> 
> # SWISH format: 2.4.2 # Search words: eat # Removed stopwords: # 
> Number of hits: 30 # Search time: 0.110 seconds # Run time: 0.125 
> seconds
> 1000|Planet PDF Forum Archive - Is that all there is to 
> 1000|it?|http://www.planetpdf.com/forumarchive/23527.asp|2004-09-03 
> 1000|13:23:52 AUS Eastern Standard Time|
> 633|Planet PDF Forum Archive - 
> 633|explain|http://www.planetpdf.com/forumarchive/91118.asp|2004-09-03 
> 633|13:06:36 AUS Eastern Standard Time|
> 
> Note the blanks after the <swishlastmodified>| section. :(
> 
> -t
> 
> 
> 
> -----Original Message-----
> From: Peter Karman [mailto:karpet@peknet.com]
> Sent: Thursday, 28 October 2004 5:01 PM
> To: Tim Hartley
> Cc: Multiple recipients of list
> Subject: Re: [SWISH-E] <swishdecription> returning blank?
> 
> 
> 
> 
> Tim Hartley wrote on 10/27/04 9:07 PM:
> 
> > Hi Bill, all.
> > 
> > I'm using the File Access method to index a folder of .asp files (as 
> > it can do it waaaaaaaay quicker than spidering them) Anyway, I'm having
problems in that it doesn't seem to be getting any values in the
<swishdescription>, so my results are coming back with the
<swishrank><swishtitle><swishdocpath><swishlastmodified>, but NOT
<swishdescription>. All my other indexes return it. Mind you they use either
the dirtree.pl or swishspider.pl to create the indexes. Anyway, details
follow:
> 
> > StoreDescription HTML2 <swishdescription> 2000
> 
> 
> I believe that the StoreDescription <tag> syntax is for the tag in the 
> source you want to include from, not for the swish property name. Is 
> there a tag called 'swishdescription' in your source files? otherwise, 
> you likely want <body> instead.
> 

--
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Thu Oct 28 07:14:02 2004