Would this apply similarly to using xpdf to parse PDF docs?
IndexContents HTML* .htm .html .shtml .php
IndexContents TXT* .txt .log .text .pdf
IndexContents XML* .xml
StoreDescription TXT* 10000
StoreDescription HTML* <body>
Thanks,
Antonio
-----Original Message-----
From: swish-e@sunsite3.berkeley.edu [mailto:swish-e@sunsite3.berkeley.edu]
On Behalf Of Bill Moseley
Sent: Thursday, October 28, 2004 10:06 AM
To: Multiple recipients of list
Subject: [SWISH-E] Re: <swishdecription> returning blank?
On Thu, Oct 28, 2004 at 12:47:04AM -0700, Tim Hartley wrote:
> Ok I just tried swapping StoreDescription HTML2 <tbody> 2000, with the
> extra line PropertyNames tbody added also, and I'm still getting the
> blanks where swishdescription should be. My config file now looks
> like:
Look at the StoreDescription line. It's saying store all HTML2 type of
files (HTML2 is one of the available parsers). But you are not telling
swish that .asp is a HTML2 type of file.
This is a confusing issue because swish will use the HTML2 parser for
parsing by default but that doesn't mean the document is classified as a
HTML2 type of document.
You need to used either DefaultContents or IndexContents.
Go back and look at the docs of the examples of StoreDescription. It Let me
know if there's a place you were looking that doesen't use DefaultContents
or IndexContents so that can be fixed.
> ---Config file---
> IndexFile c:\swish-e\ForumVirtualIndex.index
> IndexDir C:/Inetpub/VirtualRoot/planetpdfforumarchive
> IndexOnly .asp
> StoreDescription HTML2 <tbody> 2000
> FileRules filename is forum6\.asp
> FileRules filename is forum52\.asp
> FileRules filename is forum2\.asp
> FileRules filename is forum3\.asp
> FileRules filename is forum5\.asp
> FileRules filename is forum34\.asp
> FileRules filename is forum9\.asp
> FileRules filename is forum68\.asp
> FileRules filename is forum18\.asp
> FileRules filename is forum73\.asp
> FileRules filename is forum4\.asp
> FileRules filename is forum7\.asp
> FileRules filename is forum12\.asp
> FileRules filename is attachlist\.asp
> IndexReport 3
> PropertyNames tbody
> ReplaceRules Replace "C:/Inetpub/VirtualRoot/planetpdfforumarchive"
"http://www.planetpdf.com/forumarchive"
> ---end file---
>
> --excerpt of results----
> (of the format
> <swishrank>|<swishtitle>|<swishdocpath>|<swishlastmodified>|<swishdesc
> ription>)
>
> # SWISH format: 2.4.2 # Search words: eat # Removed stopwords: #
> Number of hits: 30 # Search time: 0.110 seconds # Run time: 0.125
> seconds
> 1000|Planet PDF Forum Archive - Is that all there is to
> 1000|it?|http://www.planetpdf.com/forumarchive/23527.asp|2004-09-03
> 1000|13:23:52 AUS Eastern Standard Time|
> 633|Planet PDF Forum Archive -
> 633|explain|http://www.planetpdf.com/forumarchive/91118.asp|2004-09-03
> 633|13:06:36 AUS Eastern Standard Time|
>
> Note the blanks after the <swishlastmodified>| section. :(
>
> -t
>
>
>
> -----Original Message-----
> From: Peter Karman [mailto:karpet@peknet.com]
> Sent: Thursday, 28 October 2004 5:01 PM
> To: Tim Hartley
> Cc: Multiple recipients of list
> Subject: Re: [SWISH-E] <swishdecription> returning blank?
>
>
>
>
> Tim Hartley wrote on 10/27/04 9:07 PM:
>
> > Hi Bill, all.
> >
> > I'm using the File Access method to index a folder of .asp files (as
> > it can do it waaaaaaaay quicker than spidering them) Anyway, I'm having
problems in that it doesn't seem to be getting any values in the
<swishdescription>, so my results are coming back with the
<swishrank><swishtitle><swishdocpath><swishlastmodified>, but NOT
<swishdescription>. All my other indexes return it. Mind you they use either
the dirtree.pl or swishspider.pl to create the indexes. Anyway, details
follow:
>
> > StoreDescription HTML2 <swishdescription> 2000
>
>
> I believe that the StoreDescription <tag> syntax is for the tag in the
> source you want to include from, not for the swish property name. Is
> there a tag called 'swishdescription' in your source files? otherwise,
> you likely want <body> instead.
>
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Thu Oct 28 07:14:02 2004