Skip to main content.
home | support | download

Back to List Archive

Re:

From: Shaffer, Chris <Chris.Shaffer(at)not-real.bellsouth.com>
Date: Fri Feb 11 2005 - 19:34:34 GMT
That did it!  Thanks...

Chris Shaffer


-----Original Message-----
From: swish-e@sunsite3.berkeley.edu
[mailto:swish-e@sunsite3.berkeley.edu] On Behalf Of Bill Moseley
Sent: Friday, February 11, 2005 2:03 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: 


On Fri, Feb 11, 2005 at 10:52:14AM -0800, Shaffer, Chris wrote:
> Hi...  I've gotten swish-e (using spider.pl) to crawl a couple of our 
> intranet sites.  The filters seem to be working okay for excel.  And 
> it seems to be looking at word documents.  However, (using swish.cgi),

> I don't get any descriptions for those word docs.

.

> Any idea where I can look?  I have no idea where to begin digging.

Sure.  spider.pl just writes to stdout, so you can run it on a few test
docs and see what it outputs.  Do it on a file that generates a
description and then another that doesn't and compare.

> StoreDescription HTML* <body> 200000

Make sure in the spider.pl output that the document's header is indeed
HTML*

$ SPIDER_QUIET=1 /usr/local/lib/swish-e/spider.pl default
http://localhost/apache/test.doc  | head
Path-Name: http://localhost/apache/test.doc
Content-Length: 1713
Last-Mtime: 1108148269
Document-Type: TXT*

That's saying the document is TXT*, so you would need to add another
StoreDescription line for TXT*

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu


*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. 117
Received on Fri Feb 11 11:34:39 2005