On Fri, Feb 11, 2005 at 10:52:14AM -0800, Shaffer, Chris wrote:
> Hi... I've gotten swish-e (using spider.pl) to crawl a couple of our
> intranet sites. The filters seem to be working okay for excel. And it
> seems to be looking at word documents. However, (using swish.cgi), I
> don't get any descriptions for those word docs.
> Any idea where I can look? I have no idea where to begin digging.
Sure. spider.pl just writes to stdout, so you can run it on a few
test docs and see what it outputs. Do it on a file that generates
a description and then another that doesn't and compare.
> StoreDescription HTML* <body> 200000
Make sure in the spider.pl output that the document's header is indeed
$ SPIDER_QUIET=1 /usr/local/lib/swish-e/spider.pl default http://localhost/apache/test.doc | head
That's saying the document is TXT*, so you would need to add another
StoreDescription line for TXT*
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Received on Fri Feb 11 11:03:25 2005